Plotting in R

LMS Bioinformatics

April 2022

Introduction

Visualisation of large data sets has become an essential task in molecular biology and medicine. While there are many software solutions which can be used for visualisation of data starting, for instance, with Excel or LibraOffice Calc, the R platform has become a de facto standard for visualisation in genomics. Rstudio is an interface to R which facilitates the writing and execution of R code. We will use it in our tutorial.

The course is intended to be self-contained, but if you are new to R, we recommend that you go through our previous introduction to R:

[[https://lmsbioinformatics.github.io/LMS_Reproducible-R/]]

Materials

Set the Working directory

Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.

Set working directory - in Rstudio

You can navigate to the unarchived LMS_PlottingInR/course folder in the Rstudio menu

Session -> Set Working Directory -> Choose Directory

path

Set working directory - in the console

Use getwd() to see where your current directory is

setwd("/PathToMyDownload/LMS_PlottingInR/course")
# e.g. setwd("~/Downloads/LMS_PlottingInR/course")

Motivation

Typical results

a typical table

Data visualization

This task will be much easier when we visualise the data and highlight the significant genes by the colour red:

Data visualization: A view of every Points of View column

DataVisualCol

http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html

Design of data figures

Salience

Salience

Salience

Visal conjunctions

Visal conjunctions

Color blindness

CB

https://pubmed.ncbi.nlm.nih.gov/33116149/

A short intro to the most important data structures in R

For our purpose, we need to know two data structures in R:

Data frame

A data frame is a table in which each column contains values of one variable and each row contains one set of values from each column. Both row and column names should be unique.

# reading in table 
df <- read.csv("data/DiffEx.csv")
# showing first 6 rows of the data frame
head(df)
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145     21316.8751      2945.51899       Scg2
## 2 4.03e-120       626.6633        29.26926      Gfra1
## 3 1.94e-109      3080.2873       921.18292       Dlg1
## 4  1.46e-96       472.8374      2126.62645      Mex3a
## 5  4.51e-96       820.3147        71.59931       Rin1
## 6  5.58e-96      4687.6749      1193.30291      Ildr2
# different options to access the data 
head(df$baseMean)
## [1] 10893.6671   296.2968  1967.0850  1359.6740   414.7163  2986.8342
head(df[,"baseMean"])
## [1] 10893.6671   296.2968  1967.0850  1359.6740   414.7163  2986.8342
head(df[,2])
## [1] 10893.6671   296.2968  1967.0850  1359.6740   414.7163  2986.8342
# define the DE genes using logical vector 
diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
# diffExpressed is a vector with TRUE or FALSE values
head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
class(diffExpressed)
## [1] "logical"
# make sure the length of the logical vector is consistent with the original dataset
## check the dimensions of the original dataset
dim(df)
## [1] 18357    10
## check the length of the logical vector
length(diffExpressed)
## [1] 18357
# use table function to see TRUE & FALSE
table(diffExpressed)
## diffExpressed
## FALSE  TRUE 
## 17120  1237
# subset the original df using diffExpressed vector
DE_res <- df[diffExpressed,]

# check the dimension of the new data frame
dim(DE_res)
## [1] 1237   10
# showing first 6 rows of the data frame
head(DE_res)
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145     21316.8751      2945.51899       Scg2
## 2 4.03e-120       626.6633        29.26926      Gfra1
## 3 1.94e-109      3080.2873       921.18292       Dlg1
## 4  1.46e-96       472.8374      2126.62645      Mex3a
## 5  4.51e-96       820.3147        71.59931       Rin1
## 6  5.58e-96      4687.6749      1193.30291      Ildr2
# different options to access the data 
str(df)
## 'data.frame':    18357 obs. of  10 variables:
##  $ ensembl_id     : chr  "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770" "ENSMUSG00000074480" ...
##  $ baseMean       : num  10894 296 1967 1360 415 ...
##  $ log2FoldChange : num  2.86 4.42 1.74 -2.17 3.52 ...
##  $ lfcSE          : num  0.109 0.187 0.077 0.102 0.166 ...
##  $ stat           : num  26.1 23.7 22.6 -21.3 21.2 ...
##  $ pvalue         : num  6.09e-150 4.39e-124 3.18e-113 3.17e-100 1.23e-99 ...
##  $ padj           : num  1.12e-145 4.03e-120 1.94e-109 1.46e-96 4.51e-96 ...
##  $ baseMean_DOX7D : num  21317 627 3080 473 820 ...
##  $ baseMean_DOX24H: num  2945.5 29.3 921.2 2126.6 71.6 ...
##  $ mgi_symbol     : chr  "Scg2" "Gfra1" "Dlg1" "Mex3a" ...

Vector

A vector is a sequence of data elements of the same type. We use it often in the definiton of colors or genes to be displayed. A vector can be constructed using the c() function.

colo <- c("blue","red","green")

colo
## [1] "blue"  "red"   "green"

Two philosophies of plotting in R

In our tutorial, we use two sources of plotting capabilities. The first one (base graphics) might resemble a printer, where you put in your data, make some choices, press the button and voila you have the picture. The second one (ggplot2) is more like an oil painting, it is layered and needs some setting up, but it is more easily modifiable (although it might be more laborious to get to the full painting in the first place).

Two philosophies for plotting in R

Disclaimer: The distinction between the two plotting philosophies is not that strict, we can still do some limited modification after initial plotting also with base graphics using auxiliary functions.

base graphics

This functionality for plotting is easy to use, but becomes difficult or impossible to draw complicated plots.

For using base graphics, we start with the plot() function and then could add features using auxilary functions such as

usage

This scatterplot is an example of the use of the plot() function. Basically, it takes the x and y arguments for the x and y coordinates of the plot.

# plotting 
plot(x=log2(df$baseMean_DOX24H), y=log2(df$baseMean_DOX7D))

Additional arguments provide options for adjusting the plot such as xlab, ylab, pch and main). These are explained using help(plot) or ?plot.

# help(plot) or ?plot
?plot
plot.default {graphics} R Documentation
The Default Scatterplot Function
Description
Draw a scatter plot with decorations such as axes and titles in the active graphics window.

Usage
## Default S3 method:
plot(x, y = NULL, type = "p",  xlim = NULL, ylim = NULL,
     log = "", main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
     ann = par("ann"), axes = TRUE, frame.plot = axes,
     panel.first = NULL, panel.last = NULL, asp = NA,
     xgap.axis = NA, ygap.axis = NA,
     ...)
        "p" for points,
        "l" for lines,
        "b" for both,
        "c" for the lines part alone of "b",
        "o" for both ‘overplotted’,
        "h" for ‘histogram’ like (or ‘high-density’) vertical lines,
        "s" for stair steps,
        "S" for other steps, see ‘Details’ below,
        "n" for no plotting.

In the following, we also define the differentially expressed genes and highlight them by red color using points() to add them to our origin plots.

# plotting 
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
     xlab="log2 expression DOX24H", 
     ylab="log2 expression DOX7D", pch=20, 
     main="Comparison of expression at 24h and 7 days")

diffExpressed <- (!is.na(df$padj) & df$padj < 0.01 & abs(df$log2FoldChange) > 1)
head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
length(diffExpressed)
## [1] 18357
table(diffExpressed)
## diffExpressed
## FALSE  TRUE 
## 17120  1237
points(log2(df$baseMean_DOX24H[diffExpressed]),log2(df$baseMean_DOX7D[diffExpressed]),
       col="red",pch=20)

ggplot2 package

A very versatile package which has become a de facto standard for producing high quality figures is an R package based on the “grammar of graphics”. This enables developing plots by layering graphical elements on top of each other.

“In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).” - Wickham, 2016

Every ggplot2 plot has three key components:

  1. data provided as data frame,

  2. aesthetic mappings between variables in the data and visual properties, and

  3. layer: usually created with a geom function.

usage

Start with the ggplot function which initializes a ggplot object. Here you need to declare the input data frame and to specify the set of plot aesthetics. The actual plotting will be done by applying another layer (with the “+” sign) using various specific geom functions. For a scatter plot like the one above, we will use the geom_point() function.

# install.packages("ggplot2")
library("ggplot2")

# (1) ggplot function with initializes a ggplot object
ggplot(df)

# (2) specify the set of plot aesthetics
ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D)))

# (3) specify plot type using geom_point()
ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D))) +
  geom_point()

Note that we do not need to indicate that the x and y are specific columns of the dataframe df e.g. by using the $ operator as we already defined in the ggplot function that df should be used.

Now we add more layers to get a similar plot as we got using base graphics.

ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D))) +
  geom_point() + 
  ggtitle("Comparison of expression at 24h and 7 days")+
  xlab("log2 expression DOX24H") +
  ylab("log2 expression DOX7D")

This looks promising but we would like to highlight the differentially expressed genes. Since ggplot works with the input data frame, we need to include this information into the data frame df. Then, we add color to the aesthetic mappings and set it the added column. geom_point() will select its color for the different values of the columns, but if we do not like it, we can overwrite or more exactly put another layer on it (like on an oil painting.)

# check the dimension for object "df"
dim(df)
## [1] 18357    10
diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)

# diffExpressed is a vector with TRUE or FALSE values
head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
length(diffExpressed)
## [1] 18357
df2 <- data.frame(df, diffEx=diffExpressed)

head(df2)
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol diffEx
## 1 1.12e-145     21316.8751      2945.51899       Scg2   TRUE
## 2 4.03e-120       626.6633        29.26926      Gfra1   TRUE
## 3 1.94e-109      3080.2873       921.18292       Dlg1   TRUE
## 4  1.46e-96       472.8374      2126.62645      Mex3a   TRUE
## 5  4.51e-96       820.3147        71.59931       Rin1   TRUE
## 6  5.58e-96      4687.6749      1193.30291      Ildr2   TRUE
table(df2$diffEx)
## 
## FALSE  TRUE 
## 17120  1237
g <- ggplot(df2, aes(x=log2(baseMean_DOX24H), 
                     y=log2(baseMean_DOX7D),color=diffEx)) +
  geom_point() + 
  ggtitle("Comparison of expression at 24h and 7 days")+
  xlab("log2 expression DOX24H") +
  ylab("log2 expression DOX7D")
g

# less fancy colors please  
g + scale_color_manual(values=c('black','red')) + 
  labs(color = "Diff. exp.")

# more classic look please  
g + scale_color_manual(values=c('black','red')) + 
  labs(color = "Diff. exp.") + 
  theme_classic()

Note that we assigned first the ggplot object to the variable g and then plot by entering it again. The nice thing here is that we can put further layers on the plot without calling the ggplot() and other functions again. This is convenient if we need to generate various different version of one figure.

Exercise I

Solutions I

Our case study

In 1933, the Dutch pediatrician Cornelia De Lange described a disorder in two children, which became known as Cornelia de Lange syndrome (CdLS). Its features can include short stature, intellectual disability, developmental delay and signs of autism. Over the years, research showed that Cornelia de Lange syndrome is a genetic disorder and that it is frequently linked to mutations in components of cohesin complex. This complex is fundamental for obtaining higher order structures of DNA within the nucleus.

Classical facial dysmorphisms of CdLS and related overlapping syndromes
Classical facial dysmorphisms of CdLS and related overlapping syndromes

Clinical Genetics, Volume: 97, Issue: 1, Pages: 3-11, First published: 13 November 2019, DOI: (10.1111/cge.13674)

How cohesin mutations cause Cornelia de Lange syndrome is not well understood. To elucidate the potential role of cohesin for Cornelia de Lange syndrom, a group at the LMS applied several genomic approaches including RNA-seq technology.

Abstract of our case study https://doi.org/10.1038/s41467-021-23141-9
Weiss et al., Nat Commun. 2021

After comparing expression in cortical neurons from post-mortem brains of Cornelia de Lange syndrome patients with controls, the researchers showed in a mouse model that doxycycline (dox) inducible depletion of one cohesin component (RAD21) leads to similar changes of expression including the downregulation of genes with neuronal and synaptic functions. The researchers asked then the crucial question if the normal expression could be restored when the depletion of RAD21 is lifted. A positive answer might give us hope that correction of cohesin functions could provide a potential therapeutic approach for Cornelia de Lange patients.

The figure below shows one of the experiments in this study. Rad21-TEV neurons were grown in culture, for 10 days. On day 10, cells were treated for 6 hours with either 100ng/ml of doxycycline, or vehicle. After 6 hours, the samples were washed and fresh media was added. Samples for cohesin depletion (24H) were collected 18 hours later, and samples for rescue (7D) were collected 7 days after addition of doxycyline/vehicle.

Experimental design

The file "data/DiffEx.csv" contains the expression after RAD21 depletion at 24 hours (baseMean_DOX24H) and RAD21 restoration at day 7 (baseMean_DOX7D). It also contains log2 fold changes and adjusted p-values (padj). "data/GO_UpDE.csv" contains gene ontology analysis results. "data/DoxExpression.txt" contains normalised gene counts information. To gain insights into these data and to present them to the research community, we need to turn the table into some informative plots. In the following, we would like to complete following tasks:

Our tasks:

  1. Display the distribution of log2 fold changes

  2. Visualise the expression and changes in expression

  3. Plot the functional categories enriched in significantly differentially expressed genes

Let’s get started.

Displaying distributions - Base graphics

Histograms - Base graphics

hist(df$log2FoldChange,breaks=100,xlab = "Log2 fold changes",
     main="Histogram of log2 fold changes",col="red")

Density plots - Base graphics

d <- density(df$log2FoldChange)
plot(d,xlab="Log2 fold changes",
     main="Distribution of log2 fold changes")
polygon(d,col="red")

log2_DOX24H <- log2(df$baseMean_DOX24H)
log2_DOX7D <- log2(df$baseMean_DOX7D)

d1 <- density(log2_DOX7D)
d2 <- density(log2_DOX24H)

plot(d1, lwd=2, xlab="Log2 baseMean",
     main="Distribution of log2 fold changes")
lines(d2,col=rgb(0,0,1),lwd=2)
polygon(d1,col=rgb(1,0,0,0.2))
polygon(d2,col=rgb(0,0,1,0.2))
legend(x=18,y=0.08,legend=c("DOX24H","DOX7D"),
       fill=c(rgb(1,0,0,0.2),rgb(0,0,1,0.2)))

Displaying distributions - ggplot2

Histograms - ggplot2

# install.package(ggplot2)
library(ggplot2)

# basic histogram
ggplot(df, aes(x=log2FoldChange)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# change the number of bins and add x and y lab
ggplot(df, aes(x=log2FoldChange)) + 
  geom_histogram(bins=100) + 
  xlab("Log2 fold changes") + ylab("Frequency")

# change colours 
ggplot(df, aes(x=log2FoldChange)) + 
  geom_histogram(bins=100, col="black",fill="red") + 
  xlab("Log2 fold changes") + ylab("Frequency")

# use "classic background" and add title
ggplot(df, aes(x=log2FoldChange)) + 
  geom_histogram(bins=100, col="black",fill="red") + 
  xlab("Log2 fold changes") + ylab("Frequency") +
  theme_classic() +
  ggtitle("Histogram of log2 fold changes")

Density plots - ggplot2

# basic density plot
ggplot(df, aes(x=log2FoldChange)) + 
  geom_density()

# change colour and use "theme_bw"
ggplot(df, aes(x=log2FoldChange)) + 
  geom_density(fill="red") + 
  theme_bw()

# subset the data.frame and only keep the information of interest
df_subset <- df[,colnames(df) %in% 
                  c("ensembl_id","baseMean_DOX7D","baseMean_DOX24H")]
dim(df_subset)
## [1] 18357     3
head(df_subset)
##           ensembl_id baseMean_DOX7D baseMean_DOX24H
## 1 ENSMUSG00000050711     21316.8751      2945.51899
## 2 ENSMUSG00000025089       626.6633        29.26926
## 3 ENSMUSG00000022770      3080.2873       921.18292
## 4 ENSMUSG00000074480       472.8374      2126.62645
## 5 ENSMUSG00000024883       820.3147        71.59931
## 6 ENSMUSG00000040612      4687.6749      1193.30291
# convert df_subset into a "long" format
# install.packages("reshape2")
library("reshape2")
df_subset_long <- melt(df_subset,id="ensembl_id")
dim(df_subset_long)
## [1] 36714     3
str(df_subset_long)
## 'data.frame':    36714 obs. of  3 variables:
##  $ ensembl_id: chr  "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770" "ENSMUSG00000074480" ...
##  $ variable  : Factor w/ 2 levels "baseMean_DOX7D",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value     : num  21317 627 3080 473 820 ...
head(df_subset_long)
##           ensembl_id       variable      value
## 1 ENSMUSG00000050711 baseMean_DOX7D 21316.8751
## 2 ENSMUSG00000025089 baseMean_DOX7D   626.6633
## 3 ENSMUSG00000022770 baseMean_DOX7D  3080.2873
## 4 ENSMUSG00000074480 baseMean_DOX7D   472.8374
## 5 ENSMUSG00000024883 baseMean_DOX7D   820.3147
## 6 ENSMUSG00000040612 baseMean_DOX7D  4687.6749
# now it is ready for ggplot2
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) + 
  geom_density() 

# setup the alpha value to make the colours transparent
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) + 
  geom_density(alpha=0.5) 

# use the "theme_classic" and add xlab
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) + 
  geom_density(alpha=0.5) + theme_classic() + xlab("Log2 baseMean")

ggplot(df_subset_long, aes(x=variable,y=log2(value), fill=variable)) + 
  geom_violin(alpha=0.5) + theme_classic() + 
  ylab("Log2 baseMean") + xlab("Samples")

# add geom_boxplot() layer directly
ggplot(df_subset_long, aes(x=variable,y=log2(value),fill=variable)) + 
  geom_violin(alpha=0.5) + 
  geom_boxplot() + 
  theme_classic() + 
  ylab("Log2 baseMean") + xlab("Samples")

# what to do if we want violin plot filled with 2 different colours and 
# box plot filled with "white" colour
ggplot(df_subset_long, aes(x=variable,y=log2(value))) + 
  geom_violin(aes(fill=variable),alpha=0.5) + 
  geom_boxplot(width=0.2) + 
  theme_classic() + 
  ylab("Log2 baseMean") + xlab("Samples")

Exercise II

Regular Expression

Regular Expression

Solutions II

Visualising expression and changes in expression

Scatterplots - base graphics

We already did this before. Now we want to highlight some genes and put their gene symbol into the plot. For the start, we will highlight the 5 most significant genes.

# plotting 
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
     xlab="log2 expression DOX24H", 
     ylab="log2 expression DOX7D",
     pch=20, main="Comparison of expression at 24h and 7 days",
     col="gray")

diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)

points(log2(df$baseMean_DOX24H[diffExpressed]),
       log2(df$baseMean_DOX7D[diffExpressed]),
       col="red",pch=20)

o <- order(df$padj)[1:5]
df[o,]
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145     21316.8751      2945.51899       Scg2
## 2 4.03e-120       626.6633        29.26926      Gfra1
## 3 1.94e-109      3080.2873       921.18292       Dlg1
## 4  1.46e-96       472.8374      2126.62645      Mex3a
## 5  4.51e-96       820.3147        71.59931       Rin1
points(log2(df$baseMean_DOX24H[o]),
       log2(df$baseMean_DOX7D[o]),
       col="black",pch=21,cex=1.5,lwd=2)

text(log2(df$baseMean_DOX24H[o]),
     log2(df$baseMean_DOX7D[o]),
     labels=df$mgi_symbol[o], 
     pos = 3,offset=0.5,font=2)

This looks good. Now let’s label the top 20 genes.

# plotting 
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
     xlab="log2 expression DOX24H", ylab="log2 expression DOX7D",pch=20, 
     main="Comparison of expression at 24h and 7 days",col="gray")

diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)

points(log2(df$baseMean_DOX24H[diffExpressed]),
       log2(df$baseMean_DOX7D[diffExpressed]),
       col="red",pch=20)

o <- order(df$padj)[1:20]

points(log2(df$baseMean_DOX24H[o]),
       log2(df$baseMean_DOX7D[o]),
       col="black",pch=21,cex=1.5,lwd=2)

text(log2(df$baseMean_DOX24H[o]),
     log2(df$baseMean_DOX7D[o]),
     labels=df$mgi_symbol[o], pos = 3,offset=0.5,font=2)

Oh, no.

ggplot2 and ggrepel

# install.packages("ggrepel")
library(ggrepel)

diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
# diffExpressed is a vector with TRUE or FALSE values

df4ggplot <- data.frame(df, diffEx=diffExpressed)

head(df4ggplot)
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol diffEx
## 1 1.12e-145     21316.8751      2945.51899       Scg2   TRUE
## 2 4.03e-120       626.6633        29.26926      Gfra1   TRUE
## 3 1.94e-109      3080.2873       921.18292       Dlg1   TRUE
## 4  1.46e-96       472.8374      2126.62645      Mex3a   TRUE
## 5  4.51e-96       820.3147        71.59931       Rin1   TRUE
## 6  5.58e-96      4687.6749      1193.30291      Ildr2   TRUE
table(df4ggplot$diffEx)
## 
## FALSE  TRUE 
## 17120  1237
df4ggplot <- df4ggplot[order(df4ggplot$padj),]

g_top20 <- ggplot(df4ggplot, aes(x=log2(baseMean_DOX24H), 
                     y=log2(baseMean_DOX7D),color=diffEx)) +
  geom_point() + 
  scale_color_manual(values=c('black','red')) + 
  labs(color = "Diff. exp.") + 
  theme_classic() +
  ggtitle("Comparison of expression at 24h and 7 days")+
  xlab("log2 expression DOX24H") +
  ylab("log2 expression DOX7D") +
  geom_label_repel(data=df4ggplot[c(1:20),],
                   mapping=aes(label = mgi_symbol),
                   box.padding = 0.5,max.overlaps = 20)

g_top20

Volcano plots - base graphics

# plotting 
plot(x=df$log2FoldChange,y=-log10(df$pvalue),
     xlab="log2 fold change", ylab="log10 p-value",
     pch=20, main="Volcano plot",col="gray")

diffExpressed <- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)

points(df$log2FoldChange[diffExpressed],
       -log10(df$pvalue[diffExpressed]),col="red",pch=20)

o <- order(df$padj)[1:10]

points(df$log2FoldChange[o],-log10(df$pvalue[o]),
       col="black",pch=21,cex=1.5,lwd=2)

text(df$log2FoldChange[o],-log10(df$pvalue[o]),
     labels=df$mgi_symbol[o], pos = 1,offset=0.5,font=2)

Volcano plots - ggplot2

# install.packages("ggrepel")
library(ggrepel)

df4ggplot <- df
df4ggplot$DE.cat <- ifelse(
  df4ggplot$padj < 0.01 & abs(df4ggplot$log2FoldChange) > 1,
  "TRUE", "FALSE"
)

df4ggplot$DE.cat <- factor(df4ggplot$DE.cat,
                           levels=c("TRUE","FALSE"))
df4ggplot <- df4ggplot[order(df4ggplot$padj),]

vol_plot <- ggplot(data=df4ggplot, 
                   aes(x=log2FoldChange, y=-log10(pvalue),col=DE.cat)) +
  geom_point() +  
  xlab("log2FoldChange") + ylab("-log10 PValue")+ 
  scale_color_manual(values=c("red","grey"))+
  geom_label_repel(data=df4ggplot[c(1:10),],
                   mapping=aes(label = mgi_symbol),
                   box.padding = 0.5,max.overlaps = 20)+
  theme_classic()

vol_plot

Exercise III

Solutions III

Heatmaps - base graphics

# install.packages("RColorBrewer")
library(RColorBrewer)
DoxEx <-  read.csv("data/DoxExpression.txt",row.names=1)
head(DoxEx)
##                      DOX24H_1  DOX24H_2   DOX24H_3    DOX7D_3    DOX7D_1
## ENSMUSG00000090025  2.4928423  2.430152  2.5169965  2.5058382  2.6820156
## ENSMUSG00000051951 12.4809504 12.339864 12.3060097 11.9827350 11.7679441
## ENSMUSG00000025900 -0.8083506 -0.736612 -0.8085774 -0.8090216 -0.8091383
## ENSMUSG00000025902 -2.2108951 -2.210912 -2.2110380 -2.2113214 -2.2113968
## ENSMUSG00000064376 -1.9723416 -1.972370 -1.9725807 -1.9730551 -1.9569306
## ENSMUSG00000088000 -2.2107546 -2.210769 -2.2108726 -2.2111068 -2.2111690
##                       DOX7D_2
## ENSMUSG00000090025  2.2964760
## ENSMUSG00000051951 12.1394396
## ENSMUSG00000025900 -0.7711231
## ENSMUSG00000025902 -2.2107869
## ENSMUSG00000064376 -1.9721606
## ENSMUSG00000088000 -2.2106652
class(DoxEx)
## [1] "data.frame"
# order df based on padj
o <- order(df$padj)
head(df[o,])
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145     21316.8751      2945.51899       Scg2
## 2 4.03e-120       626.6633        29.26926      Gfra1
## 3 1.94e-109      3080.2873       921.18292       Dlg1
## 4  1.46e-96       472.8374      2126.62645      Mex3a
## 5  4.51e-96       820.3147        71.59931       Rin1
## 6  5.58e-96      4687.6749      1193.30291      Ildr2
# select top 30 genes based on padj values
selectedGenes <- df[o,]$ensembl_id[1:30]

head(selectedGenes)
## [1] "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770"
## [4] "ENSMUSG00000074480" "ENSMUSG00000024883" "ENSMUSG00000040612"
# subset data of interest and convert it to matrix
DoxExSelected_mx <- as.matrix(DoxEx[selectedGenes,])

heatmap(DoxExSelected_mx,cexCol=1,scale="row", 
        col=colorRampPalette(brewer.pal(8, "Blues"))(25))

Heatmaps and RColorBrewer

# install.packages("RColorBrewer")
library(RColorBrewer)

# display all color palettes
display.brewer.all()

# display colorblind-Friendly color palettes
display.brewer.all(colorblindFriendly = TRUE)

# return 8 colours from the color palette "RdBu"
brewer.pal(8, "RdBu")
## [1] "#B2182B" "#D6604D" "#F4A582" "#FDDBC7" "#D1E5F0" "#92C5DE" "#4393C3"
## [8] "#2166AC"

# use `colorRampPalette()` to create a colour ramp from the 8 colours from the color palette "RdBu"
colorRampPalette(brewer.pal(8, "RdBu"))(25)
##  [1] "#B2182B" "#BC2C34" "#C7413E" "#D15748" "#DB6B55" "#E37F65" "#EC9374"
##  [8] "#F4A784" "#F7B799" "#F9C6AD" "#FCD6C1" "#F3DDCF" "#E7E0DB" "#DAE2E7"
## [15] "#CBE2EE" "#B9D9E9" "#A6CFE3" "#94C6DE" "#7EB8D7" "#67A9CF" "#509BC7"
## [22] "#3E8DC0" "#3480B9" "#2A73B2" "#2166AC"

Heatmaps - ComplexHeatmap I

# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("ComplexHeatmap")
library("ComplexHeatmap")
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.10.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================

https://jokergoo.github.io/ComplexHeatmap-reference/book/

?Heatmap

Heatmap {ComplexHeatmap}    R Documentation
  Constructor method for Heatmap class

## Description
  Constructor method for Heatmap class

## Usage
Heatmap(matrix, col, name,
    na_col = "grey",
    color_space = "LAB",
    rect_gp = gpar(col = NA),
    border = NA,
    border_gp = gpar(col = "black"),
    cell_fun = NULL,
    layer_fun = NULL,
    jitter = FALSE,
    
    row_title = character(0),
    row_title_side = c("left", "right"),
    row_title_gp = gpar(fontsize = 13.2),
    row_title_rot = switch(row_title_side[1], "left" = 90, "right" = 270),
    column_title = character(0),
    column_title_side = c("top", "bottom"),
    column_title_gp = gpar(fontsize = 13.2),
    column_title_rot = 0,
    
    cluster_rows = TRUE,
    cluster_row_slices = TRUE,
    clustering_distance_rows = "euclidean",
    clustering_method_rows = "complete",
    row_dend_side = c("left", "right"),
    row_dend_width = unit(10, "mm"),
    show_row_dend = TRUE,
    row_dend_reorder = is.logical(cluster_rows) || is.function(cluster_rows),
    row_dend_gp = gpar(),
    cluster_columns = TRUE,
    cluster_column_slices = TRUE,
    clustering_distance_columns = "euclidean",
    clustering_method_columns = "complete",
    column_dend_side = c("top", "bottom"),
    column_dend_height = unit(10, "mm"),
    show_column_dend = TRUE,
    column_dend_gp = gpar(),
    column_dend_reorder = is.logical(cluster_columns) || is.function(cluster_columns),
    
    row_order = NULL,
    column_order = NULL,
    
    row_labels = rownames(matrix),
    row_names_side = c("right", "left"),
    show_row_names = TRUE,
    row_names_max_width = unit(6, "cm"),
    row_names_gp = gpar(fontsize = 12),
    row_names_rot = 0,
    row_names_centered = FALSE,
    column_labels = colnames(matrix),
    column_names_side = c("bottom", "top"),
    show_column_names = TRUE,
    column_names_max_height = unit(6, "cm"),
    column_names_gp = gpar(fontsize = 12),
    column_names_rot = 90,
    column_names_centered = FALSE,
    
    top_annotation = NULL,
    bottom_annotation = NULL,
    left_annotation = NULL,
    right_annotation = NULL,
    
    km = 1,
    split = NULL,
    row_km = km,
    row_km_repeats = 1,
    row_split = split,
    column_km = 1,
    column_km_repeats = 1,
    column_split = NULL,
    gap = unit(1, "mm"),
    row_gap = unit(1, "mm"),
    column_gap = unit(1, "mm"),
    show_parent_dend_line = ht_opt$show_parent_dend_line,
    
    heatmap_width = unit(1, "npc"),
    width = NULL,
    heatmap_height = unit(1, "npc"),
    height = NULL,
    
    show_heatmap_legend = TRUE,
    heatmap_legend_param = list(title = name),
    
    use_raster = NULL,
    raster_device = c("png", "jpeg", "tiff", "CairoPNG", "CairoJPEG", "CairoTIFF", "agg_png"),
    raster_quality = 1,
    raster_device_param = list(),
    raster_resize_mat = FALSE,
    raster_by_magick = requireNamespace("magick", quietly = TRUE),
    raster_magick_filter = NULL,
    
    post_fun = NULL)

Heatmaps - ComplexHeatmap II

DoxEx <-  read.csv("data/DoxExpression.txt",row.names=1)

# order df based on padj
o <- order(df$padj)
head(df[o,])
##           ensembl_id   baseMean log2FoldChange      lfcSE      stat    pvalue
## 1 ENSMUSG00000050711 10893.6671       2.855441 0.10948634  26.08034 6.09e-150
## 2 ENSMUSG00000025089   296.2968       4.421013 0.18660593  23.69170 4.39e-124
## 3 ENSMUSG00000022770  1967.0850       1.741321 0.07700282  22.61373 3.18e-113
## 4 ENSMUSG00000074480  1359.6740      -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883   414.7163       3.518405 0.16605551  21.18812  1.23e-99
## 6 ENSMUSG00000040612  2986.8342       1.974114 0.09325275  21.16950  1.82e-99
##        padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145     21316.8751      2945.51899       Scg2
## 2 4.03e-120       626.6633        29.26926      Gfra1
## 3 1.94e-109      3080.2873       921.18292       Dlg1
## 4  1.46e-96       472.8374      2126.62645      Mex3a
## 5  4.51e-96       820.3147        71.59931       Rin1
## 6  5.58e-96      4687.6749      1193.30291      Ildr2
# select top 30 genes based on padj values
selectedGenes <- df[o,]$ensembl_id[1:30]

head(selectedGenes)
## [1] "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770"
## [4] "ENSMUSG00000074480" "ENSMUSG00000024883" "ENSMUSG00000040612"
# subset data of interest and convert it to matrix
DoxExSelected <- as.matrix(DoxEx[selectedGenes,])
DEmatok <- t(scale(t(DoxExSelected)))

# print default heatmap
ht <- Heatmap(DEmatok)
draw(ht)

# change font size for the row names and row names
ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6))
draw(ht)

# change colours
ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = colorRampPalette(brewer.pal(8, "RdBu"))(25))
draw(ht)

# change colours - upregulated as red and downregulated as blue
ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)))
draw(ht)

# change figure legend
ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
        heatmap_legend_param =
          list(title = "Row z-score",
               title_position = "topcenter"))
draw(ht)

# make it "slimmer" 
ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
        heatmap_legend_param =
          list(title = "Row z-score",width = unit(6, "mm"),
               title_position = "topcenter",direction = "horizontal"))

draw(ht,heatmap_legend_side = "bottom")

# turn off "cluster by columns"

ht <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
        heatmap_legend_param =
          list(title = "Row z-score",width = unit(6, "mm"),
               title_position = "topcenter",direction = "horizontal"),
        cluster_columns=F)

draw(ht,heatmap_legend_side = "bottom")

# create top annotation
anno <- data.frame(SampleNames=colnames(DEmatok))
anno$TimePoint <- sub("(.+)(_)(.+)","\\1",anno$SampleNames)
anno$TimePoint <- factor(anno$TimePoint)
time_col <- brewer.pal(10, "Paired")[c(9:10)]
names(time_col) <- levels(anno$TimePoint)
col_ha = HeatmapAnnotation(TimePoint = anno$TimePoint,
                           col = list(TimePoint = time_col))
ht_top <- Heatmap(DEmatok, 
              row_names_gp = gpar(fontsize = 6),
              column_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
        heatmap_legend_param =
          list(title = "Row z-score",width = unit(6, "mm"),
               title_position = "topcenter",direction = "horizontal"),
        cluster_columns=F,top_annotation = col_ha)

draw(ht_top,heatmap_legend_side = "bottom")

# remove column names and remove row names

ht_top_NoL <- Heatmap(DEmatok, row_names_gp = gpar(fontsize = 6),
        col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
        heatmap_legend_param =
          list(title = "Row z-score",width = unit(6, "mm"),
               title_position = "topcenter",direction = "horizontal"),
        cluster_columns=F,top_annotation = col_ha,
        show_column_names = F,show_row_names = F)

draw(ht_top_NoL,heatmap_legend_side = "bottom")

Saving your plots

  1. useful functions: pdf(), png(), jpeg(), postscript(), svg()

  2. plot object that you would like to save

  3. close the file with dev.off()

?svg()
## Usage
svg(filename = if(onefile) "Rplots.svg" else "Rplot%03d.svg",
    width = 7, height = 7, pointsize = 12,
    onefile = FALSE, family = "sans", bg = "white",
    antialias = c("default", "none", "gray", "subpixel"),
    symbolfamily)
filename4ht <- "heatmapCourseOutput.svg"
svg(file=filename4ht, width = 4, height=5.5)
  draw(ht_top_NoL,heatmap_legend_side = "bottom")
dev.off()

Exercise IV

Solutions IV

Plots of functional enrichment

Gene Ontology analysis was performed based on Up-regulated genes in Dox7D vs Dox24H using thresholds padj < 0.01 and log2(FC) > log2(2). Please see “data/GO_UpDE.csv”

Gene Ontology results

Bar plot

GOres <-  read.csv("data/GO_UpDE.csv")
# rank "GOres" based on "FDR.over"
GOres <- GOres[order(GOres$FDR.over),]

# subset top 15 GO terms
GOres_top15 <- GOres[c(1:15),]

# basic bar plot
ori_bar <- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) + 
  geom_bar(stat="identity") +
  xlab("-log10(FDR)")
ori_bar

# add colour
blue_bar <- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) + 
  geom_bar(stat="identity",fill="blue") +
  xlab("-log10(FDR)") + theme_classic()

blue_bar

# the "GOres_top15" already ordered by the "FDR.over" but it is not shown in the bar plot
head(GOres_top15)
##     category over_represented_pvalue under_represented_pvalue numDEInCat
## 1 GO:0023052                4.98e-22                        1        318
## 2 GO:0007154                5.35e-21                        1        317
## 3 GO:0032501                5.47e-21                        1        369
## 4 GO:0050896                2.89e-19                        1        386
## 5 GO:0007165                1.00e-17                        1        282
## 6 GO:0051716                3.41e-17                        1        339
##   numInCat                             term ontology FDR.over FDR.under
## 1     4390                        signaling       BP 9.90e-18         1
## 2     4436               cell communication       BP 3.63e-17         1
## 3     5461 multicellular organismal process       BP 3.63e-17         1
## 4     6054             response to stimulus       BP 1.44e-15         1
## 5     3932              signal transduction       BP 3.99e-14         1
## 6     5189    cellular response to stimulus       BP 1.13e-13         1
# We use 'factor'
class(GOres_top15$term)
## [1] "character"
## version 1
ordered_term <- GOres_top15$term
head(GOres_top15$term)
## [1] "signaling"                        "cell communication"              
## [3] "multicellular organismal process" "response to stimulus"            
## [5] "signal transduction"              "cellular response to stimulus"
GOres_top15$term <- factor(GOres_top15$term,levels=ordered_term)

ordered_bar <- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) + 
  geom_bar(stat="identity",fill="blue") +
  xlab("-log10(FDR)") + theme_classic()

ordered_bar

## version 2
ordered_term_rev <- rev(ordered_term)
head(ordered_term_rev)
## [1] "tissue development"                         
## [2] "system development"                         
## [3] "multicellular organism development"         
## [4] "regulation of cell population proliferation"
## [5] "cell-cell signaling"                        
## [6] "synaptic signaling"
GOres_top15$term <- factor(GOres_top15$term,levels=ordered_term_rev)

ordered_barRev <- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) + 
  geom_bar(stat="identity",fill="blue") +
  xlab("-log10(FDR)") + theme_classic()

ordered_barRev

Bubble plot

# calculate number of up.DE
df <- read.csv("data/DiffEx.csv")
upDE_df <- df[!is.na(df$padj) & df$padj<0.01 & df$log2FoldChange> 1,]
num.UpDE <- nrow(upDE_df)

GOres_top15$geneRatio <- GOres_top15$numDEInCat/num.UpDE
head(GOres_top15)
##     category over_represented_pvalue under_represented_pvalue numDEInCat
## 1 GO:0023052                4.98e-22                        1        318
## 2 GO:0007154                5.35e-21                        1        317
## 3 GO:0032501                5.47e-21                        1        369
## 4 GO:0050896                2.89e-19                        1        386
## 5 GO:0007165                1.00e-17                        1        282
## 6 GO:0051716                3.41e-17                        1        339
##   numInCat                             term ontology FDR.over FDR.under
## 1     4390                        signaling       BP 9.90e-18         1
## 2     4436               cell communication       BP 3.63e-17         1
## 3     5461 multicellular organismal process       BP 3.63e-17         1
## 4     6054             response to stimulus       BP 1.44e-15         1
## 5     3932              signal transduction       BP 3.99e-14         1
## 6     5189    cellular response to stimulus       BP 1.13e-13         1
##   geneRatio
## 1 0.4555874
## 2 0.4541547
## 3 0.5286533
## 4 0.5530086
## 5 0.4040115
## 6 0.4856734
GOres_top15_bubble <- GOres_top15[order(GOres_top15$geneRatio),]
GOres_top15_bubble$term <- factor(GOres_top15_bubble$term,
                                  levels = GOres_top15_bubble$term)

bubblePlot <- ggplot(GOres_top15_bubble,aes(x=geneRatio,y=term)) + 
  geom_point(aes(size=numDEInCat,color=FDR.over)) +
  xlab("geneRatio") + theme_classic()+
  scale_colour_gradient(name = "FDR",
                         low = "#D55E00", high = "#0072B2")
bubblePlot

Saving your plots - ggsave

ggsave()

?ggsave()
## Usage
ggsave(filename, plot = last_plot(), device = NULL, path = NULL,
  scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
  dpi = 300, limitsize = TRUE, ...)
  
data(mtcars)

plot1 <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
plot2 <- ggplot(mtcars, aes(mpg, wt,col=as.factor(vs))) + geom_point()

ggsave("mtcars_default.png", plot1)
ggsave("mtcars_col.png", plot2)

Exercise V

Solutions V

Resources

sessionInfo

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ComplexHeatmap_2.10.0 RColorBrewer_1.1-2    ggrepel_0.9.1        
## [4] reshape2_1.4.4        ggplot2_3.3.5        
## 
## loaded via a namespace (and not attached):
##  [1] shape_1.4.6         circlize_0.4.13     GetoptLong_1.0.5   
##  [4] tidyselect_1.1.1    xfun_0.28           bslib_0.3.1        
##  [7] purrr_0.3.4         colorspace_2.0-2    vctrs_0.3.8        
## [10] generics_0.1.1      stats4_4.1.2        htmltools_0.5.2    
## [13] yaml_2.2.2          utf8_1.2.2          rlang_0.4.12       
## [16] jquerylib_0.1.4     pillar_1.6.5        glue_1.6.1         
## [19] withr_2.4.3         DBI_1.1.2           BiocGenerics_0.40.0
## [22] matrixStats_0.61.0  foreach_1.5.1       lifecycle_1.0.1    
## [25] plyr_1.8.6          stringr_1.4.0       munsell_0.5.0      
## [28] gtable_0.3.0        GlobalOptions_0.1.2 codetools_0.2-18   
## [31] evaluate_0.14       labeling_0.4.2      knitr_1.36         
## [34] IRanges_2.28.0      fastmap_1.1.0       doParallel_1.0.16  
## [37] parallel_4.1.2      fansi_1.0.2         highr_0.9          
## [40] Rcpp_1.0.8          scales_1.1.1        S4Vectors_0.32.2   
## [43] magick_2.7.3        jsonlite_1.7.3      farver_2.1.0       
## [46] rjson_0.2.21        png_0.1-7           digest_0.6.29      
## [49] stringi_1.7.6       dplyr_1.0.7         clue_0.3-60        
## [52] tools_4.1.2         magrittr_2.0.1      sass_0.4.0         
## [55] tibble_3.1.6        cluster_2.1.2       crayon_1.4.2       
## [58] pkgconfig_2.0.3     ellipsis_0.3.2      assertthat_0.2.1   
## [61] rmarkdown_2.11      iterators_1.0.13    R6_2.5.1           
## [64] compiler_4.1.2