Plotting in R
1 Introduction
Visualisation of large data sets has become an essential task in molecular biology and medicine. While there are many software solutions which can be used for visualisation of data starting, for instance, with Excel or LibraOffice Calc, the R platform has become a de facto standard for visualisation in genomics. Rstudio is an interface to R which facilitates the writing and execution of R code. We will use it in our tutorial.
The course is intended to be self-contained, but if you are new to R, we recommend that you go through our previous introduction to R:
2 Materials
All prerequisites, links to material and slides for this course can be found on github - PlottingInR
Or can be downloaded as a zip archive from here - Download zip
3 Set the Working directory
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
3.1 Set working directory - in Rstudio
You can navigate to the unarchived LMS_PlottingInR/course folder in the Rstudio menu
Session -> Set Working Directory -> Choose Directory
3.2 Set working directory - in the console
Use getwd() to see where your current directory is
setwd("/PathToMyDownload/LMS_PlottingInR/course")
# e.g. setwd("~/Downloads/LMS_PlottingInR/course")
4 Motivation
Visualisation plays a key part in the analysis of genomic data and presentation of the results. Why is this so?
The reason for the importance of visualisation is simply that we humans are not good in reading large tables, which generally are produced in genomics. For instance, can you easily spot the significant genes (padj < 0.01 and abs(log2 fold change) > 1) in the table below?
4.1 Typical results
4.2 Data visualization
This task will be much easier when we visualise the data and highlight the significant genes by the colour red:
5 Data visualization: A view of every Points of View column
http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html
5.1 Design of data figures
- accuracy
5.2 Salience
- speed
Salience
Visal conjunctions
5.3 Color blindness
- Reference: Crameri F, Shephard GE, Heron PJ. The misuse of colour in science communication. Nat Commun. 2020 Oct 28;11(1):5444. doi: 10.1038/s41467-020-19160-7. PMID: 33116149; PMCID: PMC7595127.
https://pubmed.ncbi.nlm.nih.gov/33116149/
- Colour vision tests
- Colour map classes and types
- Guideline for choosing the right scientific colour map
6 A short intro to the most important data structures in R
For our purpose, we need to know two data structures
in R:
6.1 Data frame
A data frame is a table in which each column contains values of one variable and each row contains one set of values from each column. Both row and column names should be unique.
A data frame is produced when we upload a comma-delimited table into R using the read.csv() command.
A useful command to check the content of a data frame is head() which displays the top of the data.frame.
# reading in table
<- read.csv("data/DiffEx.csv")
df # showing first 6 rows of the data frame
head(df)
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145 21316.8751 2945.51899 Scg2
## 2 4.03e-120 626.6633 29.26926 Gfra1
## 3 1.94e-109 3080.2873 921.18292 Dlg1
## 4 1.46e-96 472.8374 2126.62645 Mex3a
## 5 4.51e-96 820.3147 71.59931 Rin1
## 6 5.58e-96 4687.6749 1193.30291 Ildr2
subset the data.frame based on columns
- Single columns can either be accessed via the $ operator, writing the explicit name [,name] (if the column has a name) or more generally by [,j] for the jth column.
# different options to access the data
head(df$baseMean)
## [1] 10893.6671 296.2968 1967.0850 1359.6740 414.7163 2986.8342
head(df[,"baseMean"])
## [1] 10893.6671 296.2968 1967.0850 1359.6740 414.7163 2986.8342
head(df[,2])
## [1] 10893.6671 296.2968 1967.0850 1359.6740 414.7163 2986.8342
subset the data frame based on rows
- Another common way to subset the data frame is to supply a logical (Boolean) vector. For example, if we use the following thresholds to define significant genes (padj < 0.01 and abs(log2 fold change) > 1) from the DESeq2 analysis results.
# define the DE genes using logical vector
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed # diffExpressed is a vector with TRUE or FALSE values
head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
class(diffExpressed)
## [1] "logical"
# make sure the length of the logical vector is consistent with the original dataset
## check the dimensions of the original dataset
dim(df)
## [1] 18357 10
## check the length of the logical vector
length(diffExpressed)
## [1] 18357
# use table function to see TRUE & FALSE
table(diffExpressed)
## diffExpressed
## FALSE TRUE
## 17120 1237
# subset the original df using diffExpressed vector
<- df[diffExpressed,]
DE_res
# check the dimension of the new data frame
dim(DE_res)
## [1] 1237 10
# showing first 6 rows of the data frame
head(DE_res)
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145 21316.8751 2945.51899 Scg2
## 2 4.03e-120 626.6633 29.26926 Gfra1
## 3 1.94e-109 3080.2873 921.18292 Dlg1
## 4 1.46e-96 472.8374 2126.62645 Mex3a
## 5 4.51e-96 820.3147 71.59931 Rin1
## 6 5.58e-96 4687.6749 1193.30291 Ildr2
- Another useful command to check the type of each columns from a data frame is str() which compactly displays the structure of a R object.
# different options to access the data
str(df)
## 'data.frame': 18357 obs. of 10 variables:
## $ ensembl_id : chr "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770" "ENSMUSG00000074480" ...
## $ baseMean : num 10894 296 1967 1360 415 ...
## $ log2FoldChange : num 2.86 4.42 1.74 -2.17 3.52 ...
## $ lfcSE : num 0.109 0.187 0.077 0.102 0.166 ...
## $ stat : num 26.1 23.7 22.6 -21.3 21.2 ...
## $ pvalue : num 6.09e-150 4.39e-124 3.18e-113 3.17e-100 1.23e-99 ...
## $ padj : num 1.12e-145 4.03e-120 1.94e-109 1.46e-96 4.51e-96 ...
## $ baseMean_DOX7D : num 21317 627 3080 473 820 ...
## $ baseMean_DOX24H: num 2945.5 29.3 921.2 2126.6 71.6 ...
## $ mgi_symbol : chr "Scg2" "Gfra1" "Dlg1" "Mex3a" ...
6.2 Vector
A vector is a sequence of data elements of the same type. We use it often in the definiton of colors or genes to be displayed. A vector can be constructed using the c() function.
<- c("blue","red","green")
colo
colo
## [1] "blue" "red" "green"
7 Two philosophies of plotting in R
In our tutorial, we use two sources of plotting capabilities. The first one (base graphics) might resemble a printer, where you put in your data, make some choices, press the button and voila you have the picture. The second one (ggplot2) is more like an oil painting, it is layered and needs some setting up, but it is more easily modifiable (although it might be more laborious to get to the full painting in the first place).
Disclaimer: The distinction between the two plotting philosophies is not that strict, we can still do some limited modification after initial plotting also with base graphics using auxiliary functions.
8 base graphics
This functionality for plotting is easy to use, but becomes difficult or impossible to draw complicated plots.
For using base graphics, we start with the plot() function and then could add features using auxilary functions such as
lines: add lines to a plot, given a vector x values and a corresponding vector of y values (or a 2-column matrix); the function connects the points
points: add points to a plot
text: add text labels to a plot using specified x, y coordinates
title: add annotations to x, y axis labels, title, subtitle, outer margin
mtext: add arbitrary text to the margins (inner or outer) of the plot
axis: adding axis ticks/labels
8.1 usage
This scatterplot is an example of the use of the plot() function. Basically, it takes the x and y arguments for the x and y coordinates of the plot.
# plotting
plot(x=log2(df$baseMean_DOX24H), y=log2(df$baseMean_DOX7D))
Additional arguments provide options for adjusting the plot such as xlab, ylab, pch and main). These are explained using help(plot) or ?plot.
# help(plot) or ?plot
?plot
plot.default {graphics} R Documentation The Default Scatterplot Function Description Draw a scatter plot with decorations such as axes and titles in the active graphics window. Usage ## Default S3 method: plot(x, y = NULL, type = "p", xlim = NULL, ylim = NULL, log = "", main = NULL, sub = NULL, xlab = NULL, ylab = NULL, ann = par("ann"), axes = TRUE, frame.plot = axes, panel.first = NULL, panel.last = NULL, asp = NA, xgap.axis = NA, ygap.axis = NA, ...)
- type: what type of plot should be drawn. Possible types are
"p" for points, "l" for lines, "b" for both, "c" for the lines part alone of "b", "o" for both ‘overplotted’, "h" for ‘histogram’ like (or ‘high-density’) vertical lines, "s" for stair steps, "S" for other steps, see ‘Details’ below, "n" for no plotting.
In the following, we also define the differentially expressed genes and highlight them by red color using points() to add them to our origin plots.
# plotting
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
xlab="log2 expression DOX24H",
ylab="log2 expression DOX7D", pch=20,
main="Comparison of expression at 24h and 7 days")
<- (!is.na(df$padj) & df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
length(diffExpressed)
## [1] 18357
table(diffExpressed)
## diffExpressed
## FALSE TRUE
## 17120 1237
points(log2(df$baseMean_DOX24H[diffExpressed]),log2(df$baseMean_DOX7D[diffExpressed]),
col="red",pch=20)
- Plotting ‘character’ (pch) - symbol to use
9 ggplot2 package
A very versatile package which has become a de facto standard for producing high quality figures is an R package based on the “grammar of graphics”. This enables developing plots by layering graphical elements on top of each other.
“In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).” - Wickham, 2016
Every ggplot2 plot has three key components:
data provided as data frame,
aesthetic mappings between variables in the data and visual properties, and
layer: usually created with a geom function.
9.1 usage
Start with the ggplot function which initializes a ggplot object. Here you need to declare the input data frame and to specify the set of plot aesthetics. The actual plotting will be done by applying another layer (with the “+” sign) using various specific geom functions. For a scatter plot like the one above, we will use the geom_point() function.
# install.packages("ggplot2")
library("ggplot2")
# (1) ggplot function with initializes a ggplot object
ggplot(df)
# (2) specify the set of plot aesthetics
ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D)))
# (3) specify plot type using geom_point()
ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D))) +
geom_point()
Note that we do not need to indicate that the x and y are specific columns of the dataframe df e.g. by using the $ operator as we already defined in the ggplot function that df should be used.
Now we add more layers to get a similar plot as we got using base graphics.
ggplot(df, aes(x=log2(baseMean_DOX24H), y=log2(baseMean_DOX7D))) +
geom_point() +
ggtitle("Comparison of expression at 24h and 7 days")+
xlab("log2 expression DOX24H") +
ylab("log2 expression DOX7D")
This looks promising but we would like to highlight the differentially expressed genes. Since ggplot works with the input data frame, we need to include this information into the data frame df. Then, we add color to the aesthetic mappings and set it the added column. geom_point() will select its color for the different values of the columns, but if we do not like it, we can overwrite or more exactly put another layer on it (like on an oil painting.)
# check the dimension for object "df"
dim(df)
## [1] 18357 10
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed
# diffExpressed is a vector with TRUE or FALSE values
head(diffExpressed)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE
length(diffExpressed)
## [1] 18357
<- data.frame(df, diffEx=diffExpressed)
df2
head(df2)
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol diffEx
## 1 1.12e-145 21316.8751 2945.51899 Scg2 TRUE
## 2 4.03e-120 626.6633 29.26926 Gfra1 TRUE
## 3 1.94e-109 3080.2873 921.18292 Dlg1 TRUE
## 4 1.46e-96 472.8374 2126.62645 Mex3a TRUE
## 5 4.51e-96 820.3147 71.59931 Rin1 TRUE
## 6 5.58e-96 4687.6749 1193.30291 Ildr2 TRUE
table(df2$diffEx)
##
## FALSE TRUE
## 17120 1237
<- ggplot(df2, aes(x=log2(baseMean_DOX24H),
g y=log2(baseMean_DOX7D),color=diffEx)) +
geom_point() +
ggtitle("Comparison of expression at 24h and 7 days")+
xlab("log2 expression DOX24H") +
ylab("log2 expression DOX7D")
g
# less fancy colors please
+ scale_color_manual(values=c('black','red')) +
g labs(color = "Diff. exp.")
# more classic look please
+ scale_color_manual(values=c('black','red')) +
g labs(color = "Diff. exp.") +
theme_classic()
Note that we assigned first the ggplot object to the variable g and then plot by entering it again. The nice thing here is that we can put further layers on the plot without calling the ggplot() and other functions again. This is convenient if we need to generate various different version of one figure.
10 Exercise I
note: In this exercise, you will need to know the difference between “numeric” and “factor” R data types
11 Solutions I
12 Our case study
In 1933, the Dutch pediatrician Cornelia De Lange described a disorder in two children, which became known as Cornelia de Lange syndrome (CdLS). Its features can include short stature, intellectual disability, developmental delay and signs of autism. Over the years, research showed that Cornelia de Lange syndrome is a genetic disorder and that it is frequently linked to mutations in components of cohesin complex. This complex is fundamental for obtaining higher order structures of DNA within the nucleus.
- Classical facial dysmorphisms of CdLS and related overlapping syndromes
- Five major CdLS facial features and their different severity
Clinical Genetics, Volume: 97, Issue: 1, Pages: 3-11, First published: 13 November 2019, DOI: (10.1111/cge.13674)
How cohesin mutations cause Cornelia de Lange syndrome is not well understood. To elucidate the potential role of cohesin for Cornelia de Lange syndrom, a group at the LMS applied several genomic approaches including RNA-seq technology.
After comparing expression in cortical neurons from post-mortem brains of Cornelia de Lange syndrome patients with controls, the researchers showed in a mouse model that doxycycline (dox) inducible depletion of one cohesin component (RAD21) leads to similar changes of expression including the downregulation of genes with neuronal and synaptic functions. The researchers asked then the crucial question if the normal expression could be restored when the depletion of RAD21 is lifted. A positive answer might give us hope that correction of cohesin functions could provide a potential therapeutic approach for Cornelia de Lange patients.
The figure below shows one of the experiments in this study. Rad21-TEV neurons were grown in culture, for 10 days. On day 10, cells were treated for 6 hours with either 100ng/ml of doxycycline, or vehicle. After 6 hours, the samples were washed and fresh media was added. Samples for cohesin depletion (24H) were collected 18 hours later, and samples for rescue (7D) were collected 7 days after addition of doxycyline/vehicle.
The file "data/DiffEx.csv"
contains the expression after RAD21 depletion at 24 hours (baseMean_DOX24H) and RAD21 restoration at day 7 (baseMean_DOX7D). It also contains log2 fold changes and adjusted p-values (padj). "data/GO_UpDE.csv"
contains gene ontology analysis results. "data/DoxExpression.txt"
contains normalised gene counts information. To gain insights into these data and to present them to the research community, we need to turn the table into some informative plots. In the following, we would like to complete following tasks:
12.1 Our tasks:
Display the distribution of log2 fold changes
Visualise the expression and changes in expression
Plot the functional categories enriched in significantly differentially expressed genes
Let’s get started.
13 Displaying distributions - Base graphics
13.1 Histograms - Base graphics
hist(df$log2FoldChange,breaks=100,xlab = "Log2 fold changes",
main="Histogram of log2 fold changes",col="red")
13.2 Density plots - Base graphics
- log2 FC density plot
<- density(df$log2FoldChange)
d plot(d,xlab="Log2 fold changes",
main="Distribution of log2 fold changes")
polygon(d,col="red")
- log2 normalised expression levels in two conditions
<- log2(df$baseMean_DOX24H)
log2_DOX24H <- log2(df$baseMean_DOX7D)
log2_DOX7D
<- density(log2_DOX7D)
d1 <- density(log2_DOX24H)
d2
plot(d1, lwd=2, xlab="Log2 baseMean",
main="Distribution of log2 fold changes")
lines(d2,col=rgb(0,0,1),lwd=2)
polygon(d1,col=rgb(1,0,0,0.2))
polygon(d2,col=rgb(0,0,1,0.2))
legend(x=18,y=0.08,legend=c("DOX24H","DOX7D"),
fill=c(rgb(1,0,0,0.2),rgb(0,0,1,0.2)))
14 Displaying distributions - ggplot2
14.1 Histograms - ggplot2
# install.package(ggplot2)
library(ggplot2)
# basic histogram
ggplot(df, aes(x=log2FoldChange)) +
geom_histogram()
# change the number of bins and add x and y lab
ggplot(df, aes(x=log2FoldChange)) +
geom_histogram(bins=100) +
xlab("Log2 fold changes") + ylab("Frequency")
# change colours
ggplot(df, aes(x=log2FoldChange)) +
geom_histogram(bins=100, col="black",fill="red") +
xlab("Log2 fold changes") + ylab("Frequency")
# use "classic background" and add title
ggplot(df, aes(x=log2FoldChange)) +
geom_histogram(bins=100, col="black",fill="red") +
xlab("Log2 fold changes") + ylab("Frequency") +
theme_classic() +
ggtitle("Histogram of log2 fold changes")
14.2 Density plots - ggplot2
- log2 FC density plot
# basic density plot
ggplot(df, aes(x=log2FoldChange)) +
geom_density()
# change colour and use "theme_bw"
ggplot(df, aes(x=log2FoldChange)) +
geom_density(fill="red") +
theme_bw()
log2 normalised expression levels in two conditions
we have to change the data format in to the format that ggplot2 likes
we are going to convert the “wide” data.frame into “long” data.frame
# subset the data.frame and only keep the information of interest
<- df[,colnames(df) %in%
df_subset c("ensembl_id","baseMean_DOX7D","baseMean_DOX24H")]
dim(df_subset)
## [1] 18357 3
head(df_subset)
## ensembl_id baseMean_DOX7D baseMean_DOX24H
## 1 ENSMUSG00000050711 21316.8751 2945.51899
## 2 ENSMUSG00000025089 626.6633 29.26926
## 3 ENSMUSG00000022770 3080.2873 921.18292
## 4 ENSMUSG00000074480 472.8374 2126.62645
## 5 ENSMUSG00000024883 820.3147 71.59931
## 6 ENSMUSG00000040612 4687.6749 1193.30291
# convert df_subset into a "long" format
# install.packages("reshape2")
library("reshape2")
<- melt(df_subset,id="ensembl_id")
df_subset_long dim(df_subset_long)
## [1] 36714 3
str(df_subset_long)
## 'data.frame': 36714 obs. of 3 variables:
## $ ensembl_id: chr "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770" "ENSMUSG00000074480" ...
## $ variable : Factor w/ 2 levels "baseMean_DOX7D",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ value : num 21317 627 3080 473 820 ...
head(df_subset_long)
## ensembl_id variable value
## 1 ENSMUSG00000050711 baseMean_DOX7D 21316.8751
## 2 ENSMUSG00000025089 baseMean_DOX7D 626.6633
## 3 ENSMUSG00000022770 baseMean_DOX7D 3080.2873
## 4 ENSMUSG00000074480 baseMean_DOX7D 472.8374
## 5 ENSMUSG00000024883 baseMean_DOX7D 820.3147
## 6 ENSMUSG00000040612 baseMean_DOX7D 4687.6749
# now it is ready for ggplot2
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) +
geom_density()
# setup the alpha value to make the colours transparent
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) +
geom_density(alpha=0.5)
# use the "theme_classic" and add xlab
ggplot(df_subset_long, aes(x=log2(value), fill=variable)) +
geom_density(alpha=0.5) + theme_classic() + xlab("Log2 baseMean")
- violin plot
ggplot(df_subset_long, aes(x=variable,y=log2(value), fill=variable)) +
geom_violin(alpha=0.5) + theme_classic() +
ylab("Log2 baseMean") + xlab("Samples")
- violin plot + box plot
# add geom_boxplot() layer directly
ggplot(df_subset_long, aes(x=variable,y=log2(value),fill=variable)) +
geom_violin(alpha=0.5) +
geom_boxplot() +
theme_classic() +
ylab("Log2 baseMean") + xlab("Samples")
# what to do if we want violin plot filled with 2 different colours and
# box plot filled with "white" colour
ggplot(df_subset_long, aes(x=variable,y=log2(value))) +
geom_violin(aes(fill=variable),alpha=0.5) +
geom_boxplot(width=0.2) +
theme_classic() +
ylab("Log2 baseMean") + xlab("Samples")
15 Exercise II
note, In this exercise, apart from
ggplot2
, you will also be working with some commonly used tools in R: (1) the melt() function from thereshape2
package, (2) use ofregular expression
to manipulate data. More details aboutregular expression
[[https://en.wikipedia.org/wiki/Regular_expression]]
Regular Expression
16 Solutions II
17 Visualising expression and changes in expression
17.1 Scatterplots - base graphics
We already did this before. Now we want to highlight some genes and put their gene symbol into the plot. For the start, we will highlight the 5 most significant genes.
# plotting
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
xlab="log2 expression DOX24H",
ylab="log2 expression DOX7D",
pch=20, main="Comparison of expression at 24h and 7 days",
col="gray")
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed
points(log2(df$baseMean_DOX24H[diffExpressed]),
log2(df$baseMean_DOX7D[diffExpressed]),
col="red",pch=20)
<- order(df$padj)[1:5]
o df[o,]
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145 21316.8751 2945.51899 Scg2
## 2 4.03e-120 626.6633 29.26926 Gfra1
## 3 1.94e-109 3080.2873 921.18292 Dlg1
## 4 1.46e-96 472.8374 2126.62645 Mex3a
## 5 4.51e-96 820.3147 71.59931 Rin1
points(log2(df$baseMean_DOX24H[o]),
log2(df$baseMean_DOX7D[o]),
col="black",pch=21,cex=1.5,lwd=2)
text(log2(df$baseMean_DOX24H[o]),
log2(df$baseMean_DOX7D[o]),
labels=df$mgi_symbol[o],
pos = 3,offset=0.5,font=2)
This looks good. Now let’s label the top 20 genes.
# plotting
plot(x=log2(df$baseMean_DOX24H),y=log2(df$baseMean_DOX7D),
xlab="log2 expression DOX24H", ylab="log2 expression DOX7D",pch=20,
main="Comparison of expression at 24h and 7 days",col="gray")
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed
points(log2(df$baseMean_DOX24H[diffExpressed]),
log2(df$baseMean_DOX7D[diffExpressed]),
col="red",pch=20)
<- order(df$padj)[1:20]
o
points(log2(df$baseMean_DOX24H[o]),
log2(df$baseMean_DOX7D[o]),
col="black",pch=21,cex=1.5,lwd=2)
text(log2(df$baseMean_DOX24H[o]),
log2(df$baseMean_DOX7D[o]),
labels=df$mgi_symbol[o], pos = 3,offset=0.5,font=2)
Oh, no.
17.2 ggplot2 and ggrepel
# install.packages("ggrepel")
library(ggrepel)
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed # diffExpressed is a vector with TRUE or FALSE values
<- data.frame(df, diffEx=diffExpressed)
df4ggplot
head(df4ggplot)
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol diffEx
## 1 1.12e-145 21316.8751 2945.51899 Scg2 TRUE
## 2 4.03e-120 626.6633 29.26926 Gfra1 TRUE
## 3 1.94e-109 3080.2873 921.18292 Dlg1 TRUE
## 4 1.46e-96 472.8374 2126.62645 Mex3a TRUE
## 5 4.51e-96 820.3147 71.59931 Rin1 TRUE
## 6 5.58e-96 4687.6749 1193.30291 Ildr2 TRUE
table(df4ggplot$diffEx)
##
## FALSE TRUE
## 17120 1237
<- df4ggplot[order(df4ggplot$padj),]
df4ggplot
<- ggplot(df4ggplot, aes(x=log2(baseMean_DOX24H),
g_top20 y=log2(baseMean_DOX7D),color=diffEx)) +
geom_point() +
scale_color_manual(values=c('black','red')) +
labs(color = "Diff. exp.") +
theme_classic() +
ggtitle("Comparison of expression at 24h and 7 days")+
xlab("log2 expression DOX24H") +
ylab("log2 expression DOX7D") +
geom_label_repel(data=df4ggplot[c(1:20),],
mapping=aes(label = mgi_symbol),
box.padding = 0.5,max.overlaps = 20)
g_top20
18 Volcano plots - base graphics
# plotting
plot(x=df$log2FoldChange,y=-log10(df$pvalue),
xlab="log2 fold change", ylab="log10 p-value",
pch=20, main="Volcano plot",col="gray")
<- (df$padj < 0.01 & abs(df$log2FoldChange) > 1)
diffExpressed
points(df$log2FoldChange[diffExpressed],
-log10(df$pvalue[diffExpressed]),col="red",pch=20)
<- order(df$padj)[1:10]
o
points(df$log2FoldChange[o],-log10(df$pvalue[o]),
col="black",pch=21,cex=1.5,lwd=2)
text(df$log2FoldChange[o],-log10(df$pvalue[o]),
labels=df$mgi_symbol[o], pos = 1,offset=0.5,font=2)
19 Volcano plots - ggplot2
# install.packages("ggrepel")
library(ggrepel)
<- df
df4ggplot $DE.cat <- ifelse(
df4ggplot$padj < 0.01 & abs(df4ggplot$log2FoldChange) > 1,
df4ggplot"TRUE", "FALSE"
)
$DE.cat <- factor(df4ggplot$DE.cat,
df4ggplotlevels=c("TRUE","FALSE"))
<- df4ggplot[order(df4ggplot$padj),]
df4ggplot
<- ggplot(data=df4ggplot,
vol_plot aes(x=log2FoldChange, y=-log10(pvalue),col=DE.cat)) +
geom_point() +
xlab("log2FoldChange") + ylab("-log10 PValue")+
scale_color_manual(values=c("red","grey"))+
geom_label_repel(data=df4ggplot[c(1:10),],
mapping=aes(label = mgi_symbol),
box.padding = 0.5,max.overlaps = 20)+
theme_classic()
vol_plot
20 Exercise III
21 Solutions III
22 Heatmaps - base graphics
# install.packages("RColorBrewer")
library(RColorBrewer)
<- read.csv("data/DoxExpression.txt",row.names=1)
DoxEx head(DoxEx)
## DOX24H_1 DOX24H_2 DOX24H_3 DOX7D_3 DOX7D_1
## ENSMUSG00000090025 2.4928423 2.430152 2.5169965 2.5058382 2.6820156
## ENSMUSG00000051951 12.4809504 12.339864 12.3060097 11.9827350 11.7679441
## ENSMUSG00000025900 -0.8083506 -0.736612 -0.8085774 -0.8090216 -0.8091383
## ENSMUSG00000025902 -2.2108951 -2.210912 -2.2110380 -2.2113214 -2.2113968
## ENSMUSG00000064376 -1.9723416 -1.972370 -1.9725807 -1.9730551 -1.9569306
## ENSMUSG00000088000 -2.2107546 -2.210769 -2.2108726 -2.2111068 -2.2111690
## DOX7D_2
## ENSMUSG00000090025 2.2964760
## ENSMUSG00000051951 12.1394396
## ENSMUSG00000025900 -0.7711231
## ENSMUSG00000025902 -2.2107869
## ENSMUSG00000064376 -1.9721606
## ENSMUSG00000088000 -2.2106652
class(DoxEx)
## [1] "data.frame"
# order df based on padj
<- order(df$padj)
o head(df[o,])
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145 21316.8751 2945.51899 Scg2
## 2 4.03e-120 626.6633 29.26926 Gfra1
## 3 1.94e-109 3080.2873 921.18292 Dlg1
## 4 1.46e-96 472.8374 2126.62645 Mex3a
## 5 4.51e-96 820.3147 71.59931 Rin1
## 6 5.58e-96 4687.6749 1193.30291 Ildr2
# select top 30 genes based on padj values
<- df[o,]$ensembl_id[1:30]
selectedGenes
head(selectedGenes)
## [1] "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770"
## [4] "ENSMUSG00000074480" "ENSMUSG00000024883" "ENSMUSG00000040612"
# subset data of interest and convert it to matrix
<- as.matrix(DoxEx[selectedGenes,])
DoxExSelected_mx
heatmap(DoxExSelected_mx,cexCol=1,scale="row",
col=colorRampPalette(brewer.pal(8, "Blues"))(25))
23 Heatmaps and RColorBrewer
# install.packages("RColorBrewer")
library(RColorBrewer)
# display all color palettes
display.brewer.all()
# display colorblind-Friendly color palettes
display.brewer.all(colorblindFriendly = TRUE)
# return 8 colours from the color palette "RdBu"
brewer.pal(8, "RdBu")
## [1] "#B2182B" "#D6604D" "#F4A582" "#FDDBC7" "#D1E5F0" "#92C5DE" "#4393C3"
## [8] "#2166AC"
# use `colorRampPalette()` to create a colour ramp from the 8 colours from the color palette "RdBu"
colorRampPalette(brewer.pal(8, "RdBu"))(25)
## [1] "#B2182B" "#BC2C34" "#C7413E" "#D15748" "#DB6B55" "#E37F65" "#EC9374"
## [8] "#F4A784" "#F7B799" "#F9C6AD" "#FCD6C1" "#F3DDCF" "#E7E0DB" "#DAE2E7"
## [15] "#CBE2EE" "#B9D9E9" "#A6CFE3" "#94C6DE" "#7EB8D7" "#67A9CF" "#509BC7"
## [22] "#3E8DC0" "#3480B9" "#2A73B2" "#2166AC"
24 Heatmaps - ComplexHeatmap I
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
# BiocManager::install("ComplexHeatmap")
library("ComplexHeatmap")
https://jokergoo.github.io/ComplexHeatmap-reference/book/
?Heatmap
Heatmap {ComplexHeatmap} R Documentation Constructor method for Heatmap class ## Description Constructor method for Heatmap class ## Usage Heatmap(matrix, col, name, na_col = "grey", color_space = "LAB", rect_gp = gpar(col = NA), border = NA, border_gp = gpar(col = "black"), cell_fun = NULL, layer_fun = NULL, jitter = FALSE, row_title = character(0), row_title_side = c("left", "right"), row_title_gp = gpar(fontsize = 13.2), row_title_rot = switch(row_title_side[1], "left" = 90, "right" = 270), column_title = character(0), column_title_side = c("top", "bottom"), column_title_gp = gpar(fontsize = 13.2), column_title_rot = 0, cluster_rows = TRUE, cluster_row_slices = TRUE, clustering_distance_rows = "euclidean", clustering_method_rows = "complete", row_dend_side = c("left", "right"), row_dend_width = unit(10, "mm"), show_row_dend = TRUE, row_dend_reorder = is.logical(cluster_rows) || is.function(cluster_rows), row_dend_gp = gpar(), cluster_columns = TRUE, cluster_column_slices = TRUE, clustering_distance_columns = "euclidean", clustering_method_columns = "complete", column_dend_side = c("top", "bottom"), column_dend_height = unit(10, "mm"), show_column_dend = TRUE, column_dend_gp = gpar(), column_dend_reorder = is.logical(cluster_columns) || is.function(cluster_columns), row_order = NULL, column_order = NULL, row_labels = rownames(matrix), row_names_side = c("right", "left"), show_row_names = TRUE, row_names_max_width = unit(6, "cm"), row_names_gp = gpar(fontsize = 12), row_names_rot = 0, row_names_centered = FALSE, column_labels = colnames(matrix), column_names_side = c("bottom", "top"), show_column_names = TRUE, column_names_max_height = unit(6, "cm"), column_names_gp = gpar(fontsize = 12), column_names_rot = 90, column_names_centered = FALSE, top_annotation = NULL, bottom_annotation = NULL, left_annotation = NULL, right_annotation = NULL, km = 1, split = NULL, row_km = km, row_km_repeats = 1, row_split = split, column_km = 1, column_km_repeats = 1, column_split = NULL, gap = unit(1, "mm"), row_gap = unit(1, "mm"), column_gap = unit(1, "mm"), show_parent_dend_line = ht_opt$show_parent_dend_line, heatmap_width = unit(1, "npc"), width = NULL, heatmap_height = unit(1, "npc"), height = NULL, show_heatmap_legend = TRUE, heatmap_legend_param = list(title = name), use_raster = NULL, raster_device = c("png", "jpeg", "tiff", "CairoPNG", "CairoJPEG", "CairoTIFF", "agg_png"), raster_quality = 1, raster_device_param = list(), raster_resize_mat = FALSE, raster_by_magick = requireNamespace("magick", quietly = TRUE), raster_magick_filter = NULL, post_fun = NULL)
25 Heatmaps - ComplexHeatmap II
<- read.csv("data/DoxExpression.txt",row.names=1)
DoxEx
# order df based on padj
<- order(df$padj)
o head(df[o,])
## ensembl_id baseMean log2FoldChange lfcSE stat pvalue
## 1 ENSMUSG00000050711 10893.6671 2.855441 0.10948634 26.08034 6.09e-150
## 2 ENSMUSG00000025089 296.2968 4.421013 0.18660593 23.69170 4.39e-124
## 3 ENSMUSG00000022770 1967.0850 1.741321 0.07700282 22.61373 3.18e-113
## 4 ENSMUSG00000074480 1359.6740 -2.167605 0.10199629 -21.25180 3.17e-100
## 5 ENSMUSG00000024883 414.7163 3.518405 0.16605551 21.18812 1.23e-99
## 6 ENSMUSG00000040612 2986.8342 1.974114 0.09325275 21.16950 1.82e-99
## padj baseMean_DOX7D baseMean_DOX24H mgi_symbol
## 1 1.12e-145 21316.8751 2945.51899 Scg2
## 2 4.03e-120 626.6633 29.26926 Gfra1
## 3 1.94e-109 3080.2873 921.18292 Dlg1
## 4 1.46e-96 472.8374 2126.62645 Mex3a
## 5 4.51e-96 820.3147 71.59931 Rin1
## 6 5.58e-96 4687.6749 1193.30291 Ildr2
# select top 30 genes based on padj values
<- df[o,]$ensembl_id[1:30]
selectedGenes
head(selectedGenes)
## [1] "ENSMUSG00000050711" "ENSMUSG00000025089" "ENSMUSG00000022770"
## [4] "ENSMUSG00000074480" "ENSMUSG00000024883" "ENSMUSG00000040612"
# subset data of interest and convert it to matrix
<- as.matrix(DoxEx[selectedGenes,])
DoxExSelected <- t(scale(t(DoxExSelected)))
DEmatok
# print default heatmap
<- Heatmap(DEmatok)
ht draw(ht)
# change font size for the row names and row names
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6))
draw(ht)
# change colours
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = colorRampPalette(brewer.pal(8, "RdBu"))(25))
draw(ht)
# change colours - upregulated as red and downregulated as blue
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)))
draw(ht)
# change figure legend
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
heatmap_legend_param =
list(title = "Row z-score",
title_position = "topcenter"))
draw(ht)
# make it "slimmer"
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
heatmap_legend_param =
list(title = "Row z-score",width = unit(6, "mm"),
title_position = "topcenter",direction = "horizontal"))
draw(ht,heatmap_legend_side = "bottom")
# turn off "cluster by columns"
<- Heatmap(DEmatok,
ht row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
heatmap_legend_param =
list(title = "Row z-score",width = unit(6, "mm"),
title_position = "topcenter",direction = "horizontal"),
cluster_columns=F)
draw(ht,heatmap_legend_side = "bottom")
# create top annotation
<- data.frame(SampleNames=colnames(DEmatok))
anno $TimePoint <- sub("(.+)(_)(.+)","\\1",anno$SampleNames)
anno$TimePoint <- factor(anno$TimePoint)
anno<- brewer.pal(10, "Paired")[c(9:10)]
time_col names(time_col) <- levels(anno$TimePoint)
= HeatmapAnnotation(TimePoint = anno$TimePoint,
col_ha col = list(TimePoint = time_col))
<- Heatmap(DEmatok,
ht_top row_names_gp = gpar(fontsize = 6),
column_names_gp = gpar(fontsize = 6),
col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
heatmap_legend_param =
list(title = "Row z-score",width = unit(6, "mm"),
title_position = "topcenter",direction = "horizontal"),
cluster_columns=F,top_annotation = col_ha)
draw(ht_top,heatmap_legend_side = "bottom")
# remove column names and remove row names
<- Heatmap(DEmatok, row_names_gp = gpar(fontsize = 6),
ht_top_NoL col = rev(colorRampPalette(brewer.pal(8, "RdBu"))(25)),
heatmap_legend_param =
list(title = "Row z-score",width = unit(6, "mm"),
title_position = "topcenter",direction = "horizontal"),
cluster_columns=F,top_annotation = col_ha,
show_column_names = F,show_row_names = F)
draw(ht_top_NoL,heatmap_legend_side = "bottom")
- More details please see: https://jokergoo.github.io/ComplexHeatmap-reference/book/index.html
25.1 Saving your plots
useful functions: pdf(), png(), jpeg(), postscript(), svg()
plot object that you would like to save
close the file with dev.off()
svg() ?
## Usage svg(filename = if(onefile) "Rplots.svg" else "Rplot%03d.svg", width = 7, height = 7, pointsize = 12, onefile = FALSE, family = "sans", bg = "white", antialias = c("default", "none", "gray", "subpixel"), symbolfamily)
<- "heatmapCourseOutput.svg"
filename4ht svg(file=filename4ht, width = 4, height=5.5)
draw(ht_top_NoL,heatmap_legend_side = "bottom")
dev.off()
26 Exercise IV
27 Solutions IV
28 Plots of functional enrichment
Gene Ontology analysis was performed based on Up-regulated genes in Dox7D vs Dox24H using thresholds padj < 0.01 and log2(FC) > log2(2). Please see “data/GO_UpDE.csv”
28.1 Bar plot
- We use bar plot to show top 15 GO terms
<- read.csv("data/GO_UpDE.csv")
GOres # rank "GOres" based on "FDR.over"
<- GOres[order(GOres$FDR.over),]
GOres
# subset top 15 GO terms
<- GOres[c(1:15),]
GOres_top15
# basic bar plot
<- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) +
ori_bar geom_bar(stat="identity") +
xlab("-log10(FDR)")
ori_bar
# add colour
<- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) +
blue_bar geom_bar(stat="identity",fill="blue") +
xlab("-log10(FDR)") + theme_classic()
blue_bar
- present the GO terms based on their significance
# the "GOres_top15" already ordered by the "FDR.over" but it is not shown in the bar plot
head(GOres_top15)
## category over_represented_pvalue under_represented_pvalue numDEInCat
## 1 GO:0023052 4.98e-22 1 318
## 2 GO:0007154 5.35e-21 1 317
## 3 GO:0032501 5.47e-21 1 369
## 4 GO:0050896 2.89e-19 1 386
## 5 GO:0007165 1.00e-17 1 282
## 6 GO:0051716 3.41e-17 1 339
## numInCat term ontology FDR.over FDR.under
## 1 4390 signaling BP 9.90e-18 1
## 2 4436 cell communication BP 3.63e-17 1
## 3 5461 multicellular organismal process BP 3.63e-17 1
## 4 6054 response to stimulus BP 1.44e-15 1
## 5 3932 signal transduction BP 3.99e-14 1
## 6 5189 cellular response to stimulus BP 1.13e-13 1
# We use 'factor'
class(GOres_top15$term)
## [1] "character"
## version 1
<- GOres_top15$term
ordered_term head(GOres_top15$term)
## [1] "signaling" "cell communication"
## [3] "multicellular organismal process" "response to stimulus"
## [5] "signal transduction" "cellular response to stimulus"
$term <- factor(GOres_top15$term,levels=ordered_term)
GOres_top15
<- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) +
ordered_bar geom_bar(stat="identity",fill="blue") +
xlab("-log10(FDR)") + theme_classic()
ordered_bar
## version 2
<- rev(ordered_term)
ordered_term_rev head(ordered_term_rev)
## [1] "tissue development"
## [2] "system development"
## [3] "multicellular organism development"
## [4] "regulation of cell population proliferation"
## [5] "cell-cell signaling"
## [6] "synaptic signaling"
$term <- factor(GOres_top15$term,levels=ordered_term_rev)
GOres_top15
<- ggplot(GOres_top15,aes(x=-log10(FDR.over),y=term)) +
ordered_barRev geom_bar(stat="identity",fill="blue") +
xlab("-log10(FDR)") + theme_classic()
ordered_barRev
28.2 Bubble plot
- We can further define
geneRatio
as number of DE genes associated with GO process / total number of DE genes.
# calculate number of up.DE
<- read.csv("data/DiffEx.csv")
df <- df[!is.na(df$padj) & df$padj<0.01 & df$log2FoldChange> 1,]
upDE_df <- nrow(upDE_df)
num.UpDE
$geneRatio <- GOres_top15$numDEInCat/num.UpDE
GOres_top15head(GOres_top15)
## category over_represented_pvalue under_represented_pvalue numDEInCat
## 1 GO:0023052 4.98e-22 1 318
## 2 GO:0007154 5.35e-21 1 317
## 3 GO:0032501 5.47e-21 1 369
## 4 GO:0050896 2.89e-19 1 386
## 5 GO:0007165 1.00e-17 1 282
## 6 GO:0051716 3.41e-17 1 339
## numInCat term ontology FDR.over FDR.under
## 1 4390 signaling BP 9.90e-18 1
## 2 4436 cell communication BP 3.63e-17 1
## 3 5461 multicellular organismal process BP 3.63e-17 1
## 4 6054 response to stimulus BP 1.44e-15 1
## 5 3932 signal transduction BP 3.99e-14 1
## 6 5189 cellular response to stimulus BP 1.13e-13 1
## geneRatio
## 1 0.4555874
## 2 0.4541547
## 3 0.5286533
## 4 0.5530086
## 5 0.4040115
## 6 0.4856734
<- GOres_top15[order(GOres_top15$geneRatio),]
GOres_top15_bubble $term <- factor(GOres_top15_bubble$term,
GOres_top15_bubblelevels = GOres_top15_bubble$term)
<- ggplot(GOres_top15_bubble,aes(x=geneRatio,y=term)) +
bubblePlot geom_point(aes(size=numDEInCat,color=FDR.over)) +
xlab("geneRatio") + theme_classic()+
scale_colour_gradient(name = "FDR",
low = "#D55E00", high = "#0072B2")
bubblePlot
29 Saving your plots - ggsave
29.1 ggsave()
It defaults to saving the last plot that you displayed, using the size of the current graphics device
It also guesses the type of graphics device from the extension
device: “eps”, “ps”, “tex” (pictex), “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” or “wmf” (windows only)
Plot to save, defaults to last plot displayed.
ggsave() ?
## Usage ggsave(filename, plot = last_plot(), device = NULL, path = NULL, scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"), dpi = 300, limitsize = TRUE, ...)
data(mtcars)
<- ggplot(mtcars, aes(mpg, wt)) + geom_point()
plot1 <- ggplot(mtcars, aes(mpg, wt,col=as.factor(vs))) + geom_point()
plot2
ggsave("mtcars_default.png", plot1)
ggsave("mtcars_col.png", plot2)
30 Exercise V
31 Solutions V
32 Resources
cheatsheet
https://www.rstudio.com/resources/cheatsheets/
+ Data visualization with ggplot2 cheatsheet + String manipulation with stringr cheatsheet
Useful information for colour palettes
https://www.datanovia.com/en/blog/the-a-z-of-rcolorbrewer-palette/
33 sessionInfo
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ComplexHeatmap_2.10.0 RColorBrewer_1.1-2 ggrepel_0.9.1
## [4] reshape2_1.4.4 ggplot2_3.3.5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8 circlize_0.4.13 png_0.1-7
## [4] assertthat_0.2.1 digest_0.6.29 foreach_1.5.1
## [7] utf8_1.2.2 R6_2.5.1 plyr_1.8.6
## [10] stats4_4.1.2 evaluate_0.14 highr_0.9
## [13] pillar_1.6.5 GlobalOptions_0.1.2 rlang_0.4.12
## [16] jquerylib_0.1.4 magick_2.7.3 S4Vectors_0.32.2
## [19] GetoptLong_1.0.5 rmarkdown_2.11 labeling_0.4.2
## [22] stringr_1.4.0 munsell_0.5.0 compiler_4.1.2
## [25] xfun_0.28 pkgconfig_2.0.3 BiocGenerics_0.40.0
## [28] shape_1.4.6 htmltools_0.5.2 tidyselect_1.1.1
## [31] tibble_3.1.6 bookdown_0.24 IRanges_2.28.0
## [34] codetools_0.2-18 matrixStats_0.61.0 fansi_1.0.2
## [37] crayon_1.4.2 dplyr_1.0.7 withr_2.4.3
## [40] jsonlite_1.7.3 gtable_0.3.0 lifecycle_1.0.1
## [43] DBI_1.1.2 magrittr_2.0.1 scales_1.1.1
## [46] rmdformats_1.0.3 stringi_1.7.6 farver_2.1.0
## [49] doParallel_1.0.16 bslib_0.3.1 ellipsis_0.3.2
## [52] generics_0.1.1 vctrs_0.3.8 rjson_0.2.21
## [55] iterators_1.0.13 tools_4.1.2 glue_1.6.1
## [58] purrr_0.3.4 parallel_4.1.2 fastmap_1.1.0
## [61] yaml_2.2.2 clue_0.3-60 colorspace_2.0-2
## [64] cluster_2.1.2 knitr_1.36 sass_0.4.0