These exercises cover the sections on Plotting in R PlottingInR.
The paired-end RNAseq data for our case study was aligned to mouse genome (mm9) using Tophat2. The alignment summary can be found here "data/tophat_alignment_overallstat.txt"
with the following columns.
num.aligned.pairs = num.uniquely_mapped.pairs + num.multiple_mapped.pairs + num.discordant_mapped.pairs
read.table()
function to load tophat_alignment_overallstat.txtaligned_res
head(aligned_res)
## sample num.input.pairs num.aligned.pairs num.uniquely_mapped.pairs
## 1 DOX24H_1 41113500 30527725 29078260
## 2 DOX24H_2 38166396 29575699 28188066
## 3 DOX24H_3 42506262 31376922 29572848
## 4 DOX7D_1 46156954 34147421 32392402
## 5 DOX7D_2 40372425 32876374 31409139
## 6 DOX7D_3 46160430 34722360 32896496
## num.multiple_mapped.pairs num.discordant_mapped.pairs
## 1 1249793 199672
## 2 1173844 213789
## 3 1553296 250778
## 4 1551960 203059
## 5 1201468 265767
## 6 1615952 209912
add a new column 'num.unaligned.pairs'
to aligned_res
create the bar plot below using geom_bar(stat = "identity",position=position_dodge())
hints:
you need to convert the ‘wide’ data.frame to a ‘long’ data.frame using melt() function from reshape2
package
check ?geom_bar
and see description for position
, and see what happens after you put position=position_dodge()
in geom_bar()
The x-axis labels can be rotated 45 degrees by adding layer theme(axis.text.x = element_text(angle=45, hjust=1))
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
geom_bar(position = "fill")
factor
data type worksfacet_grid()
?facet_grid
facet_wrap()
?facet_wrap
Gene Set Enrichment Analysis (GSEA) is one of the popular functional analysis tools for bulkRNAseq data. More details please see [https://www.gsea-msigdb.org/gsea/index.jsp]. File "data/GSEA_hallmark_Dox7D_Minus_Dox1D.csv"
summarizes the GSEA results for bulkRNAseq data based on Dox_7D vs Dox_24h using the hallmark gene sets.
GSEA_hallmark_Dox7D_Minus_Dox1D.csv
contains the following columns:
More details please see [http://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?_Interpreting_GSEA_Results]
Please create the bubble plot below with the following criteria
use 50 hallmark gene sets as y-axis and NES as x-axis.
if the NES > 0, order the Gene Sets based on NES values largest to smallest
if the NES < 0, order the Gene Sets based on the NES values smallest to largest
FDR < 0.25 as threshold to show significant gene set with solid circle
scale_shape_manual(values=c(1,19))
. 1: hollow circle; 19: solid circleuse continuous colours to show FDR.q.val
scale_colour_gradient(name = "FDR.q.val",low = "#D55E00", high = "#0072B2")
use the circle size to show no.genes.in.Core
Then save it as a pdf with width=8 and height=10