css: Rpress.css author: MRC London Institute of Medical Sciences (LMS) date: https://lmsbioinformatics.github.io/LMS_PlottingInR/ width: 1440 height: 1100 autosize: true font-import: font-family: ‘Slabo 27px’, serif;
id: materials
All prerequisites, links to material and slides for this course can be found on github. * PlottingInR
Or can be downloaded as a zip archive from here. * Download zip
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived LMS_PlottingInR/course folder in the Rstudio menu
Session -> Set Working Directory -> Choose Directory
Use getwd() to see where your current directory is
getwd()
Use setwd() to see where your current directory is
setwd("/PathToMyDownload/LMS_PlottingInR/course")
# if you are working with Mac
# e.g. setwd("~/Downloads/LMS_PlottingInR/course")
# if you are working with Windows
# e.g. setwd("~\\Downloads\\LMS_PlottingInR\\course")
type:section id:plotting
R has excellent graphics and plotting capabilities. They are mostly found in following three sources. + base graphics
easy to use, conceptually motivated by drawing on a canvas
would become difficult or impossible to draw complicated plots
the ggplot2 package
high-level approach: grammar of graphics
saving your plots
Useful Resources
Summary
Lattice and ggplot2 packages are built on grid graphics package while the base graphics routines adopt a pen and paper model for plotting.
We will start from the base graphics then focus on ggplot2
type:section id: baseGraph
First we’ll produce a very simple graph using the values in the data.frame that we created:
base_graph_df<- data.frame(sample_num=c(1:6),
treatment=c(0.02,1.8, 17.5, 55,75.7, 80),
control= c(0, 20, 40, 60, 80,100))
base_graph_df
sample_num treatment control
1 1 0.02 0
2 2 1.80 20
3 3 17.50 40
4 4 55.00 60
5 5 75.70 80
6 6 80.00 100
Plot the treatment with default parameters
?plot
* Usage
plot(x, y, ...)
plot(x=base_graph_df$sample_num, y=base_graph_df$treatment)
# or just
plot(base_graph_df$sample_num, base_graph_df$treatment)
What will happen if we change the order of the arguments?
from
plot(base_graph_df$sample_num, base_graph_df$treatment)
to
plot(base_graph_df$treatment, base_graph_df$sample_num)
plot(base_graph_df$treatment,base_graph_df$sample_num)
plot(base_graph_df$sample_num, base_graph_df$treatment)
to
plot(base_graph_df$treatment, base_graph_df$sample_num)
plot(base_graph_df$treatment, base_graph_df$sample_num)
plot of chunk unnamed-chunk-8
========================================================
plot(y= base_graph_df$treatment, x= base_graph_df$sample_num)
plot of chunk unnamed-chunk-9
======================================================= Now, let’s add a title, a line to connect the points, and some colour:
Plot treatment using blue points overlayed by a line
hint: look into the “type” argument
?plot
type: what type of plot should be drawn. Possible types are
"p" for points,
"l" for lines,
"b" for both,
"c" for the lines part alone of "b",
"o" for both ‘overplotted’,
"h" for ‘histogram’ like (or ‘high-density’) vertical lines,
"s" for stair steps,
"S" for other steps, see ‘Details’ below,
"n" for no plotting.
=======================================================
plot(base_graph_df$sample_num,base_graph_df$treatment, type="o", col="blue")
Create a title with a red, bold/italic font
hint: 1=plain, 2=bold, 3=italic, 4=bold italic, 5=symbol
title(main="Treatment", col.main="red", font.main=4)
======================================================== Now let’s add a red line for control column from the data.frame base_graph_df and specify the y-axis range directly so it will be large enough to fit the data:
base_graph_df$control
[1] 0 20 40 60 80 100
plot(base_graph_df$sample_num,base_graph_df$treatment, type="o", col="blue", ylim=c(0,100))
lines(base_graph_df$control, type="o", pch=0, lty="dashed", col="red")
title(main="Expression Data", col.main="red", font.main=4)
==========================================================
plot of chunk unnamed-chunk-19
lty can be c(“blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”) or number c(0, 1, 2, 3, 4, 5, 6)
plot of chunk unnamed-chunk-20
Next let’s change the axes labels to match our data and add a legend.
We’ll also compute the y-axis values using the max function so any changes to our data will be automatically reflected in our graph.
g_range <- range(0, base_graph_df$treatment, base_graph_df$control)
g_range
[1] 0 100
range returns a vector containing the minimum and maximum of all the given arguments.
plot(base_graph_df$sample_num ,base_graph_df$treatment,
type="o", col="blue",
ylim=g_range,axes=FALSE, ann=FALSE)
========================================================
Make x axis using labels
axis(1, at=1:6, lab=base_graph_df$sample_num)
Make y axis with horizontal labels that display ticks at every 20 marks.
axis(2, las=1, at=seq(g_range[1],g_range[2],20))
Create box around plot
box()
========================================================
Plot control vector with red dashed line and square points
lines(base_graph_df$control, type="o", pch=0, lty=2, col="red")
Create a title with a red, bold/italic font
title(main="Expression Data", col.main="red", font.main=4)
Label the x and y axes with dark green text
title(xlab="Samples", col.lab="purple")
title(ylab="Values", col.lab="purple")
========================================================
Create a legend at (1, g_range[2]) that is slightly smaller (cex) and uses the same line colors and points used by the actual plots
legend(1, g_range[2], c("treatment","control"), cex=0.8, col=c("blue","red"), pch=1:0, lty=1:2)
type:section id:ggplot2
ggplot2 is a powerful R package based on the grammar of graphics (Wilkinson, 2005).
“In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars).” - Wickham, 2016
We’ll use the GoT_dataset that kindly provided by Dr. Reidar P. Lystad. Inj Epidemiol. 2018. 5(1):44. doi: 10.1186/s40621-018-0174-7.
We will use "Got_dataset/episode_data.csv"
and "Got_dataset/subset_GoT.csv"
today.
We can use the head function to look at the first few rows of file "episode_data.csv"
[csv: comma-separated values file]
library(ggplot2)
episode_data<-read.csv("GoT_dataset/episode_data.csv")
# show first 6 rows for this dataset
head(episode_data)
season episode_number episode_name
1 1 1 "Winter Is Coming"
2 1 2 "The Kingsroad"
3 1 3 "Lord Snow"
4 1 4 "Cripples, Bastards, and Broken Things"
5 1 5 "The Wolf and the Lion"
6 1 6 "A Golden Crown"
gross_running_time opening_credits_time closing_credits_time
1 3546 110 33
2 3182 111 34
3 3294 96 27
4 3201 96 26
5 3123 101 24
6 3027 103 26
net_running_time cumulative_net_running_time
1 3403 3403
2 3037 6440
3 3171 9611
4 3079 12690
5 2998 15688
6 2898 18586
str() is another useful function to show the Structure of the episode_data
object
str(episode_data)
'data.frame': 73 obs. of 8 variables:
$ season : int 1 1 1 1 1 1 1 1 1 1 ...
$ episode_number : int 1 2 3 4 5 6 7 8 9 10 ...
$ episode_name : Factor w/ 73 levels " \"What Is Dead May Never Die\"",..: 71 47 25 13 65 2 73 56 6 17 ...
$ gross_running_time : int 3546 3182 3294 3201 3123 3027 3325 3345 3238 3028 ...
$ opening_credits_time : int 110 111 96 96 101 103 105 105 116 116 ...
$ closing_credits_time : int 33 34 27 26 24 26 26 28 32 32 ...
$ net_running_time : int 3403 3037 3171 3079 2998 2898 3194 3212 3090 2880 ...
$ cumulative_net_running_time: int 3403 6440 9611 12690 15688 18586 21780 24992 28082 30962 ...
========================================================
1. data,
2. aesthetic mappings between variables in the data and visual
properties, and
3. layer: usually created with a geom function.
?ggplot
ggplot(data = NULL, mapping = aes(), ...,
environment = parent.frame())
========================================================
use ggplot2’s ggplot() function to setup data and aesthetic mappings
g<-ggplot(data=episode_data,
aes(x=gross_running_time,y=net_running_time))
print(g)
plot of chunk unnamed-chunk-34
g<-ggplot(data=episode_data,
aes(x=gross_running_time,y=net_running_time))
g + geom_point()
plot of chunk unnamed-chunk-35
ghis<-ggplot(data=episode_data, aes(x=net_running_time))
ghis + geom_histogram()
plot of chunk unnamed-chunk-36
change binwidth
#ghis<-ggplot(data=episode_data, aes(x=net_running_time))
ghis + geom_histogram(binwidth=200)
plot of chunk unnamed-chunk-37
add geom_density layer
ghis<-ggplot(data=episode_data, aes(x=net_running_time))
ghis + geom_density()
plot of chunk unnamed-chunk-38
ghis<- ggplot(data=episode_data,
aes(x=net_running_time,fill=as.factor(season)))
ghis + geom_density()
plot of chunk unnamed-chunk-39
ghis<- ggplot(data=episode_data,
aes(x=net_running_time,fill=as.factor(season)))
ghis + geom_density(alpha=0.25)
plot of chunk unnamed-chunk-40
add geom_bar layer
episode_data$season<-as.factor(episode_data$season)
gbar<- ggplot(data=episode_data, aes(x=season))
gbar + geom_bar()
plot of chunk unnamed-chunk-41
use xlab(), ylab(), and ggtitle()
gbar<- ggplot(data=episode_data, aes(x=season))
gbar + geom_bar() +
xlab("Season")+ ylab("Number of episodes")+ ggtitle("Bar plot")
plot of chunk unnamed-chunk-42
or just use labs()
gbar<- ggplot(data=episode_data, aes(x=season))
gbar + geom_bar() +
labs(x="Season",y="Number of episodes",title="Bar plot")
plot of chunk unnamed-chunk-43
use different colours for different seasons and also change the labels
gbar + geom_bar() + coord_flip()
plot of chunk unnamed-chunk-44
use "Got_dataset/short_data.csv"
dataset
subset_GoT<-read.csv(file="Got_dataset/subset_GoT.csv")
# use head function to see first few rows (default = 6)
# we use the argument n=4 to limit the number of rows to be shown
head(subset_GoT, n=4)
id name sex religion occupation social_status
1 100 Waymar Royce M Unknown/Unclear Boiled leather collar Lowborn
2 101 Gared Tuttle M Unknown/Unclear Boiled leather collar Lowborn
3 102 Will M Unknown/Unclear Boiled leather collar Lowborn
4 103 Irri F Great Stallion Boiled leather collar Lowborn
allegiance_last allegiance_switched dth_flag exp_time_sec exp_time_hrs
1 Night's Watch N 1 342 0.10
2 Night's Watch N 1 405 0.11
3 Night's Watch N 1 692 0.19
4 Targaryen Y 1 48489 13.47
str(subset_GoT)
'data.frame': 359 obs. of 11 variables:
$ id : int 100 101 102 103 104 105 106 107 108 109 ...
$ name : Factor w/ 357 levels "Adrack Humble",..: 341 89 344 125 136 33 262 266 260 76 ...
$ sex : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 2 ...
$ religion : Factor w/ 8 levels "Drowned God",..: 8 8 8 3 6 6 2 8 6 6 ...
$ occupation : Factor w/ 3 levels "Boiled leather collar",..: 1 1 1 1 1 2 2 1 3 2 ...
$ social_status : Factor w/ 2 levels "Highborn","Lowborn": 2 2 2 2 1 1 1 2 1 1 ...
$ allegiance_last : Factor w/ 9 levels "Bolton","Frey",..: 5 5 5 8 5 7 7 7 7 6 ...
$ allegiance_switched: Factor w/ 2 levels "N","Y": 1 1 1 2 2 1 1 1 1 1 ...
$ dth_flag : int 1 1 1 1 0 0 1 1 1 1 ...
$ exp_time_sec : int 342 405 692 48489 230347 230347 87621 45722 176937 27606 ...
$ exp_time_hrs : num 0.1 0.11 0.19 13.47 63.99 ...
add geom_boxplot layer
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs))+
geom_boxplot()
plot of chunk unnamed-chunk-47
add another layer
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs))+
geom_boxplot() + geom_point()
plot of chunk unnamed-chunk-48
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs))+
geom_boxplot() + geom_point(position = position_jitter())
plot of chunk unnamed-chunk-49
add geom_bar layer
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs,fill=occupation))+
geom_boxplot()
plot of chunk unnamed-chunk-50
add facet_wrap() layer
facet_wrap(~variable)
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs,fill=occupation))+
geom_boxplot()+facet_wrap(~sex)
plot of chunk unnamed-chunk-51
add facet_grid() layer
facet_grid(Rows.for.var1~Columns.for.var2)
ggplot(data=subset_GoT,aes(x=social_status,y=exp_time_hrs,fill=occupation))+
geom_boxplot()+facet_grid(dth_flag~sex)
plot of chunk unnamed-chunk-52
It defaults to saving the last plot that you displayed, using the size of the current graphics device
It also guesses the type of graphics device from the extension
device: “eps”, “ps”, “tex” (pictex), “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” or “wmf” (windows only)
data(mtcars)
ggplot(mtcars, aes(mpg, wt)) + geom_point()
ggsave("mtcars_default.pdf")
ggsave("mtcars.pdf", width = 4, height = 4)
Plot to save, defaults to last plot displayed.
ggsave(filename, plot = last_plot(), device = NULL, path = NULL, scale = 1, width = NA, height = NA, units = c(“in”, “cm”, “mm”), dpi = 300, limitsize = TRUE, …)
data(mtcars)
plot1<-ggplot(mtcars, aes(mpg, wt)) + geom_point()
plot2<-ggplot(mtcars, aes(mpg, wt,col=as.factor(vs))) + geom_point()
ggsave("mtcars_default.png",plot1)
ggsave("mtcars_col.png",plot2)
type:section id:useful
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html
Base Graphic 1
base_graph_df
sample_num treatment control
1 1 0.02 0
2 2 1.80 20
3 3 17.50 40
4 4 55.00 60
5 5 75.70 80
6 6 80.00 100
plot of chunk unnamed-chunk-56
Base Graphic 2
plot(base_graph_df$sample_num ,base_graph_df$treatment, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1, at=1:6, lab=base_graph_df$days)
axis(2, las=1, at=seq(g_range[1],g_range[2],20))
box()
lines(base_graph_df$control, type="o", pch=0, lty=2, col="red")
title(main="Expression Data", col.main="red", font.main=4)
title(xlab="Samples", col.lab="purple")
title(ylab="Values", col.lab="purple")
legend(1, g_range[2], c("treatment","control"), cex=0.8, col=c("blue","red"), pch=1:0, lty=1:2);
ggplot2 - prepare the data.frame
# covert data.frame into the format that ggplot likes
# install.packages("reshape2")
library("reshape2")
base_graph_4gg<-melt(base_graph_df, id.vars="sample_num")
base_graph_4gg$variable<-relevel(base_graph_4gg$variable,ref="control")
head(base_graph_4gg,n=10)
sample_num variable value
1 1 treatment 0.02
2 2 treatment 1.80
3 3 treatment 17.50
4 4 treatment 55.00
5 5 treatment 75.70
6 6 treatment 80.00
7 1 control 0.00
8 2 control 20.00
9 3 control 40.00
10 4 control 60.00
ggplot2 - plot the figure with default settings
library("ggplot2")
ggplot(base_graph_4gg,aes(x=sample_num,y=value,col=variable,group=variable)) +
geom_point(aes(shape=variable))+
geom_line(aes(linetype=variable))+
labs(title="Expression Data",x ="Sample", y = "Values")
plot of chunk unnamed-chunk-59
ggplot2 - plot the figure that matches base grahpics
ggplot(base_graph_4gg,aes(x=sample_num,y=value,col=variable,group=variable)) +
geom_point(aes(shape=variable))+
geom_line(aes(linetype=variable))+
scale_color_manual(values=c("red", "blue"))+
scale_shape_manual(values=c(0,1))+
scale_linetype_manual(values=c("dashed","solid"))+
labs(title="Expression Data",x ="Sample", y = "Values")+
theme_classic()+
theme(plot.title = element_text(colour = "red",face="bold.italic",hjust = 0.5),
axis.title = element_text(colour = "purple"))
plot of chunk unnamed-chunk-60