Introduction to R, Session 2
2 Recap on what we have covered.
Session 1 covered introduction to R data types, inputing data.
3 Recap (1/2)
R stores data in five main data types.
- Vector - Ordered collection of single data type (numeric/character/logical).
- Matrix - Table (ordered 2D collection) of single data type (numeric/character/logical).
- Factors -Ordered collection of ordinal or nominal catagories.
- Data frame - Table (ordered 2D array) of multiple data types of same length.
- List - Ordered collection of multiple data types of differing length
4 Recap.(2/2)
Data can be read into R as a table with the read.table() function and written to file with the write.table() function.
<- read.table("data/readThisTable.csv",sep=",",header=T,row.names=1)
Table 1:3,] Table[
## Sample_1.hi Sample_2.hi Sample_3.hi Sample_4.low Sample_5.low
## Gene_a 4.570237 3.230467 3.351827 3.930877 4.098247
## Gene_b 3.561733 3.632285 3.587523 4.185287 1.380976
## Gene_c 3.797274 2.874462 4.016916 4.175772 1.988263
## Sample_1.low
## Gene_a 4.418726
## Gene_b 5.936990
## Gene_c 3.780917
write.table(Table,file="data/writeThisTable.csv", sep=",", row.names =F,col.names=T)
5 Conditions and Loops
6 Conditions and Loops (1/21)
We have looked at using logical vectors as a way to index other data types
<- 1:10
x < 4] x[x
## [1] 1 2 3
Logicals are also used in controlling how scripted procedures execute.
7 Conditions and Loops (2/21) - Two important control structures
- Conditional branching (if,else)
- Loops (for, while)
While I’m analysing data, if I need to execute complex statistical procedures on the data I will use R else I will use a calculator.
8 Conditions and Loops (3/21) - Conditional Branching.
Conditional Branching is the evaluation of a logical to determine whether a chunk of code is executed.
In R, we use the if statement with the logical to be evaluated in () and dependent code to be executed in {}.
<- TRUE
x if(x){
message("x is true")
}
## x is true
<- FALSE
x if(x){
message("x is true")
}
9 Conditions and Loops (4/21) - Evaluating in if() statements
More often, we construct the logical value within () itself.This can be termed the condition.
<- 10
x <- 4
y if(x > y){
message("The value of x is ",x," which is greater than ", y)
}
## The value of x is 10 which is greater than 4
Here the message is printed because x is greater than y.
<- 20
y if(x > y){
message("The value of x is ",x," which is greater than ", y)
}
Here, x is not longer greater than y, so no message is printed.
We really still want a message telling us what was the result of the condition.
10 Conditions and Loops (5/21) -else following an if().
If we want to perform an operation when the condition is false we can follow the if() statement with an else statement.
<- 10
x if(x < 5){
message(x, " is less than to 5")
else{
}message(x," is greater than or equal to 5")
}
## 10 is greater than or equal to 5
With the addition of the else statement, when x is not greater than 5 the code following the else statement is executed.
<- 3
x if(x < 5){
message(x, " is less than 5")
else{
}message(x," is greater than or equal to 5")
}
## 3 is less than 5
11 Conditions and Loops (6/21) - else if
We may wish to execute different procedures under multiple conditions. This can be controlled in R using the else if() following an initial if() statement.
<- 5
x if(x > 5){
message(x," is greater than 5")
else if(x == 5){
}message(x," is 5")
else{
}message(x, " is less than 5")
}
## 5 is 5
12 Conditions and Loops (7/21) -ifelse()
A useful function to evaluate conditional statements over vectors is the ifelse() function.
<- 1:10
x x
## [1] 1 2 3 4 5 6 7 8 9 10
The ifelse() functions take the arguments of the condition to evaluate, the action if the condition is true and the action when condition is false.
ifelse(x <= 3,"lessOrEqual","more")
## [1] "lessOrEqual" "lessOrEqual" "lessOrEqual" "more" "more"
## [6] "more" "more" "more" "more" "more"
This allows for multiple nested ifelse functions to be applied to vectors.
ifelse(x == 3,"same",
ifelse(x < 3,"less","more")
)
## [1] "less" "less" "same" "more" "more" "more" "more" "more" "more" "more"
13 Conditions and Loops (8/21) -Loops
The two main generic methods of looping in R are while and for
while - while loops repeat the execution of code while a condition evaluates as true.
for - for loops repeat the execution of code for a range of specified values.
14 Conditions and Loops (9/21) -While loops
While loops are most useful if you know the condition will be satisified but are not sure when. (i.e. Looking for a point when a number first occurs in a list).
<- 1
x while(x != 3){
message("x is ",x," ")
<- x+1
x }
## x is 1
## x is 2
message("Finally x is 3")
## Finally x is 3
15 Conditions and Loops (10/21) -For loops
For loops allow the user to cycle through a range of values applying an operation for every value.
Here we cycle through a numeric vector and print out its value.
<- 1:5
x for(i in x){
message("Loop",i," ", appendLF = F)
}
## Loop1 Loop2 Loop3 Loop4 Loop5
Similarly we can cycle through other vector types (or lists)
<- toupper(letters[1:5])
x for(i in x){
message("Loop",i," ", appendLF = F)
}
## LoopA LoopB LoopC LoopD LoopE
16 Conditions and Loops (11/21) - Looping through indices
We may wish to keep track of the position in x we are evaluating to retrieve the same index in other variables. A common practice is to loop though all possible index positions of x using the expression 1:length(x).
<- c("Ikzf1","Myc","Igll1")
geneName <- c(10.4,4.3,6.5)
expression 1:length(geneName)
## [1] 1 2 3
for(i in 1:length(geneName)){
message(geneName[i]," has an RPKM of ",expression[i])
}
## Ikzf1 has an RPKM of 10.4
## Myc has an RPKM of 4.3
## Igll1 has an RPKM of 6.5
17 Conditions and Loops (12/21) -Loops and conditionals
Left:60% Loops can be combined with conditional statements to allow for complex control of their execution over R objects.
<- 1:13
x
for(i in 1:13){
if(i > 10){
message("Number ",i," is greater than 10")
else if(i == 10){
}message("Number ",i," is 10")
else{
}message("Number ",i," is less than 10")
} }
## Number 1 is less than 10
## Number 2 is less than 10
## Number 3 is less than 10
## Number 4 is less than 10
## Number 5 is less than 10
## Number 6 is less than 10
## Number 7 is less than 10
## Number 8 is less than 10
## Number 9 is less than 10
## Number 10 is 10
## Number 11 is greater than 10
## Number 12 is greater than 10
## Number 13 is greater than 10
18 Conditions and Loops (13/21) - Breaking loops
We can use conditionals to exit a loop if a condition is satisfied, just a like while loop.
<- 1:13
x
for(i in 1:13){
if(i < 10){
message("Number ",i," is less than 10")
else if(i == 10){
}message("Number ",i," is 10")
break
else{
}message("Number ",i," is greater than 10")
} }
## Number 1 is less than 10
## Number 2 is less than 10
## Number 3 is less than 10
## Number 4 is less than 10
## Number 5 is less than 10
## Number 6 is less than 10
## Number 7 is less than 10
## Number 8 is less than 10
## Number 9 is less than 10
## Number 10 is 10
19 Conditions and Loops (14/21) -Functions to loop over data types
There are functions which allow you to loop over a data type and apply a function to the subsection of that data.
apply - Apply function to rows or columns of a matrix/data frame and return results as a vector,matrix or list.
lapply - Apply function to every element of a vector or list and return results as a list.
sapply - Apply function to every element of a vector or list and return results as a vector,matrix or list.
20 Conditions and Loops (15/21) - apply()
The apply() function applys a function to the rows or columns of a matrix. The arguments FUN specifies the function to apply and MARGIN whether to apply the functions by rows/columns or both.
apply(DATA,MARGIN,FUN,...)
- DATA - A matrix (or something to be coerced into a matrix)
- MARGIN - 1 for rows, 2 for columns, c(1,2) for cells
21 Conditions and Loops (16/21) - apply() example
<- matrix(c(1:4),nrow=2,ncol=2,byrow=T)
matExample matExample
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
Get the mean of rows
apply(matExample,1,mean)
## [1] 1.5 3.5
Get the mean of columns
apply(matExample,2,mean)
## [1] 2 3
22 Conditions and Loops (16/21) - Additional arguments to apply
Additional arguments to be used by the function in the apply loop can be specified after the function argument.
Arguments may be ordered as if passed to function directly. For paste() function however this isn’t possible.
apply(matExample,1,paste,collapse=";")
## [1] "1;2" "3;4"
23 Conditions and Loops (17/21) - lapply()
Similar to apply, lapply applies a function to every element of a vector or list.
lapply returns a list object containing the results of evaluating the function.
lapply(c(1,2),mean)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
As with apply() additional arguments can be supplied after the function name argument.
lapply(list(1,c(NA,1),2),mean, na.rm=T)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 2
24 Conditions and Loops (18/21) -sapply()
sapply (smart apply) acts as lapply but attempts to return the results as the most appropriate data type.
Here sapply returns a vector where lapply would return lists.
<- c(1,2,3,4,5)
exampleVector <- list(1,2,3,4,5)
exampleList sapply(exampleVector,mean,na.rm=T)
## [1] 1 2 3 4 5
sapply(exampleList,mean,na.rm=T)
## [1] 1 2 3 4 5
25 Conditions and Loops (19/21) - sapply() example
In this example lapply returns a list of vectors from the quantile function.
<- list(row1=1:5, row2=6:10, row3=11:15)
exampleList exampleList
## $row1
## [1] 1 2 3 4 5
##
## $row2
## [1] 6 7 8 9 10
##
## $row3
## [1] 11 12 13 14 15
lapply(exampleList,quantile)
## $row1
## 0% 25% 50% 75% 100%
## 1 2 3 4 5
##
## $row2
## 0% 25% 50% 75% 100%
## 6 7 8 9 10
##
## $row3
## 0% 25% 50% 75% 100%
## 11 12 13 14 15
26 Conditions and Loops (20/21) - sapply() example 2
Here is an example of sapply parsing a result from the quantile function in a smart way.
When a function always returns a vector of the same length, sapply will create a matrix with elements by column.
sapply(exampleList,quantile)
## row1 row2 row3
## 0% 1 6 11
## 25% 2 7 12
## 50% 3 8 13
## 75% 4 9 14
## 100% 5 10 15
27 Conditions and Loops (21/21) - sapply() example 4
When sapply cannot parse the result to a vector or matrix, a list will be returned.
<- list(df=data.frame(sample=paste0("patient",1:2), data=c(1,12)), vec=c(1,3,4,5))
exampleList sapply(exampleList,summary)
## $df
## sample data
## Length:2 Min. : 1.00
## Class :character 1st Qu.: 3.75
## Mode :character Median : 6.50
## Mean : 6.50
## 3rd Qu.: 9.25
## Max. :12.00
##
## $vec
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.50 3.50 3.25 4.25 5.00
28 Time for an exercise!
Exercise on loops and conditional branching can be found here
30 Functions
31 Functions (1/8) - Built in functions
As we have seen, a function is command which requires one or more arguments and returns a single R object.
This allows for the user to perform complex calculations and prodecures with one simple operation.
=rnorm(100,70,10)
x<- jitter(x,amount=1)+20
y mean(x)
## [1] 71.08371
<- data.frame(X=x,Y=y)
lmExample <- lm(Y~X,data=lmExample) lmResult
plot(Y~X,data=lmExample,main="Line of best fit with lm()",
xlim=c(0,150),ylim=c(0,150))
abline(lmResult,col="red",lty=3,lwd=3)
32 Functions (2/8) - Functions can be defined in R
Although we have access to many built functions in R, there will be many complex tasks we wish to perform regularly which are particular to our own work and for which no suitable function exists.
For these tasks we can construct our own functions with function()
Function_Name <- function(Arguments){
Result <- Arguments
return(Result)
}
33 Functions (3/8) - Defining your own functions
To define a function with function() we need to decide - the argument names within () - the expression to be evaluated within {} - the variable to which the function will be assigned with <-. - the output from the function using return()
Function_name <- function(Argument1,Argument2){ Expression}
<- function(myArgument1,myArgument2){
myFirstFunction <- (myArgument1*myArgument2)
myResult return(myResult)
}myFirstFunction(4,5)
## [1] 20
34 Functions (4/8) - Default arguments
In functions, a default value for an argument may be used. This allows the function to provide a value for an argument when the user does not specify one.
Default arguments can be specified by assigning a value to the argument with = operator
<- function(myArgument1,myArgument2=10){
mySecondFunction <- (myArgument1*myArgument2)
myResult return(myResult)
}mySecondFunction(4,5)
## [1] 20
mySecondFunction(4)
## [1] 40
35 Functions (5/8) -Missing Arguments
In some cases a function may wish to deal with missing arguments in a different way to setting a generic default for the argument. The missing() function can be used to evaluate whether an argument has been defined
<- function(myArgument1,myArgument2){
mySecondFunction if(missing(myArgument2)){
message("Value for myArgument2 not provided so will square myArgument1")
<- myArgument1*myArgument1
myResult else{
}<- (myArgument1*myArgument2)
myResult
}return(myResult)
}mySecondFunction(4)
## Value for myArgument2 not provided so will square myArgument1
## [1] 16
36 Functions (6/8) -Returning objects from functions
We have seen that a function returns the value within the return() function. If no return is specified, the result of last line evaluated in the function is returned.
<- function(myArgument1,myArgument2=10){
myforthFunction <- (myArgument1*myArgument2)
myResult return(myResult)
print("I returned the result")
}<- function(myArgument1,myArgument2=10){
myfifthFunction *myArgument2)
(myArgument1
}
myforthFunction(4,5)
## [1] 20
myfifthFunction(4,5)
## [1] 20
Note that the print() statment after the return() is not evaluated in myforthFuntion.
37 Functions (7/8) - Returning lists from functions
The return() function can only return one R object at a time. To return multiple data objects from one function call, a list can be used to contain other data objects.
<- function(arg1,arg2){
mySixthFunction <- arg1*arg2
result1 <- date()
result2 return(list(Calculation=result1,DateRun=result2))
}<- mySixthFunction(10,10)
result result
## $Calculation
## [1] 100
##
## $DateRun
## [1] "Tue Apr 19 15:03:51 2022"
38 Functions (8/8) -Scope
When arguments or variables are created within a function, they only exist within that function and disappear once the function is complete.
<- function(arg1,arg2){
mySeventhFunction <- arg1*arg2
internalValue return(internalValue)
}<- mySeventhFunction(10,10)
result internalValue
## Error in eval(expr, envir, enclos): object 'internalValue' not found
arg1
## Error in eval(expr, envir, enclos): object 'arg1' not found
39 Time for an exercise!
Exercise on functions can be found here
40 Answers to exercise.
Answers can be found here here
41 Scripts
42 Saving scripts
Once we have got our functions together and know how we want to analyse our data, we can save our analysis as a script. By convention R scripts typically end in .r or .R
To save a file in RStudio.
-> File -> Save as
To open a previous R script
->File -> Open File..
To save all the objects (workspace) with extension .RData
->Session -> Save workspace as
43 Sourcing scripts.
R scripts allow us to save and reuse custom functions we have written. To run the code from an R script we can use the source() function with the name of the R script as the argument.
The file dayOfWeek.r in the “scripts” directory contains a simple R script to tell you what day it is after your marathon R coding session.
#Contents of dayOfWeek.r
dayOfWeek <- function(){
return(gsub(" .*","",date()))
}
source("scripts/dayOfWeek.R")
dayOfWeek()
## [1] "Tue"
44 Rscript
R scripts can be run non-interactively from the command line with the Rscript command, usually with the option –vanilla to avoid saving or restoring workspaces. All messages/warnings/errors will be output to the console.
Rscript --vanilla myscript.r
An alternative to Rscript is R CMD BATCH. Here all messages/warnings/errors are directed to a file and the processing time appended.
R CMD BATCH myscript.r
45 Sending arguments to Rscript
To provide arguments to an R script at the command line we must add commandArgs() function to parse command line arguments.
<- commandArgs(TRUE)
args <- args[1]
myFirstArgument
myFirstArgumentas.numeric(myFirstArgument
'10'
as.numeric(myFirstArgument)
10
Since vectors can only be one type, all command line arguments are strings and must be converted to numeric if needed with as.numeric()
46 Getting help
- Local friendly bioinformaticians and computational biologists.
- Stackoverflow
- R-help