Introduction to R, Session 2

2 Recap on what we have covered.

Session 1 covered introduction to R data types, inputing data.

3 Recap (1/2)

R stores data in five main data types.

  • Vector - Ordered collection of single data type (numeric/character/logical).
  • Matrix - Table (ordered 2D collection) of single data type (numeric/character/logical).
  • Factors -Ordered collection of ordinal or nominal catagories.
  • Data frame - Table (ordered 2D array) of multiple data types of same length.
  • List - Ordered collection of multiple data types of differing length

4 Recap.(2/2)

Data can be read into R as a table with the read.table() function and written to file with the write.table() function.

Table <- read.table("data/readThisTable.csv",sep=",",header=T,row.names=1)
Table[1:3,]
##        Sample_1.hi Sample_2.hi Sample_3.hi Sample_4.low Sample_5.low
## Gene_a    4.570237    3.230467    3.351827     3.930877     4.098247
## Gene_b    3.561733    3.632285    3.587523     4.185287     1.380976
## Gene_c    3.797274    2.874462    4.016916     4.175772     1.988263
##        Sample_1.low
## Gene_a     4.418726
## Gene_b     5.936990
## Gene_c     3.780917
write.table(Table,file="data/writeThisTable.csv", sep=",", row.names =F,col.names=T)

5 Conditions and Loops

6 Conditions and Loops (1/21)

We have looked at using logical vectors as a way to index other data types

x <- 1:10
x[x < 4]
## [1] 1 2 3

Logicals are also used in controlling how scripted procedures execute.

7 Conditions and Loops (2/21) - Two important control structures

  • Conditional branching (if,else)
  • Loops (for, while)

While I’m analysing data, if I need to execute complex statistical procedures on the data I will use R else I will use a calculator.

8 Conditions and Loops (3/21) - Conditional Branching.

Conditional Branching is the evaluation of a logical to determine whether a chunk of code is executed.

In R, we use the if statement with the logical to be evaluated in () and dependent code to be executed in {}.

x <- TRUE
if(x){
  message("x is true")
}
## x is true
x <- FALSE
if(x){
  message("x is true")
}

9 Conditions and Loops (4/21) - Evaluating in if() statements

More often, we construct the logical value within () itself.This can be termed the condition.

x <- 10
y <- 4
if(x > y){
  message("The value of x is ",x," which is greater than ", y)
}
## The value of x is 10 which is greater than 4

Here the message is printed because x is greater than y.

y <- 20
if(x > y){
  message("The value of x is ",x," which is greater than ", y)
}

Here, x is not longer greater than y, so no message is printed.

We really still want a message telling us what was the result of the condition.

10 Conditions and Loops (5/21) -else following an if().

If we want to perform an operation when the condition is false we can follow the if() statement with an else statement.

x <- 10
if(x < 5){
  message(x, " is less than to 5")
   }else{
     message(x," is greater than or equal to 5")
}
## 10 is greater than or equal to 5

With the addition of the else statement, when x is not greater than 5 the code following the else statement is executed.

x <- 3
if(x < 5){
  message(x, " is less than 5")
   }else{
     message(x," is greater than or equal to 5")
}
## 3 is less than 5

11 Conditions and Loops (6/21) - else if

We may wish to execute different procedures under multiple conditions. This can be controlled in R using the else if() following an initial if() statement.

x <- 5
if(x > 5){
  message(x," is greater than 5")
  }else if(x == 5){
    message(x," is 5")
  }else{
    message(x, " is less than 5")
  }
## 5 is 5

12 Conditions and Loops (7/21) -ifelse()

A useful function to evaluate conditional statements over vectors is the ifelse() function.

x <- 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

The ifelse() functions take the arguments of the condition to evaluate, the action if the condition is true and the action when condition is false.

ifelse(x <= 3,"lessOrEqual","more") 
##  [1] "lessOrEqual" "lessOrEqual" "lessOrEqual" "more"        "more"       
##  [6] "more"        "more"        "more"        "more"        "more"

This allows for multiple nested ifelse functions to be applied to vectors.

ifelse(x == 3,"same",
       ifelse(x < 3,"less","more")
       ) 
##  [1] "less" "less" "same" "more" "more" "more" "more" "more" "more" "more"

13 Conditions and Loops (8/21) -Loops

The two main generic methods of looping in R are while and for

  • while - while loops repeat the execution of code while a condition evaluates as true.

  • for - for loops repeat the execution of code for a range of specified values.

14 Conditions and Loops (9/21) -While loops

While loops are most useful if you know the condition will be satisified but are not sure when. (i.e. Looking for a point when a number first occurs in a list).

x <- 1
while(x != 3){
  message("x is ",x," ")
  x <- x+1
}
## x is 1
## x is 2
message("Finally x is 3")
## Finally x is 3

15 Conditions and Loops (10/21) -For loops

For loops allow the user to cycle through a range of values applying an operation for every value.

Here we cycle through a numeric vector and print out its value.

x <- 1:5
for(i in x){
  message("Loop",i," ", appendLF = F)
}
## Loop1 Loop2 Loop3 Loop4 Loop5

Similarly we can cycle through other vector types (or lists)

x <- toupper(letters[1:5])
for(i in x){
  message("Loop",i," ", appendLF = F)
}
## LoopA LoopB LoopC LoopD LoopE

16 Conditions and Loops (11/21) - Looping through indices

We may wish to keep track of the position in x we are evaluating to retrieve the same index in other variables. A common practice is to loop though all possible index positions of x using the expression 1:length(x).

geneName <- c("Ikzf1","Myc","Igll1")
expression <- c(10.4,4.3,6.5)
1:length(geneName)
## [1] 1 2 3
for(i in 1:length(geneName)){
  message(geneName[i]," has an RPKM of ",expression[i])
}
## Ikzf1 has an RPKM of 10.4
## Myc has an RPKM of 4.3
## Igll1 has an RPKM of 6.5

17 Conditions and Loops (12/21) -Loops and conditionals

Left:60% Loops can be combined with conditional statements to allow for complex control of their execution over R objects.

x <- 1:13

for(i in 1:13){
  if(i > 10){
    message("Number ",i," is greater than 10")
  }else if(i == 10){
    message("Number ",i," is  10") 
  }else{
    message("Number ",i," is less than  10") 
  }
}

## Number 1 is less than  10
## Number 2 is less than  10
## Number 3 is less than  10
## Number 4 is less than  10
## Number 5 is less than  10
## Number 6 is less than  10
## Number 7 is less than  10
## Number 8 is less than  10
## Number 9 is less than  10
## Number 10 is  10
## Number 11 is greater than 10
## Number 12 is greater than 10
## Number 13 is greater than 10

18 Conditions and Loops (13/21) - Breaking loops

We can use conditionals to exit a loop if a condition is satisfied, just a like while loop.

x <- 1:13

for(i in 1:13){
  if(i < 10){
    message("Number ",i," is less than 10")
  }else if(i == 10){
    message("Number ",i," is  10")
    break
  }else{
    message("Number ",i," is greater than  10") 
  }
}

## Number 1 is less than 10
## Number 2 is less than 10
## Number 3 is less than 10
## Number 4 is less than 10
## Number 5 is less than 10
## Number 6 is less than 10
## Number 7 is less than 10
## Number 8 is less than 10
## Number 9 is less than 10
## Number 10 is  10

19 Conditions and Loops (14/21) -Functions to loop over data types

There are functions which allow you to loop over a data type and apply a function to the subsection of that data.

  • apply - Apply function to rows or columns of a matrix/data frame and return results as a vector,matrix or list.

  • lapply - Apply function to every element of a vector or list and return results as a list.

  • sapply - Apply function to every element of a vector or list and return results as a vector,matrix or list.

20 Conditions and Loops (15/21) - apply()

The apply() function applys a function to the rows or columns of a matrix. The arguments FUN specifies the function to apply and MARGIN whether to apply the functions by rows/columns or both.

apply(DATA,MARGIN,FUN,...)
  • DATA - A matrix (or something to be coerced into a matrix)
  • MARGIN - 1 for rows, 2 for columns, c(1,2) for cells

21 Conditions and Loops (16/21) - apply() example

matExample <- matrix(c(1:4),nrow=2,ncol=2,byrow=T)
matExample
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

Get the mean of rows

apply(matExample,1,mean)
## [1] 1.5 3.5

Get the mean of columns

apply(matExample,2,mean)
## [1] 2 3

22 Conditions and Loops (16/21) - Additional arguments to apply

Additional arguments to be used by the function in the apply loop can be specified after the function argument.

Arguments may be ordered as if passed to function directly. For paste() function however this isn’t possible.

apply(matExample,1,paste,collapse=";")
## [1] "1;2" "3;4"

23 Conditions and Loops (17/21) - lapply()

Similar to apply, lapply applies a function to every element of a vector or list.

lapply returns a list object containing the results of evaluating the function.

lapply(c(1,2),mean)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2

As with apply() additional arguments can be supplied after the function name argument.

lapply(list(1,c(NA,1),2),mean, na.rm=T)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 2

24 Conditions and Loops (18/21) -sapply()

sapply (smart apply) acts as lapply but attempts to return the results as the most appropriate data type.

Here sapply returns a vector where lapply would return lists.

exampleVector <- c(1,2,3,4,5)
exampleList <- list(1,2,3,4,5)
sapply(exampleVector,mean,na.rm=T)
## [1] 1 2 3 4 5
sapply(exampleList,mean,na.rm=T)
## [1] 1 2 3 4 5

25 Conditions and Loops (19/21) - sapply() example

In this example lapply returns a list of vectors from the quantile function.

exampleList <- list(row1=1:5, row2=6:10, row3=11:15)
exampleList
## $row1
## [1] 1 2 3 4 5
## 
## $row2
## [1]  6  7  8  9 10
## 
## $row3
## [1] 11 12 13 14 15

lapply(exampleList,quantile)
## $row1
##   0%  25%  50%  75% 100% 
##    1    2    3    4    5 
## 
## $row2
##   0%  25%  50%  75% 100% 
##    6    7    8    9   10 
## 
## $row3
##   0%  25%  50%  75% 100% 
##   11   12   13   14   15

26 Conditions and Loops (20/21) - sapply() example 2

Here is an example of sapply parsing a result from the quantile function in a smart way.

When a function always returns a vector of the same length, sapply will create a matrix with elements by column.

sapply(exampleList,quantile)
##      row1 row2 row3
## 0%      1    6   11
## 25%     2    7   12
## 50%     3    8   13
## 75%     4    9   14
## 100%    5   10   15

27 Conditions and Loops (21/21) - sapply() example 4

When sapply cannot parse the result to a vector or matrix, a list will be returned.

exampleList <- list(df=data.frame(sample=paste0("patient",1:2), data=c(1,12)), vec=c(1,3,4,5))
sapply(exampleList,summary)
## $df
##     sample               data      
##  Length:2           Min.   : 1.00  
##  Class :character   1st Qu.: 3.75  
##  Mode  :character   Median : 6.50  
##                     Mean   : 6.50  
##                     3rd Qu.: 9.25  
##                     Max.   :12.00  
## 
## $vec
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.50    3.50    3.25    4.25    5.00

28 Time for an exercise!

Exercise on loops and conditional branching can be found here

29 Answers to exercise.

Answers can be found here here

Rcode for answers can be found here here

30 Functions

31 Functions (1/8) - Built in functions

As we have seen, a function is command which requires one or more arguments and returns a single R object.

This allows for the user to perform complex calculations and prodecures with one simple operation.

x=rnorm(100,70,10)
y <- jitter(x,amount=1)+20
mean(x)
## [1] 71.08371
lmExample <- data.frame(X=x,Y=y)
lmResult <- lm(Y~X,data=lmExample)

plot(Y~X,data=lmExample,main="Line of best fit with lm()",
     xlim=c(0,150),ylim=c(0,150))
abline(lmResult,col="red",lty=3,lwd=3)

32 Functions (2/8) - Functions can be defined in R

Although we have access to many built functions in R, there will be many complex tasks we wish to perform regularly which are particular to our own work and for which no suitable function exists.

For these tasks we can construct our own functions with function()

Function_Name <- function(Arguments){
      Result <- Arguments
  return(Result)
}

33 Functions (3/8) - Defining your own functions

To define a function with function() we need to decide - the argument names within () - the expression to be evaluated within {} - the variable to which the function will be assigned with <-. - the output from the function using return()

Function_name <- function(Argument1,Argument2){ Expression}

myFirstFunction <- function(myArgument1,myArgument2){
  myResult <- (myArgument1*myArgument2)
  return(myResult)
}
myFirstFunction(4,5)
## [1] 20

34 Functions (4/8) - Default arguments

In functions, a default value for an argument may be used. This allows the function to provide a value for an argument when the user does not specify one.

Default arguments can be specified by assigning a value to the argument with = operator

mySecondFunction <- function(myArgument1,myArgument2=10){
  myResult <- (myArgument1*myArgument2)
  return(myResult)
}
mySecondFunction(4,5)
## [1] 20
mySecondFunction(4)
## [1] 40

35 Functions (5/8) -Missing Arguments

In some cases a function may wish to deal with missing arguments in a different way to setting a generic default for the argument. The missing() function can be used to evaluate whether an argument has been defined

mySecondFunction <- function(myArgument1,myArgument2){
  if(missing(myArgument2)){
    message("Value for myArgument2 not provided so will square myArgument1")
    myResult <- myArgument1*myArgument1
  }else{
    myResult <- (myArgument1*myArgument2)   
  }
  return(myResult)
}
mySecondFunction(4)
## Value for myArgument2 not provided so will square myArgument1
## [1] 16

36 Functions (6/8) -Returning objects from functions

We have seen that a function returns the value within the return() function. If no return is specified, the result of last line evaluated in the function is returned.

myforthFunction <- function(myArgument1,myArgument2=10){
  myResult <- (myArgument1*myArgument2)
  return(myResult)
  print("I returned the result")
}
myfifthFunction <- function(myArgument1,myArgument2=10){
(myArgument1*myArgument2)
}

myforthFunction(4,5)
## [1] 20
myfifthFunction(4,5)
## [1] 20

Note that the print() statment after the return() is not evaluated in myforthFuntion.

37 Functions (7/8) - Returning lists from functions

The return() function can only return one R object at a time. To return multiple data objects from one function call, a list can be used to contain other data objects.

mySixthFunction <- function(arg1,arg2){
  result1 <- arg1*arg2
  result2 <- date()
  return(list(Calculation=result1,DateRun=result2))
}
result <- mySixthFunction(10,10)
result
## $Calculation
## [1] 100
## 
## $DateRun
## [1] "Tue Apr 19 15:03:51 2022"

38 Functions (8/8) -Scope

When arguments or variables are created within a function, they only exist within that function and disappear once the function is complete.

mySeventhFunction <- function(arg1,arg2){
  internalValue <- arg1*arg2
  return(internalValue)
}
result <- mySeventhFunction(10,10)
internalValue
## Error in eval(expr, envir, enclos): object 'internalValue' not found
arg1
## Error in eval(expr, envir, enclos): object 'arg1' not found

39 Time for an exercise!

Exercise on functions can be found here

40 Answers to exercise.

Answers can be found here here

41 Scripts

42 Saving scripts

Once we have got our functions together and know how we want to analyse our data, we can save our analysis as a script. By convention R scripts typically end in .r or .R

To save a file in RStudio.

-> File -> Save as

To open a previous R script

->File -> Open File..

To save all the objects (workspace) with extension .RData

->Session -> Save workspace as

43 Sourcing scripts.

R scripts allow us to save and reuse custom functions we have written. To run the code from an R script we can use the source() function with the name of the R script as the argument.

The file dayOfWeek.r in the “scripts” directory contains a simple R script to tell you what day it is after your marathon R coding session.

#Contents of dayOfWeek.r
dayOfWeek <- function(){
  return(gsub(" .*","",date()))  
}
source("scripts/dayOfWeek.R")
dayOfWeek()
## [1] "Tue"

44 Rscript

R scripts can be run non-interactively from the command line with the Rscript command, usually with the option –vanilla to avoid saving or restoring workspaces. All messages/warnings/errors will be output to the console.

Rscript --vanilla myscript.r

An alternative to Rscript is R CMD BATCH. Here all messages/warnings/errors are directed to a file and the processing time appended.

R CMD BATCH myscript.r

45 Sending arguments to Rscript

To provide arguments to an R script at the command line we must add commandArgs() function to parse command line arguments.

args <- commandArgs(TRUE)
myFirstArgument <- args[1]
myFirstArgument
as.numeric(myFirstArgument
'10'
as.numeric(myFirstArgument)
10

Since vectors can only be one type, all command line arguments are strings and must be converted to numeric if needed with as.numeric()

46 Getting help

47 The end

48 Questions?