Exercise 2 - Introduction to Probability theory

Set working directory

# Set working directory
setwd("/Volumes/bioinfomatics$/jurtasun/Courses/CBW2022/LMS_Statistics/course/exercises")
getwd()

## [1] "/Volumes/bioinfomatics$/jurtasun/Courses/CBW2022/LMS_Statistics/course/exercises"

rbinom(): generate Binomial distributed random variables with a given mean and std
rpois(): generate Poisson distributed random variables with a given mean and std
rnorm(): generate Normal distributed random variables with a given mean and std

Exercise 1

Simulate 10 flips of a coin, and compute the probability of obtaining 5 heads
Hint: use the rbinom() function
According to a survey, 72% of Americans prefer dogs to cats. If 8 people are chosen randomly, what is the probability that 6 prefer dogs? And the probability that less than 6?
Hint: use the dbinom() and pbinom() function
A weighted coin has 42% chance coming up heads. What is the expected number of heads in 5 tosses? Compute mean and std.

Exercise 2

Calls to a customer service line at average rate of 6 every 5 minutes. What is the probability of getting exactly 4 calls in 5 minutes? And at least 4?
Hint: use the dpois() and `ppois()``function
Compute the probability of reporting 15 or less cancer patients in a given time interval, assuming the historical average is 12
Compute the probability of reporting 15 or more cancer patients in a given time interval, assuming the historical average is 12
Hint: use the rpois() function

Exercise 3

Compute the probability of a value being less than or equal to 2 for a normal distribution of mean 0 and standard deviation 1
Compute the probability of a value being greater than 2 for a normal distribution of mean 0 and standard deviation 1
Hint: use the pnorm() function
Data visualization - generate a gaussian distribution
Set mean and standard deviation to plot a normal distribution

bonus question 1 (optional)

Load the gene expression matrix that has been created for this exercise from "data/gene_exp_matrix.RData"
Use the gene expression to draw a heatmap
Use the scale() function to perform the Z-score transformation, and use the code above to generate the scaled heatmap
Hint: read the help page of ?scale, and you might need to use t() function as well

However, the heatmap.2() function has a argument called scale, which does the same thing for you….

bonus question 2 (optional)

Read in the file "categories_Expression.txt"
How many genes are in ofInterest and pathway sections?
Get the quantiles of overall Expression, and for the Glycolysis and TGFb genes
Find how many genes were selected and in the Glycolysis pathway
Compute probability of selecting a gene with at least the expression level of "Gene13", assuming normal distributed data
Perform a t-test to evaluate the difference of the Expression levels between genes in the Glycolysis pathway and genes in the TGFb pathway.