These exercises cover the sections of LMS_Statistics
# Set working directory
setwd("/Volumes/bioinfomatics$/jurtasun/Courses/CBW2022/LMS_Statistics/course/exercises")
getwd()
## [1] "/Volumes/bioinfomatics$/jurtasun/Courses/CBW2022/LMS_Statistics/course/exercises"
Exercise 1
Simulate 10 flips of a coin, and compute the probability of obtaining 5 heads
Hint: use the rbinom()
function
According to a survey, 72% of Americans prefer dogs to cats. If 8 people are chosen randomly, what is the probability that 6 prefer dogs? And the probability that less than 6?
Hint: use the dbinom()
and pbinom()
function
A weighted coin has 42% chance coming up heads. What is the expected number of heads in 5 tosses? Compute mean and std.
Exercise 2
Calls to a customer service line at average rate of 6 every 5 minutes. What is the probability of getting exactly 4 calls in 5 minutes? And at least 4?
Hint: use the dpois()
and `ppois()``function
Compute the probability of reporting 15 or less cancer patients in a given time interval, assuming the historical average is 12
Compute the probability of reporting 15 or more cancer patients in a given time interval, assuming the historical average is 12
Hint: use the rpois()
function
Exercise 3
Compute the probability of a value being less than or equal to 2 for a normal distribution of mean 0 and standard deviation 1
Compute the probability of a value being greater than 2 for a normal distribution of mean 0 and standard deviation 1
Hint: use the pnorm()
function
Data visualization - generate a gaussian distribution
Set mean and standard deviation to plot a normal distribution
bonus question 1 (optional)
Load the gene expression matrix that has been created for this exercise from "data/gene_exp_matrix.RData"
Use the gene expression to draw a heatmap
Use the scale()
function to perform the Z-score transformation, and use the code above to generate the scaled heatmap
Hint: read the help page of ?scale
, and you might need to use t()
function as well
However, the heatmap.2()
function has a argument called scale
, which does the same thing for you….
bonus question 2 (optional)
Read in the file "categories_Expression.txt"
How many genes are in ofInterest
and pathway
sections?
Get the quantiles of overall Expression
, and for the Glycolysis
and TGFb
genes
Find how many genes were selected
and in the Glycolysis
pathway
Compute probability of selecting a gene with at least the expression level of "Gene13"
, assuming normal distributed data
Perform a t-test to evaluate the difference of the Expression
levels between genes in the Glycolysis
pathway and genes in the TGFb
pathway.