### R script for the hands-on examples ### Week 3 ## Data Frames ------------------------------------------------------------- # Use the following code to import the file "read-counts.csv" # (you have already downloaded it for the hand-on examples of week01). # Name the imported data `expr_data`. expr_data <- read.table( file = "../exos_data/read-counts.csv", # replace the path with your own header = TRUE, sep = ",", row.names = 1 ) ## 1. Check the structure of `expr_data` using an appropriate R function. ## 2. How many unique expression values are in sample WT.2? ## - Use `unique()` to get the unique values; ## - then use `length()` to check the number of elements. ## 3. Extract expression levels for the gene "LOH1" in WT samples (WT.1, WT.2, ..., WT.10) ## and SET1 samples (SET1.1, SET1.2, ..., SET1.10). ## Store them as `expr_wt` and `expr_set1`. ## Ensure they are vectors using `unlist()`. (see help with `?unlist`). ## 4. Calculate the fold change and log2 fold change for "LOH1" between WT and SET1 groups. ## Is the gene up- or down-regulated? ## 5. Use `wilcox.test()` to compare *LOH1* expression between WT and SET1. ## At alpha = 0.05, what is your conclusion? ## 6. Create a new data frame for *LOH1* gene expression in WT and SET1 samples ## with two columns: ## - "expr_value": expression levels ## - "group": WT or SET1 ## 7. With the new data frame, draw a boxplot to compare expression ## between groups using `boxplot()`. (see `?boxplot`) ## Lists ------------------------------------------------------------- # Here's a toy list storing information about three samples: my_list <- list( # sample information sample_info = data.frame( id = paste0("sample", 1:3), age = c(25, 27, 30), sex = c("F", "M", "F") ), ## expression matrix count_expr = matrix( 1:6, ncol = 2, dimnames = list( paste0("sample", 1:3), paste0("gene", c("A", "B")) ) ), # mesured genes gene_name = paste0("gene", c("A", "B")), # sequenced family members of each sample family_sequenced = list( sample1 = c("father", "mother"), sample2 = c("father", "mother", "sister"), sample3 = c("mother", "sister") ) ) my_list ## 1. Use `names()` to extract the names of the elements in the list. ## 2. Extract the `count_expr` matrix from the list. ## 3. From the matrix, find the expression value of geneA in sample2 ## 4. Calculate the total counts of each gene across all samples. ## 5. From `sample_info` data frame, extract the `age` column. ## 6. Extract the 1st sequenced family member of sample3. ## 7. Add a new element to the list, "gene_description", ## with the following values: ## `c("geneA" = "housekeeping gene", "geneB" = "stress response gene")`