### R script for the hands-on examples
### Week 4

## Import Data -------------------------------------------------------------

# A gene-level differential expression (DE) analysis was performed
# to compare SET1 samples to WT samples using data from `read-counts.csv`.

# The analysis results are available via this link:
# https://raw.githubusercontent.com/InforBio/IOC/refs/heads/main/ioc_r/exos_data/toy_DEanalysis.csv

## - Please donwload the result file and upload it to your data folder.
## - Import the data using the `read.csv()` function.
## (See the documentation with `?read.csv`)
## Name the imported results `de_res`.

de_res


## Exercises -------------------------------------------------------------

## 1. Check the structure of `de_res` using an appropriate R function.
##    What are the dimensions?


# The result contains following columns:

# - `gene_name`: gene name
# - `baseMean`: mean of normalized counts for all samples
# - `log2FoldChange`: log2 fold change
# - `lfcSE`: standard error
# - `stat`: Wald statistic
# - `pvalue`: Wald test p-value
# - `padj`: adjusted p-values (Benjamini-Hochberg procedure)

## 2. Filter the rows where the gene has a log2 fold change (`log2FoldChange`)
##    greater than 0.5.

## 3. Filter the rows where the gene has a log2 fold change smaller than -0.5.

## 4. Filter the rows where the gene has a log2 fold change greater than 0.5 or
##    smaller than -0.5.

## 5. Filter the rows where the gene has a log2 fold change greater than 0.5 and
##    adjusted p-value (`padj`) smaller than 0.05.


## 6. Extract results for these genes: RNR1, PIR3, SRP68.


## 7. Use `ifelse()` to categorize genes.
##    Add a new column, `gene_category`, that assigns categories:
##    - "up" if `log2FoldChange > 0.5`.
##    - "down" if `log2FoldChange < -0.5`.
##    - "neutral" otherwise.


## 8. Use `table()` to count the occurrences of each gene category.


## Bonus Questions  ------------------------------------------------------------

## 9. Write a function to automate "de_res" filtering for genes
##    with a p-value less than or equal to a custom cutoff.


## 10. Based on the function created in question 9, modify the function
##    to allow output ordered by any desired column in `de_res`.
## Hints: You need an extra parameter to specify the wanted
##        column and another parameter to fix the cutoff.


## 11. A yeast gene annotation file was obtained from the Ensembl data base.
##     This file can be donwloaded here:
##     https://raw.githubusercontent.com/InforBio/IOC/refs/heads/main/ioc_r/exos_data/yeast_gene_annot.csv
##     Import the data and add the annotation to the `de_res` data frame using `merge()` function.