### R script for the hands-on examples
### Week 6

## "Fix the Code" Challenge -------------------------------------------------------------------

## 1. Fix the data frame error.
data <- data.frame(
  gene = c("BRCA1", "TP53", "MYC"),
  expression = c(10.5, 8.2, 12.7)
  condition = c("Tumor", "Normal", "Tumor")
)


## 2. Fix error when using the `mean()` function.
counts <- data.frame(
  sample1 = c(1, 2, 3),
  sample2 = c(1, 1, 3),
  sample3 = c(0, 1, 2),
  row.names = paste0("gene", 1:3)
)
mean(counts["gene3", ])


## 3. Fix error in condition.
x <- 10
if (x = 5) {
  print("x is 5")
}


## 4. Fix error in ggplot2.
##    The goal is to show petal length with a boxplot for each species.

ggplot(iris, aes(x = Sepal.Length, y = Species))
geom_boxplot()


## 5. Fix error in ggplot2. The aim is to draw boxplot for each group.

# simulate data for two groups of samples.
set.seed(1)
data <- data.frame(
  gp = rep(1:2, each = 20),
  value = c(rnorm(20), rnorm(20, mean = 5))
)
str(data)
# draw boxplot by group.
ggplot(data, aes(x = gp, y = value)) + 
  geom_boxplot()


## 6. Fix code in data filtration.
# try to keep rows where the value is smaller than -0.5
data[data$value<-0.5, ]


## Mini Data Project -------------------------------------------------------------------------

## A researcher has collected some gene expression data from 12 samples.
## However, some expression values are missing.
## Please help the researcher to clean the data and to performs some basic analyses.

# Simulated dataset with missing values
data <- data.frame(
  sample_id = paste0("sample", 1:12),
  expression = c(
    10.2, 15.2, NA, NA, 9.4, 18.1,
    8.9, 16.0, 10.5, 15.5, 11.5, 13.4
  ),
  sample_group = rep(c("Control", "Case"), times = 6)
)

data

## Tasks:
## 1. Find missing values.
## Which rows contain missing values?
## Hint: Use `is.na()`

## 2. Remove rows with missing values.
## Create a new dataset without missing values.


## 3. Basic summary statistics
##    - What is the mean expression level (after removing missing values)?
##    - What is the max and min expression?


## 4. Find the average expression (`mean()`) and the standard deviation (`sd()`)
##    for each sample group (Control *vs.* Case)


## 5. Use `data_clean` to draw a graph you have already seen,
##    e.g.: box plots, scatter plots, etc.


## 6. To go futhur: Let's visualise the average expression of each group using barplot,
##                  with the help of ChatGPT (or any other AI tool).

## 6a. Prepare data for barplot with error bars.
##    We need to reorganize the data in a dataframe with 2 rows and 3 columns:
##    - the column `group` contain the name of each group
##    - the column `mean` contain the mean expression in  of each group
##    - the column `mean` contain the standard deviations


## 6b. Draw the bar plot:
##    - Plot bars for mean expression (`geom_bar(stat = "identity")`)
##    - Add error bars for standard deviation (`geom_errorbar()`)
##    - Change the aesthetic apsects as you want, *e.g.*: color, title, legend, *etc.*


## 6c. What if we want to add the expression level of each sample to the bar plot?
## Hint: Add another layer for drawing points (`geom_point`),
##       using the data frame that contains the individual data (`data_clean`).