---
title: "Week 6"
output: html_document
date: '2022-05-05'
---
```{R}
library(ggpubr)
data <- read.delim("https://raw.githubusercontent.com/BarryDigby/youth-academy-semII/master/docs/source/worksheets/corr_dat.txt", sep="\t", header=T)
colnames(data) <- c("CXCL12", "PTEN", "C13orf25", "hsa-miR-125", "hsa-miR-455", "hsa-miR-75-5p")
```
# Measuring Correlation
To measure correlation (the strength of relationship and direction) between two continuous variables, we can use the `cor.test()` function:
```{R}
x <- c(2, 5, 4, 7, 8, 4, 7, 8, 9, 12, 20, 18, 15, 17)
y <- c(22, 14, 16, 15, 13, 11, 10, 9, 7, 5, 6, 5, 3, 1)
cor.test(x, y)
```
Note the negative value for cor: -0.8387998. This means there is a strong negative correlation.
Perhaps easier, we can visualise this trend and add the statistic to the plot:
```{R}
df <- data.frame(X=x, Y=y)
ggscatter(df, x="X", y="Y")
ggscatter(df, x="X", y="Y", add = "reg.line")
ggscatter(df, x="X", y="Y", add = "reg.line", cor.coef = TRUE, cor.method = "pearson")
```
# Worksheet
Your task is to use `cor.test()` or the `ggscatter()` plot to find out which miRNAs have a negative correlation with the three genes. This requires 9 calculations:
* hsa-miR-125 vs. CXCL12, PTEN, C12orf25
* hsa-miR-455 vs. CXCL12, PTEN, C12orf25
* hsa-miR-75-5p vs. CXCL12, PTEN, C12orf25
The first contrast has been done for you below. You can reuse this code and note the correlation value for each contrast:
```{R}
# use cor.test()
# place the gene first, the miRNA second
cor.test(data$CXCL12, data$`hsa-miR-455`)
# or use the plot
# place the gene on the x axis, place the miRNA on the y axis.
ggscatter(data, x="CXCL12", y="hsa-miR-455", add = "reg.line", cor.coef = T, cor.method = "pearson")
```