---
title: "Precept 3"
author: "Emily Nelson"
date: "February 16, 2016"
output: html_document
---

```{r setup}
set.seed(1990)
```

#Topics

- Functions
- Packages
- Subsetting
- Tidy Data
- `reshape2` -- `dcast` and `melt`
- `dplyr`

#Functions

How do I define a function? What is an argument?

Write a function that takes two numbers, squares them, and returns the square root of their sum.
```{r functions}
#code goes here
```

What happens if we give the function a character? A boolean?

What happens to variables declared inside the function? Can we access them outside? What if we declare a variable inside that has the same name as one outside? (What do we call this?) Can we access a variable from inside the function that was declared outside?

#Packages

What is a package? Why are they useful? How do I install a package? Update them?

How do I see what packages I have loaded?

What happens if I type `melt` into the console before and after I execute the following code?

```{r load_packages, message=FALSE}
library(data.table)
library(reshape2)
library(dplyr)
```

#Subsetting

1. Get the 4th row of `d`. Then get the 2nd value in this row.
2. Get the 3rd column of `d`.
3. Get the column of `d` called `b`.
4. Get whatever is stored in the 5th row of `d` in the 2nd column.
5. Get the indices of any row of `d` that has a TRUE in the column named `c`.
6. Get the indices of any row of `d` that has the letters `m`, `q`, or `w`.


```{r subsetting}
d <- data.frame(a = letters[2:23], b = rbinom(22, 5, 0.1), c = as.logical(rbinom(22, 1, 0.5)))

```


#Tidy Data

Why is tidy data useful? What are the drawbacks?

Here is some gene expression data:

```{r tidy_example}
#don't worry too much about how I made this
gene_expression <- data.table(expand.grid(1:100, c("A", "B"), c(1, 2)))
setnames(gene_expression, c("gene", "condition", "replicate"))

gene_expression <- gene_expression %>%
  mutate(hour_1 = rnorm(400, 10, 1)) %>%
  mutate(hour_2 = rnorm(400, hour_1, 1)) %>%
  mutate(hour_3 = rnorm(400, hour_2, 1))

print(gene_expression)
```

Is this tidy data? Why or why not?

How can we make it tidy?

#reshape2

`melt` makes data tidy, `dcast` makes it "un"-tidy. Why might we want to use `dcast`? (ie, in what situation is "un"-tidy data useful?)

Turn `gene_expression` back into untidy data with `dcast`.

```{r untidy}
#code goes here
```

#dplyr

1. `filter` -- extract just condition "A"
2. `arrange` -- sort by expression level at hour 3
3. `mutate` -- rename the conditions to glucose and ethanol (glucose=A) -- introduce that awesome thing called `ifelse`
4. `mutate` round 2 -- add a column called `sample` that has the condition and replicate information concatenated -- introduce `paste`
5. `group_by` and `summarize` -- find the mean expression level in each condition at each hour
6. `select` -- delete the replicate column -- 2 ways to do it

```{r dplyr_example}
#code goes here
```