---
title: "Precept 6"
author: "Wei"
date: "March 9, 2016"
output: html_document
---

# Precept plan

0. Touch briefly on `movie_data` from Project 1.
1. Distributions in R.
2. Using simulation to solve probability problems.


## `movie_data` from Project 1.

```{r proj1_wrapup, message=FALSE}
#knitr::opts_chunk$set(fig.show="hide", results="hide")

library(dplyr)
# I learned this load() trick from a student on Project 1
load(url("https://github.com/SML201/project1/raw/master/project_1_movie_data.RData"))
movie_data %>% select(-genre) %>%
  distinct() %>%
  select(title) %>%
  table %>%
  { which(.>1) } %>%
  names %>% tail
```


## Distributions in R

Functions:
* d is density
* p is distribution
* q is quantile
* r is random generator

### Discrete examples

1. Fair coin flips.
    a. What distribution?
    b. What happens when you try to compute the density or distribution of an invalid value?
    c. Plots.
    d. Compute probability for events.

2. How many people cross that skybridge near neuroscience in an hour if the average number of people to cross is 25?
    a. What distribution?
    b. Plots.
    d. Compute probability for events and tail probability.
    
### Continuous examples

1. Standard normal/Gaussian intervals.
    a. Compute probability for intervals.
    b. Plots.

## Simulations to compute probabilities in R.

General strategy:

0. `sample()`, `replicate()`, and vectorization can help with generating data.
1. Figure out how to simulate the sample space. *This is very important, because this is the key to successfully computing tricky conditional probabilities.*
2. Figure out how to check if something happens.
3. Count up the number of successes and the number of trials.

### Some relationships between distributions

1. Low occurence binomials leads to Poisson
```{r low_occurence_binomials}
set.seed(1234)

library(ggplot2)

p <- 0.0004
n <- 10000
B <- 100000 #trials

draws <- rbinom(n=B, size=n, prob=p)

data.frame(draws=draws) %>% 
  ggplot(aes(x=draws, y=..count..)) +
  geom_histogram()

data.frame(draws=draws) %>% 
  ggplot(aes(x=draws, y=..count..)) +
  geom_bar()

poisson_pdf <- data.frame(x=0:15, y=B*dpois(0:15, lambda=n*p))

data.frame(draws=draws) %>% 
  ggplot(aes(x=draws, y=..count..)) +
  geom_bar() + 
  geom_point(aes(x=x, y=y), data=poisson_pdf, color="firebrick", size=5)
```

2. Central Limit Theorem with binomials

```{r waiting_times}
p <- 0.33333
n <- 600
B <- 100000

draws <- rbinom(n=B, size=n, prob=p)
data.frame(draws=draws) %>% ggplot(aes(x=draws, y=..count..)) +
  geom_histogram(binwidth=4) #tune binwidth

x <- seq(150, 250, length.out=1000)
norm_density <- data.frame(x=x, y=4*B*dnorm(x, mean=200, sd=sqrt(n*p*(1-p))))
data.frame(draws=draws) %>% ggplot(aes(x=draws, y=..count..)) +
  geom_histogram(binwidth=4) +
  geom_line(aes(x=x, y=y), data=norm_density, color="firebrick")
```

### Neighbor's kids

Some new neighbors moved in next to you. You know they have two young kids. This morning you spoke to one of them who was a girl. What's the probability the other kid is also a girl?

```{r neighborskids}
sample_space <- matrix(
  c(0, 0,
    0, 1,
    1, 0,
    1, 1
  ), byrow=TRUE, ncol=2
)

B <- 1e7
draws <- sample(1:4, size=B, replace=TRUE)
simData <- sample_space[draws,]
processed <- rowSums(simData)
sum(processed==2) / sum(processed>0)
```

### Texas Hold'em

What's the probability you deal yourself pocket aces? In other words, what is the probability the top two cards are both aces?

```{r poker}
deck <- rep(1:13, 4)
B <- 1e6

draws <- replicate(B, sample(deck))

dim(draws)

first_card <- draws[1,] == 1
second_card <- draws[2,] == 1

pocket_aces <- first_card & second_card

sum(pocket_aces) / B
```

Does this probability change if you're playing at an 8-person table and you're being dealt first?

```{r poker2}
first_card <- draws[1,] == 1
second_card <- draws[9,] == 1

pocket_aces <- first_card & second_card

sum(pocket_aces) / B
```


### Coin flips

Let's suppose we flip a fair coin twice. What's the probability of heads then tails (HT)? What's the probability of HH?

```{r coinflip1}
p <- 0.5
n <- 1
B <- 1e6

first_flip <- rbinom(n=B, size=n, prob=p)
second_flip <- rbinom(n=B, size=n, prob=p)

sum( (first_flip==1) & (second_flip==0) ) / B
sum( (first_flip==1) & (second_flip==1) ) / B
```

Suppose we play a game where we flip a fair coin until we see the pattern HT (in which case I win) or the pattern HH (in which case you win). Who is more likely to win this game?

```{r coinflip2}
p <- 0.5
n <- 1
B <- 1e6

ME <- 0
YOU <- 0

for(i in 1:B){
  flip <- rbinom(n=1, size=n, prob=p)
  prev_flip <- rbinom(n=1, size=n, prob=p)
  while(1){
    if(flip==1 && prev_flip==1){
      YOU <- YOU+1
      break
    }
    if(flip==0 && prev_flip==1){
      ME <- ME+1
      break
    }
    prev_flip <- flip
    flip <- rbinom(n=1, size=n, prob=p)
  }
}

ME/B
YOU/B
```

Now let's change it so that we flip until we see HTT (I win) or HHT (You win). We just added a tail to the end of the patterns. Who is more likely to win?

```{r coinflip3}
p <- 0.5
n <- 1
B <- 1e6

ME <- 0
YOU <- 0

for(i in 1:B){
  flip <- rbinom(n=1, size=n, prob=p)
  prev_flip <- rbinom(n=1, size=n, prob=p)
  prev_flip2 <- rbinom(n=1, size=n, prob=p)
  while(1){
    if(flip==0 && prev_flip==1 && prev_flip2==1){
      YOU <- YOU+1
      break
    }
    if(flip==0 && prev_flip==0 && prev_flip2==1){
      ME <- ME+1
      break
    }
    prev_flip <- flip
    prev_flip2 <- prev_flip
    flip <- rbinom(n=1, size=n, prob=p)
  }
}

ME/B
YOU/B
```