---
title: "Sample Size"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(knitr)
library(Matrix)
library(tictoc)
library(mnormt)
set.seed(02192021)
```

### Sample Size Decisions  


\vfill


\vfill

For example assume we are in a one-sample t-test framework.

1. hypothesize effect size: 0.5 


2. Set variance to 1 and sample size to 25

\vfill

3. calculate chance p-value is below threshold. 

\vfill


\vfill


\vfill

For more details see on power see Christian Stratton and Jenny Green's wonderful Shiny app [https://christianstratton.shinyapps.io/PowerApp/](https://christianstratton.shinyapps.io/PowerApp/)

\newpage

A few points about power:


\vfill


\vfill


\vfill


\vfill


\vfill


\newpage

#### Design Analysis with Simulation

More generally we can use simulation to estimate parameters or interest, such as the standard error as a function of sample size.

\vfill

Let's reconsider the example we used for a power analysis: a one sample t-test. More generally, the goal is to estimate the population mean.

\vfill


\vfill

\vfill

##### Simulation part 1:

\vfill
~
```{r}
num_samples <- 10
var_data <- 1
num_sims <- 1000000

tic('replicate')
data_replicate <- replicate(num_sims, rnorm(num_samples, 0, sqrt(var_data))) # mean not important here
toc()

tic('loop')
data_loop <- matrix(0,num_sims, num_samples)
for (iter in 1:num_sims){
  data_loop[iter,] <- rnorm(num_samples, 0, sqrt(var_data))
}
toc()

tic('rmnorm')
data_mvnorm <- rmnorm(num_sims, rep(0,num_samples), var_data * diag(num_samples))
toc()

```


\vfill

##### Simulation part 2:

\vfill
~
```{r}
num_replicates <- 1000
var_data <- 2
num_samples <- 10

simulate_mean_se <- function(num_replicates, var_data, num_samples){
  # function to return standard error for given sample size and data variance
  # inputs:
  #       - num_replicates: number of data sets to simulate
  #       - var_data: variance of the data
  #       - num_samples: number of data points
  # output:
  #       - num_replicates sample means
  return(rowMeans(mnormt::rmnorm(num_replicates, 0, var_data * diag(num_samples))))
}

tibble(sims = c(simulate_mean_se(num_replicates, var_data, 10),
                simulate_mean_se(num_replicates, var_data, 30),
                simulate_mean_se(num_replicates, var_data, 100),
                simulate_mean_se(num_replicates, var_data, 1000)),
      n = rep(c('10','30','100','1000'), each = num_replicates)) %>% 
        ggplot(aes(x = sims, fill = n)) + geom_histogram(bins = 100) + theme_bw() +
  facet_wrap(~n) + 
  ggtitle("Sampling Distribution for Sample Mean of N(0,1) with varying sample size")

```

\newpage

##### Simulation part 3:

\vfill
~
```{r}
num_replicates <- 10000
var_data <- 1
n_seq <- 5:200
numb_n <- length(n_seq)

estimated_se <- rep(0, numb_n)

for (iter in 1:numb_n){
  estimated_se[iter] <- sd(simulate_mean_se(num_replicates, var_data, n_seq[iter]))
}

true_se <- tibble(n_seq = n_seq, se = var_data / sqrt(n_seq))

tibble(n_seq = n_seq, estimated_se = estimated_se) %>% 
  ggplot(aes(y=estimated_se, x = n_seq)) + 
  geom_line() + 
  geom_line(aes(y=se, x = n_seq), inherit.aes = F, data = true_se, color ='red', linetype = 2) + 
  theme_bw() + ylab('standard error') + xlab('sample size')
```


##### Simulation part 4:

Often times working with collaborators requires answer many different questions, so a R Shiny application can be a good option.

\vfill

As an example see: [https://andrewhoegh.shinyapps.io/Australian_Samples/](https://andrewhoegh.shinyapps.io/Australian_Samples/)