--- title: "Precept 7" author: "Wei" date: "March 24, 2016" output: html_document --- # Statistics 101 ```{r libraries} library(ggplot2) ``` ## Plan - Central Limit Theorem - Confidence Intervals - p-values and Hypothesis Testing ## Central Limit Theorem Let's consider the uniform distribution. Poisson was demonstrated in class and binomial was demonstrated in last precept. The statement of the CLT in the lectures is: $$\sqrt{n}(\bar{X} -\mu) \rightarrow \operatorname{Normal}(0, \sigma^2)$$ where $\operatorname{E}[X_i] = \mu$, $\operatorname{Var}(X_i)=\sigma^2$, and $\bar{X}$ is the sample mean. Suppose we have 50 observations from Uniform(0, 1). Obviously, the mean of $X_i$ is 0.5. Wikipedia tells us the variance is $\frac{1}{12}(b-a)^2=\frac{1}{12}(1-0)^2=\frac{1}{12}$. ```{r clt_uniform} set.seed(1234) num_reps <- 1e5 results <- rep(0, num_reps) for(i in 1:num_reps){ X <- runif(50, min=0, max=1) X_bar <- mean(X) results[i] <- sqrt(50)*(X_bar-0.5) } dat <- data.frame(samples=results, x=seq(-2, 2, length.out=1e5), y=dnorm(seq(-2, 2, length.out=1e5), mean=0, sd=sqrt(1/12))) ggplot(dat, aes(x=samples, y=..density..)) + geom_histogram(binwidth=0.1) + geom_line(aes(x=x, y=y), color="tomato", size=1, linetype=3) + theme_bw() ``` ## Confidence Intervals Consider the following experiment, where we have 25 samples from a Normal distribution with $\mu=1$ and $\sigma^2=2$. As an experimenter, let's pretend we know the variance but have to estimate the mean. Compute a 95% confidence interval. ```{r} mu <- 1 sigma <- sqrt(2) n <- 25 X <- rnorm(25, mean=mu, sd=sigma) sample_mu <- mean(X) sample_mu - (1.96*sigma/sqrt(n)) sample_mu + (1.96*sigma/sqrt(n)) sample_mu+qnorm(0.025, mean=0, sd=sigma/sqrt(n)) sample_mu+qnorm(0.975, mean=0, sd=sigma/sqrt(n)) ``` What's the probability we'll get the true mu in the range of this interval over repeated trials? ```{r} mu <- 1 sigma <- sqrt(2) n <- 25 B <- 1e6 success <- 0 for(i in 1:B){ X <- rnorm(25, mean=mu, sd=sigma) sample_mu <- mean(X) lower <- sample_mu+qnorm(0.025, mean=0, sd=sigma/sqrt(n)) upper <- sample_mu+qnorm(0.975, mean=0, sd=sigma/sqrt(n)) if(lowermu){ success <- success+1 } } success/B ``` So we're pretty close to 95%. We can visualize in a different way: ```{r} B <- 200 upper <- rep(0, 50) lower <- rep(0, 50) for(i in 1:B){ X <- rnorm(25, mean=mu, sd=sigma) sample_mu <- mean(X) lower[i] <- sample_mu+qnorm(0.025, mean=0, sd=sigma/sqrt(n)) upper[i] <- sample_mu+qnorm(0.975, mean=0, sd=sigma/sqrt(n)) } dat <- data.frame(upper=upper, lower=lower, id=1:B, outside=lower>mu | upper