--- title: "Basic statistical inference" --- ## Packages for this section ```{r inference-1-R-1} library(tidyverse) ``` ## Inference for means (from STAB57) Three kinds of inference for means of normally-distributed data: - **One-sample $t$**: a single sample from a population, estimate that population's mean - **Two-sample $t$**: one sample from each of 2 populations, estimate difference in population means - **Matched pairs $t$**: two paired measurements on same (or matched) individuals, estimate population mean difference Two forms of inference for a population parameter: - **Confidence interval**: "what is the population parameter?" - **Hypothesis test**: "could the population parameter be equal to this value?" ## Examples: - Blue jays attendances (one-sample) - Kids learning to read (two-sample) - Pain relief (matched pairs) ## Confidence interval - You have a sample from some population - Imagine repeated sampling from that population - Procedure that gives an interval containing the true parameter in 95% (or 90% or 99%) of all possible samples ## Hypothesis test - Null hypothesis gives value for population parameter - Alternative hypothesis says how you are trying to prove the null hypothesis wrong (not equal, greater, less). - Test statistic measures "distance" between data and null hypothesis - P-value gives probability of observing test statistic *as extreme or more extreme*, **if the null hypothesis is true**. - Reject null hypothesis if P-value small enough (eg smaller than 0.05). ## Why 0.05? This man. ::: columns ::: {.column width="40%"} ![](fisher.png) ::: ::: {.column width="60%"} - analysis of variance - Fisher information - Linear discriminant analysis - Fisher's $z$-transformation - Fisher-Yates shuffle - Behrens-Fisher problem Sir Ronald A. Fisher, 1890--1962. ::: ::: ## Why 0.05? (2) - From The Arrangement of Field Experiments (1926): ![](fisher1.png){width="100%"} - and ![](fisher2.png){width="100%"} ## $\alpha$ and errors - Hypothesis test ends with decision: - reject null hypothesis - do not reject null hypothesis. - but decision may be wrong: | | Decision | | |----------------|-------------------|-----------------| | **Truth** | **Do not reject** | **reject null** | | **Null true** | Correct | Type I error | | **Null false** | Type II error | Correct | - Either type of error is bad, but for now focus on controlling Type I error: write $\alpha$ = P(type I error), and devise test so that $\alpha$ small, typically 0.05. - That is, **if null hypothesis true**, have only small chance to reject it (which would be a mistake). - Worry about type II errors later (when we consider power of test). ## One sample: the Blue Jays attendances - The Toronto Blue Jays' average home attendance in part of 2015 season was 25,070 (up to May 27 2015, from baseball-reference.com). - Does that mean the attendance at every game was exactly 25,070? Certainly not. Actual attendance depends on many things, eg.: - how well the Jays are playing - the opposition - day of week - weather - random chance ## Reading the attendances ...as a `.csv` file: \footnotesize ```{r inference-1-R-2} my_url <- "http://ritsokiguess.site/datafiles/jays15-home.csv" jays <- read_csv(my_url) jays ``` \normalsize ## Another way - This is a "big" data set: only 25 observations, but a lot of *variables*. - To see the first few values in all the variables, can also use `glimpse`: \scriptsize ```{r inference-1-R-5} glimpse(jays) ``` \normalsize ## Attendance histogram ```{r inference-1-R-6, fig.height=3.8} ggplot(jays, aes(x = attendance)) + geom_histogram(bins = 6) ``` ## Comments - Attendances have substantial variability, ranging from just over 10,000 to around 50,000. - Distribution somewhat skewed to right (but no outliers). - These are a sample of "all possible games" (or maybe "all possible games played in April and May"). What can we say about mean attendance in all possible games based on this evidence? ## CI for mean attendance - `t.test` function does CI and test. Look at CI first: ```{r inference-1-R-7} t.test(jays$attendance) ``` - From 20,500 to 29,600. ## Or, 90% CI - by including a value for conf.level: ```{r inference-1-R-8} t.test(jays$attendance, conf.level = 0.90) ``` - From 21,300 to 28,800. (Shorter, as it should be.) ## Comments - Need to say "column attendance within data frame `jays`" using `$`. - 95% CI from about 20,000 to about 30,000. - Not estimating mean attendance well at all! - Generally want confidence interval to be shorter, which happens if: - SD smaller - sample size bigger - confidence level smaller - Last one is a cheat, really, since reducing confidence level increases chance that interval won't contain pop. mean at all! ## Another way to access data frame columns ```{r inference-1-R-9} with(jays, t.test(attendance)) ``` ## Hypothesis testing for Blue Jays attendances - Previous year's mean attendance was 29,327, so test to see whether the mean is different from that in any way (two-sided test): ```{r inference-1-R-10} t.test(jays$attendance, mu = 29327) ``` - See test statistic $-1.93$, P-value 0.065. - Do not reject null at $\alpha=0.05$: no evidence that mean attendance has changed. ## Another example: learning to read - You devised new method for teaching children to read. - Guess it will be more effective than current methods. - To support this guess, collect data. - Want to generalize to “all children in Canada”. - So take random sample of all children in Canada. - Or, argue that sample you actually have is “typical” of all children in Canada. - Randomization (1): whether or not a child in sample or not has nothing to do with anything else about that child. - Randomization (2): randomly choose whether each child gets new reading method (t) or standard one (c). ## Reading in data - File at . - Proper reading-in function is `read_delim` (check file to see) - Read in thus: ```{r inference-1-R-12} my_url <- "http://ritsokiguess.site/datafiles/drp.txt" kids <- read_delim(my_url," ") ``` ## The data (some) ```{r inference-1-R-13} kids ``` ## Boxplots ```{r inference-1-R-14, fig.height=3.7} ggplot(kids, aes(x = group, y = score)) + geom_boxplot() ``` ## Two kinds of two-sample t-test - Do the two groups have same spread (SD, variance)? - If yes (shaky assumption here), can use pooled t-test. - If not, use Welch-Satterthwaite t-test (safe). - Pooled test derived in STAB57 (easier to derive, but assumes equal variances). - Welch-Satterthwaite does not assume equality of variances. - Assess (approx) equality of spreads using boxplot. ## The (Welch-Satterthwaite) t-test - `c` (control) before `t` (treatment) alphabetically, so proper alternative is “less”. - R does Welch-Satterthwaite test by default - Answer to "does the new reading program really help?" - (in a moment) how to get R to do pooled test? ## Welch-Satterthwaite ```{r inference-1-R-15} t.test(score ~ group, data = kids, alternative = "less") ``` ## The pooled t-test ```{r inference-1-R-16} t.test(score ~ group, data = kids, alternative = "less", var.equal = TRUE) ``` ## Two-sided test; CI - To do 2-sided test, leave out `alternative`: ```{r inference-1-R-17} t.test(score ~ group, data = kids) ``` ## Comments: - P-values for pooled and Welch-Satterthwaite tests very similar (even though the pooled test seemed inferior): 0.013 vs.\ 0.014. - Two-sided test also gives CI: new reading program increases average scores by somewhere between about 1 and 19 points. - Confidence intervals inherently two-sided, so do 2-sided test to get them. ## Pain relief Some data: \centering{ \includegraphics[height=0.7\textheight]{Screenshot_2019-04-26_13-41-29} } ## Matched pairs data - Data are comparison of 2 drugs for effectiveness at reducing pain. - 12 subjects (cases) were arthritis sufferers - Response is #hours of pain relief from each drug. - In reading example, each child tried only one reading method. - But here, each subject tried out both drugs, giving us two measurements. - Possible because, if you wait long enough, one drug has no influence over effect of other. - Advantage: focused comparison of drugs. Compare one drug with another on same person, removes a lot of variability due to differences between people. - Matched pairs, requires different analysis. - Design: randomly choose 6 of 12 subjects to get drug A first, other 6 get drug B first. ## Paired t test: reading the data Values aligned in columns: ```{r inference-4-R-1} my_url <- "http://ritsokiguess.site/datafiles/analgesic.txt" pain <- read_table(my_url) ``` ## The data ```{r inference-4-R-2} pain ``` ## Paired *t*-test \small ```{r inference-4-R-3} with(pain, t.test(druga, drugb, paired = T)) ``` \normalsize - P-value is 0.053. - Not quite evidence of difference between drugs. ## t-testing the differences - Likewise, you can calculate the differences yourself and do a 1-sample t-test on them. - First calculate a column of differences: \footnotesize ```{r inference-4-R-4} (pain %>% mutate(diff=druga-drugb) -> pain) ``` \normalsize ## t-test on the differences - then throw them into t.test, testing that the mean is zero, with same result as before: ```{r inference-4-R-5} with(pain, t.test(diff, mu=0)) ```