---
title: 'STA130H1S - Class #5'
author: "Prof. A. Gibbs"
date: "February 5, 2018"
output:
ioslides_presentation:
smaller: no
widescreen: yes
slidy_presentation:
font_adjustment: +1
beamer_presentation: default
subtitle: 'Inferential Thinking Part 2: Testing hypotheses on two groups'
---
##
```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(tidyverse)
```
### Another look at the "Economic Guide to Picking a Major" from second week:
- Is there evidence that salaries are, on average, higher for Engineering graduates than Arts graduates?
- Is there evidence that salaries are, on average, higher for Computers & Mathematics graduates than Business graduates?
```{r,echo=FALSE, fig.height=3.5}
library(fivethirtyeight)
somemajors <- college_recent_grads %>%
filter(major_category %in% c('Computers & Mathematics','Engineering','Arts','Business'))
ggplot(data=somemajors, aes(x=major_category, y=median)) + geom_boxplot() + theme_bw() +
labs(x="Major", y="Median salary")
```
## Today
Answering the question:
*if we see a difference between two groups, is it meaningful?
or could it just be due to chance?*
Examples for:
- proportions for two groups
- means for two groups
Can extend to comparing any statistic between two groups
Recommended reading:
Sections 2.1, 2.2, 2.3 (excluding 2.3.4) of [*Introductory Statistics with Randomization and Simulation* from OpenIntro](https://www.openintro.org/stat/textbook.php?stat_book=isrs)
(a free open-source textbook)
## Statistical Inference
- Imagine we have a "real world" where we observe data, and a "theoretical world" (a population or scientific model) that we want to make conclusions about.
- Inference connects what we have observed in the real world to what we can say about the theoretical world.
- *Last class:* made inferences about one proportion
- *Today:* our theoretical world models will be that two groups are the same in some way, and we'll test to see if our data are consistent with that
# Hypothesis Testing for Two Proportions: Comparing a proportion between two groups
## Gender Bias in Promotion
- 1972 study on "sex role stereotypes on personnel decisions".
- 48 male managers were asked to rate whether several candidates were suitable for promotion.
- Managers were randomly assigned to review the file of either a male or female candidate. The files were otherwise identical.
B. Rosen and T.H. Jerdee (1974). Influence of sex role stereotypes on personnel decisions. *Journal of Applied Psychology* **59**(1), 9-14.
## What they found
Observed results | Male | Female | Total
---------------- | ---- | ------ | -----
Promoted | 21 | 14 | 35
Not promoted | 3 | 10 | 13
| | |
Total | 24 | 24 | 48
- 21/24 = 87.5% of males were recommended for promotion
- 14/24 = 58.3% of females were recommended for promotion
## The data
Data are in the dataframe `bias` (which I created)
```{r, echo=F,message=FALSE, warning=FALSE}
# create datafrome
bias <- data_frame(gender=c(rep("male",24),rep("female",24)),
promoted=c(rep("yes",21),rep("no",3),rep("yes",14),rep("no",10)))
```
```{r}
glimpse(bias)
```
- How many variables are in the data frame?
- Are the variables numerical or categorical?
---
### Code to calculate the proportion of males and females promoted
```{r}
n_female <- bias %>% filter(gender=="female") %>% summarise(n())
n_male <- bias %>% filter(gender=="male") %>% summarise(n())
yes_female <- bias %>%
filter(promoted=="yes" & gender=="female") %>% # only promoted females
summarize(n()) # count
as.numeric(yes_female) # treat as a number (not a dataframe)
yes_male <- bias %>%
filter(promoted=="yes" & gender=="male") %>% # only promoted males
summarize(n()) # count
as.numeric(yes_male)
```
---
### Code to calculate the proportion of males and females promoted
```{r}
# calculate the difference in the proportion of people promoted by gender
p_diff <- yes_female/n_female - yes_male/n_male
as.numeric(p_diff)
```
## Is the difference between the proportion of males and females promoted meaningful?
- The difference in the proportions of people who were deemed suitable for promotion between the females and males is
$$\hat{p}_{female} - \hat{p}_{male} = 0.583-0.875 = -0.292$$
- This suggests that the males were more likely to be recommended for promotion.
- But the sample size is small. Could this difference just be due to chance?
- Repeat the experiment assuming it's just due to chance (using simulation), and see what happens
# Review: The Logic of Hypothesis Testing
## 1. The hypotheses
Two claims:
1. There is nothing going on. This is the **null hypothesis**, written $H_0$.
For the gender bias in promotion study:
2. There is something going on. This is the **alternative hypothesis**, written $H_A$ (or $H_a$ or $H_1$).
The alternative is almost always what the research wants to find evidence for.
For the gender bias in promotion study:
## 2. The test statistic
The **test statistic** is a number, calculated from the data, that captures what we're interested in.
For the gender bias promotion example, what would be a useful test statistic?
---
*Is it possible that the value of the test statistic occured just by chance and there was really no difference between genders in being recommended for promotion?*
To answer this, simulate possible values of the test statistic assuming there's no difference (i.e., the null hypothesis is true).
## 3. Simulate what $H_0$ predicts will happen
We want to simulate many many possible values of what the test statistic might have looked like if the null hypothesis were true to know the distribution of its possible values.
How can we do this?
If the null hypothesis is true, any observation (promoted or not promoted) is just as likely to be for one gender as the other gender. And all ways the observations could be arranged among the two genders are equally likely.
---
---
---
## 3. Simulate what $H_0$ predicts will happen
- shuffle the categorical variable that says to which gender each observation belongs
- calculate the difference in the proportions of people who were promoted in the new groups
- repeat lots of times
## How to shuffle
The `sample()` command by default produces a random sample of the same length of the data without replacement
```{r}
# illustration of sample
a_vector <- c(1,1,1,2,2)
a_vector
sample(a_vector)
sample(a_vector)
sample(a_vector)
```
## Before the shuffle
```{r, echo=F}
set.seed(130) # set the random number seed if you want to get the same answer every time
```
```{r}
bias$gender # the values of gender in the data
bias$promoted
```
## After the shuffle
```{r}
sim <- bias %>% mutate(gender = sample(gender)) # shuffle gender labels
sim$gender
sim$promoted
```
## After the shuffle
```{r}
yes_female <- sim %>% filter(promoted=="yes" & gender=="female") %>% # only promoted females
summarize(n()) # count
as.numeric(yes_female)
yes_male <- sim %>% filter(promoted=="yes" & gender=="male") %>% # only promoted males
summarize(n()) # count
as.numeric(yes_male)
# calculate the difference in the proportion of people promoted by gender
p_diff <- yes_female/n_female - yes_male/n_male
as.numeric(p_diff)
```
---
### Simulate the possible values of the difference in the proportions many times, assuming the null hypothesis is true
```{r, warning=F, message=F}
set.seed(130) # remove in practice
repetitions <- 1000 # "many times" will be 1000
# create a vector of missing values to store results
# rep() is the replicate function
# NA means a missing value
simulated_stats <- rep(NA, repetitions) # 1000 missing values
# initialize some values
n_female <- bias %>% filter(gender=="female") %>% summarise(n())
n_male <- bias %>% filter(gender=="male") %>% summarise(n())
```
---
```{r}
# calculate the test statistic
yes_female <- bias %>%
filter(promoted=="yes" & gender=="female") %>% # only promoted females
summarize(n()) # count
yes_male <- bias %>%
filter(promoted=="yes" & gender=="male") %>% # only promoted males
summarize(n()) # count
test_stat <- as.numeric(yes_female/n_female - yes_male/n_male) # treat result as a number
```
---
```{r}
for (i in 1:repetitions)
{
sim <- bias %>% mutate(gender = sample(gender)) # shuffle gender labels
yes_female <- sim %>%
filter(promoted=="yes" & gender=="female") %>% # only promoted females
summarize(n()) # count
yes_male <- sim %>%
filter(promoted=="yes" & gender=="male") %>% # only promoted males
summarize(n()) # count
# calculate the difference in the proportion of people promoted by gender in the simulation
p_diff <- yes_female/n_female - yes_male/n_male
# add the new simulated value to the ith entry in the vector of results
simulated_stats[i] <- as.numeric(p_diff) # treat result as a number
}
# turn results into a data frame for plotting
sim <- data_frame(p_diff=simulated_stats)
```
---
### Distribution of simulated values of $\hat{p}_{female}-\hat{p}_{male}$ assuming $H_0$ is true
```{r, warning=F, message=F, fig.height=3.5}
ggplot(sim, aes(x=p_diff)) + geom_histogram(binwidth=0.1)
```
Around what value is this distribution centred? Does this make sense?
## 4. The P-value
- Assuming that the null hypothesis is true, the **P-value** gives a measure of the probability of getting data that are at least as unusual as the sample data.
- What does "at least as unusual" mean?
Values that are as far away or even farther from the null hypothesis value than the test statistic.
For the gender bias example:
- the null hypothesis value is $p_1-p_2=0$
- the observed estimate from the data (the test statistic) is
$\hat{p}_1-\hat{p}_2=-0.292$
- values at least as unusual as the data values inclues all values *greater than or equal to +0.292* and all values *less than or equal to -0.292*
- This is a **two-sided test** because it considers differences from the null hypothesis that are both larger and smaller than what you observed.
## Values more extreme than the test statistic
```{r promoteplotwithp, echo=T, eval=F, warning=F, message=F}
test_stat
ggplot(sim, aes(p_diff)) +
geom_histogram(binwidth=0.1) +
geom_vline(xintercept = test_stat, color="red") +
geom_vline(xintercept = -1*test_stat, color="red") +
labs(x = "Simulated difference in proportion promoted between female and male candidates")
```
---
```{r promoteplotwithp, echo=F, eval=T}
```
## Calculate P-value
```{r}
test_stat
extreme_count <- sim %>%
filter(p_diff >= abs(test_stat) | p_diff <= -1*abs(test_stat)) %>%
summarise(n())
as.numeric(extreme_count)
p_value <- as.numeric(extreme_count)/repetitions
as.numeric(p_value)
```
## 5. Make a conclusion
- A large P-value means the data are consistent with the null hypothesis.
- A small P-value means the data are inconsistent with the null hypothesis.
The P-value is 0.048 for our test that the proportion of people promoted is the same for females and males.
We conclude that there is moderate evidence of a difference between genders in being chosen for promotion.
# Hypothesis testing for comparing a characteristic of a numerical variable between two groups
## Example: Sleep and performance on a visual discrimination task
Stickgold, James and Hobson (2000). Visual discrimination learning requires sleep after training. *Nature Neuroscience* **3**(12), 1237-8
## Can you recover from an all-nighter after a couple of days of good sleep?
- Subjects: 21 student volunteers (ages 18 to 25)
- Subjects were trained on a visual discrimination task
- Subjects were then randomly assigned into two groups:
- *sleep deprived*: kept up all night after the training and then not allowed to sleep until 9pm the next day
- *unrestricted sleep*: no restrictions on their sleep
- 11 subjects were in the sleep deprived group and 10 subjects were in the unrestricted sleep group
- Subjects then were allowed unrestricted sleep for the next two nights
- Subjects were then retested on the visual discrimination task
## The visual discrimination task
- Subjects shown "target screen"" A or B for 17 milliseonds
- Then shown blank screen for a variable length of time, the "interstimulus interval" (ISI)
- Then shown "mask screen" with random pattern for 17 milliseconds
- Asked if target screen included an L or a T and whether the slashes were vertical or horizontal
- Score on the task for a subject was the minimum interstimulus interval (ISI) required for the subject to achieve accurate results