---
title: "Worksheet for Lab 10"
author: "NAME"
date: "`r Sys.Date()`"
output: html_document
---

## Introduction

With this worksheet, you can work on and run the chunks of code you need for different exercises.  Chunks of code appear between the three backquote marks and have a different color, like this:

```{r chunk_name}
2 + 3
```

## Preliminaries

Run this chunk first!  This chunk of code loads the packages and datasets you need for this activity.

```{r setup}
library(tidyverse)
library(infer)

lullaby_paired <- read_csv("https://raw.githubusercontent.com/gregcox7/StatLabs/main/data/lullaby_wide.csv")
lullaby_twogroups <- read_csv("https://raw.githubusercontent.com/gregcox7/StatLabs/main/data/lullaby_twogroups.csv")
hiring_study <- read_csv("https://raw.githubusercontent.com/gregcox7/StatLabs/main/data/hiring_study.csv")
```

Within the braces at the beginning of each chunk, the letter `r` says that the code is written in the "R" language and the text after the little `r` is a helpful label for the chunk.

To run a chunk of code, press the green arrow button in the upper right of the chunk.  The result of running the chunk will appear immediately below it.  You can run a chunk more than once.  If you need to change anything or get an error, you can always edit your chunk and try again.

## Exercise 10.1

Run the following chunk of code to examine the first few rows of the *original* infant data from last time:

```{r}
head(lullaby_paired)
```

Now look at the first few rows of the data in which we *pretend* that the infants were two different groups:

```{r}
head(lullaby_twogroups)
```

In your own words, describe the differences in how the data are structured between the original paired data and our pretend version with two groups.  *Hint:* Would we be able to easily create a new `diff` variable with the two-group data like we did with the original paired data?

## Exercise 10.2

Last time, we tested the null hypothesis that the difference in mean looking times was zero.  Another way to state that null hypothesis is that looking time and hearing people sing are *independent*.  Now we will use our pretend two-group data to test the same null hypothesis and see whether our conclusions would be different.

Fill in the blanks in the following chunk of code to use *random permutation* to simulated 1000 datasets assuming the null hypothesis is true.  At the end of the chunk, we visualize the sampling distribution of the *difference in means* between groups across each of those simulations, along with a red line indicating where the observed difference falls.  For guidance, check out how we did this with Kobe's data in Lab 5; the only important difference is the `stat` in the `calculate` line (remember that we are looking at differences in means, not proportions).  Also, remember that with the pretend two-group data we have both a response variable (`Looking`) and an explanatory variable (`Group`).

```{r}
lullaby_twogroups_obs_diff <- lullaby_twogroups %>%
    specify(___ ~ ___) %>%
    calculate(stat = "___", order = c("After", "Before"))

lullaby_twogroups_null_dist <- lullaby_twogroups %>%
    specify(___ ~ ___) %>%
    hypothesize(null = "___") %>%
    generate(reps = 1000, type = "___") %>%
    calculate(stat = "___", order = c("After", "Before"))

lullaby_twogroups_null_dist %>%
    visualize() +
    shade_p_value(obs_stat = lullaby_twogroups_obs_diff, direction = "two-sided")
```

a. Based on the visualization you just made, what would you conclude?  Would you reject the null hypothesis or not and why?
b. Run the following chunk of code to perform the same test using the original paired data (last time, we did this with a mathematical model instead of simulation, but the outcome is the same).  Would you come to the same conclusion with the original paired data as with our pretend two-group data?

```{r}
lullaby_paired_obs_diff <- lullaby_paired %>%
    mutate(diff = After - Before) %>%
    specify(response = diff) %>%
    calculate(stat = "mean")

lullaby_paired_null_dist <- lullaby_paired %>%
    mutate(diff = After - Before) %>%
    specify(response = diff) %>%
    hypothesize(null = "point", mu = 0) %>%
    generate(reps = 1000, type = "bootstrap") %>%
    calculate(stat = "mean")

lullaby_paired_null_dist %>%
    visualize() +
    shade_p_value(obs_stat = lullaby_paired_obs_diff, direction = "two-sided")
```

c. Why do you think the observed difference of means (the red line) is in the same position on the X axis in both graphs?
d. Why do you think the sampling distribution is wider in part [a] (with the pretend two-group data) than part [b] (with the original paired data)?

## Exercise 10.4

Fill in the blanks in the code below with the appropriate variable names to visualize the distribution of ratings in the two groups.  *Hint:* Keep in mind what are the names of the explanatory variable and response variable.

```{r eval=FALSE}
hiring_study %>%
    ggplot(aes(x = ___)) +
    geom_histogram(binwidth = 1) +
    facet_wrap("___", ncol = 1)
```

a. Describe the shape of the distributions and whether it seems like there may be a relationship between the type of pitch and hiring ratings.
b. Fill in the blanks in the code below to use random permutation to simulate *one* way the data would look if the null hypothesis were true.  Describe how, if at all, the distributions differ from how they looked in the original data in part (a).

```{r eval=FALSE}
hiring_null_simulation <- hiring_study %>%
    specify(___ ~ ___) %>%
    hypothesize(null = "___") %>%
    generate(reps = 1, type = "___")

hiring_null_simulation %>%
    ggplot(aes(x = ___)) +
    geom_histogram(binwidth = 1) +
    facet_wrap("___", ncol=1)
```

## Exercise 10.5

Fill in the blanks in the code below using the appropriate variable names to obtain numerical summaries of each group's hiring ratings.

```{r}
hiring_study %>%
    group_by(___) %>%
    summarize(sample_mean = mean(___), sample_sd = sd(___), sample_size = n())
```

Use the resulting table to answer the following questions.

a. Do the two groups have similar variability?  (Be sure to refer to the "rule of thumb" for checking whether two samples have similar variability.)
b. Do the sample means suggest an advantage for either spoken ("audio") or written ("transcript") hiring pitches?

## Exercise 10.6

Fill in the blanks in the code below (using the appropriate variable names) to find the observed difference in mean hiring ratings between conditions, then compare this against the differences we would expect *if* the null hypothesis were true.  We will build the sampling distribution of differences in means using random permutation.  Be sure to make at least 1000 simulations.  For the final blank, the `direction` can be either `less`, `greater`, or `two-sided`; pick the one corresponding to the alternative hypothesis.

```{r eval=FALSE}
hiring_obs_diff <- hiring_study %>%
    specify(___ ~ ___) %>%
    calculate(stat = "___", order = c("audio", "transcript"))

hiring_null_simulation <- hiring_study %>%
    specify(___ ~ ___) %>%
    hypothesize(null = "___") %>%
    generate(reps = ___, type = "___") %>%
    calculate(stat = "___", order = c("audio", "transcript"))

hiring_null_simulation %>%
    visualize() +
    shade_p_value(obs_stat = hiring_obs_diff, direction = "___")
```

Based on the visualization, does it seems like the observed difference would be unusual if the null hypothesis were true?

## Exercise 10.7

Fill in the blanks in the code below (using the appropriate variable names) to calculate the **T score** associated with our sample:

```{r eval=FALSE}
hiring_obs_t <- hiring_study %>%
    specify(___ ~ ___) %>%
    calculate(stat = "t", order = c("audio", "transcript"))

hiring_obs_t
```

a. Does the T score indicate that the observed difference is relatively near or far from what we would expect if the null hypothesis were true?  Explain your reasoning.
b. Above, we used code with the line `calculate(stat = "diff in means", order = c("audio", "transcript"))` to get the observed difference in sample means.  How does that line differ from the `calculate` line in the code you used in this Exercise to find the T score?  Describe in your own words what the `stat` option seems to do.

## Exercise 10.8

Fill in the blanks in the code below to visualize where our sample falls on the corresponding T distribution.  As above, be sure to fill in the last blank according to the alternative hypothesis (either `two-sided`, `less`, or `greater`).

```{r}
hiring_null_math <- hiring_study %>%
    specify(___ ~ ___) %>%
    assume("t")

hiring_null_math %>%
    visualize() +
    shade_p_value(obs_stat = hiring_obs_t, direction = "___")
```

a. Compare the visualization you just made using the mathematical model with the visualization you made earlier using simulation.  Relative to each null distribution, is our observed data (the red vertical line) in a similar place?  Is a similar proportion of the null distribution more extreme than what was observed?

b. Fill in the blanks in the code below to get the $p$ value according to the mathematical model (again, the `direction` should correspond to the alternative hypothesis).  Would you reject the null hypothesis or not and why?  (Be sure to state your significance level!)

```{r}
hiring_null_math %>%
    get_p_value(obs_stat = hiring_obs_t, direction = "___")
```

c. Based on this hypothesis test (either the simulation or mathematical approach will do), give a one sentence summary of what we should conclude about the effectiveness of spoken vs. written pitches on hiring ratings.
d. Speculate about why these results turned out the way they did.  Do you expect this result would generalize to hiring decisions in real life?  Why or why not?