--- title: "Lab 07" output: html_document --- Start by reading in all of the R packages that we will need for today: ```{r, message=FALSE} library(dplyr) library(ggplot2) library(tmodels) library(readr) library(readxl) ``` ## Graduate pay information by major Start by reading in the following dataset that describe the salaries of recent graduates based on their major ```{r, message=FALSE} pay <- read_csv("https://raw.githubusercontent.com/statsmaths/stat_data/gh-pages/grad_info.csv") ``` Open the dataset with the data viewer and make sure that you understand what the variables all mean (if something is unclear, just ask!). The unit of observation here is a major. Run a T-test that predicts the median pay based on whether a major is in the sciences: ```{r} ``` Is the p-value significant? Which group has a lower unemployment rate? How would you describe the results of this study? Run a Mann-Whitney test for median pay; how does the p-value compare to that of the T-test? ```{r} ``` Now, run a T-test that predicts the unemployment rate based on whether a major is in the sciences: ```{r} ``` Is the p-value significant? Which group has a lower unemployment rate? How would you describe the results of this study? ## Robustness The tail function shows the last few rows of your dataset: ```{r} tail(pay) ``` Library science majors do not make very much money. What if someone accidentally wrote down the pay of 22k per year as 22 million per year? We can change this one value with the following R code: ```{r} pay\$median_pay <- 22000000 ``` Now re-run the T-test here predicting the median pay as a result of whether a major is in the sciences: ```{r} ``` Is the test still significant? Now, re-run the Mann-Whitney test: ```{r} ``` Is this test still significant? How much does the test statistic change compared to the original dataset? Which of the two tests is more robust to one bad data point?