18.6 Chapter 13: Hypothesis tests
- Do male pirates have significantly longer beards than female pirates? Test this by conducting the appropriate test on the relevant data in the pirates dataset.
beard.sex.htest <- t.test(formula = beard.length ~ sex,
subset = sex %in% c("male", "female"),
data = pirates)
beard.sex.htest
##
## Welch Two Sample t-test
##
## data: beard.length by sex
## t = -71, df = 500, p-value <2e-16
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 95 percent confidence interval:
## -20 -18
## sample estimates:
## mean in group female mean in group male
## 0.4 19.4
Answer: Yes, men have significantly longer beards than women, mean difference = 19.02, t(499.82) = -70.89, p < 0.01 (2-tailed)
- Are pirates whose favorite pixar movie is Up more or less likely to wear an eye patch than those whose favorite pixar movie is Inside Out? Test this by conducting the appropriate test on the relevant data in the pirates dataset.
df <- subset(pirates, fav.pixar %in% c("Up", "Inside Out"))
pixar.ep.table <- table(df$fav.pixar, df$eyepatch)
pixar.ep.htest <- chisq.test(pixar.ep.table)
pixar.ep.htest
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: pixar.ep.table
## X-squared = 89, df = 1, p-value <2e-16
Answer: Yes, pirates whose favorite movie is Inside Out are much more likely to wear an eye patch than those whose favorite Pixar movie is Up, X(1, N = 422) = 88.96, p < 0.01 (2-tailed)
- Do longer movies have significantly higher budgets than shorter movies? Answer this question by conducting the appropriate test in the movies dataset.
budget.time.htest <- cor.test(formula = ~ budget + time,
data = movies)
budget.time.htest
##
## Pearson's product-moment correlation
##
## data: budget and time
## t = 14, df = 2313, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.24 0.32
## sample estimates:
## cor
## 0.28
Answer: Yes, longer movies tend to have higher budgets than shorter movies, r = 0.28, t(2313) = 14.09, p < 0.01 (2-tailed)
- Do R rated movies earn significantly more money than PG-13 movies? Test this by conducting a the appropriate test on the relevant data in the movies dataset.
revenue.rating.htest <- t.test(formula = revenue.all ~ rating,
subset = rating %in% c("R", "PG-13"),
data = movies)
revenue.rating.htest
##
## Welch Two Sample t-test
##
## data: revenue.all by rating
## t = 11, df = 1779, p-value <2e-16
## alternative hypothesis: true difference in means between group PG-13 and group R is not equal to 0
## 95 percent confidence interval:
## 56 82
## sample estimates:
## mean in group PG-13 mean in group R
## 148 80
Answer: No, R Rated movies do not earn significantly more than PG-13 movies. In fact, PG-13 movies earn significantly more than R rated movies.
- Are certain movie genres significantly more common than others in the movies dataset?
genre.table <- table(movies$genre)
genre.htest <- chisq.test(genre.table)
genre.htest
##
## Chi-squared test for given probabilities
##
## data: genre.table
## X-squared = 6409, df = 13, p-value <2e-16
Answer: Yes, some movie genres are more common than others, X(13, N = 4682) = 6408.91, p < 0.01 (2-tailed)
- Do sequels and non-sequels differ in their ratings?
genre.sequel.table <- table(movies$genre, movies$sequel)
genre.sequel.htest <- chisq.test(genre.sequel.table)
## Warning in chisq.test(genre.sequel.table): Chi-squared approximation may be incorrect
Answer: Yes, sequels are more likely in some genres than others.
Note: The error “Warning in chisq.test” we get in this code is due to the fact that some cells have no entries. This can make the test statistic unreliable. You can correct it by adding a value of 20 to every element in the table as follows:
genre.sequel.table <- table(movies$genre, movies$sequel)
# Add 20 to each cell to correct for empty cells
genre.sequel.table <- genre.sequel.table + 20
# Here is the result
genre.sequel.table
##
## 0 1
## Action 550 178
## Adventure 384 141
## Black Comedy 54 20
## Comedy 1078 172
## Concert/Performance 34 20
## Documentary 83 20
## Drama 1077 46
## Horror 235 105
## Multiple Genres 21 20
## Musical 92 25
## Reality 22 20
## Romantic Comedy 265 23
## Thriller/Suspense 425 41
## Western 57 21