2.1. The tests for the overtake distance data were performed with two-sided alternatives and so two-sided areas used to find the p-values. Suppose that the researchers expected that the average passing distance would be less (closer) for the commute clothing than for the casual clothing group. Repeat obtaining the permutation-based p-value for the one-sided test for either the full or smaller sample data set. Hint: Your p-value should be just about half of what it was before and in the direction of the alternative.
Load the HELPrct
data set from the mosaicData
package (Pruim, Kaplan, and Horton 2018)
(you need to
install the mosaicData
package once to be able to load it). The HELP study
was a clinical trial for adult inpatients recruited from a
detoxification unit. Patients with no primary care physician were randomly
assigned to receive a multidisciplinary assessment and a brief motivational
intervention or usual care and various outcomes were observed. Two of the
variables in the data set are sex
, a factor with levels male and female
and daysanysub
which is the time (in days) to first use of any substance
post-detox. We are interested in the difference in mean number of days to first
use of any substance post-detox between males and females. There are some
missing responses and the following code will produce favstats
with the
missing values and then provide a data set that by
applying the na.omit
function removes any observations with missing
values.
library(mosaicData)
data(HELPrct)
HELPrct2 <- HELPrct[, c("daysanysub", "sex")] #Just focus on two variables
HELPrct3 <- na.omit(HELPrct2) #Removes subjects with missing values
favstats(daysanysub~sex, data=HELPrct2)
favstats(daysanysub~sex, data=HELPrct3)
2.2. Based on the results provided, how many observations were missing for males and females? Missing values here likely mean that the subjects didn’t use any substances post-detox in the time of the study but might have at a later date – the study just didn’t run long enough. This is called censoring. What is the problem with the numerical summaries here if the missing responses were all something larger than the largest observation?
2.3. Make a pirate-plot and a boxplot of daysanysub
~ sex
using the
HELPrct3
data set created above. Compare the distributions, recommending
parametric or nonparametric inferences.
2.4. Generate the permutation results and write out the 6+ steps of the hypothesis test.
2.5. Interpret the p-value for these results.
2.6. Generate the parametric test results using lm
, reporting the test-statistic,
its distribution under the null hypothesis, and compare the p-value to those
observed using the permutation approach.
2.7. Make and interpret a 95% bootstrap confidence interval for the difference in the means.
Pruim, Randall, Daniel Kaplan, and Nicholas Horton. 2018. MosaicData: Project Mosaic Data Sets. https://CRAN.R-project.org/package=mosaicData.