2.13 Practice problems

2.1. The tests for the overtake distance data were performed with two-sided alternatives and so two-sided areas used to find the p-values. Suppose that the researchers expected that the average passing distance would be less (closer) for the commute clothing than for the casual clothing group. Repeat obtaining the permutation-based p-value for the one-sided test for either the full or smaller sample data set. Hint: Your p-value should be just about half of what it was before and in the direction of the alternative.

Load the HELPrct data set from the mosaicData package (Pruim, Kaplan, and Horton 2018) (you need to install the mosaicData package once to be able to load it). The HELP study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomly assigned to receive a multidisciplinary assessment and a brief motivational intervention or usual care and various outcomes were observed. Two of the variables in the data set are sex, a factor with levels male and female and daysanysub which is the time (in days) to first use of any substance post-detox. We are interested in the difference in mean number of days to first use of any substance post-detox between males and females. There are some missing responses and the following code will produce favstats with the missing values and then provide a data set that by applying the na.omit function removes any observations with missing values.

library(mosaicData)
data(HELPrct)
HELPrct2 <- HELPrct[, c("daysanysub", "sex")] #Just focus on two variables
HELPrct3 <- na.omit(HELPrct2) #Removes subjects with missing values
favstats(daysanysub~sex, data=HELPrct2)
favstats(daysanysub~sex, data=HELPrct3)

2.2. Based on the results provided, how many observations were missing for males and females? Missing values here likely mean that the subjects didn’t use any substances post-detox in the time of the study but might have at a later date – the study just didn’t run long enough. This is called censoring. What is the problem with the numerical summaries here if the missing responses were all something larger than the largest observation?

2.3. Make a pirate-plot and a boxplot of daysanysub ~ sex using the HELPrct3 data set created above. Compare the distributions, recommending parametric or nonparametric inferences.

2.4. Generate the permutation results and write out the 6+ steps of the hypothesis test.

2.5. Interpret the p-value for these results.

2.6. Generate the parametric test results using lm, reporting the test-statistic, its distribution under the null hypothesis, and compare the p-value to those observed using the permutation approach.

2.7. Make and interpret a 95% bootstrap confidence interval for the difference in the means.

References

Pruim, Randall, Daniel Kaplan, and Nicholas Horton. 2018. MosaicData: Project Mosaic Data Sets. https://CRAN.R-project.org/package=mosaicData.