The original research goal for the treadmill data set used for practice problems in the last two chapters was to replace the costly treadmill oxygen test with a cheap to find running time measurement but there were actually quite a few variables measured when the run time was found – maybe we can replace the treadmill test result with a combined prediction built using a few variables using the MLR techniques. The following code will get us re-started in this situation.
treadmill <- read_csv("http://www.math.montana.edu/courses/s217/documents/treadmill.csv")
tm1 <- lm(TreadMillOx~RunTime, data=treadmill)
8.1. Fit the MLR that also includes the running pulse (RunPulse
), the
resting pulse (RestPulse
), body weight (BodyWeight
), and Age (Age
)
of the subjects. Report and interpret the R2 for
this model.
8.2. Compare the R2 and the adjusted R2 to the results for the SLR model
that just had RunTime
in the model. What do these results suggest?
8.3. Interpret the estimated RunTime
slope coefficients from the SLR model
and this MLR model. Explain the differences in the estimates.
8.4. Find the VIFs for this model and discuss whether there is an issue with multicollinearity noted in these results.
8.5. Report the value for the overall \(F\)-test for the MLR model and interpret the result.
8.6. Drop the variable with the largest p-value in the MLR model and re-fit it. Compare the resulting R2 and adjusted R2 values to the others found previously.
8.7. Use the dredge
function as follows to consider some other potential
reduced models and report the top two models according to adjusted
R2 values. What model had the highest R2?
Also discuss and compare the model selection results provided by the delta AICs
here.
require(MuMIn)
options(na.action = "na.fail") #Must run this code once to use dredge
dredge(MODELNAMEFORFULLMODEL, rank="AIC",
extra=c("R^2", adjRsq=function(x) summary(x)$adj.r.squared))
8.8. For one of the models, interpret the Age slope coefficient. Remember that only male subjects between 38 and 57 participated in this study. Discuss how this might have impacted the results found as compared to a more general population that could have been sampled from.
8.9. The following code creates a new three-level variable grouping the ages into low, middle, and high for those observed. The scatterplot lets you explore whether the relationship between treadmill oxygen and run time might differ across the age groups.
treadmill$Ageb <- cut(treadmill$Age, breaks=c(37,44.5,50.5,58))
summary(treadmill$Ageb)
library(car)
scatterplot(TreadMillOx~RunTime|Ageb, data=treadmill, smooth=F, lwd=2)
Based on the plot, do the lines look approximately parallel or not?
8.10. Fit the MLR that contains a RunTime
by Ageb
interaction – do not
include any other variables. Compare the R2 and adjusted
R2 results to previous models.
8.11. Find and report the results for the \(F\)-test that assesses evidence relative to the need for different slope coefficients.
8.12. Write out the overall estimated model. What level was R using as baseline? Write out the simplified model for two of the age levels.
8.13. Fit the additive model with RunTime
and predict the mean treadmill
oxygen values for subjects with run times of 11 minutes in each of the three
Ageb
groups.
8.14. Find the \(F\)-test results for the binned age variable in the additive model. Report and interpret those results.