These questions revisit the treadmill data set from Chapter 1.
Researchers were
interested in whether the run test variable could be used to replace the treadmill
oxygen consumption variable that is expensive to measure. The following code loads
the data set and provides a scatterplot matrix using pairs.panel
.
treadmill <- read_csv("http://www.math.montana.edu/courses/s217/documents/treadmill.csv")
require(psych)
pairs.panels(treadmill, ellipses=F, smooth=F, col=0)
6.1. First,
we should get a sense of the strength of the correlation between the variable
of primary interest, TreadMillOx
, and the other variables and consider
whether outliers or nonlinearity are going to be
major issues here. Which variable is it most strongly correlated with? Which
variables are next most strongly correlated with this variable?
6.2. Fit
the SLR using RunTime
as explanatory variable for TreadMillOx
.
Report the estimated model.
6.3. Predict the treadmill oxygen value for a subject with a run time of 14 minutes. Repeat for a subject with a run time of 16 minutes. Is there something different about these two predictions?
6.4. Interpret the slope coefficient from the estimated model, remembering the units on the variables.
6.5. Report and interpret the \(y\)-intercept from the SLR.
6.6. Report and interpret the \(R^2\) value from the output. Show how you can find this value from the original correlation matrix result.
6.7. Produce the diagnostic plots and discuss any potential issues. What is the approximate leverage of the highest leverage observation and how large is its Cook’s D? What does that tell you about its potential influence in this model?