6.14 Practice problems

These questions revisit the treadmill data set from Chapter 1. Researchers were interested in whether the run test variable could be used to replace the treadmill oxygen consumption variable that is expensive to measure. The following code loads the data set and provides a scatterplot matrix using pairs.panel.

treadmill <- read_csv("http://www.math.montana.edu/courses/s217/documents/treadmill.csv")
require(psych)
pairs.panels(treadmill, ellipses=F, smooth=F, col=0)

6.1. First, we should get a sense of the strength of the correlation between the variable of primary interest, TreadMillOx, and the other variables and consider whether outliers or nonlinearity are going to be major issues here. Which variable is it most strongly correlated with? Which variables are next most strongly correlated with this variable?

6.2. Fit the SLR using RunTime as explanatory variable for TreadMillOx. Report the estimated model.

6.3. Predict the treadmill oxygen value for a subject with a run time of 14 minutes. Repeat for a subject with a run time of 16 minutes. Is there something different about these two predictions?

6.4. Interpret the slope coefficient from the estimated model, remembering the units on the variables.

6.5. Report and interpret the \(y\)-intercept from the SLR.

6.6. Report and interpret the \(R^2\) value from the output. Show how you can find this value from the original correlation matrix result.

6.7. Produce the diagnostic plots and discuss any potential issues. What is the approximate leverage of the highest leverage observation and how large is its Cook’s D? What does that tell you about its potential influence in this model?