--- title: "Short Lab 5" author: "INSERT YOUR NAME HERE" date: "Due Date Here" output: html_document --- <!--- Begin styling code. ---> <style type="text/css"> /* Whole document: */ body{ font-family: "Palatino Linotype", "Book Antiqua", Palatino, serif; font-size: 12pt; } h1.title { font-size: 38px; text-align: center; } h4.author { font-size: 18px; text-align: center; } h4.date { font-size: 18px; text-align: center; } </style> <!--- End styling code. ---> ## Part 1. Training and Test Error Use the following code to generate data: ```{r, message = FALSE} library(ggpubr) # generate data set.seed(302) n <- 30 x <- sort(runif(n, -3, 3)) y <- 2*x + 2*rnorm(n) x_test <- sort(runif(n, -3, 3)) y_test <- 2*x_test + 2*rnorm(n) df_train <- data.frame("x" = x, "y" = y) df_test <- data.frame("x" = x_test, "y" = y_test) # store a theme my_theme <- theme_bw(base_size = 16) + theme(plot.title = element_text(hjust = 0.5, face = "bold"), plot.subtitle = element_text(hjust = 0.5)) # generate plots g_train <- ggplot(df_train, aes(x = x, y = y)) + geom_point() + xlim(-3, 3) + ylim(min(y, y_test), max(y, y_test)) + labs(title = "Training Data") + my_theme g_test <- ggplot(df_test, aes(x = x, y = y)) + geom_point() + xlim(-3, 3) + ylim(min(y, y_test), max(y, y_test)) + labs(title = "Test Data") + my_theme ggarrange(g_train, g_test) # from ggpubr, to put side-by-side ``` **1a.** For every k in between 1 and 10, fit a degree-k polynomial linear regression model with `y` as the response and `x` as the explanatory variable(s). (*Hint: Use *`poly()`*, as in the lecture slides.*) **1b.** For each model from (a), record the training error. Then predict `y_test` using `x_test` and also record the test error. **1c.** Present the 10 values for both training error and test error on a single table. Comment on what you notice about the relative magnitudes of training and test error, as well as the trends in both types of error as $k$ increases. **1d.** If you were going to choose a model based on training error, which would you choose? Plot the data, colored by split. Add a line to the plot representing your selection for model fit. Add a subtitle to this plot with the (rounded!) test error. (*Hint: You can use as much of my code as you want for this and part (e). See Lecture Slides 8!*) **1e.** If you were going to choose a model based on test error, which would you choose? Plot the data, colored by split. Add a line to the plot representing your selection for model fit. Add a subtitle to this plot with the (rounded!) test error. **1f.** What do you notice about the shape of the curves from part (d) and (e)? Which model do you think has lower bias? Lower variance? Why?