---
title: "Econ 425T Homework 2"
subtitle: "Due Feb 3, 2023 @ 11:59PM"
author: "YOUR NAME and UID"
date: "`r format(Sys.time(), '%d %B, %Y')`"
format:
  html:
    theme: cosmo
    number-sections: true
    toc: true
    toc-depth: 4
    toc-location: left
    code-fold: false
engine: knitr
knitr:
  opts_chunk: 
    fig.align: 'center'
    # fig.width: 6
    # fig.height: 4
    message: FALSE
    cache: false
---

## Least squares is MLE (10pts)

Show that in the case of linear model with Gaussian errors, maximum likelihood and least squares are the same thing, and $C_p$ and AIC are equivalent.

## ISL Exercise 6.6.1 (10pts)

## ISL Exercise 6.6.3 (10pts)

## ISL Exercise 6.6.4 (10pts)

## ISL Exercise 6.6.5 (10pts)

## ISL Exercise 6.6.11 (30pts)

You must follow the [typical machine learning paradigm](https://ucla-econ-425t.github.io/2023winter/slides/06-modelselection/workflow_lasso.html) to compare _at least_ 3 methods: least squares, lasso, and ridge. Report final results as

| Method | CV RMSE | Test RMSE |
|:------:|:------:|:------:|:------:|
| LS | | | |
| Ridge | | | |
| Lasso | | | |
| ... | | | |

## ISL Exercise 5.4.2 (10pts)

## ISL Exercise 5.4.9 (20pts)

## Bonus question (20pts)

Consider a linear regression, fit by least squares to a set of training data $(x_1, y_1), \ldots, (x_N,  y_N)$ drawn at random from a population. Let $\hat \beta$ be the least squares estimate. Suppose we have some test data $(\tilde{x}_1, \tilde{y}_1), \ldots, (\tilde{x}_M, \tilde{y}_M)$ drawn at random from the same population as the training data. If $R_{\text{train}}(\beta) = \frac{1}{N} \sum_{i=1}^N (y_i - \beta^T x_i)^2$ and $R_{\text{test}}(\beta) = \frac{1}{M} \sum_{i=1}^M (\tilde{y}_i - \beta^T \tilde{x}_i)^2$. Show that
$$
\operatorname{E}[R_{\text{train}}(\hat{\beta})] < \operatorname{E}[R_{\text{test}}(\hat{\beta})].
$$