---
title: "Econ 425T Homework 1"
subtitle: "Due Jan 25, 2023 @ 11:59PM"
author: "YOUR NAME and UID"
date: "`r format(Sys.time(), '%d %B, %Y')`"
format:
  html:
    theme: cosmo
    number-sections: true
    toc: true
    toc-depth: 4
    toc-location: left
    code-fold: false
engine: knitr
knitr:
  opts_chunk: 
    fig.align: 'center'
    # fig.width: 6
    # fig.height: 4
    message: FALSE
    cache: false
---

## Filling gaps in lecture notes (10pts)

Consider the regression model
$$
Y = f(X) + \epsilon,
$$
where $\operatorname{E}(\epsilon) = 0$. 

### Optimal regression function

Show that the choice
$$
f_{\text{opt}}(X) = \operatorname{E}(Y | X)
$$
minimizes the mean squared prediction error
$$
\operatorname{E}[Y - f(X)]^2,
$$
where the expectations averages over variations in both $X$ and $Y$. (Hint: condition on $X$.)

### Bias-variance trade-off

Given an estimate $\hat f$ of $f$, show that the test error at a $x_0$ can be decomposed as
$$
\operatorname{E}[y_0 - \hat f(x_0)]^2 = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}},
$$
where the expectation averages over the variability in $y_0$ and $\hat f$.

## ISL Exercise 2.4.3 (10pts)

## ISL Exercise 2.4.4 (10pts)

## ISL Exercise 2.4.10 (30pts)

Your can read in the `boston` data set directly from url <https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv>. A documentation of the `boston` data set is [here](https://www.rdocumentation.org/packages/ISLR2/versions/1.3-2/topics/Boston).

::: {.panel-tabset}

#### Python

```{python}
import pandas as pd
import io
import requests

url = "https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv"
s = requests.get(url).content
Boston = pd.read_csv(io.StringIO(s.decode('utf-8')), index_col = 0)
Boston
```

#### R

```{r}
library(tidyverse)

Boston <- read_csv("https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv", col_select = -1) %>% 
  print(width = Inf)
```

:::

## ISL Exercise 3.7.3 (12pts)

## ISL Exercise 3.7.15 (20pts)

## Bonus question (20pts)

For multiple linear regression, show that $R^2$ is equal to the correlation between the response vector $\mathbf{y} = (y_1, \ldots, y_n)^T$ and the fitted values $\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T$. That is
$$
R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2.
$$