# Multiple linear regression

Let's now investigate **multiple linear regression** (MLR), where several output depends on more than one input variable:

$$
\begin{cases}
y_1 = w_1 \, x_1 + w_2 \, x_2 + b_1\\
\\
y_2 = w_3 \, x_1 + w_4 \, x_2 + b_2\\
\end{cases}
$$

The California Housing Dataset consists of price of houses in various places in California. Alongside with the price of over 20000 houses, the dataset provides 8 features:

- `MedInc` : median income in block group
- `HouseAge` : median house age in block group
- `AveRooms` : average number of rooms per household
- `AveBedrms` : average number of bedrooms per household
- `Population` : block group population
- `AveOccup` : average number of household members
- `Latitude` : block group latitude
- `Longitude` : block group longitude

The California housing dataset can be directly downloaded from scikit-learn:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_california_housing

dataset = fetch_california_housing()

X = dataset.data
t = dataset.target

print(X.shape)
print(t.shape)

There are 20640 samples with 8 input features and one output (the price). The following cell decribes the dataset:

In [None]:
print(dataset.DESCR)

The following cell allows to visualize how each feature influences the price individually:

In [None]:
plt.figure(figsize=(12, 15))

for i in range(8):
 plt.subplot(4, 2 , i+1)
 plt.scatter(X[:, i], t)
 plt.title(dataset.feature_names[i])
plt.show()

## Linear regression

**Q:** Apply MLR on the California data using the same `LinearRegression` method of `scikit-learn` as last time. Print the mse, plot how the predictions predict the price for each feature, and plot the prediction $y$ against the true value $t$ for each sample as in the last exercise. Does it work well?

You will also plot the weights of the model (`reg.coef_`) and conclude on the relative importance of the different features: which feature has the stronger weight and why?

A good practice in machine learning is to **normalize** the inputs, i.e. to make sure that the features have a mean of 0 and a standard deviation of 1. The formula is:

$$X^\text{normalized} = \dfrac{X - \mathbb{E}[X]}{\text{std}(X)}$$

i.e. you compute the mean and standard deviation of each column of `X` and apply the formula on each column. 

**Q:** Normalize the dataset. Make sure that the new mean and std is correct. 

*Tip:* `X.mean(axis=0)` and `X.std(axis=0)` should be useful.

**Q:** Apply MLR again on $X^\text{normalized}$, print the mse and visualize the weights. What has changed?

## Regularized regression

Now is time to investigate **regularization**:
1. MLR with L2 regularization is called **Ridge regression**
2. MLR with L1 regularization is called **Lasso regression** 

Fortunately, `scikit-learn` provides these methods with a similar interface to `LinearRegression`. The `Ridge` and `Lasso` objects take an additional argument `alpha` which represents the regularization parameter:

```python
reg = Ridge(alpha=1.0)
reg = Lasso(alpha=1.0)
```

In [None]:
from sklearn.linear_model import Ridge, Lasso

**Q:** Apply Ridge and Lasso regression on the scaled data, vary the regularization parameter to understand its function and comment on the results. In particular, vary the regularization parameter for LASSO and identify the features which are the most predictive of the price. Does it make sense? 