<center>
<img src="../../img/ods_stickers.jpg" />
    
## [mlcourse.ai](https://mlcourse.ai) â€“ Open Machine Learning Course 

Author: [Yury Kashnitskiy](https://yorko.github.io). Translated by Anna Larionova. This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose.

# <center> Topic 6. Regression</center>
## <center>Lasso and Ridge Regressions</center>

*Lecture syllabus differs this week from the article outline, because topic 4 (linear models) is too huge and important, so we cover regression this week.*

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

%config InlineBackend.figure_format = 'retina'
import seaborn as sns

sns.set()  # just to use the seaborn theme


from sklearn.datasets import load_boston
from sklearn.linear_model import Lasso, LassoCV, Ridge, RidgeCV
from sklearn.model_selection import KFold, cross_val_score

**We will work with Boston house prices data (UCI repository).**
**Download the data.**

In [None]:
boston = load_boston()
X, y = boston["data"], boston["target"]

**Let's read description of data:**

In [None]:
print(boston.DESCR)

In [None]:
boston.feature_names

**Let's look at the first two records.**

In [None]:
X[:2]

## Lasso Regression

Lasso regression minimizes mean squared error with L1 regularization:
$$\Large error(X, y, w) = \frac{1}{2} \sum_{i=1}^\ell {(y_i - w^Tx_i)}^2 + \alpha \sum_{i=1}^d |w_i|$$

where $y = w^Tx$ hyperplane equation depending on model parameters $w$, $\ell$ is number of observations in data $X$, $d$ is number of features, $y$ target values, $\alpha$ regularization coefficient.

**Let's fit Lasso Regression with the small $\alpha$ coefficient (weak regularization). Coefficient related to NOX feature (nitric oxides concentration) will be zero. It means that this feature is the least important for median house prices prediction in this region.**

In [None]:
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)
lasso.coef_

**Let's train Lasso Regression with $\alpha=10$. All of the coefficients are equal to zero except features ZN (proportion of residential land zoned for lots over 25,000 sq.ft.), TAX (full-value property-tax rate), B (proportion of blacks by town) and LSTAT (% of lower status of the population).**

In [None]:
lasso = Lasso(alpha=10)
lasso.fit(X, y)
lasso.coef_

**It means that Lasso Regression may serve as a feature selection method.**

In [None]:
n_alphas = 200
alphas = np.linspace(0.1, 10, n_alphas)
model = Lasso()

coefs = []
for a in alphas:
    model.set_params(alpha=a)
    model.fit(X, y)
    coefs.append(model.coef_)

plt.rcParams["figure.figsize"] = (12, 8)

ax = plt.gca()
# ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm'])

ax.plot(alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis
plt.xlabel("alpha")
plt.ylabel("weights")
plt.title("Lasso coefficients as a function of the regularization")
plt.axis("tight")
plt.show();

**Now let's find the best value of $\alpha$ during cross-validation.**

In [None]:
lasso_cv = LassoCV(alphas=alphas, cv=3, random_state=17)
lasso_cv.fit(X, y)

In [None]:
lasso_cv.coef_

In [None]:
lasso_cv.alpha_

**In Scikit-learn, metrics are typically *maximized*, so for MSE there's a workaround: `neg_mean_squared_error` is minimized instead. Not really convenient.**

In [None]:
cross_val_score(Lasso(lasso_cv.alpha_), X, y, cv=3, scoring="neg_mean_squared_error")

In [None]:
abs(
    cross_val_score(
        Lasso(lasso_cv.alpha_), X, y, cv=3, scoring="neg_mean_squared_error"
    ).mean()
)

In [None]:
abs(np.mean(cross_val_score(Lasso(9.95), X, y, cv=3, scoring="neg_mean_squared_error")))

**One more ambiguous point: LassoCV sorts values of the parameters in decreasing order to ease optimization. It may seem that $\alpha$ optimization works incorrectly.**

In [None]:
lasso_cv.alphas[:10]

In [None]:
lasso_cv.alphas_[:10]

In [None]:
plt.plot(lasso_cv.alphas, lasso_cv.mse_path_.mean(1))  # incorrect
plt.axvline(lasso_cv.alpha_, c="g");

In [None]:
plt.plot(lasso_cv.alphas_, lasso_cv.mse_path_.mean(1))  # correct
plt.axvline(lasso_cv.alpha_, c="g");

## Ridge Regression

Ridge regression minimizes mean squared error with L2 regularization:
$$\Large error(X, y, w) = \frac{1}{2} \sum_{i=1}^\ell {(y_i - w^Tx_i)}^2 + \alpha \sum_{i=1}^d w_i^2$$

where $y = w^Tx$ hyperplane equation depending on model parameters $w$, $\ell$ is number of observations in data $X$, $d$ is number of features, $y$ target values, $\alpha$ regularization coefficient.

There is a special class [RidgeCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV) for Ridge regression cross-validation.

In [None]:
n_alphas = 200
ridge_alphas = np.logspace(-2, 6, n_alphas)

In [None]:
ridge_cv = RidgeCV(alphas=ridge_alphas, scoring="neg_mean_squared_error", cv=3)
ridge_cv.fit(X, y)

In [None]:
ridge_cv.alpha_

**In case of Ridge Regression neither of the parameters are reducing to zero. It can be small value but non-zero.**

In [None]:
ridge_cv.coef_

In [None]:
n_alphas = 200
ridge_alphas = np.logspace(-2, 6, n_alphas)
model = Ridge()

coefs = []
for a in ridge_alphas:
    model.set_params(alpha=a)
    model.fit(X, y)
    coefs.append(model.coef_)

ax = plt.gca()
# ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm'])

ax.plot(ridge_alphas, coefs)
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis
plt.xlabel("alpha")
plt.ylabel("weights")
plt.title("Ridge coefficients as a function of the regularization")
plt.axis("tight")
plt.show()

## References
- [Generalized linear models](http://scikit-learn.org/stable/modules/linear_model.html) (Generalized Linear Models, GLM) in Scikit-learn
- [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression), [Lasso](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso), [LassoCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV), [Ridge](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) and [RidgeCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV) in Scikit-learn