{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"../../img/ods_stickers.jpg\" />\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
    "\n",
    "Author: [Yury Kashnitskiy](https://yorko.github.io). Translated by Anna Larionova. This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center> Topic 6. Regression</center>\n",
    "## <center>Lasso and Ridge Regressions</center>\n",
    "\n",
    "*Lecture syllabus differs this week from the article outline, because topic 4 (linear models) is too huge and important, so we cover regression this week.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set()  # just to use the seaborn theme\n",
    "\n",
    "\n",
    "from sklearn.datasets import load_boston\n",
    "from sklearn.linear_model import Lasso, LassoCV, Ridge, RidgeCV\n",
    "from sklearn.model_selection import KFold, cross_val_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**We will work with Boston house prices data (UCI repository).**\n",
    "**Download the data.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "boston = load_boston()\n",
    "X, y = boston[\"data\"], boston[\"target\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's read description of data:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(boston.DESCR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "boston.feature_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's look at the first two records.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lasso Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lasso regression minimizes mean squared error with L1 regularization:\n",
    "$$\\Large error(X, y, w) = \\frac{1}{2} \\sum_{i=1}^\\ell {(y_i - w^Tx_i)}^2 + \\alpha \\sum_{i=1}^d |w_i|$$\n",
    "\n",
    "where $y = w^Tx$ hyperplane equation depending on model parameters $w$, $\\ell$ is number of observations in data $X$, $d$ is number of features, $y$ target values, $\\alpha$ regularization coefficient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's fit Lasso Regression with the small $\\alpha$ coefficient (weak regularization). Coefficient related to NOX feature (nitric oxides concentration) will be zero. It means that this feature is the least important for median house prices prediction in this region.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso = Lasso(alpha=0.1)\n",
    "lasso.fit(X, y)\n",
    "lasso.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's train Lasso Regression with $\\alpha=10$. All of the coefficients are equal to zero except features ZN (proportion of residential land zoned for lots over 25,000 sq.ft.), TAX (full-value property-tax rate), B (proportion of blacks by town) and LSTAT (% of lower status of the population).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso = Lasso(alpha=10)\n",
    "lasso.fit(X, y)\n",
    "lasso.coef_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**It means that Lasso Regression may serve as a feature selection method.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_alphas = 200\n",
    "alphas = np.linspace(0.1, 10, n_alphas)\n",
    "model = Lasso()\n",
    "\n",
    "coefs = []\n",
    "for a in alphas:\n",
    "    model.set_params(alpha=a)\n",
    "    model.fit(X, y)\n",
    "    coefs.append(model.coef_)\n",
    "\n",
    "plt.rcParams[\"figure.figsize\"] = (12, 8)\n",
    "\n",
    "ax = plt.gca()\n",
    "# ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm'])\n",
    "\n",
    "ax.plot(alphas, coefs)\n",
    "ax.set_xscale(\"log\")\n",
    "ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis\n",
    "plt.xlabel(\"alpha\")\n",
    "plt.ylabel(\"weights\")\n",
    "plt.title(\"Lasso coefficients as a function of the regularization\")\n",
    "plt.axis(\"tight\")\n",
    "plt.show();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Now let's find the best value of $\\alpha$ during cross-validation.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso_cv = LassoCV(alphas=alphas, cv=3, random_state=17)\n",
    "lasso_cv.fit(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso_cv.coef_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso_cv.alpha_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**In Scikit-learn, metrics are typically *maximized*, so for MSE there's a workaround: `neg_mean_squared_error` is minimized instead. Not really convenient.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cross_val_score(Lasso(lasso_cv.alpha_), X, y, cv=3, scoring=\"neg_mean_squared_error\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "abs(\n",
    "    cross_val_score(\n",
    "        Lasso(lasso_cv.alpha_), X, y, cv=3, scoring=\"neg_mean_squared_error\"\n",
    "    ).mean()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "abs(np.mean(cross_val_score(Lasso(9.95), X, y, cv=3, scoring=\"neg_mean_squared_error\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**One more ambiguous point: LassoCV sorts values of the parameters in decreasing order to ease optimization. It may seem that $\\alpha$ optimization works incorrectly.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso_cv.alphas[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso_cv.alphas_[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(lasso_cv.alphas, lasso_cv.mse_path_.mean(1))  # incorrect\n",
    "plt.axvline(lasso_cv.alpha_, c=\"g\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(lasso_cv.alphas_, lasso_cv.mse_path_.mean(1))  # correct\n",
    "plt.axvline(lasso_cv.alpha_, c=\"g\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ridge Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ridge regression minimizes mean squared error with L2 regularization:\n",
    "$$\\Large error(X, y, w) = \\frac{1}{2} \\sum_{i=1}^\\ell {(y_i - w^Tx_i)}^2 + \\alpha \\sum_{i=1}^d w_i^2$$\n",
    "\n",
    "where $y = w^Tx$ hyperplane equation depending on model parameters $w$, $\\ell$ is number of observations in data $X$, $d$ is number of features, $y$ target values, $\\alpha$ regularization coefficient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a special class [RidgeCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV) for Ridge regression cross-validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_alphas = 200\n",
    "ridge_alphas = np.logspace(-2, 6, n_alphas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ridge_cv = RidgeCV(alphas=ridge_alphas, scoring=\"neg_mean_squared_error\", cv=3)\n",
    "ridge_cv.fit(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ridge_cv.alpha_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**In case of Ridge Regression neither of the parameters are reducing to zero. It can be small value but non-zero.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ridge_cv.coef_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_alphas = 200\n",
    "ridge_alphas = np.logspace(-2, 6, n_alphas)\n",
    "model = Ridge()\n",
    "\n",
    "coefs = []\n",
    "for a in ridge_alphas:\n",
    "    model.set_params(alpha=a)\n",
    "    model.fit(X, y)\n",
    "    coefs.append(model.coef_)\n",
    "\n",
    "ax = plt.gca()\n",
    "# ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm'])\n",
    "\n",
    "ax.plot(ridge_alphas, coefs)\n",
    "ax.set_xscale(\"log\")\n",
    "ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis\n",
    "plt.xlabel(\"alpha\")\n",
    "plt.ylabel(\"weights\")\n",
    "plt.title(\"Ridge coefficients as a function of the regularization\")\n",
    "plt.axis(\"tight\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "- [Generalized linear models](http://scikit-learn.org/stable/modules/linear_model.html) (Generalized Linear Models, GLM) in Scikit-learn\n",
    "- [LinearRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression), [Lasso](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso), [LassoCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV), [Ridge](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) and [RidgeCV](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV) in Scikit-learn"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "name": "lesson8_part1_kmeans.ipynb"
 },
 "nbformat": 4,
 "nbformat_minor": 1
}