{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"../../img/ods_stickers.jpg\" />\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
    "\n",
    "Author: [Yury Kashnitskiy](https://yorko.github.io). Translated by [Sergey Oreshkov](https://www.linkedin.com/in/sergeoreshkov/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# <center> Assignment #8 (demo)\n",
    "\n",
    "## <center> Implementation of online regressor\n",
    "    \n",
    "**Same assignment as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/a8-demo-implementing-online-regressor) + [solution](https://www.kaggle.com/kashnitsky/a8-demo-implementing-online-regressor-solution).**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we'll implement a regressor trained with stochastic gradient descent (SGD). Fill in the missing code. If you do evething right, you'll pass a simple embedded test."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <center>Linear regression and Stochastic Gradient Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.base import BaseEstimator\n",
    "from sklearn.metrics import log_loss, mean_squared_error, roc_auc_score\n",
    "from sklearn.model_selection import train_test_split\n",
    "from tqdm import tqdm\n",
    "\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "from matplotlib import pyplot as plt\n",
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Implement class `SGDRegressor`. Specification:\n",
    "- class is inherited from `sklearn.base.BaseEstimator`\n",
    "- constructor takes parameters `eta` – gradient step ($10^{-3}$ by default) and `n_epochs` – dataset pass count (3 by default)\n",
    "- constructor also creates `mse_` and `weights_` lists in order to track mean squared error and weight vector during gradient descent iterations\n",
    "- Class has `fit` and `predict` methods\n",
    "- The `fit` method takes matrix `X` and vector `y` (`numpy.array` objects) as parameters, appends column of ones to  `X` on the left side, initializes weight vector `w` with **zeros** and then makes `n_epochs` iterations of weight updates (you may refer to this [article](https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-8-vowpal-wabbit-fast-learning-with-gigabytes-of-data-60f750086237) for details), and for every iteration logs mean squared error and weight vector `w` in corresponding lists we created in the constructor. \n",
    "- Additionally the `fit` method will create `w_` variable to store weights which produce minimal mean squared error\n",
    "- The `fit` method returns current instance of the `SGDRegressor` class, i.e. `self`\n",
    "- The `predict` method takes `X` matrix, adds column of ones to the left side and returns prediction vector, using weight vector `w_`, created by the `fit` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SGDRegressor(BaseEstimator):\n",
    "    # you code here\n",
    "    def __init__(self):\n",
    "        pass\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        pass\n",
    "\n",
    "    def predict(self, X):\n",
    "        pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's test out the algorithm on height/weight data. We will predict heights (in inches) based on weights (in lbs)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_demo = pd.read_csv(\"../../data/weights_heights.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(data_demo[\"Weight\"], data_demo[\"Height\"])\n",
    "plt.xlabel(\"Weight (lbs)\")\n",
    "plt.ylabel(\"Height (Inch)\")\n",
    "plt.grid();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, y = data_demo[\"Weight\"].values, data_demo[\"Height\"].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perform train/test split and scale data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_valid, y_train, y_valid = train_test_split(\n",
    "    X, y, test_size=0.3, random_state=17\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scaler = StandardScaler()\n",
    "X_train_scaled = scaler.fit_transform(X_train.reshape([-1, 1]))\n",
    "X_valid_scaled = scaler.transform(X_valid.reshape([-1, 1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train created `SGDRegressor` with `(X_train_scaled, y_train)` data. Leave default parameter values for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Draw a chart with training process  – dependency of mean squared error from the i-th SGD iteration number."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print the minimal value of mean squared error and the best weights vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Draw chart of model weights ($w_0$ and $w_1$) behavior during training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a prediction for hold-out  set `(X_valid_scaled, y_valid)` and check MSE value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here\n",
    "sgd_holdout_mse = 10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do the same thing for `LinearRegression` class from `sklearn.linear_model`. Evaluate MSE for hold-out set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you code here\n",
    "linreg_holdout_mse = 9"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    assert (sgd_holdout_mse - linreg_holdout_mse) < 1e-4\n",
    "    print(\"Correct!\")\n",
    "except AssertionError:\n",
    "    print(\n",
    "        \"Something's not good.\\n Linreg's holdout MSE: {}\"\n",
    "        \"\\n SGD's holdout MSE: {}\".format(linreg_holdout_mse, sgd_holdout_mse)\n",
    "    )"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}