{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 17: Binary classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last lecture, we have learned our goal in a binary classification problem. To sum up:\n",
    "* Data has features (attributes of a sample) and labels (which class this sample is in, $y=0$ or $1$).\n",
    "* Each weight corresponds to each feature.\n",
    "* Cross-entropy loss function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MNIST\n",
    "Now let us look at the famous [MNIST dataset of handwritten digits](http://yann.lecun.com/exdb/mnist/), please download the `npz` file on Canvas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_train = np.load('mnist_binary_train.npz')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What does the data look like?\n",
    "The first column `data_train[:,0]` is the label, and the rest 784 columns `data_train[:,1:]` represent the image. Let us try to visualize the first 20 rows of the training data, with their labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "X_train = data_train['X']\n",
    "y_train = data_train['y']\n",
    "\n",
    "fig, axes = plt.subplots(4,5, figsize=(12, 14))\n",
    "axes = axes.reshape(-1)\n",
    "\n",
    "for i in range(20):\n",
    "    axes[i].axis('off')\n",
    "    axes[i].imshow(X_train[i,:].reshape(28,28), cmap = 'gray')\n",
    "    axes[i].set_title(str(int(y_train[i])), color= 'black', fontsize=25)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic Regression\n",
    "\n",
    "----\n",
    "\n",
    "\n",
    "## Model function (hypothesis)\n",
    "\n",
    "Weights vector $\\mathbf{w}$, same shape with a sample's feature vector $\\mathbf{x}$, $h(\\mathbf{x})$ is our estimate of $ P(y=1|\\mathbf{x})$ and $1 - h(\\mathbf{x})$ is our estimate of $P(y=0|\\mathbf{x}) = 1 - P(y=1|\\mathbf{x})$.\n",
    "\n",
    "$$\n",
    "h(\\mathbf{x}) = h(\\mathbf{x};\\mathbf{w}) = \\frac{1}{1 + \\exp(-\\mathbf{w}^\\top \\mathbf{x})}\n",
    "=: \\sigma(\\mathbf{w}^\\top \\mathbf{x}) \n",
    "$$\n",
    "where $\\sigma(z)$ is the Sigmoid function $1/(1+e^{z})$\n",
    "or more compactly, because $y = 0$ or $1$:\n",
    "$$\n",
    "P(y|\\mathbf{x}) \\text{ is estimated by } h(\\mathbf{x})^y \\big(1 - h(\\mathbf{x}) \\big)^{1-y}.\n",
    "$$\n",
    "\n",
    "----\n",
    "\n",
    "## Loss function (cross-entropy)\n",
    "The cross entropy loss for two probability distribution is defined as, $K$ is the no. of classes, $\\hat {y}$ is the prediction from the model (try to estimate $y$)\n",
    "$$\n",
    "H(p,q)\\ =\\ -\\sum^{K}_{k=1}p_{k}\\log q_{k}\\ =\\ -y\\log {\\hat {y}}-(1-y)\\log(1-{\\hat {y}})\n",
    "$$\n",
    "Since we estimate $y$ using $h(\\mathbf{x})$,\n",
    "$$\n",
    "L (\\mathbf{w}; X, \\mathbf{y}) = - \\frac{1}{N}\\sum_{i=1}^N \n",
    "\\Bigl\\{y^{(i)} \\ln\\big( h(\\mathbf{x}^{(i)}; \\mathbf{w}) \\big) \n",
    "+ (1 - y^{(i)}) \\ln\\big( 1 - h(\\mathbf{x}^{(i)};\\mathbf{w}) \\big) \\Bigr\\}.\n",
    "\\tag{$\\star$}\n",
    "$$\n",
    "\n",
    "----\n",
    "\n",
    "## Training\n",
    "\n",
    "The gradient of the loss function $(\\star)$ with respect to the weights $\\mathbf{w}$ is:\n",
    "\n",
    "$$\n",
    "\\nabla_{\\mathbf{w}} \\big( L (\\mathbf{w}) \\big) \n",
    "=\\frac{1}{N}\\sum_{i=1}^N \\big( h(\\mathbf{x}^{(i)};\\mathbf{w})  - y^{(i)} \\big) \\mathbf{x}^{(i)}  . \\tag{$\\dagger$}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient descent\n",
    "Now let us run the gradient descent based on $(\\dagger)$, with code adapted from Lecture 12."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialization\n",
    "N = len(y)\n",
    "w  = np.zeros(np.shape(X_train)[1]) \n",
    "# zero initial guess, np.shape(X)[1] = 784, which is no. of pixels\n",
    "# and we want it to be small"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# model h(X; w) = sigma(-Xw)\n",
    "# w: weights\n",
    "# x: training data\n",
    "# x has shape 12665 (no. of samples) row by 784 (no. of features)\n",
    "# w has shape 784\n",
    "def h(w,X):\n",
    "    z = np.matmul(X,w)\n",
    "    return 1.0 / (1.0 + np.exp(-z))\n",
    "\n",
    "# loss function, modulo by N (size of training data), a vectorized implementation without for loop\n",
    "def loss(w,X,y):\n",
    "    loss_components = np.log(h(w,X)) * y + (1.0 - y)* np.log(1 - h(w,X))\n",
    "    # above is a dimension (12665,) array\n",
    "    return -np.mean(loss_components) # same with loss_components.sum()/N\n",
    "\n",
    "def gradient_loss(w,X,y):\n",
    "    gradient_for_all_training_data = (h(w,X) - y).reshape(-1,1)*X\n",
    "    # we should return a (784,) array, which is averaging all 12655 training data\n",
    "    return np.mean(gradient_for_all_training_data, axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eta = 5e-5  # step size (learning rate)\n",
    "num_steps = 500\n",
    "\n",
    "# this block is for plot\n",
    "fig, axes = plt.subplots(2,5, figsize=(12, 5))\n",
    "axes = axes.reshape(-1)\n",
    "\n",
    "\n",
    "loss_at_eachstep = np.zeros(num_steps) # record the change of the loss function\n",
    "for i in range(num_steps):\n",
    "    loss_at_eachstep[i] = loss(w,X_train,y_train) # this step is optional\n",
    "    dw = gradient_loss(w,X_train,y_train) \n",
    "    w -= eta * dw\n",
    "    if i % 50 == 0: # plot weights and print loss every 50 steps\n",
    "        print(\"loss after\", i+1, \"iterations is: \", loss(w,X_train,y_train))\n",
    "        axes[i//50].axis('off')\n",
    "        axes[i//50].imshow(w.reshape(28,28), cmap = 'gray')\n",
    "        axes[i//50].set_title(\"%4i iterations\" % i)\n",
    "        fig.canvas.draw()\n",
    "        fig.canvas.flush_events()\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(range(num_steps), loss_at_eachstep)\n",
    "plt.yscale('log')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cross-validation (Judgement day)\n",
    "Now let us use the testing data set to see if the the accuracy is good."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the testing data and extract zeros and ones like we did before\n",
    "data_test = np.load('mnist_binary_test.npz')\n",
    "X_test = data_test['X']\n",
    "y_test = data_test['y']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the y_pred using the weights w and X_test\n",
    "probability_estimate = h(w,X_test)\n",
    "y_pred = 1*(probability_estimate > 0.5) # integer\n",
    "# if probability_estimate is > 0.5, it is the 2nd class (class 1), otherwise it is the first class (class 0)\n",
    "percentage_getting_label_correct = np.mean(y_pred == y_test)\n",
    "print(\"{:.5%}\".format(percentage_getting_label_correct))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## In-class exercise:\n",
    "Read the manual of the [logistic regression class](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) in `scikit-learn`, follow the example there to redo the classification above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_binary_reg = LogisticRegression(solver= 'lbfgs',max_iter=1000, verbose=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_binary_reg.fit(X_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = mnist_binary_reg.predict(X_test)\n",
    "percentage_getting_label_correct = np.mean(y_pred == y_test)\n",
    "print(\"{:.5%}\".format(percentage_getting_label_correct))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}