{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 17: Binary classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the last lecture, we have learned our goal in a binary classification problem. To sum up:\n", "* Data has features (attributes of a sample) and labels (which class this sample is in, $y=0$ or $1$).\n", "* Each weight corresponds to each feature.\n", "* Cross-entropy loss function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MNIST\n", "Now let us look at the famous [MNIST dataset of handwritten digits](http://yann.lecun.com/exdb/mnist/), please download the `npz` file on Canvas." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_train = np.load('mnist_binary_train.npz')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What does the data look like?\n", "The first column `data_train[:,0]` is the label, and the rest 784 columns `data_train[:,1:]` represent the image. Let us try to visualize the first 20 rows of the training data, with their labels." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "X_train = data_train['X']\n", "y_train = data_train['y']\n", "\n", "fig, axes = plt.subplots(4,5, figsize=(12, 14))\n", "axes = axes.reshape(-1)\n", "\n", "for i in range(20):\n", " axes[i].axis('off')\n", " axes[i].imshow(X_train[i,:].reshape(28,28), cmap = 'gray')\n", " axes[i].set_title(str(int(y_train[i])), color= 'black', fontsize=25)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression\n", "\n", "----\n", "\n", "\n", "## Model function (hypothesis)\n", "\n", "Weights vector $\\mathbf{w}$, same shape with a sample's feature vector $\\mathbf{x}$, $h(\\mathbf{x})$ is our estimate of $ P(y=1|\\mathbf{x})$ and $1 - h(\\mathbf{x})$ is our estimate of $P(y=0|\\mathbf{x}) = 1 - P(y=1|\\mathbf{x})$.\n", "\n", "$$\n", "h(\\mathbf{x}) = h(\\mathbf{x};\\mathbf{w}) = \\frac{1}{1 + \\exp(-\\mathbf{w}^\\top \\mathbf{x})}\n", "=: \\sigma(\\mathbf{w}^\\top \\mathbf{x}) \n", "$$\n", "where $\\sigma(z)$ is the Sigmoid function $1/(1+e^{z})$\n", "or more compactly, because $y = 0$ or $1$:\n", "$$\n", "P(y|\\mathbf{x}) \\text{ is estimated by } h(\\mathbf{x})^y \\big(1 - h(\\mathbf{x}) \\big)^{1-y}.\n", "$$\n", "\n", "----\n", "\n", "## Loss function (cross-entropy)\n", "The cross entropy loss for two probability distribution is defined as, $K$ is the no. of classes, $\\hat {y}$ is the prediction from the model (try to estimate $y$)\n", "$$\n", "H(p,q)\\ =\\ -\\sum^{K}_{k=1}p_{k}\\log q_{k}\\ =\\ -y\\log {\\hat {y}}-(1-y)\\log(1-{\\hat {y}})\n", "$$\n", "Since we estimate $y$ using $h(\\mathbf{x})$,\n", "$$\n", "L (\\mathbf{w}; X, \\mathbf{y}) = - \\frac{1}{N}\\sum_{i=1}^N \n", "\\Bigl\\{y^{(i)} \\ln\\big( h(\\mathbf{x}^{(i)}; \\mathbf{w}) \\big) \n", "+ (1 - y^{(i)}) \\ln\\big( 1 - h(\\mathbf{x}^{(i)};\\mathbf{w}) \\big) \\Bigr\\}.\n", "\\tag{$\\star$}\n", "$$\n", "\n", "----\n", "\n", "## Training\n", "\n", "The gradient of the loss function $(\\star)$ with respect to the weights $\\mathbf{w}$ is:\n", "\n", "$$\n", "\\nabla_{\\mathbf{w}} \\big( L (\\mathbf{w}) \\big) \n", "=\\frac{1}{N}\\sum_{i=1}^N \\big( h(\\mathbf{x}^{(i)};\\mathbf{w}) - y^{(i)} \\big) \\mathbf{x}^{(i)} . \\tag{$\\dagger$}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gradient descent\n", "Now let us run the gradient descent based on $(\\dagger)$, with code adapted from Lecture 12." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Initialization\n", "N = len(y)\n", "w = np.zeros(np.shape(X_train)[1]) \n", "# zero initial guess, np.shape(X)[1] = 784, which is no. of pixels\n", "# and we want it to be small" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# model h(X; w) = sigma(-Xw)\n", "# w: weights\n", "# x: training data\n", "# x has shape 12665 (no. of samples) row by 784 (no. of features)\n", "# w has shape 784\n", "def h(w,X):\n", " z = np.matmul(X,w)\n", " return 1.0 / (1.0 + np.exp(-z))\n", "\n", "# loss function, modulo by N (size of training data), a vectorized implementation without for loop\n", "def loss(w,X,y):\n", " loss_components = np.log(h(w,X)) * y + (1.0 - y)* np.log(1 - h(w,X))\n", " # above is a dimension (12665,) array\n", " return -np.mean(loss_components) # same with loss_components.sum()/N\n", "\n", "def gradient_loss(w,X,y):\n", " gradient_for_all_training_data = (h(w,X) - y).reshape(-1,1)*X\n", " # we should return a (784,) array, which is averaging all 12655 training data\n", " return np.mean(gradient_for_all_training_data, axis=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eta = 5e-5 # step size (learning rate)\n", "num_steps = 500\n", "\n", "# this block is for plot\n", "fig, axes = plt.subplots(2,5, figsize=(12, 5))\n", "axes = axes.reshape(-1)\n", "\n", "\n", "loss_at_eachstep = np.zeros(num_steps) # record the change of the loss function\n", "for i in range(num_steps):\n", " loss_at_eachstep[i] = loss(w,X_train,y_train) # this step is optional\n", " dw = gradient_loss(w,X_train,y_train) \n", " w -= eta * dw\n", " if i % 50 == 0: # plot weights and print loss every 50 steps\n", " print(\"loss after\", i+1, \"iterations is: \", loss(w,X_train,y_train))\n", " axes[i//50].axis('off')\n", " axes[i//50].imshow(w.reshape(28,28), cmap = 'gray')\n", " axes[i//50].set_title(\"%4i iterations\" % i)\n", " fig.canvas.draw()\n", " fig.canvas.flush_events()\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(range(num_steps), loss_at_eachstep)\n", "plt.yscale('log')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cross-validation (Judgement day)\n", "Now let us use the testing data set to see if the the accuracy is good." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import the testing data and extract zeros and ones like we did before\n", "data_test = np.load('mnist_binary_test.npz')\n", "X_test = data_test['X']\n", "y_test = data_test['y']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the y_pred using the weights w and X_test\n", "probability_estimate = h(w,X_test)\n", "y_pred = 1*(probability_estimate > 0.5) # integer\n", "# if probability_estimate is > 0.5, it is the 2nd class (class 1), otherwise it is the first class (class 0)\n", "percentage_getting_label_correct = np.mean(y_pred == y_test)\n", "print(\"{:.5%}\".format(percentage_getting_label_correct))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In-class exercise:\n", "Read the manual of the [logistic regression class](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) in `scikit-learn`, follow the example there to redo the classification above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LogisticRegression" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_binary_reg = LogisticRegression(solver= 'lbfgs',max_iter=1000, verbose=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_binary_reg.fit(X_train,y_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_pred = mnist_binary_reg.predict(X_test)\n", "percentage_getting_label_correct = np.mean(y_pred == y_test)\n", "print(\"{:.5%}\".format(percentage_getting_label_correct))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": true, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }