{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Softmax classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise, you will implement a softmax classifier for multi-class classification." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now import the `digits` dataset provided by `scikit-learn`:\n", "\n", "\n", "\n", "It contains 1797 small (8x8) black and white images of digits between 0 and 9. \n", "\n", "The two following cells load the data and visualize 16 images chosen randomly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_digits\n", "digits = load_digits()\n", "\n", "N, w, h = digits.images.shape\n", "d = w * h # number of pixels\n", "c = len(digits.target_names) # number of classes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng()\n", "indices = rng.choice(N, 16)\n", "plt.figure(figsize=(16, 16))\n", "for i in range(16):\n", " plt.subplot(4, 4, i+1)\n", " plt.imshow(digits.images[indices[i], :], cmap=\"gray\")\n", " plt.title(\"Label: \"+ str(digits.target[indices[i]]))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Digits are indeed to be recognized, the hope being that they are linearly separable and we can apply a softmax classifier directly on the pixels. \n", "\n", "The only problem is that each image is a 8x8 matrix, while we want vectors for our model. Fortunately, that is very easy with `reshape`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = digits.images.reshape((N, d))\n", "print(X.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's know have a look at the targets, i.e. the ground truth / labels of each digit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "labels = digits.target\n", "print(labels)\n", "print(labels.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each label is an integer between 0 and 9, while our softmax classifier expects a **one-hot-encoded** vector of 10 classes, with only one non-zero element, for example for digit 3:\n", "\n", "$$[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]$$\n", "\n", "To do the conversion, we can once again use a built-in method of `scikit-learn`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import OneHotEncoder\n", "\n", "t = OneHotEncoder().fit_transform(labels.reshape(-1, 1)).toarray()\n", "\n", "print(t)\n", "print(t.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Split the data into a training set `X_train, t_train` and a test set `X_test, t_test` using `scikit-learn` (e.g. with a ratio 70/30)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Softmax linear classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's remember the structure of the softmax linear classifier: the input vector $\\mathbf{x}$ is transformed into a **logit score** vector $\\mathbf{z}$ using a weight matrix $W$ and a bias vector $\\mathbf{b}$:\n", "\n", "$$\n", " \\mathbf{z} = W \\times \\mathbf{x} + \\mathbf{b}\n", "$$\n", "\n", "This logit score has one element per class, so the weight matrix must have a size $(c, d)$, where $c$ is the number of classes (10) and $d$ is the number of dimensions of the input space (64). The bias vector has 10 elements (one per class).\n", "\n", "The logit score is turned into probabilities using the **softmax** operator:\n", "\n", "$$\n", " y_j = P(\\text{class = j}) = \\frac{\\exp(z_j)}{\\sum_k \\exp(z_k)}\n", "$$\n", "\n", "The following Python function allows to turn any vector $\\mathbf{z}$ (numpy array) into softmax probabilities:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def softmax(z):\n", " e = np.exp(z - z.max())\n", " return e/np.sum(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Experiment with the `softmax()` to understand its function. Pass it different numpy arrays (e.g. [-1, 0, 2]) and print or plot the corresponding probabilities. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The loss function to use is the **cross-entropy** or **negative log-likelihood**, defined for a single example as:\n", "\n", "$$\n", " \\mathcal{l}(W, \\mathbf{b}) = - \\mathbf{t} \\cdot \\log \\mathbf{y} = - \\log y_j \n", "$$\n", "\n", "where $\\mathbf{t}$ is a one-hot encoding of the class of the example and $j$ is the index of the corresponding class.\n", "\n", "After doing the derivations, we obtain the following learning rules for $W$ and $\\mathbf{b}$ to minimize the loss function:\n", "\n", "$$\n", " \\Delta W = \\eta \\, (\\mathbf{t} - \\mathbf{y}) \\, \\mathbf{x}^T\n", "$$\n", "\n", "$$\n", " \\Delta \\mathbf{b} = \\eta \\, (\\mathbf{t} - \\mathbf{y})\n", "$$\n", "\n", "Note that because $W$ is a $(c, d)$ matrix, $\\Delta W$ too. $(\\mathbf{t} - \\mathbf{y}) \\, \\mathbf{x}^T$ is therefore the **outer product** between the error vector $\\mathbf{t} - \\mathbf{y}$ ($c$ elements) and the input vector $\\mathbf{x}$ ($d$ elements)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation\n", "\n", "You will now modify your implementation of the online Perceptron algorithm from last week.\n", "\n", "Some things to keep in mind:\n", "\n", "* `W` must now be defined as a $(c, d)$ matrix (numpy array) and `b` as a vector with $c$ elements. Both can be initialized to 0.\n", "\n", "* When computing the logit score $\\mathbf{z} = W \\times \\mathbf{x} + \\mathbf{b}$, remember that `W` is now a matrix, so its position will matter in the dot product `np.dot`.\n", "\n", "* Use the `softmax()` function defined above on the whole vector instead of `np.sign()` or `logistic` to get the prediction $\\mathbf{y}$.\n", "\n", "* For $\\Delta W$, you will need the **outer** product between the vectors $\\mathbf{t} - \\mathbf{y}_\\text{train}$ and $\\mathbf{x}_\\text{train}$. Check the doc for `np.outer()`.\n", "\n", "* The one-hot encoding of the class of the example $i$ is now a vector with 10 elements `t_train[i, :]`. You can get the index of the corresponding class by looking at the position of its maximum with `t_train[i, :].argmax()`.\n", "\n", "* Similarly, the predicted class by the model can be identified by the class with the maximum probability: `y.argmax()`.\n", "\n", "* Do not forget to record and plot the evolution of the training error and loss. Compute the test error and loss at the end of learning.\n", "\n", "* Pick the right learning rate and number of epochs.\n", "\n", "**Q:** Let's go." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** What is the final training error and loss of the model? After how many epochs do you get a perfect classification? Why do they evolve like this? \n", "\n", "*Hint:* you may need to avoid plotting the error/loss during the first 20 epochs or so to observe the effect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Compare the evolution of the training and test errors during training. What happens?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** The following cell samples 12 misclassified images from the test and shows the predicted class together with the ground truth. What do you think?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "misclassified = []\n", "\n", "for i in range(N_test):\n", " pred = softmax(np.dot(W, X_test[i, :]) + b).argmax()\n", " if pred != t_test[i, :].argmax():\n", " misclassified.append([X_test[i, :].reshape((8, 8)), t_test[i, :].argmax(), pred])\n", " if len(misclassified) > 12: break\n", " \n", " \n", "plt.figure(figsize=(16, 12))\n", "for i in range(12):\n", " if i < len(misclassified):\n", " X, t, pred = misclassified[i]\n", " plt.subplot(3, 4, i+1)\n", " plt.imshow(X, cmap=\"gray\")\n", " plt.title(\"Label \" + str(t) + \" ; Prediction \" + str(pred))\n", " \n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.13 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "vscode": { "interpreter": { "hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d" } } }, "nbformat": 4, "nbformat_minor": 4 }