{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Softmax classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise, you will implement a softmax classifier for multi-class classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now import the `digits` dataset provided by `scikit-learn`:\n",
    "\n",
    "<https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html>\n",
    "\n",
    "It contains 1797 small (8x8) black and white images of digits between 0 and 9. \n",
    "\n",
    "The two following cells load the data and visualize 16 images chosen randomly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "digits = load_digits()\n",
    "\n",
    "N, w, h = digits.images.shape\n",
    "d = w * h # number of pixels\n",
    "c = len(digits.target_names) # number of classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rng = np.random.default_rng()\n",
    "indices = rng.choice(N, 16)\n",
    "plt.figure(figsize=(16, 16))\n",
    "for i in range(16):\n",
    "    plt.subplot(4, 4, i+1)\n",
    "    plt.imshow(digits.images[indices[i], :], cmap=\"gray\")\n",
    "    plt.title(\"Label: \"+ str(digits.target[indices[i]]))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Digits are indeed to be recognized, the hope being that they are linearly separable and we can apply a softmax classifier directly on the pixels. \n",
    "\n",
    "The only problem is that each image is a 8x8 matrix, while we want vectors for our model. Fortunately, that is very easy with `reshape`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = digits.images.reshape((N, d))\n",
    "print(X.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's know have a look at the targets, i.e. the ground truth / labels of each digit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "labels = digits.target\n",
    "print(labels)\n",
    "print(labels.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each label is an integer between 0 and 9, while our softmax classifier expects a **one-hot-encoded** vector of 10 classes, with only one non-zero element, for example for digit 3:\n",
    "\n",
    "$$[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]$$\n",
    "\n",
    "To do the conversion, we can once again use a built-in method of `scikit-learn`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import OneHotEncoder\n",
    "\n",
    "t = OneHotEncoder().fit_transform(labels.reshape(-1, 1)).toarray()\n",
    "\n",
    "print(t)\n",
    "print(t.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Split the data into a training set `X_train, t_train` and a test set `X_test, t_test` using `scikit-learn` (e.g. with a ratio 70/30)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Softmax linear classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's remember the structure of the softmax linear classifier: the input vector $\\mathbf{x}$ is transformed into a **logit score** vector $\\mathbf{z}$ using a weight matrix $W$ and a bias vector $\\mathbf{b}$:\n",
    "\n",
    "$$\n",
    "    \\mathbf{z} = W \\times \\mathbf{x} + \\mathbf{b}\n",
    "$$\n",
    "\n",
    "This logit score has one element per class, so the weight matrix must have a size $(c, d)$, where $c$ is the number of classes (10) and $d$ is the number of dimensions of the input space (64). The bias vector has 10 elements (one per class).\n",
    "\n",
    "The logit score is turned into probabilities using the **softmax** operator:\n",
    "\n",
    "$$\n",
    "    y_j = P(\\text{class = j}) = \\frac{\\exp(z_j)}{\\sum_k \\exp(z_k)}\n",
    "$$\n",
    "\n",
    "The following Python function allows to turn any vector $\\mathbf{z}$ (numpy array) into softmax probabilities:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax(z):\n",
    "    e = np.exp(z - z.max())\n",
    "    return e/np.sum(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Experiment with the `softmax()` to understand its function. Pass it different numpy arrays (e.g. [-1, 0, 2]) and print or plot the corresponding probabilities.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The loss function to use is the **cross-entropy** or **negative log-likelihood**, defined for a single example as:\n",
    "\n",
    "$$\n",
    "    \\mathcal{l}(W, \\mathbf{b}) =   - \\mathbf{t} \\cdot \\log \\mathbf{y} = - \\log y_j \n",
    "$$\n",
    "\n",
    "where $\\mathbf{t}$ is a one-hot encoding of the class of the example and $j$ is the index of the corresponding class.\n",
    "\n",
    "After doing the derivations, we obtain the following learning rules for $W$ and $\\mathbf{b}$ to minimize the loss function:\n",
    "\n",
    "$$\n",
    "    \\Delta W = \\eta \\, (\\mathbf{t} - \\mathbf{y}) \\, \\mathbf{x}^T\n",
    "$$\n",
    "\n",
    "$$\n",
    "    \\Delta \\mathbf{b} = \\eta \\, (\\mathbf{t} - \\mathbf{y})\n",
    "$$\n",
    "\n",
    "Note that because $W$ is a $(c, d)$ matrix, $\\Delta W$ too. $(\\mathbf{t} - \\mathbf{y}) \\, \\mathbf{x}^T$ is therefore the **outer product** between the error vector $\\mathbf{t} - \\mathbf{y}$ ($c$ elements) and the input vector $\\mathbf{x}$ ($d$ elements)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Implementation\n",
    "\n",
    "You will now modify your implementation of the online Perceptron algorithm from last week.\n",
    "\n",
    "Some things to keep in mind:\n",
    "\n",
    "* `W` must now be defined as a $(c, d)$ matrix (numpy array) and `b` as a vector with $c$ elements. Both can be initialized to 0.\n",
    "\n",
    "* When computing the logit score $\\mathbf{z} = W \\times \\mathbf{x} + \\mathbf{b}$, remember that `W` is now a matrix, so its position will matter in the dot product `np.dot`.\n",
    "\n",
    "* Use the `softmax()` function defined above on the whole vector instead of `np.sign()` or `logistic` to get the prediction $\\mathbf{y}$.\n",
    "\n",
    "* For $\\Delta W$, you will need the **outer** product between the vectors $\\mathbf{t} - \\mathbf{y}_\\text{train}$ and $\\mathbf{x}_\\text{train}$. Check the doc for `np.outer()`.\n",
    "\n",
    "* The one-hot encoding of the class of the example $i$ is now a vector with 10 elements `t_train[i, :]`. You can get the index of the corresponding class by looking at the position of its maximum with `t_train[i, :].argmax()`.\n",
    "\n",
    "* Similarly, the predicted class by the model can be identified by the class with the maximum probability: `y.argmax()`.\n",
    "\n",
    "* Do not forget to record and plot the evolution of the training error and loss. Compute the test error and loss at the end of learning.\n",
    "\n",
    "* Pick the right learning rate and number of epochs.\n",
    "\n",
    "**Q:** Let's go."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** What is the final training error and loss of the model? After how many epochs do you get a perfect classification? Why do they evolve like this? \n",
    "\n",
    "*Hint:* you may need to avoid plotting the error/loss during the first 20 epochs or so to observe the effect."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Compare the evolution of the training and test errors during training. What happens?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** The following cell samples 12 misclassified images from the test and shows the predicted class together with the ground truth. What do you think?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "misclassified = []\n",
    "\n",
    "for i in range(N_test):\n",
    "    pred = softmax(np.dot(W, X_test[i, :]) + b).argmax()\n",
    "    if pred != t_test[i, :].argmax():\n",
    "        misclassified.append([X_test[i, :].reshape((8, 8)), t_test[i, :].argmax(), pred])\n",
    "        if len(misclassified) > 12: break\n",
    "        \n",
    "        \n",
    "plt.figure(figsize=(16, 12))\n",
    "for i in range(12):\n",
    "    if i < len(misclassified):\n",
    "        X, t, pred = misclassified[i]\n",
    "        plt.subplot(3, 4, i+1)\n",
    "        plt.imshow(X, cmap=\"gray\")\n",
    "        plt.title(\"Label \" + str(t) + \" ; Prediction \" + str(pred))\n",
    "    \n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 ('base')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}