{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MHvuWIUHdIpt" }, "source": [ "Created by Tias Guns , first part based on:\n", "\n", "#### Handwritten Digit Recognition\n", "- Author = Amitrajit Bose\n", "- Dataset = MNIST\n", "- [Medium Article Link](https://medium.com/@amitrajit_bose/handwritten-digit-mnist-pytorch-977b5338e627)\n", "- Frameworks = PyTorch\n", "\n", "The first 3 boxes are identical to the p0 notebook; then it starts..." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "oGjRmijsaXJ3" }, "source": [ "### Necessary Imports\n", "\n", "Recommended installation instructions:\n", "https://pytorch.org/get-started/locally\n", "\n", "This typically involves installing python3, python3-numpy, python3-matplotlib through an installer (anaconda) or system manager (apt), then installing torch and torchvision from python through conda or pip." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "TOyGrPT5ASDc" }, "outputs": [], "source": [ "# Import necessary packages\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina'\n", "\n", "import os\n", "import numpy as np\n", "import torch\n", "import torchvision\n", "import matplotlib.pyplot as plt\n", "from time import time\n", "\n", "from cpmpy import *" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "uLdtrS4zaeEs" }, "source": [ "### Download The Dataset & Define The Transforms\n", "\n", "It is best to do this before the practical; + it allows you to test whether your setup works!\n", "\n", "It will download the files into the __same directory as where you stored this notebook__.\n", "\n", "It will take a minute or two. It will only download it once, so you can rerun this cell over and over." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 119 }, "colab_type": "code", "id": "sZD2NGz2Ak6w", "outputId": "74eec0da-d867-406b-be2c-4013a2162bf1" }, "outputs": [], "source": [ "from torchvision import datasets, transforms\n", "\n", "# Define a transform to normalize the data (effect: all values between -1 and 1)\n", "transform = transforms.Compose([transforms.ToTensor(),\n", " transforms.Normalize((0.5,), (0.5,))])\n", "\n", "# Download and load the training data\n", "trainset = datasets.MNIST('.', download=True, train=True, transform=transform)\n", "testset = datasets.MNIST('.', download=True, train=False, transform=transform)\n", "trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True)\n", "testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True)\n", "print(\"Data loaded.\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "GcAfrn2falkK" }, "source": [ "### Showing a grid" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "colab_type": "code", "id": "F9CppCcqDLtB", "outputId": "0f59838b-90b6-4370-e23a-0e4a0ba90c2a", "scrolled": true }, "outputs": [], "source": [ "def show_grid_img(images):\n", " dim = 9\n", " figure = plt.figure()\n", " num_of_images = dim*dim\n", " for index in range(num_of_images):\n", " plt.subplot(dim, dim, index+1)\n", " plt.axis('off')\n", " plt.imshow(images[index].numpy().squeeze(), cmap='gray_r')\n", "\n", "dataiter = iter(trainloader)\n", "images, labels = dataiter.next()\n", "show_grid_img(images)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "lGyau0mOaP2m" }, "source": [ "### Defining The Neural Network\n", "\n", "We will now define the __architecture__ of the neural network.\n", "\n", "We define two architectures: a standard multi-layer perceptron network (the classical MLP) and a famous vision-specific architecture which is known to do well on MNIST: the LeNet architecture.\n", "\n", "Building an architecture is done _declaratively_ in pytorch, very elegant as you will see.\n", "\n", "### MLP architecture:\n", "(this is the standard fully connected used-since-the-80's network)\n", "\n", "The number of layers and its size are rather arbitrarily chosen." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "n-NR96UtFSkB" }, "source": [ "![](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/image/mlp_mnist.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 153 }, "colab_type": "code", "id": "3WJXInzQGcAy", "outputId": "be31aea8-542d-46ab-9816-85bc418664f9" }, "outputs": [], "source": [ "from torch import nn\n", "\n", "# Build a feed-forward network, declaratively\n", "def MLP():\n", " return nn.Sequential(nn.Flatten(), # Flatten MNIST images into a 784 long vector\n", " nn.Linear(28*28, 120),\n", " nn.ReLU(),\n", " nn.Linear(120, 84), # the image shows the case of (128, 64), see LeNet below for why I changed it\n", " nn.ReLU(),\n", " nn.Linear(84, 10),\n", " nn.LogSoftmax(dim=1))\n", "print(MLP())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LeNet architecture\n", "\n", "We now built a LeNet architecture, introduced by Yann LeCun in 1998. It consists of a convolutional layer followed by max-pooling, again a convolutional layer followed by max pooling, and then two fully connected layer followed by the LogSoftMax() output layer.\n", "\n", "\n", "![](https://i.ibb.co/4tBDWxx/lenet.png)\n", "\n", "The image above shows a graphical representation of the network, and for a an example input what the output of each layer could be.\n", "\n", "_Good to know:_ the convolutions are copied a number of times, each copy creates its own 'channel' (you can think of it as an extra dimension) hence the multiple versions. The weights, which are modified during training, are implicit in each layer in this syntax.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch.nn.functional as F\n", "\n", "class LeNet(nn.Module):\n", " def __init__(self, calibrated=False):\n", " super(LeNet, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 6, 5, 1, padding=2)\n", " self.conv2 = nn.Conv2d(6, 16, 5, 1)\n", " self.fc1 = nn.Linear(5*5*16, 120)\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84,10)\n", "\n", " def forward(self, x):\n", " x = F.relu(self.conv1(x))\n", " x = F.max_pool2d(x, 2, 2)\n", " x = F.relu(self.conv2(x))\n", " x = F.max_pool2d(x, 2, 2)\n", " x = x.view(-1, 5*5*16) \n", " x = F.relu(self.fc1(x))\n", " x = self.fc2(x)\n", " x = self.fc3(x)\n", " return F.log_softmax(x, dim=1)\n", "\n", "print(LeNet())" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "wstRGu4FaJBe" }, "source": [ "### Core Training Of Neural Network\n", "\n", "Training is through stochastic gradient descent, with a given learning rate (lr) and momentum. Together with the number of epochs, the number of passes over the data, these determine the _hyperparameters_." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 306 }, "colab_type": "code", "id": "XCsoAdjdLjPb", "outputId": "6e7a5f80-f945-4e5c-c538-8ef445b6ad3e" }, "outputs": [], "source": [ "from torch import optim\n", "\n", "def train_model(model, optimizer, epochs):\n", " # the stochastic gradient descent loop\n", " criterion = nn.NLLLoss() # negative log likelihood as loss\n", " time0 = time()\n", " print(\"Training starts, using\", len(trainset), \"training instances...\")\n", " for e in range(epochs):\n", " running_loss = 0\n", " for images, labels in trainloader:\n", " \n", " # Training pass\n", " optimizer.zero_grad()\n", " \n", " output = model(images)\n", " loss = criterion(output, labels)\n", " \n", " #This is where the model learns by backpropagating\n", " loss.backward()\n", " \n", " #And optimizes its weights here\n", " optimizer.step()\n", " \n", " running_loss += loss.item()\n", " else:\n", " print(\"Epoch {} - Training loss: {}\".format(e, running_loss/len(trainloader)))\n", " print(\"\\nTraining Time (in minutes) =\",(time()-time0)/60)\n", " # model is now trained\n", "\n", "\n", "# hyperparameters\n", "model = LeNet()\n", "optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)\n", "epochs = 1\n", "\n", "train_model(model, optimizer, epochs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualising the output\n", "\n", "We use a helper function and then show both the input image, its output probability distribution and the maximum likelihood prediction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 261 }, "colab_type": "code", "id": "Ie9Fffl_Mqp6", "outputId": "2e93c6ea-0534-498d-e072-904f62591dfe" }, "outputs": [], "source": [ "def view_classify(img, ps):\n", " ''' Function for viewing an image and it's predicted classes.\n", " '''\n", " ps = ps.data.numpy().squeeze()\n", "\n", " fig, (ax1, ax2) = plt.subplots(figsize=(6,9), ncols=2)\n", " ax1.imshow(img.numpy().squeeze(), cmap='gray_r');\n", " ax1.axis('off')\n", " ax2.barh(np.arange(10), ps)\n", " ax2.set_aspect(0.1)\n", " ax2.set_yticks(np.arange(10))\n", " ax2.set_yticklabels(np.arange(10))\n", " ax2.set_title('Class Probability')\n", " ax2.set_xlim(0, 1.1)\n", " plt.tight_layout()\n", "\n", "def show_one_prediction(images, labels, i=0):\n", " # Turn off gradients to speed up this part\n", " img = images[i].unsqueeze(1) # classifier optimized for batches, make it a batch of 1\n", " with torch.no_grad():\n", " logps = model(img)\n", "\n", " # Output of the network are log-probabilities, need to take exponential for probabilities\n", " ps = torch.exp(logps)\n", " probab = list(ps.numpy()[0])\n", " view_classify(img, ps)\n", " print(\"Predicted Digit =\", probab.index(max(probab)), \"\\tActual =\",int(labels[i]))\n", "\n", "images, labels = next(iter(testloader))\n", "show_one_prediction(images, labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may notice a wrong prediction and varying class probabilities. That's because the network has _high training error_, measured by the training loss.\n", "\n", "__Task for you__: Adjust the hyperparameters (and your patience) to train a better model!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# you can use this box to train alternative model(s) with other hyperparameters and inspect the result\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "wAEvDtiaM6RQ" }, "source": [ "### Model Evaluation\n", "\n", "Manually and visually inspecting a few predictions is good practice, but to know how good your model is you need a proper evaluation.\n", "\n", "The _loss_ computes the negative loglikelihood of the predicted _probabilities_. The code below computes the _accuracy_ of the actualy predicted class (not its probability):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "id": "5sBPmaBONPkT", "outputId": "25b032aa-737f-49d8-d1b5-76f6b9e6b7b6" }, "outputs": [], "source": [ "def eval_model(model, testloader):\n", " # returns (all_count, correct_count, all_loss)\n", " correct_count, all_count = 0, 0\n", " avgloss = 0.0\n", " \n", " criterion = nn.NLLLoss() # negative log likelihood as loss\n", " with torch.no_grad(): # Turn off gradients to speed up this part\n", " for images,labels in testloader:\n", " logprobs = model(images)\n", " avgloss += criterion(logprobs, labels)\n", " preds = torch.argmax(logprobs, dim=1)\n", " correct_count += int(torch.sum(preds == labels))\n", " all_count += len(labels)\n", " return (all_count, correct_count, avgloss/len(testloader))\n", "\n", "(all_count, correct_count, avgloss) = eval_model(model, testloader)\n", "print(\"Number Of Images Tested =\", all_count)\n", "print(\"\\tAvg test loss =\", float(avgloss))\n", "print(\"\\tTest accuracy =\", (correct_count/all_count)*100,\"%\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# another box for you to play with hyperparameters / evaluation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visual sudoku time!\n", "\n", "We will use your trained classifier to predict the output of a 'visual sudoku'. You will then use those predictions to solve the sudoku.\n", "\n", "For __solving the sudoku's__, we will use the CP modeling environment CPMpy.\n", "\n", "First, data prepartion:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# a sudoku, from http://hakank.org/minizinc/sudoku_problems2/index.html\n", "\n", "sudoku_p0 = torch.IntTensor([[0,0,0, 2,0,5, 0,0,0],\n", " [0,9,0, 0,0,0, 7,3,0],\n", " [0,0,2, 0,0,9, 0,6,0],\n", " [2,0,0, 0,0,0, 4,0,9],\n", " [0,0,0, 0,7,0, 0,0,0],\n", " [6,0,9, 0,0,0, 0,0,1],\n", " [0,8,0, 4,0,0, 1,0,0],\n", " [0,6,3, 0,0,0, 0,8,0],\n", " [0,0,0, 6,0,8, 0,0,0]])\n", "sudoku_p0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sample a dataset index with that value/label\n", "def sample_by_label(labels, value):\n", " # primitive but it works...\n", " idxs = torch.randperm(len(labels))\n", " for idx in idxs:\n", " if labels[idx] == value:\n", " return idx\n", "# sample a dataset index for each non-zero number\n", "def sample_visual_sudoku(sudoku_p, loader):\n", " for (images, labels) in loader: # sample one batch\n", " nonzero = sudoku_p > 0\n", " vizsudoku = torch.zeros((9,9,1,28,28), dtype=images.dtype)\n", " idxs = torch.LongTensor([sample_by_label(labels, value) for value in sudoku_p[nonzero]])\n", " vizsudoku[nonzero] = images[idxs]\n", " return vizsudoku\n", " \n", "vizsudoku = sample_visual_sudoku(sudoku_p0, testloader)\n", "show_grid_img(vizsudoku.reshape(-1,28,28)) # list of 81 images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the data prepared, we can now make the predictions!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def predict_sudoku(model, vizsudoku):\n", " nonzero = (vizsudoku.reshape([-1,28,28]).sum(dim=(1,2)) != 0).reshape(9,9)\n", " predsudoku = torch.zeros((9,9), dtype=torch.long)\n", " with torch.no_grad():\n", " images = vizsudoku[nonzero]\n", " logprobs = model(images)\n", " preds = torch.argmax(logprobs, dim=1)\n", " predsudoku[nonzero] = preds\n", " return predsudoku\n", "\n", "preds = predict_sudoku(model, vizsudoku)\n", "preds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see if we made errors..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def vizsudoku_errors(sudoku_p, preds):\n", " has_error = (sudoku_p != preds)\n", " return (torch.sum(has_error).numpy(), sudoku_p*has_error, preds*has_error)\n", "\n", "(cnt, e_real, e_pred) = vizsudoku_errors(sudoku_p0, preds)\n", "print(\"Nr errors:\", int(cnt), \" out of \", int(torch.sum(sudoku_p0 > 0)))\n", "print(\"Labels that were wrongly predicted:\")\n", "e_real" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solving time\n", "\n", "__Task: up to you to compute the sudoku solution given the predictions!__\n", "\n", "Here is an example sudoku model that you can work on:\n", "https://github.com/CPMpy/cpmpy/blob/master/examples/sudoku.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def solve_sudoku(grid, empty_val=0):\n", " \"\"\"\n", " Solve sudoku with given grid 'grid', where 'empty_val' is used to denote empty cells.\n", " returns (Bool, Array) with:\n", " - Bool: whether a satisfying solution was found\n", " - Array: if Bool = True, the filled-in grid.\n", " \"\"\"\n", " return (False, None)\n", "\n", "# try solving the 'true' sudoku, which has a unique solution\n", "solve_sudoku(sudoku_p0.numpy())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# now let's try solving the predicted sudoku:\n", "solve_sudoku(preds.numpy())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing\n", "\n", "Let's see how it behaves on different sets of images (for the same given sudoku)\n", "\n", "__Task: Try sampling 10 different image-sets (using `sample_visual_sudoku`) and compute there prediction accuracy as well as whether the predictions could be solved__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(10):\n", " acc = 0.0\n", " solvable = False\n", " print(f\"Prediction accuracy [{i}]: {acc}, solvable: {solvable}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Saving classifier\n", "\n", "You might want to save your classifier for later, by running the following line: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "torch.save(model.state_dict(), \"myNet\")" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "handwritten_digit_recognition_CPU.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }