{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MNIST with SciKit-Learn and skorch\n", "\n", "This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.\n", "\n", "
\n", "\n", " Run in Google Colab \n", "\n", "View source on GitHub
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: If you are running this in [a colab notebook](https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb), we recommend you enable a free GPU by going:\n", "\n", "> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**\n", "\n", "If you are running in colab, you should install the dependencies and download the dataset by running the following cell:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "! [ ! -z \"$COLAB_GPU\" ] && pip install torch scikit-learn==0.20.* skorch" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import fetch_openml\n", "from sklearn.model_selection import train_test_split\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Data\n", "Using SciKit-Learns ```fetch_openml``` to load MNIST data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "mnist = fetch_openml('mnist_784', cache=False)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(70000, 784)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mnist.data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preprocessing Data\n", "\n", "Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
\n", "The above ```featch_mldata``` method to load MNIST returns ```data``` and ```target``` as ```uint8``` which we convert to ```float32``` and ```int64``` respectively." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "X = mnist.data.astype('float32')\n", "y = mnist.target.astype('int64')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will use ReLU as activation in combination with softmax over the output layer, we need to scale `X` down. An often use range is [0, 1]." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "X /= 255.0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 1.0)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.min(), X.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: data is not normalized." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((52500, 784), (52500,))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train.shape, y_train.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build Neural Network with Torch\n", "Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torch import nn\n", "import torch.nn.functional as F" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "torch.manual_seed(0);\n", "device = 'cuda' if torch.cuda.is_available() else 'cpu'" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "mnist_dim = X.shape[1]\n", "hidden_dim = int(mnist_dim/8)\n", "output_dim = len(np.unique(mnist.target))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(784, 98, 10)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mnist_dim, hidden_dim, output_dim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Neural network in PyTorch's framework." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "class ClassifierModule(nn.Module):\n", " def __init__(\n", " self,\n", " input_dim=mnist_dim,\n", " hidden_dim=hidden_dim,\n", " output_dim=output_dim,\n", " dropout=0.5,\n", " ):\n", " super(ClassifierModule, self).__init__()\n", " self.dropout = nn.Dropout(dropout)\n", "\n", " self.hidden = nn.Linear(input_dim, hidden_dim)\n", " self.output = nn.Linear(hidden_dim, output_dim)\n", "\n", " def forward(self, X, **kwargs):\n", " X = F.relu(self.hidden(X))\n", " X = self.dropout(X)\n", " X = F.softmax(self.output(X), dim=-1)\n", " return X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Skorch allows to use PyTorch's networks in the SciKit-Learn setting." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from skorch import NeuralNetClassifier" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "net = NeuralNetClassifier(\n", " ClassifierModule,\n", " max_epochs=20,\n", " lr=0.1,\n", " device=device,\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ ------\n", " 1 \u001b[36m0.8321\u001b[0m \u001b[32m0.8828\u001b[0m \u001b[35m0.4077\u001b[0m 0.7626\n", " 2 \u001b[36m0.4306\u001b[0m \u001b[32m0.9110\u001b[0m \u001b[35m0.3121\u001b[0m 0.4984\n", " 3 \u001b[36m0.3623\u001b[0m \u001b[32m0.9221\u001b[0m \u001b[35m0.2649\u001b[0m 0.5147\n", " 4 \u001b[36m0.3241\u001b[0m \u001b[32m0.9298\u001b[0m \u001b[35m0.2457\u001b[0m 0.5040\n", " 5 \u001b[36m0.2942\u001b[0m \u001b[32m0.9373\u001b[0m \u001b[35m0.2129\u001b[0m 0.5629\n", " 6 \u001b[36m0.2707\u001b[0m \u001b[32m0.9411\u001b[0m \u001b[35m0.1974\u001b[0m 0.5093\n", " 7 \u001b[36m0.2554\u001b[0m \u001b[32m0.9439\u001b[0m \u001b[35m0.1836\u001b[0m 0.5055\n", " 8 \u001b[36m0.2487\u001b[0m \u001b[32m0.9480\u001b[0m \u001b[35m0.1754\u001b[0m 0.5102\n", " 9 \u001b[36m0.2276\u001b[0m 0.9473 \u001b[35m0.1730\u001b[0m 0.5055\n", " 10 \u001b[36m0.2229\u001b[0m \u001b[32m0.9524\u001b[0m \u001b[35m0.1612\u001b[0m 0.4966\n", " 11 \u001b[36m0.2158\u001b[0m 0.9511 \u001b[35m0.1600\u001b[0m 0.5048\n", " 12 \u001b[36m0.2059\u001b[0m \u001b[32m0.9556\u001b[0m \u001b[35m0.1501\u001b[0m 0.4979\n", " 13 \u001b[36m0.1988\u001b[0m \u001b[32m0.9572\u001b[0m \u001b[35m0.1429\u001b[0m 0.4973\n", " 14 \u001b[36m0.1934\u001b[0m 0.9563 0.1460 0.4981\n", " 15 \u001b[36m0.1915\u001b[0m \u001b[32m0.9595\u001b[0m \u001b[35m0.1355\u001b[0m 0.5030\n", " 16 \u001b[36m0.1881\u001b[0m \u001b[32m0.9607\u001b[0m \u001b[35m0.1325\u001b[0m 0.5013\n", " 17 \u001b[36m0.1816\u001b[0m 0.9602 \u001b[35m0.1302\u001b[0m 0.5003\n", " 18 \u001b[36m0.1796\u001b[0m 0.9601 \u001b[35m0.1285\u001b[0m 0.4977\n", " 19 \u001b[36m0.1767\u001b[0m \u001b[32m0.9624\u001b[0m \u001b[35m0.1248\u001b[0m 0.5056\n", " 20 \u001b[36m0.1716\u001b[0m \u001b[32m0.9628\u001b[0m \u001b[35m0.1236\u001b[0m 0.5080\n" ] } ], "source": [ "net.fit(X_train, y_train);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prediction" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "predicted = net.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.962" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(predicted == y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An accuracy of nearly 96% for a network with only one hidden layer is not too bad" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional Network\n", "PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:\n", "* Batch size\n", "* Number of channel\n", "* Height\n", "* Width\n", "\n", "As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28). " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "XCnn = X.reshape(-1, 1, 28, 28)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(70000, 1, 28, 28)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "XCnn.shape" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((52500, 1, 28, 28), (52500,))" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "XCnn_train.shape, y_train.shape" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "class Cnn(nn.Module):\n", " def __init__(self):\n", " super(Cnn, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 32, kernel_size=3)\n", " self.conv2 = nn.Conv2d(32, 64, kernel_size=3)\n", " self.conv2_drop = nn.Dropout2d()\n", " self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height\n", " self.fc2 = nn.Linear(128, 10)\n", "\n", " def forward(self, x):\n", " x = F.relu(F.max_pool2d(self.conv1(x), 2))\n", " x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n", " x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600\n", " x = F.relu(self.fc1(x))\n", " x = F.dropout(x, training=self.training)\n", " x = self.fc2(x)\n", " x = F.softmax(x, dim=-1)\n", " return x" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "cnn = NeuralNetClassifier(\n", " Cnn,\n", " max_epochs=15,\n", " lr=1,\n", " optimizer=torch.optim.Adadelta,\n", " device=device,\n", ")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ ------\n", " 1 \u001b[36m0.4136\u001b[0m \u001b[32m0.9711\u001b[0m \u001b[35m0.0949\u001b[0m 1.7914\n", " 2 \u001b[36m0.1402\u001b[0m \u001b[32m0.9798\u001b[0m \u001b[35m0.0636\u001b[0m 1.0294\n", " 3 \u001b[36m0.1129\u001b[0m \u001b[32m0.9811\u001b[0m \u001b[35m0.0628\u001b[0m 1.0192\n", " 4 \u001b[36m0.0961\u001b[0m \u001b[32m0.9851\u001b[0m \u001b[35m0.0482\u001b[0m 1.0338\n", " 5 \u001b[36m0.0847\u001b[0m 0.9846 0.0517 1.0152\n", " 6 \u001b[36m0.0772\u001b[0m \u001b[32m0.9864\u001b[0m \u001b[35m0.0446\u001b[0m 1.0351\n", " 7 \u001b[36m0.0669\u001b[0m \u001b[32m0.9871\u001b[0m \u001b[35m0.0442\u001b[0m 1.0360\n", " 8 \u001b[36m0.0638\u001b[0m 0.9871 \u001b[35m0.0426\u001b[0m 1.0318\n", " 9 \u001b[36m0.0612\u001b[0m \u001b[32m0.9886\u001b[0m \u001b[35m0.0394\u001b[0m 1.0215\n", " 10 \u001b[36m0.0582\u001b[0m 0.9882 0.0410 1.0182\n", " 11 \u001b[36m0.0541\u001b[0m \u001b[32m0.9887\u001b[0m \u001b[35m0.0367\u001b[0m 1.0259\n", " 12 \u001b[36m0.0513\u001b[0m \u001b[32m0.9894\u001b[0m 0.0378 1.0252\n", " 13 \u001b[36m0.0481\u001b[0m \u001b[32m0.9898\u001b[0m \u001b[35m0.0360\u001b[0m 1.0383\n", " 14 \u001b[36m0.0478\u001b[0m 0.9898 0.0362 1.0299\n", " 15 \u001b[36m0.0466\u001b[0m \u001b[32m0.9902\u001b[0m \u001b[35m0.0352\u001b[0m 1.0203\n" ] } ], "source": [ "cnn.fit(XCnn_train, y_train);" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "cnn_pred = cnn.predict(XCnn_test)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9891428571428571" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(cnn_pred == y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An accuracy of 99.1% should suffice for this example!" ] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }