{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MNIST with SciKit-Learn and skorch\n", "\n", "This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import fetch_mldata\n", "from sklearn.model_selection import train_test_split\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Data\n", "Using SciKit-Learns ```fetch_mldata``` to load MNIST data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "mnist = fetch_mldata('MNIST original', data_home='../datasets/')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'DESCR': 'mldata.org dataset: mnist-original',\n", " 'COL_NAMES': ['label', 'data'],\n", " 'target': array([0., 0., 0., ..., 9., 9., 9.]),\n", " 'data': array([[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ...,\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mnist" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(70000, 784)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mnist.data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preprocessing Data\n", "\n", "Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.
\n", "The above ```featch_mldata``` method to load MNIST returns ```data``` and ```target``` as ```uint8``` which we convert to ```float32``` and ```int64``` respectively." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "X = mnist.data.astype('float32')\n", "y = mnist.target.astype('int64')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will use ReLU as activation in combination with softmax over the output layer, we need to scale `X` down. An often use range is [0, 1]." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "X /= 255.0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 1.0)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.min(), X.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: data is not normalized." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((52500, 784), (52500,))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train.shape, y_train.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build Neural Network with Torch\n", "Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torch import nn\n", "import torch.nn.functional as F" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "torch.manual_seed(0);\n", "device = 'cuda' if torch.cuda.is_available() else 'cpu'" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "mnist_dim = X.shape[1]\n", "hidden_dim = int(mnist_dim/8)\n", "output_dim = len(np.unique(mnist.target))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(784, 98, 10)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mnist_dim, hidden_dim, output_dim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A Neural network in PyTorch's framework." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "class ClassifierModule(nn.Module):\n", " def __init__(\n", " self,\n", " input_dim=mnist_dim,\n", " hidden_dim=hidden_dim,\n", " output_dim=output_dim,\n", " dropout=0.5,\n", " ):\n", " super(ClassifierModule, self).__init__()\n", " self.dropout = nn.Dropout(dropout)\n", "\n", " self.hidden = nn.Linear(input_dim, hidden_dim)\n", " self.output = nn.Linear(hidden_dim, output_dim)\n", "\n", " def forward(self, X, **kwargs):\n", " X = F.relu(self.hidden(X))\n", " X = self.dropout(X)\n", " X = F.softmax(self.output(X), dim=-1)\n", " return X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Skorch allows to use PyTorch's networks in the SciKit-Learn setting." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from skorch.net import NeuralNetClassifier" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "net = NeuralNetClassifier(\n", " ClassifierModule,\n", " max_epochs=20,\n", " lr=0.1,\n", " device=device,\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ ------\n", " 1 \u001b[36m0.8284\u001b[0m \u001b[32m0.9010\u001b[0m \u001b[35m0.3771\u001b[0m 5.4626\n", " 2 \u001b[36m0.4376\u001b[0m \u001b[32m0.9199\u001b[0m \u001b[35m0.2879\u001b[0m 5.6434\n", " 3 \u001b[36m0.3664\u001b[0m \u001b[32m0.9308\u001b[0m \u001b[35m0.2458\u001b[0m 5.1058\n", " 4 \u001b[36m0.3239\u001b[0m \u001b[32m0.9385\u001b[0m \u001b[35m0.2150\u001b[0m 4.9935\n", " 5 \u001b[36m0.2971\u001b[0m \u001b[32m0.9448\u001b[0m \u001b[35m0.1947\u001b[0m 6.2501\n", " 6 \u001b[36m0.2755\u001b[0m \u001b[32m0.9474\u001b[0m \u001b[35m0.1823\u001b[0m 5.3603\n", " 7 \u001b[36m0.2643\u001b[0m \u001b[32m0.9514\u001b[0m \u001b[35m0.1712\u001b[0m 5.3241\n", " 8 \u001b[36m0.2443\u001b[0m \u001b[32m0.9541\u001b[0m \u001b[35m0.1585\u001b[0m 5.5232\n", " 9 \u001b[36m0.2346\u001b[0m \u001b[32m0.9557\u001b[0m \u001b[35m0.1500\u001b[0m 4.8883\n", " 10 \u001b[36m0.2257\u001b[0m \u001b[32m0.9577\u001b[0m \u001b[35m0.1447\u001b[0m 4.6561\n", " 11 \u001b[36m0.2165\u001b[0m \u001b[32m0.9594\u001b[0m \u001b[35m0.1394\u001b[0m 6.2466\n", " 12 \u001b[36m0.2093\u001b[0m \u001b[32m0.9600\u001b[0m \u001b[35m0.1338\u001b[0m 5.2418\n", " 13 \u001b[36m0.2045\u001b[0m \u001b[32m0.9610\u001b[0m \u001b[35m0.1297\u001b[0m 5.6362\n", " 14 \u001b[36m0.1969\u001b[0m \u001b[32m0.9620\u001b[0m \u001b[35m0.1263\u001b[0m 5.5742\n", " 15 \u001b[36m0.1931\u001b[0m \u001b[32m0.9629\u001b[0m \u001b[35m0.1223\u001b[0m 5.1409\n", " 16 \u001b[36m0.1893\u001b[0m \u001b[32m0.9647\u001b[0m \u001b[35m0.1191\u001b[0m 6.2617\n", " 17 \u001b[36m0.1849\u001b[0m \u001b[32m0.9651\u001b[0m \u001b[35m0.1185\u001b[0m 6.5456\n", " 18 \u001b[36m0.1803\u001b[0m \u001b[32m0.9657\u001b[0m \u001b[35m0.1155\u001b[0m 6.8025\n", " 19 \u001b[36m0.1765\u001b[0m \u001b[32m0.9665\u001b[0m \u001b[35m0.1136\u001b[0m 5.0039\n", " 20 \u001b[36m0.1721\u001b[0m \u001b[32m0.9667\u001b[0m \u001b[35m0.1103\u001b[0m 5.0675\n" ] } ], "source": [ "net.fit(X_train, y_train);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prediction" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "predicted = net.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9653142857142857" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(predicted == y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An accuracy of nearly 96% for a network with only one hidden layer is not too bad" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional Network\n", "PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:\n", "* Batch size\n", "* Number of channel\n", "* Height\n", "* Width\n", "\n", "As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28). " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "XCnn = X.reshape(-1, 1, 28, 28)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(70000, 1, 28, 28)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "XCnn.shape" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((52500, 1, 28, 28), (52500,))" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "XCnn_train.shape, y_train.shape" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "class Cnn(nn.Module):\n", " def __init__(self):\n", " super(Cnn, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 32, kernel_size=3)\n", " self.conv2 = nn.Conv2d(32, 64, kernel_size=3)\n", " self.conv2_drop = nn.Dropout2d()\n", " self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height\n", " self.fc2 = nn.Linear(128, 10)\n", "\n", " def forward(self, x):\n", " x = F.relu(F.max_pool2d(self.conv1(x), 2))\n", " x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n", " x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600\n", " x = F.relu(self.fc1(x))\n", " x = F.dropout(x, training=self.training)\n", " x = self.fc2(x)\n", " x = F.softmax(x, dim=-1)\n", " return x" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "cnn = NeuralNetClassifier(\n", " Cnn,\n", " max_epochs=15,\n", " lr=1,\n", " optimizer=torch.optim.Adadelta,\n", " device=device,\n", ")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ ------\n", " 1 \u001b[36m0.4692\u001b[0m \u001b[32m0.9730\u001b[0m \u001b[35m0.0873\u001b[0m 7.4336\n", " 2 \u001b[36m0.1503\u001b[0m \u001b[32m0.9818\u001b[0m \u001b[35m0.0601\u001b[0m 6.6657\n", " 3 \u001b[36m0.1177\u001b[0m \u001b[32m0.9834\u001b[0m \u001b[35m0.0525\u001b[0m 6.5910\n", " 4 \u001b[36m0.1037\u001b[0m \u001b[32m0.9846\u001b[0m \u001b[35m0.0476\u001b[0m 7.9510\n", " 5 \u001b[36m0.0889\u001b[0m \u001b[32m0.9847\u001b[0m \u001b[35m0.0446\u001b[0m 6.5556\n", " 6 \u001b[36m0.0808\u001b[0m \u001b[32m0.9873\u001b[0m \u001b[35m0.0407\u001b[0m 6.4084\n", " 7 \u001b[36m0.0724\u001b[0m \u001b[32m0.9878\u001b[0m \u001b[35m0.0384\u001b[0m 6.1549\n", " 8 \u001b[36m0.0680\u001b[0m 0.9875 \u001b[35m0.0379\u001b[0m 5.7811\n", " 9 \u001b[36m0.0646\u001b[0m \u001b[32m0.9885\u001b[0m \u001b[35m0.0376\u001b[0m 6.2944\n", " 10 \u001b[36m0.0582\u001b[0m 0.9883 \u001b[35m0.0370\u001b[0m 5.6687\n", " 11 \u001b[36m0.0578\u001b[0m 0.9879 \u001b[35m0.0350\u001b[0m 5.7188\n", " 12 \u001b[36m0.0542\u001b[0m 0.9879 0.0380 6.4705\n", " 13 \u001b[36m0.0523\u001b[0m \u001b[32m0.9904\u001b[0m \u001b[35m0.0326\u001b[0m 6.1535\n", " 14 \u001b[36m0.0493\u001b[0m 0.9884 0.0343 6.5948\n", " 15 0.0498 0.9900 \u001b[35m0.0316\u001b[0m 5.5662\n" ] } ], "source": [ "cnn.fit(XCnn_train, y_train);" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "cnn_pred = cnn.predict(XCnn_test)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9912571428571428" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(cnn_pred == y_test)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "An accuracy of 99.1% should suffice for this example!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }