{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MNIST with SciKit-Learn and skorch\n",
    "\n",
    "This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.\n",
    "\n",
    "<table align=\"left\"><td>\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb\">\n",
    "    <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>  \n",
    "</td><td>\n",
    "<a target=\"_blank\" href=\"https://github.com/dnouri/skorch/blob/master/notebooks/MNIST.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a></td></table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: If you are running this in [a colab notebook](https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb), we recommend you enable a free GPU by going:\n",
    "\n",
    "> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**\n",
    "\n",
    "If you are running in colab, you should install the dependencies and download the dataset by running the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "! [ ! -z \"$COLAB_GPU\" ] && pip install torch scikit-learn==0.20.* skorch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import fetch_openml\n",
    "from sklearn.model_selection import train_test_split\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data\n",
    "Using SciKit-Learns ```fetch_openml``` to load MNIST data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist = fetch_openml('mnist_784', cache=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(70000, 784)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mnist.data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocessing Data\n",
    "\n",
    "Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.<br />\n",
    "The above ```featch_mldata``` method to load MNIST returns ```data``` and ```target``` as ```uint8``` which we convert to ```float32``` and ```int64``` respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = mnist.data.astype('float32')\n",
    "y = mnist.target.astype('int64')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we will use ReLU as activation in combination with softmax over the output layer, we need to scale `X` down. An often use range is [0, 1]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "X /= 255.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.0, 1.0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.min(), X.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: data is not normalized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((52500, 784), (52500,))"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.shape, y_train.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build Neural Network with Torch\n",
    "Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "import torch.nn.functional as F"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.manual_seed(0);\n",
    "device = 'cuda' if torch.cuda.is_available() else 'cpu'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "mnist_dim = X.shape[1]\n",
    "hidden_dim = int(mnist_dim/8)\n",
    "output_dim = len(np.unique(mnist.target))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(784, 98, 10)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mnist_dim, hidden_dim, output_dim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Neural network in PyTorch's framework."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ClassifierModule(nn.Module):\n",
    "    def __init__(\n",
    "            self,\n",
    "            input_dim=mnist_dim,\n",
    "            hidden_dim=hidden_dim,\n",
    "            output_dim=output_dim,\n",
    "            dropout=0.5,\n",
    "    ):\n",
    "        super(ClassifierModule, self).__init__()\n",
    "        self.dropout = nn.Dropout(dropout)\n",
    "\n",
    "        self.hidden = nn.Linear(input_dim, hidden_dim)\n",
    "        self.output = nn.Linear(hidden_dim, output_dim)\n",
    "\n",
    "    def forward(self, X, **kwargs):\n",
    "        X = F.relu(self.hidden(X))\n",
    "        X = self.dropout(X)\n",
    "        X = F.softmax(self.output(X), dim=-1)\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Skorch allows to use PyTorch's networks in the SciKit-Learn setting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from skorch import NeuralNetClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = NeuralNetClassifier(\n",
    "    ClassifierModule,\n",
    "    max_epochs=20,\n",
    "    lr=0.1,\n",
    "    device=device,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  epoch    train_loss    valid_acc    valid_loss     dur\n",
      "-------  ------------  -----------  ------------  ------\n",
      "      1        \u001b[36m0.8321\u001b[0m       \u001b[32m0.8828\u001b[0m        \u001b[35m0.4077\u001b[0m  0.7626\n",
      "      2        \u001b[36m0.4306\u001b[0m       \u001b[32m0.9110\u001b[0m        \u001b[35m0.3121\u001b[0m  0.4984\n",
      "      3        \u001b[36m0.3623\u001b[0m       \u001b[32m0.9221\u001b[0m        \u001b[35m0.2649\u001b[0m  0.5147\n",
      "      4        \u001b[36m0.3241\u001b[0m       \u001b[32m0.9298\u001b[0m        \u001b[35m0.2457\u001b[0m  0.5040\n",
      "      5        \u001b[36m0.2942\u001b[0m       \u001b[32m0.9373\u001b[0m        \u001b[35m0.2129\u001b[0m  0.5629\n",
      "      6        \u001b[36m0.2707\u001b[0m       \u001b[32m0.9411\u001b[0m        \u001b[35m0.1974\u001b[0m  0.5093\n",
      "      7        \u001b[36m0.2554\u001b[0m       \u001b[32m0.9439\u001b[0m        \u001b[35m0.1836\u001b[0m  0.5055\n",
      "      8        \u001b[36m0.2487\u001b[0m       \u001b[32m0.9480\u001b[0m        \u001b[35m0.1754\u001b[0m  0.5102\n",
      "      9        \u001b[36m0.2276\u001b[0m       0.9473        \u001b[35m0.1730\u001b[0m  0.5055\n",
      "     10        \u001b[36m0.2229\u001b[0m       \u001b[32m0.9524\u001b[0m        \u001b[35m0.1612\u001b[0m  0.4966\n",
      "     11        \u001b[36m0.2158\u001b[0m       0.9511        \u001b[35m0.1600\u001b[0m  0.5048\n",
      "     12        \u001b[36m0.2059\u001b[0m       \u001b[32m0.9556\u001b[0m        \u001b[35m0.1501\u001b[0m  0.4979\n",
      "     13        \u001b[36m0.1988\u001b[0m       \u001b[32m0.9572\u001b[0m        \u001b[35m0.1429\u001b[0m  0.4973\n",
      "     14        \u001b[36m0.1934\u001b[0m       0.9563        0.1460  0.4981\n",
      "     15        \u001b[36m0.1915\u001b[0m       \u001b[32m0.9595\u001b[0m        \u001b[35m0.1355\u001b[0m  0.5030\n",
      "     16        \u001b[36m0.1881\u001b[0m       \u001b[32m0.9607\u001b[0m        \u001b[35m0.1325\u001b[0m  0.5013\n",
      "     17        \u001b[36m0.1816\u001b[0m       0.9602        \u001b[35m0.1302\u001b[0m  0.5003\n",
      "     18        \u001b[36m0.1796\u001b[0m       0.9601        \u001b[35m0.1285\u001b[0m  0.4977\n",
      "     19        \u001b[36m0.1767\u001b[0m       \u001b[32m0.9624\u001b[0m        \u001b[35m0.1248\u001b[0m  0.5056\n",
      "     20        \u001b[36m0.1716\u001b[0m       \u001b[32m0.9628\u001b[0m        \u001b[35m0.1236\u001b[0m  0.5080\n"
     ]
    }
   ],
   "source": [
    "net.fit(X_train, y_train);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "predicted = net.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.962"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(predicted == y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An accuracy of nearly 96% for a network with only one hidden layer is not too bad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convolutional Network\n",
    "PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:\n",
    "* Batch size\n",
    "* Number of channel\n",
    "* Height\n",
    "* Width\n",
    "\n",
    "As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "XCnn = X.reshape(-1, 1, 28, 28)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(70000, 1, 28, 28)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "XCnn.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((52500, 1, 28, 28), (52500,))"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "XCnn_train.shape, y_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Cnn(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Cnn, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)\n",
    "        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)\n",
    "        self.conv2_drop = nn.Dropout2d()\n",
    "        self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height\n",
    "        self.fc2 = nn.Linear(128, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = F.relu(F.max_pool2d(self.conv1(x), 2))\n",
    "        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n",
    "        x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.dropout(x, training=self.training)\n",
    "        x = self.fc2(x)\n",
    "        x = F.softmax(x, dim=-1)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "cnn = NeuralNetClassifier(\n",
    "    Cnn,\n",
    "    max_epochs=15,\n",
    "    lr=1,\n",
    "    optimizer=torch.optim.Adadelta,\n",
    "    device=device,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  epoch    train_loss    valid_acc    valid_loss     dur\n",
      "-------  ------------  -----------  ------------  ------\n",
      "      1        \u001b[36m0.4136\u001b[0m       \u001b[32m0.9711\u001b[0m        \u001b[35m0.0949\u001b[0m  1.7914\n",
      "      2        \u001b[36m0.1402\u001b[0m       \u001b[32m0.9798\u001b[0m        \u001b[35m0.0636\u001b[0m  1.0294\n",
      "      3        \u001b[36m0.1129\u001b[0m       \u001b[32m0.9811\u001b[0m        \u001b[35m0.0628\u001b[0m  1.0192\n",
      "      4        \u001b[36m0.0961\u001b[0m       \u001b[32m0.9851\u001b[0m        \u001b[35m0.0482\u001b[0m  1.0338\n",
      "      5        \u001b[36m0.0847\u001b[0m       0.9846        0.0517  1.0152\n",
      "      6        \u001b[36m0.0772\u001b[0m       \u001b[32m0.9864\u001b[0m        \u001b[35m0.0446\u001b[0m  1.0351\n",
      "      7        \u001b[36m0.0669\u001b[0m       \u001b[32m0.9871\u001b[0m        \u001b[35m0.0442\u001b[0m  1.0360\n",
      "      8        \u001b[36m0.0638\u001b[0m       0.9871        \u001b[35m0.0426\u001b[0m  1.0318\n",
      "      9        \u001b[36m0.0612\u001b[0m       \u001b[32m0.9886\u001b[0m        \u001b[35m0.0394\u001b[0m  1.0215\n",
      "     10        \u001b[36m0.0582\u001b[0m       0.9882        0.0410  1.0182\n",
      "     11        \u001b[36m0.0541\u001b[0m       \u001b[32m0.9887\u001b[0m        \u001b[35m0.0367\u001b[0m  1.0259\n",
      "     12        \u001b[36m0.0513\u001b[0m       \u001b[32m0.9894\u001b[0m        0.0378  1.0252\n",
      "     13        \u001b[36m0.0481\u001b[0m       \u001b[32m0.9898\u001b[0m        \u001b[35m0.0360\u001b[0m  1.0383\n",
      "     14        \u001b[36m0.0478\u001b[0m       0.9898        0.0362  1.0299\n",
      "     15        \u001b[36m0.0466\u001b[0m       \u001b[32m0.9902\u001b[0m        \u001b[35m0.0352\u001b[0m  1.0203\n"
     ]
    }
   ],
   "source": [
    "cnn.fit(XCnn_train, y_train);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "cnn_pred = cnn.predict(XCnn_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9891428571428571"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(cnn_pred == y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An accuracy of 99.1% should suffice for this example!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}