# MNIST with SciKit-Learn and skorch

This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with SciKit-Learn.

<table align="left"><td>
<a target="_blank" href="https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>  
</td><td>
<a target="_blank" href="https://github.com/dnouri/skorch/blob/master/notebooks/MNIST.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a></td></table>

**Note**: If you are running this in [a colab notebook](https://colab.research.google.com/github/dnouri/skorch/blob/master/notebooks/MNIST.ipynb), we recommend you enable a free GPU by going:

> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**

If you are running in colab, you should install the dependencies and download the dataset by running the following cell:

In [1]:
! [ ! -z "$COLAB_GPU" ] && pip install torch scikit-learn==0.20.* skorch

In [2]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import numpy as np

## Loading Data
Using SciKit-Learns ```fetch_openml``` to load MNIST data.

In [3]:
mnist = fetch_openml('mnist_784', cache=False)

In [4]:
mnist.data.shape

(70000, 784)

## Preprocessing Data

Each image of the MNIST dataset is encoded in a 784 dimensional vector, representing a 28 x 28 pixel image. Each pixel has a value between 0 and 255, corresponding to the grey-value of a pixel.<br />
The above ```featch_mldata``` method to load MNIST returns ```data``` and ```target``` as ```uint8``` which we convert to ```float32``` and ```int64``` respectively.

In [5]:
X = mnist.data.astype('float32')
y = mnist.target.astype('int64')

As we will use ReLU as activation in combination with softmax over the output layer, we need to scale `X` down. An often use range is [0, 1].

In [6]:
X /= 255.0

In [7]:
X.min(), X.max()

(0.0, 1.0)

Note: data is not normalized.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [9]:
assert(X_train.shape[0] + X_test.shape[0] == mnist.data.shape[0])

In [10]:
X_train.shape, y_train.shape

((52500, 784), (52500,))

## Build Neural Network with Torch
Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9.

In [11]:
import torch
from torch import nn
import torch.nn.functional as F

In [12]:
torch.manual_seed(0);
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [13]:
mnist_dim = X.shape[1]
hidden_dim = int(mnist_dim/8)
output_dim = len(np.unique(mnist.target))

In [14]:
mnist_dim, hidden_dim, output_dim

(784, 98, 10)

A Neural network in PyTorch's framework.

In [15]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            input_dim=mnist_dim,
            hidden_dim=hidden_dim,
            output_dim=output_dim,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.dropout = nn.Dropout(dropout)

        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

Skorch allows to use PyTorch's networks in the SciKit-Learn setting.

In [16]:
from skorch import NeuralNetClassifier

In [17]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    device=device,
)

In [18]:
net.fit(X_train, y_train);

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.8321[0m       [32m0.8828[0m        [35m0.4077[0m  0.7626
      2        [36m0.4306[0m       [32m0.9110[0m        [35m0.3121[0m  0.4984
      3        [36m0.3623[0m       [32m0.9221[0m        [35m0.2649[0m  0.5147
      4        [36m0.3241[0m       [32m0.9298[0m        [35m0.2457[0m  0.5040
      5        [36m0.2942[0m       [32m0.9373[0m        [35m0.2129[0m  0.5629
      6        [36m0.2707[0m       [32m0.9411[0m        [35m0.1974[0m  0.5093
      7        [36m0.2554[0m       [32m0.9439[0m        [35m0.1836[0m  0.5055
      8        [36m0.2487[0m       [32m0.9480[0m        [35m0.1754[0m  0.5102
      9        [36m0.2276[0m       0.9473        [35m0.1730[0m  0.5055
     10        [36m0.2229[0m       [32m0.9524[0m        [35m0.1612[0m  0.4966
     11        [36m0.2158[0m       0.9511        [35

## Prediction

In [19]:
predicted = net.predict(X_test)

In [20]:
np.mean(predicted == y_test)

0.962

An accuracy of nearly 96% for a network with only one hidden layer is not too bad

# Convolutional Network
PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:
* Batch size
* Number of channel
* Height
* Width

As initial batch size the number of examples needs to be provided. MNIST data has only one channel. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for PyTorch tensor needs to be (x, 1, 28, 28). 

In [21]:
XCnn = X.reshape(-1, 1, 28, 28)

In [22]:
XCnn.shape

(70000, 1, 28, 28)

In [23]:
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)

In [24]:
XCnn_train.shape, y_train.shape

((52500, 1, 28, 28), (52500,))

In [25]:
class Cnn(nn.Module):
    def __init__(self):
        super(Cnn, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(1600, 128) # 1600 = number channels * width * height
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3)) # flatten over channel, height and width = 1600
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        x = F.softmax(x, dim=-1)
        return x

In [26]:
cnn = NeuralNetClassifier(
    Cnn,
    max_epochs=15,
    lr=1,
    optimizer=torch.optim.Adadelta,
    device=device,
)

In [27]:
cnn.fit(XCnn_train, y_train);

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.4136[0m       [32m0.9711[0m        [35m0.0949[0m  1.7914
      2        [36m0.1402[0m       [32m0.9798[0m        [35m0.0636[0m  1.0294
      3        [36m0.1129[0m       [32m0.9811[0m        [35m0.0628[0m  1.0192
      4        [36m0.0961[0m       [32m0.9851[0m        [35m0.0482[0m  1.0338
      5        [36m0.0847[0m       0.9846        0.0517  1.0152
      6        [36m0.0772[0m       [32m0.9864[0m        [35m0.0446[0m  1.0351
      7        [36m0.0669[0m       [32m0.9871[0m        [35m0.0442[0m  1.0360
      8        [36m0.0638[0m       0.9871        [35m0.0426[0m  1.0318
      9        [36m0.0612[0m       [32m0.9886[0m        [35m0.0394[0m  1.0215
     10        [36m0.0582[0m       0.9882        0.0410  1.0182
     11        [36m0.0541[0m       [32m0.9887[0m        [35m0.0367[0m  1.0259
     12

In [28]:
cnn_pred = cnn.predict(XCnn_test)

In [29]:
np.mean(cnn_pred == y_test)

0.9891428571428571

An accuracy of 99.1% should suffice for this example!