{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "4rU_kfN6q-0v" }, "source": [ "# MNIST with torchvision and skorch\n", "\n", "This notebooks shows how to define and train a simple Neural-Network with PyTorch and use it via skorch with the help of torchvision.\n", "\n", "
\n", "\n", " Run in Google Colab \n", "\n", "View source on GitHub
" ] }, { "cell_type": "markdown", "metadata": { "id": "7OYtKLj-q-03" }, "source": [ "**Note**: If you are running this in [a colab notebook](https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/MNIST-torchvision.ipynb), we recommend you enable a free GPU by going:\n", "\n", "> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**\n", "\n", "If you are running in colab, you should install the dependencies and download the dataset by running the following cell:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "fJW3IR6Mq-06" }, "outputs": [], "source": [ "import subprocess\n", "\n", "# Installation on Google Colab\n", "try:\n", " import google.colab\n", " subprocess.run(['python', '-m', 'pip', 'install', 'skorch' , 'torch', 'torchvision'])\n", "except ImportError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "nCr4NPnIq-09" }, "outputs": [], "source": [ "from itertools import islice\n", "\n", "from sklearn.model_selection import train_test_split\n", "import torch\n", "import torchvision\n", "from torchvision.datasets import MNIST\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "35jlO3R5q-1A" }, "outputs": [], "source": [ "USE_TENSORBOARD = True # whether to use TensorBoard\n", "DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'\n", "MNIST_FLAT_DIM = 28 * 28" ] }, { "cell_type": "markdown", "metadata": { "id": "cCEm6xAWq-1B" }, "source": [ "## Loading Data\n", "\n", "Use torchvision's data repository to provide MNIST data in form of a torch `Dataset`. Originally, the `MNIST` dataset provides 28x28 `PIL` images. To use them with PyTorch, we convert those to tensors by adding the `ToTensor` transform." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "y8MuBRW5q-1C" }, "outputs": [], "source": [ "mnist_train = MNIST('datasets', train=True, download=True, transform=torchvision.transforms.Compose([\n", " torchvision.transforms.ToTensor(),\n", "]))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "XzcCY_4bq-1D" }, "outputs": [], "source": [ "mnist_test = MNIST('datasets', train=False, download=True, transform=torchvision.transforms.Compose([\n", " torchvision.transforms.ToTensor(),\n", "]))" ] }, { "cell_type": "markdown", "metadata": { "id": "1Ne8gcaSq-1G" }, "source": [ "## Taking a look at the data\n", "\n", "Each entry in the `mnist_train` and `mnist_test` Dataset instances consists of a 28 x 28 images and the corresponding label (numbers between 0 and 9). The image data is already normalized to the range [0; 1]. Let's take a look at the first 5 images of the training set:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "OYMmCHJ9q-1I" }, "outputs": [], "source": [ "X_example, y_example = zip(*islice(iter(mnist_train), 5))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "f6KVmZm1q-1K", "outputId": "be31b0b7-c7d5-4be0-9cc6-4552e75541fc" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(tensor(0.), tensor(1.))" ] }, "metadata": {}, "execution_count": 7 } ], "source": [ "\n", "X_example[0].min(), X_example[0].max()" ] }, { "cell_type": "markdown", "metadata": { "id": "wgjNeDTLq-1L" }, "source": [ "### Print a selection of training images and their labels" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "RF6Ki7_aq-1L" }, "outputs": [], "source": [ "def plot_example(X, y, n=5):\n", " \"\"\"Plot the images in X and their labels in rows of `n` elements.\"\"\"\n", " fig = plt.figure()\n", " rows = len(X) // n + 1\n", " for i, (img, y) in enumerate(zip(X, y)):\n", " ax = fig.add_subplot(rows, n, i + 1)\n", " ax.imshow(img.reshape(28, 28))\n", " ax.set_xticks([])\n", " ax.set_yticks([])\n", " ax.set_title(y)\n", " plt.tight_layout()\n", " return fig" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 120 }, "id": "LRhu6GbVq-1M", "outputId": "f2d91ae4-d1d4-411c-c477-4605f85e1f47" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} } ], "source": [ "plot_example(torch.stack(X_example), y_example, n=5);" ] }, { "cell_type": "markdown", "metadata": { "id": "r5OH322Rq-1N" }, "source": [ "### Preparing a validation split\n", "\n", "skorch can split the data for us automatically but since we are using `Dataset`s for their lazy-loading property there is no way skorch can do a stratified split automatically without exploring the data completely first (which it doesn't). \n", "\n", "If we want skorch to do a validation split for us we need to retrieve the `y` values from the dataset and pass these values to `net.fit` later on:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "PZ2clJzBq-1N" }, "outputs": [], "source": [ "y_train = np.array([y for x, y in iter(mnist_train)])" ] }, { "cell_type": "markdown", "metadata": { "id": "dQhh6yaAq-1O" }, "source": [ "## Build Neural Network with PyTorch\n", "\n", "Simple, fully connected neural network with one hidden layer. Input layer has 784 dimensions (28x28), hidden layer has 98 (= 784 / 8) and output layer 10 neurons, representing digits 0 - 9." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "tojAhRAoq-1P" }, "outputs": [], "source": [ "from torch import nn\n", "import torch.nn.functional as F" ] }, { "cell_type": "markdown", "metadata": { "id": "YHfKnJz_q-1P" }, "source": [ "A simple neural network classifier with linear layers and a final softmax in PyTorch:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "I6veVp04q-1Q" }, "outputs": [], "source": [ "class ClassifierModule(nn.Module):\n", " def __init__(\n", " self,\n", " input_dim=MNIST_FLAT_DIM,\n", " hidden_dim=98,\n", " output_dim=10,\n", " dropout=0.5,\n", " ):\n", " super(ClassifierModule, self).__init__()\n", " self.dropout = nn.Dropout(dropout)\n", "\n", " self.hidden = nn.Linear(input_dim, hidden_dim)\n", " self.output = nn.Linear(hidden_dim, output_dim)\n", "\n", " def forward(self, X, **kwargs):\n", " X = X.reshape(-1, self.hidden.in_features)\n", " X = F.relu(self.hidden(X))\n", " X = self.dropout(X)\n", " X = F.softmax(self.output(X), dim=-1)\n", " return X" ] }, { "cell_type": "markdown", "metadata": { "id": "rty5y1hDq-1Q" }, "source": [ "skorch allows to use PyTorch with an sklearn API. We will train the classifier using the classic sklearn `.fit()`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "KkjJEufpq-1R" }, "outputs": [], "source": [ "from skorch import NeuralNetClassifier\n", "from skorch.dataset import CVSplit" ] }, { "cell_type": "markdown", "metadata": { "id": "4QtvtZMKq-1S" }, "source": [ "We might also add tensorboard logging. For that, skorch offers the `TensorBoard` callback, which automatically logs useful information to tensorboard\n", "\n", "**Note**: Using tensorboard requires installing the following Python packages: `tensorboard, future, pillow`\n", "\n", "After this, to start tensorboard, run:\n", "\n", "`$ tensorboard --logdir runs`\n", "\n", "in the directory you are running this notebook in." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "KeRIksQHq-1S" }, "outputs": [], "source": [ "callbacks = []\n", "if USE_TENSORBOARD:\n", " from torch.utils.tensorboard import SummaryWriter\n", " from skorch.callbacks import TensorBoard\n", " writer = SummaryWriter()\n", " callbacks.append(TensorBoard(writer))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "zSZKfehwq-1T" }, "outputs": [], "source": [ "torch.manual_seed(0)\n", "\n", "net = NeuralNetClassifier(\n", " ClassifierModule,\n", " max_epochs=10,\n", " iterator_train__num_workers=2,\n", " iterator_valid__num_workers=2,\n", " lr=0.1,\n", " device=DEVICE,\n", " callbacks=callbacks,\n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vNr_KEP3q-1T", "outputId": "a56a5865-66ae-4d0b-fbff-267d305b1466" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ -------\n", " 1 \u001b[36m0.7973\u001b[0m \u001b[32m0.8988\u001b[0m \u001b[35m0.3627\u001b[0m 13.3156\n", " 2 \u001b[36m0.4256\u001b[0m \u001b[32m0.9191\u001b[0m \u001b[35m0.2840\u001b[0m 7.8078\n", " 3 \u001b[36m0.3589\u001b[0m \u001b[32m0.9297\u001b[0m \u001b[35m0.2434\u001b[0m 6.6143\n", " 4 \u001b[36m0.3169\u001b[0m \u001b[32m0.9374\u001b[0m \u001b[35m0.2165\u001b[0m 6.6950\n", " 5 \u001b[36m0.2919\u001b[0m \u001b[32m0.9405\u001b[0m \u001b[35m0.2016\u001b[0m 6.6883\n", " 6 \u001b[36m0.2680\u001b[0m \u001b[32m0.9474\u001b[0m \u001b[35m0.1831\u001b[0m 6.5840\n", " 7 \u001b[36m0.2542\u001b[0m \u001b[32m0.9496\u001b[0m \u001b[35m0.1718\u001b[0m 6.6463\n", " 8 \u001b[36m0.2432\u001b[0m \u001b[32m0.9522\u001b[0m \u001b[35m0.1629\u001b[0m 6.6640\n", " 9 \u001b[36m0.2291\u001b[0m \u001b[32m0.9548\u001b[0m \u001b[35m0.1536\u001b[0m 6.6585\n", " 10 \u001b[36m0.2232\u001b[0m \u001b[32m0.9563\u001b[0m \u001b[35m0.1479\u001b[0m 6.6471\n" ] } ], "source": [ "net.fit(mnist_train, y=y_train);" ] }, { "cell_type": "markdown", "metadata": { "id": "efHqyfCLq-1U" }, "source": [ "## Prediction" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "63f18IAZq-1U" }, "outputs": [], "source": [ "from sklearn.metrics import accuracy_score" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "id": "LONeNZs8q-1U" }, "outputs": [], "source": [ "y_pred = net.predict(mnist_test)\n", "y_test = np.array([y for x, y in iter(mnist_test)])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "l1iY1uS8q-1V", "outputId": "7629d989-0013-4987-ef75-6f3c66d7a6e1" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.9578" ] }, "metadata": {}, "execution_count": 19 } ], "source": [ "accuracy_score(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "K9V7S_Urq-1W" }, "source": [ "An accuracy of about 96% for a network with only one hidden layer is not too bad.\n", "\n", "Let's take a look at some predictions that went wrong.\n", "\n", "We compute the index of elements that are misclassified and plot a few of those to get an idea\n", "of what went wrong." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "7YuhO4G_q-1X" }, "outputs": [], "source": [ "error_mask = y_pred != y_test" ] }, { "cell_type": "markdown", "metadata": { "id": "2zJqiCP9q-1Y" }, "source": [ "Now that we have the mask we need a way to access the images from the `mnist_test` dataset. Luckily, skorch provides a helper class that lets us slice arbitrary `Dataset` objects, `SlicedDataset`:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "EhXWuo_3q-1Y" }, "outputs": [], "source": [ "from skorch.helper import SliceDataset" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "F8rGoDl-q-1Z" }, "outputs": [], "source": [ "mnist_test_sliceable = SliceDataset(mnist_test)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "U_dWUitgq-1a" }, "outputs": [], "source": [ "X_pred = torch.stack(list(mnist_test_sliceable[error_mask]))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 120 }, "id": "vGW6BB6Gq-1a", "outputId": "30ac782a-1ad2-4550-e1ca-aa0e638429a4" }, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} } ], "source": [ "plot_example(X_pred[:5], y_pred[error_mask][:5]);" ] }, { "cell_type": "markdown", "metadata": { "id": "gzUcw21Fq-1b" }, "source": [ "If tensorboard was enabled, here is how the metrics could look like:" ] }, { "cell_type": "markdown", "metadata": { "id": "uOh1oAq7q-1c" }, "source": [ "![tensorboard scalars](https://github.com/sawradip/skorch/blob/master/assets/tensorboard_scalars.png?raw=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "YXQnnf78q-1c" }, "source": [ "# Convolutional Network\n", "\n", "Next we want to turn it up a notch and use a convolutional neural network which is far better\n", "suited for images than simple densely connected layers.\n", "\n", "PyTorch expects a 4 dimensional tensor as input for its 2D convolution layer. The dimensions represent:\n", "\n", "* Batch size\n", "* Number of channels\n", "* Height\n", "* Width\n", "\n", "MNIST data only has one channel since there is no color information. As stated above, each MNIST vector represents a 28x28 pixel image. Hence, the resulting shape for the input tensor needs to be `(x, 1, 28, 28)` where `x` is the batch size and automatically provided by the data loader.\n", "\n", "Luckily, our data is already formated that way:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZKiPtuCsq-1d", "outputId": "1dc788f0-9414-450a-b2c5-dae079e6bb34" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "torch.Size([1, 28, 28])" ] }, "metadata": {}, "execution_count": 25 } ], "source": [ "X_example[0].shape" ] }, { "cell_type": "markdown", "metadata": { "id": "KyGRbkU7q-1d" }, "source": [ "Now let us define the convolutional neural network module using PyTorch:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "id": "KyDI-ZoPq-1e" }, "outputs": [], "source": [ "class Cnn(nn.Module):\n", " def __init__(self, dropout=0.5):\n", " super(Cnn, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 32, kernel_size=3)\n", " self.conv2 = nn.Conv2d(32, 64, kernel_size=3)\n", " self.conv2_drop = nn.Dropout2d(p=dropout)\n", " self.fc1 = nn.Linear(1600, 100) # 1600 = number channels * width * height\n", " self.fc2 = nn.Linear(100, 10)\n", " self.fc1_drop = nn.Dropout(p=dropout)\n", "\n", " def forward(self, x):\n", " x = torch.relu(F.max_pool2d(self.conv1(x), 2))\n", " x = torch.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n", " \n", " # flatten over channel, height and width = 1600\n", " x = x.view(-1, x.size(1) * x.size(2) * x.size(3))\n", " \n", " x = torch.relu(self.fc1_drop(self.fc1(x)))\n", " x = torch.softmax(self.fc2(x), dim=-1)\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "id": "s6aWg-11q-1e" }, "source": [ "We also want to extend tensorboard logging by two more features:\n", "\n", "1. Add the predictions for the misclassified images to tensorboard.\n", " \n", " To do this, we subclass the `TensorBoard` callback and call `self.writer.add_figure` with our produced images. When subclassing, don't forget to call `super()` or the other logged metrics won't show.\n", "\n", "\n", "2. Add a graph of the module\n", " \n", " To do this, we use the summary writer's ability to add a traced graph of our module to tensorboard by calling `add_graph`. We also make sure to only call this on the very first batch by inspecting the `self.first_batch_` attribute on `TensorBoard`." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "nEjtVJXgq-1f" }, "outputs": [], "source": [ "callbacks = []\n", "if USE_TENSORBOARD:\n", " from torch.utils.tensorboard import SummaryWriter\n", " from skorch.callbacks import TensorBoard\n", " writer = SummaryWriter()\n", "\n", " class MyTensorBoard(TensorBoard):\n", " def __init__(self, *args, X, **kwargs):\n", " self.X = X\n", " super().__init__(*args, **kwargs)\n", "\n", " def add_graph(self, module, X):\n", " \"\"\"\"Add a graph to tensorboard\n", "\n", " This requires to run the module with a sample from the\n", " dataset.\n", "\n", " \"\"\"\n", " self.writer.add_graph(module, X.to(DEVICE))\n", "\n", " def on_batch_begin(self, net, batch, **kwargs):\n", " X, y = batch\n", " if self.first_batch_:\n", " # only add graph on very first batch\n", " self.add_graph(net.module_, X)\n", " \n", " def add_figure(self, net):\n", " # show how difficult images were classified\n", " epoch = net.history[-1, 'epoch']\n", " y_pred = net.predict(self.X)\n", " fig = plot_example(self.X, y_pred)\n", " self.writer.add_figure('difficult images', fig, global_step=epoch)\n", "\n", " def on_epoch_end(self, net, **kwargs):\n", " self.add_figure(net)\n", " super().on_epoch_end(net, **kwargs) # call super last\n", "\n", " X_difficult = torch.stack(list(mnist_test_sliceable[error_mask][:15]))\n", " callbacks.append(MyTensorBoard(writer, X=X_difficult))" ] }, { "cell_type": "markdown", "metadata": { "id": "v9ISDtlLq-1f" }, "source": [ "As before we can wrap skorch's `NeuralNetClassifier` around our module and start training it like every other sklearn model using `.fit`:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "id": "xpzP_ct6q-1g" }, "outputs": [], "source": [ "torch.manual_seed(0)\n", "\n", "cnn = NeuralNetClassifier(\n", " Cnn,\n", " max_epochs=10,\n", " lr=0.0002,\n", " optimizer=torch.optim.Adam,\n", " device=DEVICE,\n", " iterator_train__num_workers=2,\n", " iterator_valid__num_workers=2,\n", " callbacks=callbacks,\n", ")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VeT1tzEHq-1g", "outputId": "ad4806ae-00d6-4fbd-fa33-f33251ea5bd0" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ " epoch train_loss valid_acc valid_loss dur\n", "------- ------------ ----------- ------------ ------\n", " 1 \u001b[36m0.9490\u001b[0m \u001b[32m0.9257\u001b[0m \u001b[35m0.2535\u001b[0m 9.3405\n", " 2 \u001b[36m0.3174\u001b[0m \u001b[32m0.9561\u001b[0m \u001b[35m0.1501\u001b[0m 7.4618\n", " 3 \u001b[36m0.2211\u001b[0m \u001b[32m0.9656\u001b[0m \u001b[35m0.1177\u001b[0m 7.4806\n", " 4 \u001b[36m0.1819\u001b[0m \u001b[32m0.9712\u001b[0m \u001b[35m0.1002\u001b[0m 7.4573\n", " 5 \u001b[36m0.1578\u001b[0m \u001b[32m0.9735\u001b[0m \u001b[35m0.0880\u001b[0m 8.0541\n", " 6 \u001b[36m0.1435\u001b[0m \u001b[32m0.9768\u001b[0m \u001b[35m0.0775\u001b[0m 8.4212\n", " 7 \u001b[36m0.1316\u001b[0m \u001b[32m0.9781\u001b[0m \u001b[35m0.0728\u001b[0m 7.4816\n", " 8 \u001b[36m0.1197\u001b[0m \u001b[32m0.9795\u001b[0m \u001b[35m0.0690\u001b[0m 9.6584\n", " 9 \u001b[36m0.1100\u001b[0m \u001b[32m0.9808\u001b[0m \u001b[35m0.0651\u001b[0m 7.4606\n", " 10 \u001b[36m0.1051\u001b[0m \u001b[32m0.9822\u001b[0m \u001b[35m0.0620\u001b[0m 7.4750\n" ] } ], "source": [ "cnn.fit(mnist_train, y=y_train);" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "ENbErtU7q-1h" }, "outputs": [], "source": [ "y_pred_cnn = cnn.predict(mnist_test)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wAWCV2ycq-1i", "outputId": "f75bafc9-2a77-4adc-a848-ebe298fec6b0" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.9848" ] }, "metadata": {}, "execution_count": 31 } ], "source": [ "accuracy_score(y_test, y_pred_cnn)" ] }, { "cell_type": "markdown", "metadata": { "id": "sWsPxuspq-1j" }, "source": [ "An accuracy of >98% should suffice for this example!\n", "\n", "Let's see how we fare on the examples that went wrong before:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Hgj840CSq-1j", "outputId": "273eda81-5330-4f11-b342-2d6d3486defd" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "0.7132701421800948" ] }, "metadata": {}, "execution_count": 32 } ], "source": [ "accuracy_score(y_test[error_mask], y_pred_cnn[error_mask])" ] }, { "cell_type": "markdown", "metadata": { "id": "_AGqI1qtq-1k" }, "source": [ "Great success! The majority of the previously misclassified images are now correctly identified." ] }, { "cell_type": "markdown", "metadata": { "id": "wFxdFxLpq-1l" }, "source": [ "On tensorboard, in the \"IMAGES\" section, we can see how well the CNN classified the difficult images, and how that changed over the epochs:" ] }, { "cell_type": "markdown", "metadata": { "id": "fLGQyk8Yq-1l" }, "source": [ "\"tensorboard" ] }, { "cell_type": "markdown", "metadata": { "id": "bapzRW6mq-1m" }, "source": [ "In the \"GRAPHS\" section, we can see the graph of our module." ] }, { "cell_type": "markdown", "metadata": { "id": "-Kte4hXwq-1n" }, "source": [ "\"tensorboard" ] }, { "cell_type": "markdown", "metadata": { "id": "Av8L8UiKq-1o" }, "source": [ "# Grid searching parameter configurations\n", "\n", "Finally we want to show an example of how to use sklearn grid search when using torch `Dataset` instances.\n", "\n", "When doing k-fold validation grid search we have the same problem as before that sklearn is only able to do (stratified) splits when the data is sliceable. While skorch knows how to deal with PyTorch `Dataset` objects and only needs `y` to be known beforehand, sklearn doesn't know how to deal with `Dataset`s and needs a wrapper that makes them sliceable.\n", "\n", "Fortunately, we already know that skorch provides such a helper: `SliceDataset`.\n", "\n", "What is left to do is to define our parameter search space and run the grid search with a sliceable instance of `mnist_train`:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "id": "eBgNhs3Dq-1o" }, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "k0a5yr7cq-1p", "outputId": "53c240b7-79b2-4cf1-981d-411985f50029" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[initialized](\n", " module_=Cnn(\n", " (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))\n", " (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n", " (conv2_drop): Dropout2d(p=0.5, inplace=False)\n", " (fc1): Linear(in_features=1600, out_features=100, bias=True)\n", " (fc2): Linear(in_features=100, out_features=10, bias=True)\n", " (fc1_drop): Dropout(p=0.5, inplace=False)\n", " ),\n", ")" ] }, "metadata": {}, "execution_count": 34 } ], "source": [ "cnn.set_params(max_epochs=2, verbose=False, train_split=False, callbacks=[])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "id": "8mopSVXPq-1q" }, "outputs": [], "source": [ "params = {\n", " 'module__dropout': [0, 0.5, 0.8],\n", "}" ] }, { "cell_type": "markdown", "metadata": { "id": "hvrV6wjdq-1q" }, "source": [ "The parameter we are interested in here is the dropout rate. We want to see which of the values (no dropout, 50%, 80%) is the best choice for our network.\n", "\n", "Additionally:\n", "\n", "- We use only two epochs (`max_epochs=2`) for each `.fit` (only to reduce execution time, normally we wouldn't change this and possibly add an `EarlyStopping` callback).\n", "- Disable the network print output (`verbose=False`)\n", "- Disable the internal train/validation split (`train_split=False`) since the grid search will do k-fold validation anyway\n", "- Turn off tensorboard logging (`callbacks=[]`)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "id": "LuZ0Ug68q-1r" }, "outputs": [], "source": [ "cnn.initialize();" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "XgLmSp3sq-1r" }, "outputs": [], "source": [ "gs = GridSearchCV(cnn, param_grid=params, scoring='accuracy', verbose=1, cv=3)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "id": "yBAcSDdpq-1r" }, "outputs": [], "source": [ "mnist_train_sliceable = SliceDataset(mnist_train)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "i9hcSRgkq-1s", "outputId": "63b1c573-49d4-423a-e465-3c0d2715f42c", "scrolled": false }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Fitting 3 folds for each of 3 candidates, totalling 9 fits\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "GridSearchCV(cv=3,\n", " estimator=[initialized](\n", " module_=Cnn(\n", " (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))\n", " (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n", " (conv2_drop): Dropout2d(p=0.5, inplace=False)\n", " (fc1): Linear(in_features=1600, out_features=100, bias=True)\n", " (fc2): Linear(in_features=100, out_features=10, bias=True)\n", " (fc1_drop): Dropout(p=0.5, inplace=False)\n", " ),\n", "),\n", " param_grid={'module__dropout': [0, 0.5, 0.8]}, scoring='accuracy',\n", " verbose=1)" ] }, "metadata": {}, "execution_count": 39 } ], "source": [ "gs.fit(mnist_train_sliceable, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "ZUI3B1t-q-1s" }, "source": [ "After running the grid search we now know the best configuration in our search space:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YlKmRJoYq-1t", "outputId": "765c7546-788f-4626-8a91-63f399e2d51b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'module__dropout': 0}" ] }, "metadata": {}, "execution_count": 40 } ], "source": [ "gs.best_params_" ] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.13 (default, Mar 28 2022, 08:03:21) [MSC v.1916 64 bit (AMD64)]" }, "vscode": { "interpreter": { "hash": "bd97b8bffa4d3737e84826bc3d37be3046061822757ce35137ab82ad4c5a2016" } } }, "nbformat": 4, "nbformat_minor": 0 }