{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "waK78bpJHqYr"
   },
   "source": [
    "**Name:** \\_\\_\\_\\_\\_\n",
    "\n",
    "**EID:** \\_\\_\\_\\_\\_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KzrZ4qYfHqYt"
   },
   "source": [
    "# Tutorial 7: Build a Fashion-MNIST CNN in PyTorch\n",
    "\n",
    "In this tutorial, we will use [PyTorch](https://pytorch.org/) to build a convolutional neural network (CNN) for Fashion-MNIST classification. This tutorial is by courtesy of Michael Li."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MnxcsjlrHqYt"
   },
   "source": [
    "## Installing PyTorch\n",
    "PyTorch is a Python-based scientific computing package target to use the power of GPUs and to provide maximum flexibility and speed. Thus, we default that you have installed Python (version > 3.0) with [Anaconda](https://www.anaconda.com/). Here, we only introduce how to install PyTorch with CPU.\n",
    "\n",
    "### Windows\n",
    "It's straightforward to install it in Windows. You can choose to use a virtual environment or install it directly with root access. Type this command in the terminal\n",
    "\n",
    "> pip3 install --upgrade torch==1.3.0+cpu torchvision==0.4.1+cpu -f https://download.pytorch.org/whl/torch_stable.html\n",
    "\n",
    "### Mac\n",
    "Type this command in the terminal\n",
    ">  pip3 install --upgrade torch torchvison"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n7rXj6oTHqYt"
   },
   "source": [
    "## Import\n",
    "First, let’s import the necessary modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 10249,
     "status": "ok",
     "timestamp": 1698825573763,
     "user": {
      "displayName": "Kede Ma",
      "userId": "03984196065012193056"
     },
     "user_tz": -480
    },
    "id": "M0fJM0zJHqYu",
    "outputId": "e733db15-46d8-4072-c2ec-aa998baf378c",
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch.autograd.grad_mode.set_grad_enabled at 0x2b1a6cd1c10>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# import standard PyTorch modules\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim\n",
    "from torch.utils.tensorboard import SummaryWriter # TensorBoard support\n",
    "\n",
    "# import torchvision module to handle image manipulation\n",
    "import torchvision\n",
    "import torchvision.transforms as transforms\n",
    "\n",
    "# calculate train time, writing train data to files etc.\n",
    "import time\n",
    "import pandas as pd\n",
    "import json\n",
    "from IPython.display import clear_output\n",
    "\n",
    "torch.set_printoptions(linewidth=120)\n",
    "torch.set_grad_enabled(True)     # On by default, leave it here for clarity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_ON0gicZHqYv"
   },
   "source": [
    "PyTorch modules are quite straight forward."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iBlVbGlMHqYv"
   },
   "source": [
    "### `torch`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VKEvM9T_HqYv",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "`torch` is the main module that holds all the things you need for Tensor computation. You can build a fully functional neural network using Tensor computation alone, but this is not what this tutorial is about. We’ll make use of the more powerful and convenient `torch.nn`, `torch.optim` and `torchvision` classes to quickly build our CNN."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "L-_4DbvTHqYv",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### `torch.nn` and `torch.nn.functional`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oFagqjEqHqYv"
   },
   "source": [
    "The `torch.nn` module provides many classes and functions to build neural networks. You can think of it as the fundamental building blocks of neural networks: models, all kinds of layers, activation functions, parameter classes, etc. It allows us to build the model like putting some LEGO set together."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_HPKBmU6HqYv",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### `torch.optim`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RXMwg1OwHqYw"
   },
   "source": [
    "`torch.optim` offers all the optimizers like SGD, ADAM, etc., so you don’t have to write it from scratch."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BvjQB2kPHqYw"
   },
   "source": [
    "### `torchvision`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Mj8trRRfHqYw"
   },
   "source": [
    "`torchvision` contains a lot of popular datasets, model architectures, and common image transformations for computer vision. We get our Fashion MNIST dataset from it and also use its transforms.\n",
    "\n",
    "`torchvision` already has the Fashion MNIST dataset. If you’re not familiar with Fashion MNIST dataset:\n",
    "> Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of $60,000$ examples and a test set of $10,000$ examples. Each example is a $28\\times28$ grayscale image, associated with a label from $10$ classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original [MNIST dataset](http://yann.lecun.com/exdb/mnist/) for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. — [From Github](https://github.com/zalandoresearch/fashion-mnist)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AMK6RrhEHqYw"
   },
   "source": [
    "![./1.png](1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MRNgig4EHqYw",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 5565,
     "status": "ok",
     "timestamp": 1698825595342,
     "user": {
      "displayName": "Kede Ma",
      "userId": "03984196065012193056"
     },
     "user_tz": -480
    },
    "id": "zAN7m64GHqYw",
    "outputId": "e2c0d22b-bd31-4341-8231-d9af9c59e236"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\\train-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2daff0ee3f564601bf365889fa53da0a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/26421880 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting ./data/FashionMNIST\\FashionMNIST\\raw\\train-images-idx3-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\n",
      "\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\\train-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "eddea013a2bf4f0697a77982d4af4c1d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/29515 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting ./data/FashionMNIST\\FashionMNIST\\raw\\train-labels-idx1-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\n",
      "\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\\t10k-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b5b02f22dd2940c5b6711860eaf6ff01",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/4422102 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting ./data/FashionMNIST\\FashionMNIST\\raw\\t10k-images-idx3-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\n",
      "\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz\n",
      "Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\\t10k-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "71fda8511251401b9c159a01343aa674",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/5148 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting ./data/FashionMNIST\\FashionMNIST\\raw\\t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST\\FashionMNIST\\raw\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Use standard FashionMNIST dataset\n",
    "train_set = torchvision.datasets.FashionMNIST(\n",
    "    root = './data/FashionMNIST',\n",
    "    train = True,\n",
    "    download = True,\n",
    "    transform = transforms.Compose([\n",
    "        transforms.ToTensor()\n",
    "    ])\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "swbI24_JHqYw"
   },
   "source": [
    "This doesn’t need much explanation. We specified the root directory to store the dataset, snatch the training data, allow it to be downloaded if not present at the local machine, and then apply the `transforms.ToTensor` to turn images into Tensor so we can directly use it with our network. The dataset is stored in the dataset class named `train_set`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Vwgs7qFNHqYw"
   },
   "source": [
    "## Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HAZLorFHHqYw",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "The Fashion MNIST is only $28\\times28$ pixels in size, so we actually don’t need a complicated network. We can just build a simple CNN like this:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RVmq89oVHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "![./2.png](2.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UcW1Tt1VHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "We have two convolution layers, each with $5\\times5$ kernels. After each convolution layer, we have a max-pooling layer with a stride of $2$. This allows us to extract the necessary features from the images. Then we flatten the tensors and put them into a dense layer, pass through a Multi-Layer Perceptron (MLP) to carry out the task of classification of our $10$ categories.\n",
    "\n",
    "Now that we are clear about the structure of the network, let’s see how we can use PyTorch to build it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "dWFPG6yBHqYx"
   },
   "outputs": [],
   "source": [
    "## INSERT YOUR CODE HERE\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "BPX1k-mfHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Build the neural network, expand on top of nn.Module\n",
    "class Network(nn.Module):\n",
    "  def __init__(self):\n",
    "    super().__init__()\n",
    "\n",
    "    # define layers\n",
    "    self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)\n",
    "    self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)\n",
    "\n",
    "    self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)\n",
    "    self.fc2 = nn.Linear(in_features=120, out_features=60)\n",
    "    self.out = nn.Linear(in_features=60, out_features=10)\n",
    "\n",
    "  # define forward function\n",
    "  def forward(self, t):\n",
    "    # conv 1\n",
    "    t = self.conv1(t)\n",
    "    t = F.relu(t)\n",
    "    t = F.max_pool2d(t, kernel_size=2, stride=2)\n",
    "\n",
    "    # conv 2\n",
    "    t = self.conv2(t)\n",
    "    t = F.relu(t)\n",
    "    t = F.max_pool2d(t, kernel_size=2, stride=2)\n",
    "\n",
    "    # fc1\n",
    "    t = t.reshape(-1, 12*4*4)\n",
    "    t = self.fc1(t)\n",
    "    t = F.relu(t)\n",
    "\n",
    "    # fc2\n",
    "    t = self.fc2(t)\n",
    "    t = F.relu(t)\n",
    "\n",
    "    # output\n",
    "    t = self.out(t)\n",
    "    # don't need softmax here since we'll use cross-entropy as activation.\n",
    "\n",
    "    return t"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jFCciFihHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "Once the layer is defined, we can then use the layer itself to compute the forward results of each layer, coupled with the activation function (ReLU) and Max Pooling operations, we can easily write the forward function of our network as above. Notice that on fc1 (Fully Connect layer 1), we used PyTorch’s tensor operation `t.reshape` to flatten the tensor so it can be passed to the dense layer afterward. Also, we didn’t add the softmax activation function at the output layer since PyTorch’s **CrossEntropy** function will take care of that for us."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_HQrr9peHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "## Hyperparameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GlyKGqyXHqYx",
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "Normally, we can just handpick one set of hyperparameters and do some experiments with them. In this example, we want to do a bit more by introducing some structuring. We’ll build a system to generate different hyperparameter combinations and use them to carry out training ‘runs’. Each ‘run’ uses one set of hyperparameter combinations. Export the training data/results of each run to Tensor Board so we can directly compare and see which hyperparameters set performs the best.\n",
    "\n",
    "We store all our hyperparameters in an [OrderedDict](https://www.geeksforgeeks.org/ordereddict-in-python/):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "CCObGd48HqYx"
   },
   "outputs": [],
   "source": [
    "# put all hyper params into a OrderedDict, easily expandable\n",
    "from collections import OrderedDict\n",
    "params = OrderedDict(\n",
    "    lr = [.01, .001],\n",
    "    batch_size = [100, 1000],\n",
    "    shuffle = [True, False]\n",
    ")\n",
    "epochs = 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FQGxrK5eHqYx"
   },
   "source": [
    "`lr`: Learning Rate. We want to try 0.01 and 0.001 for our models.\n",
    "\n",
    "`batch_size`: Batch Size to speed up the training process. We’ll use 100 and 1000.\n",
    "\n",
    "`shuffle`: Shuffle toggle, whether we shuffle the batch before training.\n",
    "\n",
    "Once the parameters are down, we use two helper classes: `RunBuilder` and `RunManager` to manage our hyperparameters and training process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cb5XbZcYHqYx"
   },
   "source": [
    "### RunBuilder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qaVoES-FHqYy"
   },
   "source": [
    "The main purpose of the class `RunBuilder` is to offer a static method `get_runs`. It takes the `OrderedDict` (with all hyperparameters stored in it) as a parameter and generates a named tuple `Run`, each element of run represents one possible combination of the hyperparameters. This named tuple is later consumed by the training loop. The code is easy to understand.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "VBx06xs5HqYy"
   },
   "outputs": [],
   "source": [
    "# import modules to build RunBuilder and RunManager helper classes\n",
    "from collections import namedtuple\n",
    "from itertools import product\n",
    "\n",
    "# Read in the hyper-parameters and return a Run namedtuple containing all the\n",
    "# combinations of hyper-parameters\n",
    "class RunBuilder():\n",
    "  @staticmethod\n",
    "  def get_runs(params):\n",
    "\n",
    "    Run = namedtuple('Run', params.keys())\n",
    "\n",
    "    runs = []\n",
    "    for v in product(*params.values()):\n",
    "      runs.append(Run(*v))\n",
    "\n",
    "    return runs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3aiQ7cawHqYy"
   },
   "source": [
    "### RunManager"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfV_IA8iHqYy"
   },
   "source": [
    "There are four main purposes of the RunManager class.\n",
    "\n",
    "* Calculate and record the duration of each epoch and run.\n",
    "* Calculate the training loss and accuracy of each epoch and run.\n",
    "* Record the training data (e.g. loss, accuracy, weights, gradients, computational graph, etc.) for each epoch and run, then export them into Tensor Board for further analysis.\n",
    "* Save all training results in csv and json for future reference or API extraction.\n",
    "\n",
    "As you can see, it helps us take care of the logistics which is also important for our success in training the model. Let’s look at the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "E8GkTbMGHqYy"
   },
   "outputs": [],
   "source": [
    "# Helper class, help track loss, accuracy, epoch time, run time,\n",
    "# hyper-parameters etc. Also record to TensorBoard and write into csv, json\n",
    "class RunManager():\n",
    "  def __init__(self):\n",
    "\n",
    "    # tracking every epoch count, loss, accuracy, time\n",
    "    self.epoch_count = 0\n",
    "    self.epoch_loss = 0\n",
    "    self.epoch_num_correct = 0\n",
    "    self.epoch_start_time = None\n",
    "\n",
    "    # tracking every run count, run data, hyper-params used, time\n",
    "    self.run_params = None\n",
    "    self.run_count = 0\n",
    "    self.run_data = []\n",
    "    self.run_start_time = None\n",
    "\n",
    "    # record model, loader and TensorBoard\n",
    "    self.network = None\n",
    "    self.loader = None\n",
    "    self.tb = None\n",
    "\n",
    "  # record the count, hyper-param, model, loader of each run\n",
    "  # record sample images and network graph to TensorBoard\n",
    "  def begin_run(self, run, network, loader):\n",
    "\n",
    "    self.run_start_time = time.time()\n",
    "\n",
    "    self.run_params = run\n",
    "    self.run_count += 1\n",
    "\n",
    "    self.network = network\n",
    "    self.loader = loader\n",
    "    self.tb = SummaryWriter(comment=f'-{run}')\n",
    "\n",
    "    images, labels = next(iter(self.loader))\n",
    "    grid = torchvision.utils.make_grid(images)\n",
    "\n",
    "    self.tb.add_image('images', grid)\n",
    "    self.tb.add_graph(self.network, images)\n",
    "\n",
    "  # when run ends, close TensorBoard, zero epoch count\n",
    "  def end_run(self):\n",
    "    self.tb.close()\n",
    "    self.epoch_count = 0\n",
    "\n",
    "  # zero epoch count, loss, accuracy,\n",
    "  def begin_epoch(self):\n",
    "    self.epoch_start_time = time.time()\n",
    "\n",
    "    self.epoch_count += 1\n",
    "    self.epoch_loss = 0\n",
    "    self.epoch_num_correct = 0\n",
    "\n",
    "  #\n",
    "  def end_epoch(self):\n",
    "    # calculate epoch duration and run duration(accumulate)\n",
    "    epoch_duration = time.time() - self.epoch_start_time\n",
    "    run_duration = time.time() - self.run_start_time\n",
    "\n",
    "    # record epoch loss and accuracy\n",
    "    loss = self.epoch_loss / len(self.loader.dataset)\n",
    "    accuracy = self.epoch_num_correct / len(self.loader.dataset)\n",
    "\n",
    "    # Record epoch loss and accuracy to TensorBoard\n",
    "    self.tb.add_scalar('Loss', loss, self.epoch_count)\n",
    "    self.tb.add_scalar('Accuracy', accuracy, self.epoch_count)\n",
    "\n",
    "    # Record params to TensorBoard\n",
    "    for name, param in self.network.named_parameters():\n",
    "      self.tb.add_histogram(name, param, self.epoch_count)\n",
    "      self.tb.add_histogram(f'{name}.grad', param.grad, self.epoch_count)\n",
    "\n",
    "    # Write into 'results' (OrderedDict) for all run related data\n",
    "    results = OrderedDict()\n",
    "    results[\"run\"] = self.run_count\n",
    "    results[\"epoch\"] = self.epoch_count\n",
    "    results[\"loss\"] = loss\n",
    "    results[\"accuracy\"] = accuracy\n",
    "    results[\"epoch duration\"] = epoch_duration\n",
    "    results[\"run duration\"] = run_duration\n",
    "\n",
    "    # Record hyper-params into 'results'\n",
    "    for k,v in self.run_params._asdict().items(): results[k] = v\n",
    "    self.run_data.append(results)\n",
    "    df = pd.DataFrame.from_dict(self.run_data, orient = 'columns')\n",
    "\n",
    "    # display epoch information and show progress\n",
    "    clear_output(wait=True)\n",
    "    display(df)\n",
    "\n",
    "  # accumulate loss of batch into entire epoch loss\n",
    "  def track_loss(self, loss):\n",
    "    # multiply batch size so variety of batch sizes can be compared\n",
    "    self.epoch_loss += loss.item() * self.loader.batch_size\n",
    "\n",
    "  # accumulate number of corrects of batch into entire epoch num_correct\n",
    "  def track_num_correct(self, preds, labels):\n",
    "    self.epoch_num_correct += self._get_num_correct(preds, labels)\n",
    "\n",
    "  @torch.no_grad()\n",
    "  def _get_num_correct(self, preds, labels):\n",
    "    return preds.argmax(dim=1).eq(labels).sum().item()\n",
    "\n",
    "  # save end results of all runs into csv, json for further analysis\n",
    "  def save(self, fileName):\n",
    "\n",
    "    pd.DataFrame.from_dict(\n",
    "        self.run_data,\n",
    "        orient = 'columns',\n",
    "    ).to_csv(f'{fileName}.csv')\n",
    "\n",
    "    with open(f'{fileName}.json', 'w', encoding='utf-8') as f:\n",
    "      json.dump(self.run_data, f, ensure_ascii=False, indent=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "FGqHRtmnHqYy"
   },
   "source": [
    "**`__init__`**: Initialize necessary attributes like count, loss, number of correct predictions, start time, etc.\n",
    "\n",
    "**`begin_run`**: Record run start time so when a run is finished, the duration of the run can be calculated. Create a SummaryWriter object to store everything we want to export into Tensor Board during the run. Write the network graph and sample images into the `SummaryWriter` object.\n",
    "\n",
    "**`end_run`**: When run is finished, close the `SummaryWriter` object and reset the epoch count to 0 (getting ready for next run).\n",
    "\n",
    "**`begin_epoch`**: Record epoch start time so epoch duration can be calculated when epoch ends. Reset epoch_loss and epoch_num_correct.\n",
    "\n",
    "**`end_epoch`**: This function is where most things happen. When an epoch ends, we’ll calculate the epoch duration and the run duration(up to this epoch, not the final run duration unless for the last epoch of the run). We’ll calculate the total loss and accuracy for this epoch, then export the loss, accuracy, weights/biases, gradients we recorded into Tensor Board. For ease of tracking within the Jupyter Notebook, we also created an OrderedDict object results and put all our run data(loss, accuracy, run count, epoch count, run duration, epoch duration, all hyperparameters) into it. Then we’ll use Pandas to read it in and display it in a neat table format.\n",
    "\n",
    "**`track_loss`**, **`track_num_correct`**, **`_get_num_correct`**: These are utility functions to accumulate the loss, number of correct predictions of each batch so the epoch loss and accuracy can be calculated later.\n",
    "\n",
    "**`save`**: Save all run data (a list of results OrderedDict objects for all runs) into csv and json format for further analysis or API access.\n",
    "\n",
    "There is a lot to take in for this `RunManager` class. Congrats on coming to this far! The hardest part is already behind you. From now on everything will start to come together and make sense."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3aiZcoOXHqYy"
   },
   "source": [
    "### Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sZ-9zKzGHqYz"
   },
   "source": [
    "Finally, we are ready to do some training! With the help of our `RunBuilder` and `RunManager` classes, the training process is a breeze:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "C34G2LBbHqYz"
   },
   "outputs": [],
   "source": [
    "## INSERT YOUR CODE HERE\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 802
    },
    "executionInfo": {
     "elapsed": 457018,
     "status": "ok",
     "timestamp": 1698826151097,
     "user": {
      "displayName": "Kede Ma",
      "userId": "03984196065012193056"
     },
     "user_tz": -480
    },
    "id": "RrtJLHG3HqY5",
    "outputId": "8efef6bb-9aeb-4640-cdac-fa65696ecb7b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>run</th>\n",
       "      <th>epoch</th>\n",
       "      <th>loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>epoch duration</th>\n",
       "      <th>run duration</th>\n",
       "      <th>lr</th>\n",
       "      <th>batch_size</th>\n",
       "      <th>shuffle</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.604385</td>\n",
       "      <td>0.769683</td>\n",
       "      <td>9.439022</td>\n",
       "      <td>12.665576</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0.416335</td>\n",
       "      <td>0.845150</td>\n",
       "      <td>9.945075</td>\n",
       "      <td>22.667464</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0.375695</td>\n",
       "      <td>0.858867</td>\n",
       "      <td>9.718419</td>\n",
       "      <td>32.441826</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0.553525</td>\n",
       "      <td>0.793367</td>\n",
       "      <td>10.073782</td>\n",
       "      <td>10.180495</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.383937</td>\n",
       "      <td>0.856150</td>\n",
       "      <td>9.965945</td>\n",
       "      <td>20.210270</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0.355384</td>\n",
       "      <td>0.867350</td>\n",
       "      <td>10.178794</td>\n",
       "      <td>30.451102</td>\n",
       "      <td>0.010</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0.966071</td>\n",
       "      <td>0.632417</td>\n",
       "      <td>9.016396</td>\n",
       "      <td>9.595426</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.504912</td>\n",
       "      <td>0.808150</td>\n",
       "      <td>8.770962</td>\n",
       "      <td>18.430220</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0.413616</td>\n",
       "      <td>0.846700</td>\n",
       "      <td>8.654358</td>\n",
       "      <td>27.143420</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0.988811</td>\n",
       "      <td>0.621150</td>\n",
       "      <td>8.913599</td>\n",
       "      <td>9.486570</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>0.494245</td>\n",
       "      <td>0.811683</td>\n",
       "      <td>8.798919</td>\n",
       "      <td>18.357534</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>0.411937</td>\n",
       "      <td>0.849200</td>\n",
       "      <td>8.633890</td>\n",
       "      <td>27.049403</td>\n",
       "      <td>0.010</td>\n",
       "      <td>1000</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0.777393</td>\n",
       "      <td>0.701650</td>\n",
       "      <td>9.726935</td>\n",
       "      <td>9.820893</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>0.494440</td>\n",
       "      <td>0.814150</td>\n",
       "      <td>9.838894</td>\n",
       "      <td>19.717787</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>0.428938</td>\n",
       "      <td>0.841617</td>\n",
       "      <td>11.178187</td>\n",
       "      <td>30.959978</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0.774533</td>\n",
       "      <td>0.704033</td>\n",
       "      <td>10.195840</td>\n",
       "      <td>10.304769</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>0.505617</td>\n",
       "      <td>0.809717</td>\n",
       "      <td>9.790914</td>\n",
       "      <td>20.157723</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>0.439204</td>\n",
       "      <td>0.839917</td>\n",
       "      <td>9.723019</td>\n",
       "      <td>29.938666</td>\n",
       "      <td>0.001</td>\n",
       "      <td>100</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>1.484513</td>\n",
       "      <td>0.514383</td>\n",
       "      <td>9.324070</td>\n",
       "      <td>9.879909</td>\n",
       "      <td>0.001</td>\n",
       "      <td>1000</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    run  epoch      loss  accuracy  epoch duration  run duration     lr  \\\n",
       "0     1      1  0.604385  0.769683        9.439022     12.665576  0.010   \n",
       "1     1      2  0.416335  0.845150        9.945075     22.667464  0.010   \n",
       "2     1      3  0.375695  0.858867        9.718419     32.441826  0.010   \n",
       "3     2      1  0.553525  0.793367       10.073782     10.180495  0.010   \n",
       "4     2      2  0.383937  0.856150        9.965945     20.210270  0.010   \n",
       "5     2      3  0.355384  0.867350       10.178794     30.451102  0.010   \n",
       "6     3      1  0.966071  0.632417        9.016396      9.595426  0.010   \n",
       "7     3      2  0.504912  0.808150        8.770962     18.430220  0.010   \n",
       "8     3      3  0.413616  0.846700        8.654358     27.143420  0.010   \n",
       "9     4      1  0.988811  0.621150        8.913599      9.486570  0.010   \n",
       "10    4      2  0.494245  0.811683        8.798919     18.357534  0.010   \n",
       "11    4      3  0.411937  0.849200        8.633890     27.049403  0.010   \n",
       "12    5      1  0.777393  0.701650        9.726935      9.820893  0.001   \n",
       "13    5      2  0.494440  0.814150        9.838894     19.717787  0.001   \n",
       "14    5      3  0.428938  0.841617       11.178187     30.959978  0.001   \n",
       "15    6      1  0.774533  0.704033       10.195840     10.304769  0.001   \n",
       "16    6      2  0.505617  0.809717        9.790914     20.157723  0.001   \n",
       "17    6      3  0.439204  0.839917        9.723019     29.938666  0.001   \n",
       "18    7      1  1.484513  0.514383        9.324070      9.879909  0.001   \n",
       "\n",
       "    batch_size  shuffle  \n",
       "0          100     True  \n",
       "1          100     True  \n",
       "2          100     True  \n",
       "3          100    False  \n",
       "4          100    False  \n",
       "5          100    False  \n",
       "6         1000     True  \n",
       "7         1000     True  \n",
       "8         1000     True  \n",
       "9         1000    False  \n",
       "10        1000    False  \n",
       "11        1000    False  \n",
       "12         100     True  \n",
       "13         100     True  \n",
       "14         100     True  \n",
       "15         100    False  \n",
       "16         100    False  \n",
       "17         100    False  \n",
       "18        1000     True  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "m = RunManager()\n",
    "\n",
    "# get all runs from params using RunBuilder class\n",
    "for run in RunBuilder.get_runs(params):\n",
    "\n",
    "    # if params changes, following line of code should reflect the changes too\n",
    "    network = Network()\n",
    "    loader = torch.utils.data.DataLoader(train_set, batch_size = run.batch_size)\n",
    "    optimizer = optim.Adam(network.parameters(), lr=run.lr)\n",
    "\n",
    "    m.begin_run(run, network, loader)\n",
    "    for epoch in range(epochs):\n",
    "\n",
    "      m.begin_epoch()\n",
    "      for batch in loader:\n",
    "\n",
    "        images = batch[0]\n",
    "        labels = batch[1]\n",
    "        preds = network(images)\n",
    "        loss = F.cross_entropy(preds, labels)\n",
    "\n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        m.track_loss(loss)\n",
    "        m.track_num_correct(preds, labels)\n",
    "\n",
    "      m.end_epoch()\n",
    "    m.end_run()\n",
    "\n",
    "# when all runs are done, save results to files\n",
    "m.save('results')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gi02QZFLHqY6"
   },
   "source": [
    "The above code is where real training happens. We read in the images and labels from the batch, use network class to do the forward propagation (remember the forward method above?) and get the predictions. With predictions, we can calculate the loss of this batch using `cross_entropy` function. Once the loss is calculated, we reset the gradients (otherwise PyTorch will accumulate the gradients which is not what we want) with `.zero_grad()`, do one back propagation use `loss.backward()` method to calculate all the gradients of the weights/biases. Then, we use the optimizer defined above to update the weights/biases. Now that the network is updated for the current batch, we’ll calculate the loss and number of correct predictions and accumulate/track them using `track_loss` and `track_num_correct` methods of our `RunManager` class.\n",
    "\n",
    "Once all is finished, we’ll save the results in files using `m.save('results')`.\n",
    "\n",
    "The output of the runs in the notebook looks like this:\n",
    "![./3.png](3.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Ly3IdhQJHqY6"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}