{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "$$\n",
    "\\newcommand{\\mat}[1]{\\boldsymbol {#1}}\n",
    "\\newcommand{\\mattr}[1]{\\boldsymbol {#1}^\\top}\n",
    "\\newcommand{\\matinv}[1]{\\boldsymbol {#1}^{-1}}\n",
    "\\newcommand{\\vec}[1]{\\boldsymbol {#1}}\n",
    "\\newcommand{\\vectr}[1]{\\boldsymbol {#1}^\\top}\n",
    "\\newcommand{\\rvar}[1]{\\mathrm {#1}}\n",
    "\\newcommand{\\rvec}[1]{\\boldsymbol{\\mathrm{#1}}}\n",
    "\\newcommand{\\diag}{\\mathop{\\mathrm {diag}}}\n",
    "\\newcommand{\\set}[1]{\\mathbb {#1}}\n",
    "\\newcommand{\\norm}[1]{\\left\\lVert#1\\right\\rVert}\n",
    "\\newcommand{\\pderiv}[2]{\\frac{\\partial #1}{\\partial #2}}\n",
    "\\newcommand{\\bb}[1]{\\boldsymbol{#1}}\n",
    "$$\n",
    "\n",
    "\n",
    "# CS236781: Deep Learning\n",
    "# Tutorial 3: Convolutional Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Introduction\n",
    "\n",
    "In this tutorial, we will cover:\n",
    "\n",
    "- Convolutional layers\n",
    "- Pooling layers\n",
    "- Network architecture\n",
    "- Spatial classification with fully-convolutional nets\n",
    "- Residual nets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:40.304945Z",
     "iopub.status.busy": "2022-03-24T07:23:40.300065Z",
     "iopub.status.idle": "2022-03-24T07:23:41.606026Z",
     "shell.execute_reply": "2022-03-24T07:23:41.605724Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Setup\n",
    "%matplotlib inline\n",
    "import os\n",
    "import sys\n",
    "import torch\n",
    "import torchvision\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:41.608025Z",
     "iopub.status.busy": "2022-03-24T07:23:41.607915Z",
     "iopub.status.idle": "2022-03-24T07:23:41.621298Z",
     "shell.execute_reply": "2022-03-24T07:23:41.621040Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "plt.rcParams['font.size'] = 20\n",
    "data_dir = os.path.expanduser('~/.pytorch-datasets')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Theory Reminders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Multilayer Perceptron (MLP)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Model\n",
    "\n",
    "<center><img src=\"img/mlp.png\" width=1000 /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Composed of multiple **layers**.\n",
    "\n",
    "Each layer $j$ consists of $n_j$ regular perceptrons (\"neurons\") which calculate:\n",
    "$$\n",
    "\\vec{y}_j = \\varphi\\left( \\mat{W}_j \\vec{y}_{j-1} + \\vec{b}_j \\right),~\n",
    "\\mat{W}_j\\in\\set{R}^{n_{j}\\times n_{j-1}},~ \\vec{b}_j\\in\\set{R}^{n_j}.\n",
    "$$\n",
    "\n",
    "- Note that both input and output are **vectors**. We can think of the above equation as describing a layer of **multiple perceptrons**.\n",
    "- We'll henceforth refer to such layers as **fully-connected** or FC layers.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Given an input sample $\\vec{x}^i$, the computed function of an $L$-layer MLP is:\n",
    "$$\n",
    "\\vec{y}_L^i= \\varphi \\left(\n",
    "\\mat{W}_L \\varphi \\left( \\cdots\n",
    "\\varphi \\left( \\mat{W}_1 \\vec{x}^i + \\vec{b}_1 \\right)\n",
    "\\cdots \\right)\n",
    "+ \\vec{b}_L \\right)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Potent hypothesis class**: An MLP with $L>1$, can approximate virtually any continuous function given enough parameters (Cybenko, 1989)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Limitations of MLPs for image classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Number of parameters increases quadratically with image size due to connectivity.\n",
    "- 28x28 MNIST image: 784 weights per neuron in the first layer\n",
    "- 1000x1000x3 color image: 3M weights **per neuron**\n",
    "    \n",
    "<center><img src=\"img/vanilla_dnn_scale.png\" width=\"700\" alt=\"scale\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Not enough compute\n",
    "\n",
    "* Overfitting\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Fully-connected layers are highly sensitivity to translation, while image features are inherently translation-invariant.\n",
    "\n",
    "<center><img src=\"img/cat_translation.png\" width=\"900\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Despite all these limitations we still want to use deep neural nets because they allow us to learn hierarchical,\n",
    "non-linear transformations of the input."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Convolutional Layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We'll explain how convolutional layers work in using three different \"views\", from the most non-formal to the most formal."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Structural view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Just for intuition, a convolutional layer **can be viewed**  as a composition of neurons (as in an FC layer) but with three important distinctions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "1. The neurons can be thought of as stacked in a **3D** grid (insead of 1D).\n",
    "1. Neurons that are at the same depth in the grid **share the same weights** (parameters $\\mat{W},~\\vec{b}$) (represented by color).\n",
    "1. Each neuron is only **connected to a small region** of the previous layer's output (represented by location).\n",
    "\n",
    "\n",
    "   <center><img src=\"img/cnn_layers.jpeg\" width=\"800\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Crucially, each neuron is spatially local, but operates on the **full depth** dimension of its input layer.\n",
    "\n",
    "   <center><img src=\"img/depthcol.jpeg\" width=\"500\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Filter-based view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Since each neuron in a given depth-slice of operates on a small region of the input layer, we can think of the combined **output of that depth-slice** as a **filtered version of the input volume**.\n",
    "\n",
    "<center><img src=\"img/cnn_filters.png\" width=\"900\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Imagine sliding the filter along the input and computing an inner product at each point.\n",
    "\n",
    "<center><img src=\"img/filter_resp.png\" width=\"700\" /></center>\n",
    "\n",
    "Since we have multiple depth-slices per convolutional layer, the layer computes multiple convolutions of the same input with different kernels (filters).\n",
    "\n",
    "Each 2D slice of an input and output volume is known as **feature map** or a **channel**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Formal definitions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Given an input tensor $\\vec{x}$ of shape $(C_{\\text{in}}, H_{\\text{in}}, W_{\\text{in}})$,\n",
    "a convolutional layer produces an output tensor $\\vec{y}$ of shape $(C_{\\text{out}}, H_{\\text{out}}, W_{\\text{out}})$,\n",
    "such that:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "$$\n",
    "\\vec{y}^j = \\sum_{i=1}^{C_\\text{in}} \\vec{w}^{ij}\\ast\\vec{x}^i+b^j;\\ j=1,2,\\dots,C_\\text{out}\n",
    "$$\n",
    "is the $j$-th feature map (or channel) of the output tensor $\\vec{y}$, the $\\ast$ denotes convolution, and $x^i$ is the $i$-th input feature map."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Recall the definition of the convolution operator:\n",
    "$$\n",
    "\\left\\{\\vec{g}\\ast\\vec{f}\\right\\}_j = \\sum_{i} g_{j-i} f_{i}.\n",
    "$$\n",
    "\n",
    "<center><img src=\"img/conv.gif\" width=\"800\" /></center>\n",
    "\n",
    "Note that in practice, correlation is used instead of convolution, as there's no need to \"flip\" a learned filter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Convolution is a **linear** and **shift-equivariant** operator."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Linear means it can be represented simply as a matrix multiplication.\n",
    "\n",
    "Shift-equivariance means that a shifted input will result in an output shifted by the same amount.\n",
    "Due to this property, the matrix representing a convolution is always a **Toeplitz** matrix.\n",
    "\n",
    "<center><img src=\"img/toeplitz.png\" width=\"500\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Hyperparameters & dimentions\n",
    "\n",
    "Assume an input volume of shape $(C_{\\mathrm{in}}, H_{\\mathrm{in}}, W_{\\mathrm{in}})$, i.e. channels, height, width.\n",
    "Define,\n",
    "\n",
    "1. Number of kernels, $K \\geq 1$.\n",
    "2. Spatial extent (size) of each kernel, $F \\geq 1$. \n",
    "3. Stride $S\\geq 1$: spatial distance between consecutive applications of a kernel.\n",
    "4. Padding $P\\geq 0$: Number of \"pixels\" to zero-pad around each input feature map.\n",
    "5. Dilation $D \\geq 1$: Spacing between kernel elements when applying to input."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "In the following animations, **blue** maps are inputs,\n",
    "**green** maps are outputs and\n",
    "the **shaded** area is the kernel with $F=3$.\n",
    "\n",
    "| $P=0,~S=1,~D=1$ | $P=1,~S=1,~D=1$ | $P=1,~S=2,~D=1$ | $P=0,~S=1,~D=2$ |\n",
    "|-----------------|-----------------|-----------------| --------------- |\n",
    "|<img src=\"https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/no_padding_no_strides.gif\" width=\"250\"/>| <img src=\"https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/same_padding_no_strides.gif\" width=\"250\"/> | <img src=\"https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/padding_strides.gif\" width=\"250\"/> | <img src=\"https://raw.githubusercontent.com/vdumoulin/conv_arithmetic/master/gif/dilation.gif\" width=\"250\"/> |\n",
    "\n",
    "\n",
    "We can see that the second combination, $F=3,~P=1,~S=1,~D=1$, leads to identical sizes of input and output feature maps."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "A 3D view\n",
    "\n",
    "| $P=0,~S=1,~D=1$ | $P=1,~S=1,~D=1$ | $P=1,~S=2,~D=1$ |\n",
    "|-----------------|-----------------|-----------------|\n",
    "|<img src=\"https://animatedai.github.io/media/convolution-animation-3x3-kernel.gif\" width=\"400\"/>| <img src=\"https://animatedai.github.io/media/convolution-animation-3x3-kernel-same-padding.gif\" width=\"400\"/> | <img src=\"https://animatedai.github.io/media/convolution-animation-3x3-kernel-stride-2-same-padding.gif\" width=\"400\"/> | \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Then, given a set of hyperparameters,\n",
    "\n",
    "- Each convolution kernel will (usually) be a tensor of shape $(C_{\\mathrm{in}}, F, F)$.\n",
    "- The ouput volume dimensions will be:\n",
    "\n",
    "  $$\\begin{align}\n",
    "  H_{\\mathrm{out}} &= \\left\\lfloor \\frac{H_{\\mathrm{in}} + 2P - D\\cdot(F-1) -1}{S} \\right\\rfloor + 1\\\\\n",
    "  W_{\\mathrm{out}} &= \\left\\lfloor \\frac{W_{\\mathrm{in}} + 2P - D\\cdot(F-1) -1}{S} \\right\\rfloor + 1\\\\\n",
    "  C_{\\mathrm{out}} &= K\\\\\n",
    "  \\end{align}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "- The number of parameters in a convolutional **layer** will be:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "$$\n",
    "\\underbrace{K}_{\\mathrm{kernels}} \\cdot \\left(\n",
    "\\underbrace{C_{\\mathrm{in}} \\cdot F^2}_{\\mathrm{kernel\\ parameters}} + \\underbrace{1}_{\\mathrm{bias\\ term}}\n",
    "\\right)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Example**: Input image is 1000x1000x3, and the first conv layer has $10$ kernels of size 5x5.\n",
    "The number of parameters in the first layer will be: $ 10 \\cdot 3 \\cdot 5^2 + 10 = 760 $.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Pytorch `Conv2d` layer example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:41.624486Z",
     "iopub.status.busy": "2022-03-24T07:23:41.624382Z",
     "iopub.status.idle": "2022-03-24T07:23:42.462133Z",
     "shell.execute_reply": "2022-03-24T07:23:42.461799Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n"
     ]
    }
   ],
   "source": [
    "import torchvision.transforms as tvtf\n",
    "\n",
    "tf = tvtf.Compose([tvtf.ToTensor()])\n",
    "ds_cifar10 = torchvision.datasets.CIFAR10(data_dir, download=True, train=True, transform=tf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.464224Z",
     "iopub.status.busy": "2022-03-24T07:23:42.464117Z",
     "iopub.status.idle": "2022-03-24T07:23:42.479634Z",
     "shell.execute_reply": "2022-03-24T07:23:42.479339Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x0 shape with batch dim: torch.Size([1, 3, 32, 32])\n"
     ]
    }
   ],
   "source": [
    "# Load first CIFAR10 image\n",
    "x0,y0 = ds_cifar10[0]\n",
    "\n",
    "# add batch dimension\n",
    "x0 = x0.unsqueeze(0)\n",
    "\n",
    "# Note: channels come before spatial extent\n",
    "print('x0 shape with batch dim:', x0.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.481747Z",
     "iopub.status.busy": "2022-03-24T07:23:42.481629Z",
     "iopub.status.idle": "2022-03-24T07:23:42.494972Z",
     "shell.execute_reply": "2022-03-24T07:23:42.494679Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# A function to count the number of parameters in an nn.Module.\n",
    "def num_params(layer):\n",
    "    return sum([p.numel() for p in layer.parameters()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Let's create our first conv layer with pytorch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.496857Z",
     "iopub.status.busy": "2022-03-24T07:23:42.496774Z",
     "iopub.status.idle": "2022-03-24T07:23:42.511150Z",
     "shell.execute_reply": "2022-03-24T07:23:42.510875Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv1: 760 parameters\n"
     ]
    }
   ],
   "source": [
    "import torch.nn as nn\n",
    "\n",
    "# First conv layer: works on input image volume\n",
    "conv1 = nn.Conv2d(in_channels=x0.shape[1], out_channels=10, padding=1, kernel_size=5, stride=1,dialation=1)\n",
    "\n",
    "print(f'conv1: {num_params(conv1)} parameters')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Number of parameters: $10\\cdot(3\\cdot3^2+1)=280$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.512891Z",
     "iopub.status.busy": "2022-03-24T07:23:42.512788Z",
     "iopub.status.idle": "2022-03-24T07:23:42.532553Z",
     "shell.execute_reply": "2022-03-24T07:23:42.532226Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Input image shape:       torch.Size([1, 3, 32, 32])\n",
      "After first conv layer:  torch.Size([1, 10, 30, 30])\n"
     ]
    }
   ],
   "source": [
    "# Apply the layer to an input\n",
    "print(f'{\"Input image shape:\":25s}{x0.shape}')\n",
    "\n",
    "y1 = conv1(x0)\n",
    "print(f'{\"After first conv layer:\":25s}{y1.shape}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.534579Z",
     "iopub.status.busy": "2022-03-24T07:23:42.534466Z",
     "iopub.status.idle": "2022-03-24T07:23:42.553679Z",
     "shell.execute_reply": "2022-03-24T07:23:42.553246Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv2: 9820 parameters\n",
      "After second conv layer: torch.Size([1, 20, 12, 12])\n"
     ]
    }
   ],
   "source": [
    "# Second conv layer: works on output volume of first layer\n",
    "conv2 = nn.Conv2d(in_channels=10, out_channels=20, padding=0, kernel_size=7, stride=2)\n",
    "print(f'conv2: {num_params(conv2)} parameters')\n",
    "\n",
    "y2 = conv2(conv1(x0))\n",
    "print(f'{\"After second conv layer:\":25s}{y2.shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "New spatial extent:\n",
    "\n",
    "$$\n",
    "H_{\\mathrm{out}} = \\left\\lfloor \\frac{H_{\\mathrm{in}} + 2P -F}{S} \\right\\rfloor + 1\n",
    "=\n",
    "\\left\\lfloor \\frac{32 + 2\\cdot 0 -6}{2} \\right\\rfloor + 1\n",
    "=\n",
    "14\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Note**: observe that the width and height dimensions of the input image were never specified!\n",
    "more on the significance of that later."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pooling layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "In addition to strides, another way to reduce the size of feature maps between the convolutional layers,\n",
    "is by adding **pooling** layers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "A pooling layer has the following hyperparameters (but **no trainable parameters**):\n",
    "\n",
    "1. Spatial extent (size) of each pooling kernel, $F \\geq 2$. \n",
    "1. Stride $S\\geq 2$: spatial distance between consecutive applications.\n",
    "1. Operation (e.g. max, average, $p$-norm)\n",
    "\n",
    "**Example**: $\\max$-pooling with $F=2,~S=2$ performing a factor-2 downsample:\n",
    "\n",
    "<center><img src=\"img/maxpool.png\" width=\"600\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Why pool feature maps after convolutions?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "One reason is to more rapidly increase the **receptive field** of each layer.\n",
    "\n",
    "<center><img src=\"img/receptive_field2.png\" width=\"500\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "- Receptive field size increases more rapidly if we add pooling, strides or dilation.\n",
    "- We want successive conv layers to be affected by increasingly larger parts of the input image.\n",
    "- This allows us to learn a hierarchy of visual features.\n",
    "\n",
    "<center><img src=\"img/feature_hierarchy.png\" width=\"700\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Another reason is to add **invariance** to changes in the input.\n",
    "\n",
    "- Pooling within feature maps: introduces invariance to small translations\n",
    "  <center><img src=\"img/maxpool.png\" width=\"600\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "- Pooling across feature maps: introduces invariance to learned transformations\n",
    "  <center><img src=\"img/pooling_invariance.png\" width=\"600\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### PyTorch `Pool2d` layer example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.556622Z",
     "iopub.status.busy": "2022-03-24T07:23:42.556478Z",
     "iopub.status.idle": "2022-03-24T07:23:42.574020Z",
     "shell.execute_reply": "2022-03-24T07:23:42.573654Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "After second conv layer: torch.Size([1, 20, 12, 12])\n",
      "After max-pool:          torch.Size([1, 20, 6, 6])\n"
     ]
    }
   ],
   "source": [
    "pool = nn.MaxPool2d(kernel_size=2, stride=2)\n",
    "\n",
    "print(f'{\"After second conv layer:\":25s}{conv2(conv1(x0)).shape}')\n",
    "print(f'{\"After max-pool:\":25s}{pool(conv2(conv1(x0))).shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Network Architecture"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The basic way to build an architecture of a deep convolutional neural net, is to repeat groups of **conv-relu** layers, optionally add **pooling** in between and end with an **FC-softmax** combination.\n",
    "\n",
    "\n",
    "Why does such a scheme make sense, e.g. for image classification?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<center><img src=\"img/arch.png\" width=\"700\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "In the above image,\n",
    "\n",
    "- all the **conv** blocks shown are actually **conv-relu** (or some other nonlinearity).\n",
    "- The repeating **conv-conv-...-pool** blocks are learned, non-linear feature extractors: they learn to detect specific features in an image (e.g. lines at different orientations).\n",
    "- The pooling controls the receptive field increase, so that more high-level features can be generated by each conv group (e.g. shapes composed from multiple simple lines).\n",
    "- The **FC-softmax** at the end is just an MLP that uses the extracted features for classification.\n",
    "- Training end-to-end learns the classifier together with the features!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- The rightmost architecture is called VGG, and used to be a relevant architecture for ImageNet classification.\n",
    "- Other types of layers, such as normalization layers are usually also added."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "There are many other things to consider as part of the architecture:\n",
    "- Size of conv kernels\n",
    "- Number of consecutive convolutions\n",
    "- Use of batch normalization to speed up training\n",
    "- Dropout for improved generalization\n",
    "- Not using FC layers (we'll see later)\n",
    "- Skip connections (we'll see later)\n",
    "\n",
    "All of these could be hyperparameters to cross-validate over!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Many different network architectures exist, made famous mainly by repeated improvements on the ImageNet classification challenge since 2012.\n",
    "\n",
    "<center><img src=\"img/net_archs.png\" width=\"1100\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Notable ImageNet-winning architectures:\n",
    "\n",
    "- AlexNet, 5 layers (2012): Based on LeNet, deeper, with ReLU, trained with GPUs\n",
    "- Inception/GoogLeNet, 22 layers (2014): Multiple (small) kernel sizes at same depth\n",
    "- ResNet, 152 (!) layers (2015): Skip connections"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### What filters are deep CNNs learning?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "CNNs capture hierarchical features, with deeper layers capturing higher-level, class-specific features\n",
    "(Zeiler & Fergus, 2013).\n",
    "\n",
    "<center><img src=\"img/zf1.png\" width=\"1000\"/></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "This visualization shows patterns which maximally-activate kernels at various layers of a conv net."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### PyTorch network architecture example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Let's implement **LeNet**, arguably the first successful CNN model for MNIST (LeCun, 1998).\n",
    "\n",
    "<center><img src=\"https://cdn-images-1.medium.com/max/1600/1*1TI1aGBZ4dybR6__DI9dzA.png\" width=\"1100\" /></center>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.576853Z",
     "iopub.status.busy": "2022-03-24T07:23:42.576761Z",
     "iopub.status.idle": "2022-03-24T07:23:42.593373Z",
     "shell.execute_reply": "2022-03-24T07:23:42.593067Z"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "class LeNet(nn.Module):\n",
    "    def __init__(self, in_channels=3):\n",
    "        super().__init__()\n",
    "        self.feature_extractor = nn.Sequential(\n",
    "            nn.Conv2d(in_channels, out_channels=6, kernel_size=5),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),\n",
    "            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),\n",
    "        )\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Linear(16*5*5, 120),  # Why 16*5*5 ?\n",
    "            nn.ReLU(), \n",
    "            nn.Linear(120, 84), # (N, 120) -> (N, 84)\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(84, 10)   # (N, 84)  -> (N, 10)\n",
    "        )\n",
    "    def forward(self, x):\n",
    "        features = self.feature_extractor(x)\n",
    "        features = features.view(features.size(0), -1)\n",
    "        class_scores = self.classifier(features)\n",
    "        return class_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.595148Z",
     "iopub.status.busy": "2022-03-24T07:23:42.595066Z",
     "iopub.status.idle": "2022-03-24T07:23:42.608738Z",
     "shell.execute_reply": "2022-03-24T07:23:42.608473Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LeNet(\n",
      "  (feature_extractor): Sequential(\n",
      "    (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (4): ReLU()\n",
      "    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (classifier): Sequential(\n",
      "    (0): Linear(in_features=400, out_features=120, bias=True)\n",
      "    (1): ReLU()\n",
      "    (2): Linear(in_features=120, out_features=84, bias=True)\n",
      "    (3): ReLU()\n",
      "    (4): Linear(in_features=84, out_features=10, bias=True)\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net = LeNet()\n",
    "print(net)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.610411Z",
     "iopub.status.busy": "2022-03-24T07:23:42.610309Z",
     "iopub.status.idle": "2022-03-24T07:23:42.661448Z",
     "shell.execute_reply": "2022-03-24T07:23:42.661183Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x0 shape= torch.Size([1, 3, 32, 32])\n",
      "\n",
      "LeNet(x0)= tensor([[-0.0388,  0.0337, -0.0120, -0.0205, -0.0326,  0.0651, -0.0826, -0.0595,\n",
      "          0.0831,  0.1228]], grad_fn=<AddmmBackward0>)\n",
      "\n",
      "shape= torch.Size([1, 10])\n"
     ]
    }
   ],
   "source": [
    "# Test forward pass\n",
    "print('x0 shape=', x0.shape, end='\\n\\n')\n",
    "print('LeNet(x0)=', net(x0), end='\\n\\n')\n",
    "print('shape=', net(x0).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Fully-convolutional Networks\n",
    "you can read at home, not for the homework"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Notice how we never actually specified the input image size when implementing the network.\n",
    "\n",
    "**Does this mean we can use the network on images of any size**?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "**No**, because of the FC layers at the end.\n",
    "\n",
    "Here, let's try:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.663315Z",
     "iopub.status.busy": "2022-03-24T07:23:42.663210Z",
     "iopub.status.idle": "2022-03-24T07:23:42.698988Z",
     "shell.execute_reply": "2022-03-24T07:23:42.698684Z"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "mat1 and mat2 shapes cannot be multiplied (1x2704 and 400x120)\n"
     ]
    }
   ],
   "source": [
    "large_image = torch.randn(1,3,32*2,32*2)\n",
    "try:\n",
    "    net(large_image)\n",
    "except RuntimeError as e:\n",
    "    print(e, file=sys.stderr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "However: Only the FC layers at the end require actual knowledge of exact image sizes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "We can replace them with... More convolutions, of course"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "What would we get from:\n",
    "\n",
    "- Kernels of size 1x1?\n",
    "- Kernels of size HxW (full spatial extent)?\n",
    "\n",
    "\n",
    "<center><img src=\"img/1x1_conv.png\" width=\"600\" /></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Lets create a fully-convolutional LeNet:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.700836Z",
     "iopub.status.busy": "2022-03-24T07:23:42.700733Z",
     "iopub.status.idle": "2022-03-24T07:23:42.715096Z",
     "shell.execute_reply": "2022-03-24T07:23:42.714801Z"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "class LeNetFullyConv(LeNet):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        # Remember: the last feature map volume has shape (16,5,5) for the original image size\n",
    "        # Override the classifier with 5x5 then 1x1 convolutions\n",
    "        # Try to figure out the output shape after each of the following convolutions:\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5), # no padding or strides!\n",
    "            nn.ReLU(),\n",
    "            nn.Conv2d(in_channels=120, out_channels=84, kernel_size=1), # 1x1 conv\n",
    "            nn.ReLU(),\n",
    "            nn.Conv2d(in_channels=84, out_channels=10, kernel_size=1),  # 1x1 conv\n",
    "        )\n",
    "        \n",
    "    def forward(self, x):\n",
    "        # Using feature extractor block from the base model\n",
    "        features = self.feature_extractor(x)\n",
    "        # note: no need to reshape the features now\n",
    "        class_scores = self.classifier(features)\n",
    "        return class_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.716758Z",
     "iopub.status.busy": "2022-03-24T07:23:42.716679Z",
     "iopub.status.idle": "2022-03-24T07:23:42.731167Z",
     "shell.execute_reply": "2022-03-24T07:23:42.730846Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LeNetFullyConv(\n",
      "  (feature_extractor): Sequential(\n",
      "    (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (4): ReLU()\n",
      "    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
      "  )\n",
      "  (classifier): Sequential(\n",
      "    (0): Conv2d(16, 120, kernel_size=(5, 5), stride=(1, 1))\n",
      "    (1): ReLU()\n",
      "    (2): Conv2d(120, 84, kernel_size=(1, 1), stride=(1, 1))\n",
      "    (3): ReLU()\n",
      "    (4): Conv2d(84, 10, kernel_size=(1, 1), stride=(1, 1))\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "net_fully_conv = LeNetFullyConv()\n",
    "print(net_fully_conv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Let's forward the original-sized image and the larger image through the network and observe the output shapes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-03-24T07:23:42.733071Z",
     "iopub.status.busy": "2022-03-24T07:23:42.732968Z",
     "iopub.status.idle": "2022-03-24T07:23:42.763894Z",
     "shell.execute_reply": "2022-03-24T07:23:42.763569Z"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "regular image output shape: torch.Size([1, 10, 1, 1])\n",
      "large   image output shape: torch.Size([1, 10, 9, 9])\n"
     ]
    }
   ],
   "source": [
    "print('regular image output shape:', net_fully_conv(x0).shape)\n",
    "print('large   image output shape:', net_fully_conv(large_image).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "**What's the meaning of the output after conversion to fully convolutional?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "It's now a **spatial classification map**.\n",
    "\n",
    "<center><img src=\"img/fully_conv.png\" width=\"1000\" /></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Residual Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<center><img src=\"img/deeper_meme.jpeg\"/></center>\n",
    "\n",
    "For image-related tasks it seems that **deeper is better**: learn more complex features.\n",
    "\n",
    "How deep can we go? Should more depth always improve results?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "In theory, adding an addition layer should provide **at least** the same accuracy as before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Extra layers could always be just identity maps.\n",
    "\n",
    "In practice, there are two major problems with adding depth:\n",
    "1. More difficult convergence: vanishing gradients\n",
    "1. More difficult optimization: parameter space increases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center><img src=\"img/resnet_plain_deep_error.png\" width=\"800\"/></center>\n",
    "\n",
    "I.e., even if the same solution (or better) exists, SGD-based optimization can't find it. **Optimization error** increased with depth."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "ResNets attempt to address these issues by building a network architecture composed of convolutional blocks with added **shortcut-connections**:\n",
    "\n",
    "<center><img src=\"img/resnet_block2.png\" width=\"900\"/></center>\n",
    "\n",
    "(Left: basic block; right: bottleneck block).\n",
    "\n",
    "Here the weight layers are `3x3` or `1x1` convolutions followed by batch-normalization.\n",
    "\n",
    "**Why do these shortcut-connections help?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "These shortcuts create two key advantages:\n",
    "- Allow gradients to \"flow\" freely backwards\n",
    "- Each block only learns the \"residual mapping\", i.e. some delta from the identity map which is easier to optimize."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Implementation: In the homeworks :)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Thanks!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "**Credits**\n",
    "\n",
    "This tutorial was written by [Aviv A. Rosenberg](https://avivr.net) and [Moshe Kimhi](https://mkimhi.github.io/).<br>\n",
    "To re-use, please provide attribution and link to the original.\n",
    "\n",
    "Some images in this tutorial were taken and/or adapted from the following sources:\n",
    "\n",
    "- Sebastian Raschka, https://sebastianraschka.com/\n",
    "- Deep Learning, Goodfellow, Bengio and Courville, MIT Press, 2016\n",
    "- Fundamentals of Deep Learning, Nikhil Buduma, Oreilly 2017\n",
    "- Deep Learning with Python, Francios Chollet, Manning 2018\n",
    "- Stanford cs231n course notes by Andrej Karpathy\n",
    "- https://github.com/vdumoulin/conv_arithmetic\n",
    "- Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition.\n",
    "- Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications.\n",
    "- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.\n",
    "- A Comprehensive Introduction to Different Types of Convolutions in Deep Learning, Kulun Bai\n",
    "- https://animatedai.github.io/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  },
  "rise": {
   "scroll": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}