{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$\\newcommand{\\mat}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\mattr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\matinv}[1]{\\boldsymbol {#1}^{-1}}\n", "\\newcommand{\\vec}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\vectr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\rvar}[1]{\\mathrm {#1}}\n", "\\newcommand{\\rvec}[1]{\\boldsymbol{\\mathrm{#1}}}\n", "\\newcommand{\\diag}{\\mathop{\\mathrm {diag}}}\n", "\\newcommand{\\set}[1]{\\mathbb {#1}}\n", "\\newcommand{\\norm}[1]{\\left\\lVert#1\\right\\rVert}\n", "\\newcommand{\\pderiv}[2]{\\frac{\\partial #1}{\\partial #2}}\n", "\\newcommand{\\bb}[1]{\\boldsymbol{#1}}$$\n", "\n", "# CS236781: Deep Learning\n", "# Tutorial 2: Multilayer Perceptron" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Introduction\n", "\n", "In this tutorial, we will cover:\n", "\n", "* Linear (fully connected) layers\n", "* Activation functions\n", "* 2-Layer MLP implementation from scratch (self study)\n", "* N-layer MLP with PyTorch's `autograd` and `optim` modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-05-06T06:05:49.016538Z", "iopub.status.busy": "2021-05-06T06:05:49.016034Z", "iopub.status.idle": "2021-05-06T06:05:50.095393Z", "shell.execute_reply": "2021-05-06T06:05:50.095954Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Setup\n", "%matplotlib inline\n", "import os\n", "import numpy as np\n", "import sklearn\n", "import torch\n", "import matplotlib.pyplot as plt\n", "from typing import Sequence" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-05-06T06:05:50.099377Z", "iopub.status.busy": "2021-05-06T06:05:50.098809Z", "iopub.status.idle": "2021-05-06T06:05:50.126349Z", "shell.execute_reply": "2021-05-06T06:05:50.126947Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "plt.rcParams['font.size'] = 20" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Reminder: Perceptrons and linear models" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The following hypothesis class\n", "$$\n", "\\mathcal{H} =\n", "\\left\\{ h: \\mathcal{X}\\rightarrow\\mathcal{Y}\n", "~\\vert~\n", "h(\\vec{x}) = \\varphi(\\vectr{w}\\vec{x}+b); \\vec{w}\\in\\set{R}^D,~b\\in\\set{R}\\right\\}\n", "$$\n", "where $\\varphi(\\cdot)$ is some nonlinear function, is composed of functions representing the **perceptron** model.\n", "\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Last tutorial: we trained a **logistic regression** model by using\n", "$$\\varphi(\\vec{z})=\\sigma(\\vec{z})=\\frac{1}{1+\\exp(-\\vec{z})}\\in[0,1].$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Limitation**: logistic regression is still a linear classifier. In what sense is it linear though?\n", "\n", "$$\\hat{y} = \\sigma(\\vectr{w}\\vec{x}+b)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Linear** in the sense that output depends only on a linear combination of weights and inputs.\n", "\n", "Decision boundaries are therefore straight lines:\n", "\n", "