{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$\n", "\\newcommand{\\mat}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\mattr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\matinv}[1]{\\boldsymbol {#1}^{-1}}\n", "\\newcommand{\\vec}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\vectr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\rvar}[1]{\\mathrm {#1}}\n", "\\newcommand{\\rvec}[1]{\\boldsymbol{\\mathrm{#1}}}\n", "\\newcommand{\\diag}{\\mathop{\\mathrm {diag}}}\n", "\\newcommand{\\set}[1]{\\mathbb {#1}}\n", "\\newcommand{\\norm}[1]{\\left\\lVert#1\\right\\rVert}\n", "\\newcommand{\\pderiv}[2]{\\frac{\\partial #1}{\\partial #2}}\n", "\\newcommand{\\bb}[1]{\\boldsymbol{#1}}\n", "$$\n", "\n", "\n", "# CS236781: Deep Learning\n", "# Tutorial 8: Sequence Models" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introduction\n", "\n", "In this tutorial, we will cover:\n", "\n", "- What RNNs are and how they work\n", "- Implementing basic RNNs models\n", "- Application example: sentiment analysis of movie reviews\n", "- Bonus: Stock prediction\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Setup\n", "%matplotlib inline\n", "import os\n", "import sys\n", "import time\n", "import torch\n", "import matplotlib.pyplot as plt\n", "import warnings\n", "warnings.simplefilter(\"ignore\")\n", "\n", "# pytorch\n", "import torch\n", "import torch.nn as nn\n", "import torchtext\n", "import torchtext.data as data\n", "import torchtext.datasets as datasets\n", "import torch.nn.functional as f" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "plt.rcParams['font.size'] = 20\n", "data_dir = os.path.expanduser('~/.pytorch-datasets')\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Theory Reminders" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Thus far, our models have been composed of fully connected (linear) layers or convolutional layers." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Fully connected layers\n", " - Each layer $l$ operates on the output of the previous layer ($\\vec{y}_{l-1}$) and calculates,\n", " $$\n", " \\vec{y}_l = \\varphi\\left( \\mat{W}_l \\vec{y}_{l-1} + \\vec{b}_l \\right),~\n", " \\mat{W}_l\\in\\set{R}^{n_{l}\\times n_{l-1}},~ \\vec{b}_l\\in\\set{R}^{n_l}.\n", " $$\n", " - FC's have completely pre-fixed input and output dimensions.\n", " \n", "