{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2016-12-02T17:30:29.181539",
     "start_time": "2016-12-02T17:30:29.172204"
    },
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'theme': 'white',\n",
       " 'transition': 'none',\n",
       " 'controls': 'false',\n",
       " 'progress': 'true'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reveal.js\n",
    "from notebook.services.config import ConfigManager\n",
    "cm = ConfigManager()\n",
    "cm.update('livereveal', {\n",
    "        'theme': 'white',\n",
    "        'transition': 'none',\n",
    "        'controls': 'false',\n",
    "        'progress': 'true',\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "# %cd ..\n",
    "import sys\n",
    "sys.path.append(\"..\")\n",
    "import statnlpbook.util as util\n",
    "util.execute_notebook('language_models.ipynb')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Contextualised Word Representations\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## What makes a good word representation? ##\n",
    "\n",
    "1. Representations are **distinct**\n",
    "2. **Similar** words have **similar** representations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## What does this mean? ##\n",
    "\n",
    "\n",
    "* \"Yesterday I saw a bass ...\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/bass_1.jpg?0.37587358775262225\" width=\"300\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/bass_2.svg?0.8377721543924447\" width=\"100\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Contextualised Representations #\n",
    "\n",
    "* Static embeddings (e.g., [word2vec](dl-representations_simple.ipynb)) have one representation per word *type*, regardless of context\n",
    "\n",
    "* Contextualised representations use the context surrounding the word *token*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Contextualised Representations Example ##\n",
    "\n",
    "\n",
    "* a) \"Yesterday I saw a bass swimming in the lake\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/bass_1.jpg?0.2514734494969232\" width=\"300\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* b) \"Yesterday I saw a bass in the music shop\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/bass_2.svg?0.12294740404377913\" width=\"100\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Contextualised Representations Example ##\n",
    "\n",
    "\n",
    "* a) <span style=\"color:red\">\"Yesterday I saw a bass swimming in the lake\"</span>.\n",
    "* b) <span style=\"color:green\">\"Yesterday I saw a bass in the music shop\"</span>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/bass_visualisation.jpg?0.0005807542302558311\" width=\"500\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/bass_visualisation.jpg'+'?'+str(random.random()), width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## What makes a good representation? ##\n",
    "\n",
    "1. Representations are **distinct**\n",
    "2. **Similar** words have **similar** representations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Additional criterion:\n",
    "\n",
    "3. Representations take **context** into account"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## How to train contextualised representations ##\n",
    "\n",
    "Basicallly like word2vec: predict a word from its context (or vice versa).\n",
    "\n",
    "Cannot just use lookup table (i.e., embedding matrix) any more.\n",
    "\n",
    "Train a network with the sequence as input! Does this remind you of anything?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<center><img src=\"../img/rnnlm.png\"></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "The hidden state of an RNN LM is a contextualised word representation!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/elmo_1.png?0.4888163857481026\" width=\"800\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/elmo_1.png'+'?'+str(random.random()), width=800)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "\"Let's stick to improvisation in this skit\"\n",
    "\n",
    "Image credit: http://jalammar.github.io/illustrated-bert/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Bidirectional RNN LM ##\n",
    "\n",
    "An RNN (or LSTM) LM only considers preceding context.\n",
    "\n",
    "ELMo (Embeddings from Language Models) is based on a biLM: *bidirectional language model* ([Peters et al., 2018](https://www.aclweb.org/anthology/N18-1202/))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/elmo_2.png?0.9144382673436326\" width=\"1200\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/elmo_2.png'+'?'+str(random.random()), width=1200)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/elmo_3.png?0.1076383293975175\" width=\"1200\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/elmo_3.png'+'?'+str(random.random()), width=1200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center><img src=\"../img/quiz_time.png\"></center>\n",
    "\n",
    "# [ucph.page.link/bilm](https://ucph.page.link/bilm)\n",
    "([Responses](https://docs.google.com/forms/d/1BimPo-S12XWt1qOJLXBTIGjRpt-bVW8H7hmT3j0iRRQ/edit#responses))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Solution\n",
    "\n",
    "To prevent a word from being used to predict itself, while still allowing the model to consider both preceding and following words."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Problem: Long-Term Dependencies ##\n",
    "\n",
    "LSTMs have *longer-term* memory, but they still forget.\n",
    "\n",
    "Solution: *transformers*! ([Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* In 2022, all state-of-the-art LMs are transformers.\n",
    "    * Yes, also GPT-3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/transformers.png?0.20799997509039903\" width=\"400\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/transformers.png'+'?'+str(random.random()), width=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Summary #\n",
    "\n",
    "* Static word embeddings do not differ depending on context\n",
    "* Contextualised representations are dynamic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Additional Reading #\n",
    "\n",
    "+ [Jurafsky & Martin Chapter 11](https://web.stanford.edu/~jurafsky/slp3/11.pdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}