{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2016-12-02T17:30:29.181539", "start_time": "2016-12-02T17:30:29.172204" }, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'theme': 'white',\n", " 'transition': 'none',\n", " 'controls': 'false',\n", " 'progress': 'true'}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reveal.js\n", "from notebook.services.config import ConfigManager\n", "cm = ConfigManager()\n", "cm.update('livereveal', {\n", " 'theme': 'white',\n", " 'transition': 'none',\n", " 'controls': 'false',\n", " 'progress': 'true',\n", "})" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "# %cd ..\n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "util.execute_notebook('language_models.ipynb')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.display import Image\n", "import random" ] }, { "cell_type": "markdown", "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Contextualised Word Representations\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What makes a good word representation? ##\n", "\n", "1. Representations are **distinct**\n", "2. **Similar** words have **similar** representations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## What does this mean? ##\n", "\n", "\n", "* \"Yesterday I saw a bass ...\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/bass_1.jpg?0.37587358775262225\" width=\"300\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/bass_2.svg?0.8377721543924447\" width=\"100\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Contextualised Representations #\n", "\n", "* Static embeddings (e.g., [word2vec](dl-representations_simple.ipynb)) have one representation per word *type*, regardless of context\n", "\n", "* Contextualised representations use the context surrounding the word *token*\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Contextualised Representations Example ##\n", "\n", "\n", "* a) \"Yesterday I saw a bass swimming in the lake\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/bass_1.jpg?0.2514734494969232\" width=\"300\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/bass_1.jpg'+'?'+str(random.random()), width=300)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* b) \"Yesterday I saw a bass in the music shop\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/bass_2.svg?0.12294740404377913\" width=\"100\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/bass_2.svg'+'?'+str(random.random()), width=100)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Contextualised Representations Example ##\n", "\n", "\n", "* a) <span style=\"color:red\">\"Yesterday I saw a bass swimming in the lake\"</span>.\n", "* b) <span style=\"color:green\">\"Yesterday I saw a bass in the music shop\"</span>." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/bass_visualisation.jpg?0.0005807542302558311\" width=\"500\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/bass_visualisation.jpg'+'?'+str(random.random()), width=500)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## What makes a good representation? ##\n", "\n", "1. Representations are **distinct**\n", "2. **Similar** words have **similar** representations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Additional criterion:\n", "\n", "3. Representations take **context** into account" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## How to train contextualised representations ##\n", "\n", "Basicallly like word2vec: predict a word from its context (or vice versa).\n", "\n", "Cannot just use lookup table (i.e., embedding matrix) any more.\n", "\n", "Train a network with the sequence as input! Does this remind you of anything?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "<center><img src=\"../img/rnnlm.png\"></center>\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The hidden state of an RNN LM is a contextualised word representation!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/elmo_1.png?0.4888163857481026\" width=\"800\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/elmo_1.png'+'?'+str(random.random()), width=800)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "\"Let's stick to improvisation in this skit\"\n", "\n", "Image credit: http://jalammar.github.io/illustrated-bert/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Bidirectional RNN LM ##\n", "\n", "An RNN (or LSTM) LM only considers preceding context.\n", "\n", "ELMo (Embeddings from Language Models) is based on a biLM: *bidirectional language model* ([Peters et al., 2018](https://www.aclweb.org/anthology/N18-1202/))." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/elmo_2.png?0.9144382673436326\" width=\"1200\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/elmo_2.png'+'?'+str(random.random()), width=1200)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/elmo_3.png?0.1076383293975175\" width=\"1200\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/elmo_3.png'+'?'+str(random.random()), width=1200)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "<center><img src=\"../img/quiz_time.png\"></center>\n", "\n", "# [ucph.page.link/bilm](https://ucph.page.link/bilm)\n", "([Responses](https://docs.google.com/forms/d/1BimPo-S12XWt1qOJLXBTIGjRpt-bVW8H7hmT3j0iRRQ/edit#responses))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Solution\n", "\n", "To prevent a word from being used to predict itself, while still allowing the model to consider both preceding and following words." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Problem: Long-Term Dependencies ##\n", "\n", "LSTMs have *longer-term* memory, but they still forget.\n", "\n", "Solution: *transformers*! ([Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* In 2022, all state-of-the-art LMs are transformers.\n", " * Yes, also GPT-3" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/transformers.png?0.20799997509039903\" width=\"400\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/transformers.png'+'?'+str(random.random()), width=400)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Summary #\n", "\n", "* Static word embeddings do not differ depending on context\n", "* Contextualised representations are dynamic" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Additional Reading #\n", "\n", "+ [Jurafsky & Martin Chapter 11](https://web.stanford.edu/~jurafsky/slp3/11.pdf)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 1 }