{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2016-12-02T17:30:29.181539", "start_time": "2016-12-02T17:30:29.172204" }, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'theme': 'white',\n", " 'transition': 'none',\n", " 'controls': 'false',\n", " 'progress': 'true'}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reveal.js\n", "from notebook.services.config import ConfigManager\n", "cm = ConfigManager()\n", "cm.update('livereveal', {\n", " 'theme': 'white',\n", " 'transition': 'none',\n", " 'controls': 'false',\n", " 'progress': 'true',\n", "})" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import random\n", "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": { "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Introduction to Representation Learning\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2016-12-02T16:54:38.546624", "start_time": "2016-12-02T16:54:38.540051" }, "cell_style": "center", "slideshow": { "slide_type": "fragment" } }, "source": [ "* Representations of Words\n", " * Motivation\n", " * Sparse Binary Representations\n", " * Dense Continuous Representations\n", "* Unsupervised Learning of Word Representations\n", " * Motivation\n", " * Sparse Co-occurence Representations\n", " * Neural Word Representations\n", "* From Word Representations to Sentences and Documents" ] }, { "cell_type": "markdown", "metadata": { "hide_input": false, "slideshow": { "slide_type": "skip" } }, "source": [ "# Feed-forward Neural Networks\n", "<center><img src=\"../img/mlp.svg\"></center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Word representations ##" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/word_representations.svg?0.09272290851904197\" width=\"1000\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/word_representations.svg'+'?'+str(random.random()), width=1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why talk about representations? ##\n", "\n", "* Machine Learning, features are representations of the input data\n", " * Language is special\n", "* Better representations, better performance\n", "* Representation learning (neural networks, \"deep learning\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## What makes a good representation? ##\n", "\n", "1. Representations are **distinct**\n", "2. **Similar** words have **similar** representations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Formal Task ##\n", "\n", "* Words: $w$\n", "* Vocabulary: $\\mathbb{V} (\\forall_{i} w_{i} \\in \\mathbb{V})$\n", "* Find representation function: $f(w_{i}) = r_{i}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Sparse Binary Representations ##\n", "\n", "* Map words to unique positive non-zero integers\n", " * $f_{id}(w) \\mapsto \\mathbb{N^{*}}$\n", " * $g(w, i) = {\\left\\lbrace\n", " \\begin{array}{ll}\n", " 1 & \\textrm{if }~i = f_{id}(w) \\\\\n", " 0 & \\textrm{otherwise} \\\\\n", " \\end{array}\\right.}$\n", "* Word representations as \"one-hot\" vectors\n", " * $f_{sb}(w) = (g(w, 1), \\ldots, g(w, |V|))$\n", " * $f_{sb}(w) \\mapsto \\{0,1\\}^{|V|}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example ###\n", "\n", "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n", "* $f_{id}(\\textrm{apple}) = 1, \\ldots$, $f_{id}(\\textrm{rabbit}) = 3$\n", "* $f_{sb}(\\textrm{apple}) = (1, 0, 0)$\n", "* $f_{sb}(\\textrm{orange}) = (0, 1, 0)$\n", "* $f_{sb}(\\textrm{rabbit}) = (0, 0, 1)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sparse Binary Visualised ##\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Cosine Similarity ##\n", "\n", "* $cos(u, v) = \\frac{u \\cdot v}{||u|| \\cdot ||v||}$\n", "* $cos(u, v) \\mapsto [-1, 1]$\n", "* $cos(u, v) = 1$; identical\n", "* $cos(u, v) = -1$; opposites\n", "* $cos(u, v) = 0$; orthogonal" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Note the different formulation in SciPy:\n", "$$cos(u, v) = 1 - \\frac{u \\cdot v}{||u|| \\cdot||v||}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Cosine Similarity Visualised ##\n", "\n", "<center><img src=\"http://blog.christianperone.com/wp-content/uploads/2013/09/cosinesimilarityfq1.png\" width=\"110%\"></center>\n", "\n", "http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/quiz_time.png?0.06805595211107918\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/quiz_time.png'+'?'+str(random.random()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## [ucph.page.link/cos](https://ucph.page.link/cos)\n", "([Responses](https://docs.google.com/forms/d/1mbVCSbDXpA9qY5DkfR-fnX3c3MFXPiLAqeTrd2P0U90/edit#responses))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Solution\n", "\n", "$$\n", "cos(f_sb(apple), f_sb(orange) = 0 \\\\\n", "cos(f_sb(apple), f_sb(rabbit) = 0 \\\\\n", "cos(f_sb(orange), f_sb(rabbit) = 0\n", "$$\n", "\n", "* Apple and orange are more similar, because they are fruit, but the distance metric does not distinguish between them and rabbit. An apple is more like an orange than a rabbit according to human intuition." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Dense Continuous Representations ##\n", "\n", "* $f_{id}(w) \\mapsto \\mathbb{N}^{*}$\n", "* \"Embed\" words as matrix rows\n", "* Dimensionality: $d$ (hyperparameter)\n", "* $W \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$\n", "* $f_{dc}(w) = W_{f_{id}(w), :}$\n", "* $f_{dc}(w) \\mapsto \\mathbb{R}^{d}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example ###\n", "\n", "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n", "* $d = 2$\n", "* $W \\in \\mathbb{R}^{3 \\times 2}$\n", "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n", "* $f_{dc}(\\textrm{apple}) = (1.0, 1.0)$\n", "* $f_{dc}(\\textrm{orange}) = (0.9, 1.0)$\n", "* $f_{dc}(\\textrm{rabbit}) = (0.1, 0.5)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Dense Continuous Visualised ##\n", "\n", "<center><img src=\"../img/dense_continuous.svg\" width=\"80%\"></center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Dense Continuous Similarities ##\n", "\n", "* $cos(f_{dc}(\\textrm{apple}),f_{dc}(\\textrm{rabbit})) \\approx 0.83$\n", "* $cos(f_{dc}(\\textrm{apple}),f_{dc}(\\textrm{orange})) \\approx 1.0$\n", "* $cos(f_{dc}(\\textrm{orange}),f_{dc}(\\textrm{rabbit})) \\approx 0.86$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Unsupervised Learning of Word Representations #" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Why not supervised? ##\n", "\n", "<center><img src=\"../img/annotated_vs_unannotated_data.svg\" width=\"40%\"></center>\n", "\n", "\n", "* Current gains from large data sets / more compute\n", " * Lack of comparison" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Linguistic Inspirations ##\n", "\n", "* \"Oculist and eye-doctor … occur in almost the same environments. … If $A$ and $B$ have almost identical environments we say that they are synonyms.\" – Zellig Harris (1954)\n", "* \"You shall know a word by the company it keeps.\" – John Rupert Firth (1957)\n", "* Akin to \"meaning is use\" – Wittgenstein (1953)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sparse Co-occurence Representations ##" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Co-occurences ##\n", "\n", "* Collected from a large collection of *raw* text\n", "* E.g. Wikipedia, crawled news data, tweets, ...\n", "\n", "1. \"…comparing an **apple** to an **orange**…\"\n", "2. \"…an **apple** and **orange** from Florida…\"\n", "3. \"…my **rabbit** is not shaped like an **orange**…\" (yes, there is always **noise** in the data)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sparse Co-occurence Representations ##\n", "\n", "* The number of times words co-occur in a text collection\n", "* $C \\in \\mathbb{N}^{|V| \\times |V|}$\n", "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n", "* $C = \\begin{pmatrix}\n", " 2 & 2 & 0 \\\\\n", " 2 & 3 & 1 \\\\\n", " 0 & 1 & 1 \\\\\n", " \\end{pmatrix}$\n", "* $f_{cs}(w) = C_{f_{id}(w), :}$\n", "* $f_{cs}(w) \\mapsto \\mathbb{N}^{|V|}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example ###\n", "\n", "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n", "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n", "* $f_{cs}(\\textrm{apple}) = (2, 2, 0)$\n", "* $f_{cs}(\\textrm{orange}) = (2, 3, 1)$\n", "* $f_{cs}(\\textrm{rabbit}) = (0, 1, 1)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sparse Co-occurence Similarities ##\n", "\n", "* $cos(f_{cs}(\\textrm{apple}), f_{cs}(\\textrm{rabbit})) \\approx 0.50$\n", "* $cos(f_{cs}(\\textrm{apple}), f_{cs}(\\textrm{orange})) \\approx 0.94$\n", "* $cos(f_{cs}(\\textrm{orange}), f_{cs}(\\textrm{rabbit})) \\approx 0.76$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Neural Word Representations #" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Learning by Slot Filling ##\n", "\n", "* \"…I had some **_____** for breakfast today…\"\n", "* Good: *cereal*\n", "* Bad: *airplanes*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "<center><img width=1000 src=\"../img/cbow_sg2.png\"></center>\n", "\n", "<div style=\"text-align: right;\">\n", " (word2vec: <a href=\"https://arxiv.org/abs/1301.3781\">Mikolov et al., 2013</a>)\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Creating Positive Training Instances ##\n", "\n", "\n", "\"I had some cereal for breakfast today\"\n", "\n", "* **I** had some cereal for breakfast today -> (**I**, had); (**I**, some)\n", "* I **had** some cereal for breakfast today -> (**had**, I); (**had**, some); (**had**, cereal)\n", "* I had **some** cereal for breakfast today -> (**some**, I); (**some**, had); (**some**, cereal); (**some**, for)\n", "* I had some **cereal** for breakfast today -> (**cereal**, had); (**cereal**, some); (**cereal**, for); (**cereal**, breakfast)\n", "* ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Training instance: target word $w \\in \\mathbb{V}$; context word $c \\in \\mathbb{V}$\n", "\n", "$D = ((c, w),\\ldots)$; observed co-occurences\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sampling Negative Training Instances ##\n", "\n", "\n", "\"I had some cereal for breakfast today\"\n", "\n", "* **Lecture** had some cereal for breakfast today -> (**Lecture**, had); (**Lecture**, some)\n", "* I **computer** some cereal for breakfast today -> (**computer**, I); (**computer**, some); (**computer**, cereal)\n", "* I had **word** cereal for breakfast today -> (**word**, I); (**word**, had); (**word**, cereal); (**word**, for)\n", "* I had some **books** for breakfast today -> (**books**, had); (**books**, some); (**books**, for); (**books**, breakfast)\n", "* ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "Create $D' = ((c, w),\\ldots)$; \"noise samples\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Unsupervised Loss Function ##\n", "\n", "* $w \\in \\mathbb{V}$; $c \\in \\mathbb{V}$\n", "* $D = ((c, w),\\ldots)$; observed co-occurences\n", "* $D' = ((c, w),\\ldots)$; \"noise samples\"\n", "* $\\textrm{max}~p((c, w) \\in D | W) - p((c, w) \\in D' | W)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Idea: maximise difference between the score for true instances and negative samples\n", "* How do we get $p(c, w)$?\n", " * using a neural network" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Neural Skip-Gram Model ##\n", "\n", "<center><img width=650 src=\"../img/skip-gram_architecture.png\"></center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* $W^{w} \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$; $W^{c} \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$\n", "* $D = ((c, w),\\ldots)$; $D' = ((c, w),\\ldots)$\n", "* $\\sigma(x) = \\frac{1}{1 + \\textrm{exp}(-x)}$\n", "* $p((c, w) \\in D | W^{w}, W^{c}) = \\sigma(W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:})$\n", "* $\\arg\\max\\limits_{W^{w},W^{c}} \\sum\\limits_{(w,c) \\in D} \\log \\sigma(W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:}) + \\sum\\limits_{(w,c) \\in D'} \\log \\sigma(-W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:})$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Neural Representation ##\n", "\n", "* Learned using [word2vec](https://code.google.com/p/word2vec/)\n", "* Google News data, $~1,000,000,000$ words\n", "* $|\\mathbb{V}| = 3,000,000$\n", "* $d = 300$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Neural Representation Example ##\n", "\n", "* $f_{n}(\\textrm{apple}) = (-0.06, -0.16, \\ldots, 0.34)$\n", "* $f_{n}(\\textrm{orange}) = (-0.10, -0.18, \\ldots, 0.08)$\n", "* $f_{n}(\\textrm{rabbit}) = (0.02, 0.11, \\ldots, 0.11)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## Neural Representation Similarities ##\n", "\n", "* $cos(f_{n}(\\textrm{apple}), f_{n}(\\textrm{rabbit})) \\approx 0.34$\n", "* $cos(f_{n}(\\textrm{apple}), f_{n}(\\textrm{orange})) \\approx 0.39$\n", "* $cos(f_{n}(\\textrm{orange}), f_{n}(\\textrm{rabbit})) \\approx 0.20$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Neural Representations Visualised ##" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/word_representations.svg?0.14263726697964574\" width=\"1200\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/word_representations.svg'+'?'+str(random.random()), width=1200)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Dimensionality reduction using [t-SNE](https://lvdmaaten.github.io/tsne/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Neural Representations Visualised (zoomed) ##" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/word_representations_zoom.svg?0.0936390600659478\" width=\"1200\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/word_representations_zoom.svg'+'?'+str(random.random()), width=1200)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Dimensionality reduction using [t-SNE](https://lvdmaaten.github.io/tsne/)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/quiz_time.png?0.8984209371543637\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/quiz_time.png'+'?'+str(random.random()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## [ucph.page.link/wordrep](https://ucph.page.link/wordrep)\n", "([Responses](https://docs.google.com/forms/d/1eJJKaSUc8MfxtAvwbLArg6cHneV4m_WS6xqc1CmNSvY/edit#responses))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Word Representation Algebra ##\n", "\n", "* $f_{n}(\\textrm{king}) - f_{n}(\\textrm{man}) + f_{n}(\\textrm{woman}) \\approx f_{n}(\\textrm{queen})$\n", "* $f_{n}(\\textrm{Paris}) - f_{n}(\\textrm{France}) + f_{n}(\\textrm{Italy}) \\approx f_{n}(\\textrm{Rome})$" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/regularities.png?0.24745425222352702\" width=\"1000\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/regularities.png'+'?'+str(random.random()), width=1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "(It's not really that simple... see [Drozd et al. (2016)](https://aclanthology.org/C16-1332.pdf))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Granularities of representations #\n", "\n", "## Output\n", "* Word representations\n", "* Sentence representations\n", "* Document representations\n", "\n", "## Input\n", "* Words\n", "* Characters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# From Words to Sentences to Documents\n", "\n", "* Standard pooling approaches of word embeddings\n", " * Sum, Mean, Max" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/embedding_sum.jpg?0.3045411286595874\" width=\"500\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/embedding_sum.jpg'+'?'+str(random.random()), width=500)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# From Words to Sentences to Documents\n", "\n", "* Standard pooling approaches of word embeddings\n", " * Sum, Mean, Max\n", " * TF-IDF weighting" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/embedding_weighted.jpg?0.7298754432804081\" width=\"500\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/embedding_weighted.jpg'+'?'+str(random.random()), width=500)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Building Sentence and Document Representations\n", "\n", "* Sentence representations from scratch (e.g. with RNNs in next lecture)\n", "\n", "* Doc2vec" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Doc2vec\n", "\n", "\n", "* Simple extension of word2vec (cbow)\n", "* Paragraph vector" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/doc2vec_0.png?0.18292233610209685\" width=\"1000\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/doc2vec_0.png'+'?'+str(random.random()), width=1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Source: https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Doc2vec\n", "\n", "\n", "* Simple extension of word2vec (cbow)\n", "* Paragraph vector" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "<img src=\"../img/doc2vec_1.png?0.8293094887551172\" width=\"1000\"/>" ], "text/plain": [ "<IPython.core.display.Image object>" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(url='../img/doc2vec_1.png'+'?'+str(random.random()), width=1000)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Source: https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Summary #\n", "\n", "* Moving from features to \"representations\"\n", "* Representations limits what we can learn and our generalisation power\n", "* Many ways to learn representations (there are many more than what we covered)\n", "* Neural representations most widely used\n", "* Different linguistic granularities - words, sentences, documents" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Additional Reading #\n", "\n", "* [\"Word Representations: A Simple and General Method for Semi-Supervised Learning\"](http://www.aclweb.org/anthology/P/P10/P10-1040.pdf) by Turian et al. (2010)\n", "* [\"Representation Learning: A Review and New Perspectives\"](https://arxiv.org/abs/1206.5538) by Bengio et al. (2012)\n", "* [\"Linguistic Regularities in Continuous Space Word Representations\"](http://www.aclweb.org/anthology/N/N13/N13-1090.pdf) by Mikolov et al. (2013a) ([video](http://techtalks.tv/talks/linguistic-regularities-in-continuous-space-word-representations/58471/))\n", "* [\"Distributed Representations of Words and Phrases and their Compositionality\"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality) by Mikolov et al. (2013b)\n", "* [\"word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method\"](https://arxiv.org/abs/1402.3722) by Goldberg and Levy (2014)\n", "* [\"Neural Word Embedding as Implicit Matrix Factorization\"](http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization) by Levy and Goldberg (2014)\n", "* [Demystifying Neural Network in Skip-Gram Language Modeling](https://aegis4048.github.io/demystifying_neural_network_in_skip_gram_language_modeling)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Questions" ] } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 1 }