{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2016-12-02T17:30:29.181539",
     "start_time": "2016-12-02T17:30:29.172204"
    },
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'theme': 'white',\n",
       " 'transition': 'none',\n",
       " 'controls': 'false',\n",
       " 'progress': 'true'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reveal.js\n",
    "from notebook.services.config import ConfigManager\n",
    "cm = ConfigManager()\n",
    "cm.update('livereveal', {\n",
    "        'theme': 'white',\n",
    "        'transition': 'none',\n",
    "        'controls': 'false',\n",
    "        'progress': 'true',\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import random\n",
    "from IPython.display import Image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Introduction to Representation Learning\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2016-12-02T16:54:38.546624",
     "start_time": "2016-12-02T16:54:38.540051"
    },
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Representations of Words\n",
    "    * Motivation\n",
    "    * Sparse Binary Representations\n",
    "    * Dense Continuous Representations\n",
    "* Unsupervised Learning of Word Representations\n",
    "    * Motivation\n",
    "    * Sparse Co-occurence Representations\n",
    "    * Neural Word Representations\n",
    "* From Word Representations to Sentences and Documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hide_input": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "# Feed-forward Neural Networks\n",
    "<center><img src=\"../img/mlp.svg\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Word representations ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/word_representations.svg?0.09272290851904197\" width=\"1000\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/word_representations.svg'+'?'+str(random.random()), width=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Why talk about representations? ##\n",
    "\n",
    "* Machine Learning, features are representations of the input data\n",
    "    * Language is special\n",
    "* Better representations, better performance\n",
    "* Representation learning (neural networks, \"deep learning\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## What makes a good representation? ##\n",
    "\n",
    "1. Representations are **distinct**\n",
    "2. **Similar** words have **similar** representations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Formal Task ##\n",
    "\n",
    "* Words: $w$\n",
    "* Vocabulary: $\\mathbb{V} (\\forall_{i} w_{i} \\in \\mathbb{V})$\n",
    "* Find representation function: $f(w_{i}) = r_{i}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Sparse Binary Representations ##\n",
    "\n",
    "* Map words to unique positive non-zero integers\n",
    "    * $f_{id}(w) \\mapsto \\mathbb{N^{*}}$\n",
    "    * $g(w, i) = {\\left\\lbrace\n",
    "    \\begin{array}{ll}\n",
    "        1 & \\textrm{if }~i = f_{id}(w) \\\\\n",
    "        0 & \\textrm{otherwise} \\\\\n",
    "    \\end{array}\\right.}$\n",
    "* Word representations as \"one-hot\" vectors\n",
    "    * $f_{sb}(w) = (g(w, 1), \\ldots, g(w, |V|))$\n",
    "    * $f_{sb}(w) \\mapsto \\{0,1\\}^{|V|}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example ###\n",
    "\n",
    "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n",
    "* $f_{id}(\\textrm{apple}) = 1, \\ldots$, $f_{id}(\\textrm{rabbit}) = 3$\n",
    "* $f_{sb}(\\textrm{apple}) = (1, 0, 0)$\n",
    "* $f_{sb}(\\textrm{orange}) = (0, 1, 0)$\n",
    "* $f_{sb}(\\textrm{rabbit}) = (0, 0, 1)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sparse Binary Visualised ##\n",
    "\n",
    "![Sparse binary representations visualised](../img/sparse_binary.svg)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Cosine Similarity ##\n",
    "\n",
    "* $cos(u, v) = \\frac{u \\cdot v}{||u|| \\cdot ||v||}$\n",
    "* $cos(u, v) \\mapsto [-1, 1]$\n",
    "* $cos(u, v) = 1$; identical\n",
    "* $cos(u, v) = -1$; opposites\n",
    "* $cos(u, v) = 0$; orthogonal"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "Note the different formulation in SciPy:\n",
    "$$cos(u, v) = 1 - \\frac{u \\cdot v}{||u|| \\cdot||v||}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Cosine Similarity Visualised ##\n",
    "\n",
    "<center><img src=\"http://blog.christianperone.com/wp-content/uploads/2013/09/cosinesimilarityfq1.png\" width=\"110%\"></center>\n",
    "\n",
    "http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/quiz_time.png?0.06805595211107918\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/quiz_time.png'+'?'+str(random.random()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## [ucph.page.link/cos](https://ucph.page.link/cos)\n",
    "([Responses](https://docs.google.com/forms/d/1mbVCSbDXpA9qY5DkfR-fnX3c3MFXPiLAqeTrd2P0U90/edit#responses))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "### Solution\n",
    "\n",
    "$$\n",
    "cos(f_sb(apple), f_sb(orange) = 0 \\\\\n",
    "cos(f_sb(apple), f_sb(rabbit) = 0 \\\\\n",
    "cos(f_sb(orange), f_sb(rabbit) = 0\n",
    "$$\n",
    "\n",
    "* Apple and orange are more similar, because they are fruit, but the distance metric does not distinguish between them and rabbit. An apple is more like an orange than a rabbit according to human intuition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Dense Continuous Representations ##\n",
    "\n",
    "* $f_{id}(w) \\mapsto \\mathbb{N}^{*}$\n",
    "* \"Embed\" words as matrix rows\n",
    "* Dimensionality: $d$ (hyperparameter)\n",
    "* $W \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$\n",
    "* $f_{dc}(w) = W_{f_{id}(w), :}$\n",
    "* $f_{dc}(w) \\mapsto \\mathbb{R}^{d}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example ###\n",
    "\n",
    "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n",
    "* $d = 2$\n",
    "* $W \\in \\mathbb{R}^{3 \\times 2}$\n",
    "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n",
    "* $f_{dc}(\\textrm{apple}) = (1.0, 1.0)$\n",
    "* $f_{dc}(\\textrm{orange}) = (0.9, 1.0)$\n",
    "* $f_{dc}(\\textrm{rabbit}) = (0.1, 0.5)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Dense Continuous Visualised ##\n",
    "\n",
    "<center><img src=\"../img/dense_continuous.svg\" width=\"80%\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Dense Continuous Similarities ##\n",
    "\n",
    "* $cos(f_{dc}(\\textrm{apple}),f_{dc}(\\textrm{rabbit})) \\approx 0.83$\n",
    "* $cos(f_{dc}(\\textrm{apple}),f_{dc}(\\textrm{orange})) \\approx 1.0$\n",
    "* $cos(f_{dc}(\\textrm{orange}),f_{dc}(\\textrm{rabbit})) \\approx 0.86$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Unsupervised Learning of Word Representations #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Why not supervised? ##\n",
    "\n",
    "<center><img src=\"../img/annotated_vs_unannotated_data.svg\" width=\"40%\"></center>\n",
    "\n",
    "\n",
    "* Current gains from large data sets / more compute\n",
    "    * Lack of comparison"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Linguistic Inspirations ##\n",
    "\n",
    "* \"Oculist and eye-doctor … occur in almost the same environments. … If $A$ and $B$ have almost identical environments we say that they are synonyms.\" – Zellig Harris (1954)\n",
    "* \"You shall know a word by the company it keeps.\" – John Rupert Firth (1957)\n",
    "* Akin to \"meaning is use\" – Wittgenstein (1953)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sparse Co-occurence Representations ##"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Co-occurences ##\n",
    "\n",
    "* Collected from a large collection of *raw* text\n",
    "* E.g. Wikipedia, crawled news data, tweets, ...\n",
    "\n",
    "1. \"…comparing an **apple** to an **orange**…\"\n",
    "2. \"…an **apple** and **orange** from Florida…\"\n",
    "3. \"…my **rabbit** is not shaped like an **orange**…\" (yes, there is always **noise** in the data)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sparse Co-occurence Representations ##\n",
    "\n",
    "* The number of times words co-occur in a text collection\n",
    "* $C \\in \\mathbb{N}^{|V| \\times |V|}$\n",
    "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n",
    "* $C = \\begin{pmatrix}\n",
    "        2 & 2 & 0 \\\\\n",
    "        2 & 3 & 1 \\\\\n",
    "        0 & 1 & 1 \\\\\n",
    "    \\end{pmatrix}$\n",
    "* $f_{cs}(w) = C_{f_{id}(w), :}$\n",
    "* $f_{cs}(w) \\mapsto \\mathbb{N}^{|V|}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example ###\n",
    "\n",
    "* $\\mathbb{V} = \\{\\textrm{apple}, \\textrm{orange}, \\textrm{rabbit}\\}$\n",
    "* $f_{id}(\\textrm{apple}) = 1, \\ldots, f_{id}(\\textrm{rabbit}) = 3$\n",
    "* $f_{cs}(\\textrm{apple}) = (2, 2, 0)$\n",
    "* $f_{cs}(\\textrm{orange}) = (2, 3, 1)$\n",
    "* $f_{cs}(\\textrm{rabbit}) = (0, 1, 1)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sparse Co-occurence Similarities ##\n",
    "\n",
    "* $cos(f_{cs}(\\textrm{apple}), f_{cs}(\\textrm{rabbit})) \\approx 0.50$\n",
    "* $cos(f_{cs}(\\textrm{apple}), f_{cs}(\\textrm{orange})) \\approx 0.94$\n",
    "* $cos(f_{cs}(\\textrm{orange}), f_{cs}(\\textrm{rabbit})) \\approx 0.76$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Neural Word Representations #"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Learning by Slot Filling ##\n",
    "\n",
    "* \"…I had some **_____** for breakfast today…\"\n",
    "* Good: *cereal*\n",
    "* Bad: *airplanes*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center><img width=1000 src=\"../img/cbow_sg2.png\"></center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (word2vec: <a href=\"https://arxiv.org/abs/1301.3781\">Mikolov et al., 2013</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Creating Positive Training Instances ##\n",
    "\n",
    "\n",
    "\"I had some cereal for breakfast today\"\n",
    "\n",
    "* **I** had some cereal for breakfast today -> (**I**, had); (**I**, some)\n",
    "* I **had** some cereal for breakfast today -> (**had**, I); (**had**, some); (**had**, cereal)\n",
    "* I had **some** cereal for breakfast today -> (**some**, I); (**some**, had); (**some**, cereal); (**some**, for)\n",
    "* I had some **cereal** for breakfast today -> (**cereal**, had); (**cereal**, some); (**cereal**, for); (**cereal**, breakfast)\n",
    "* ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Training instance: target word $w \\in \\mathbb{V}$; context word $c \\in \\mathbb{V}$\n",
    "\n",
    "$D = ((c, w),\\ldots)$; observed co-occurences\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sampling Negative Training Instances ##\n",
    "\n",
    "\n",
    "\"I had some cereal for breakfast today\"\n",
    "\n",
    "* **Lecture** had some cereal for breakfast today -> (**Lecture**, had); (**Lecture**, some)\n",
    "* I **computer** some cereal for breakfast today -> (**computer**, I); (**computer**, some); (**computer**, cereal)\n",
    "* I had **word** cereal for breakfast today -> (**word**, I); (**word**, had); (**word**, cereal); (**word**, for)\n",
    "* I had some **books** for breakfast today -> (**books**, had); (**books**, some); (**books**, for); (**books**, breakfast)\n",
    "* ..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\n",
    "Create $D' = ((c, w),\\ldots)$; \"noise samples\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Unsupervised Loss Function ##\n",
    "\n",
    "* $w \\in \\mathbb{V}$; $c \\in \\mathbb{V}$\n",
    "* $D = ((c, w),\\ldots)$; observed co-occurences\n",
    "* $D' = ((c, w),\\ldots)$; \"noise samples\"\n",
    "* $\\textrm{max}~p((c, w) \\in D | W) - p((c, w) \\in D' | W)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Idea: maximise difference between the score for true instances and negative samples\n",
    "* How do we get $p(c, w)$?\n",
    "   * using a neural network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Neural Skip-Gram Model ##\n",
    "\n",
    "<center><img width=650 src=\"../img/skip-gram_architecture.png\"></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* $W^{w} \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$; $W^{c} \\in \\mathbb{R}^{|\\mathbb{V}| \\times d}$\n",
    "* $D = ((c, w),\\ldots)$; $D' = ((c, w),\\ldots)$\n",
    "* $\\sigma(x) = \\frac{1}{1 + \\textrm{exp}(-x)}$\n",
    "* $p((c, w) \\in D | W^{w}, W^{c}) = \\sigma(W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:})$\n",
    "* $\\arg\\max\\limits_{W^{w},W^{c}} \\sum\\limits_{(w,c) \\in D} \\log \\sigma(W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:})  + \\sum\\limits_{(w,c) \\in D'} \\log \\sigma(-W^{c}_{f_{id}(c),:} \\cdot W^{w}_{f_{id}(w),:})$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Neural Representation ##\n",
    "\n",
    "* Learned using [word2vec](https://code.google.com/p/word2vec/)\n",
    "* Google News data, $~1,000,000,000$ words\n",
    "* $|\\mathbb{V}| = 3,000,000$\n",
    "* $d = 300$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Neural Representation Example ##\n",
    "\n",
    "* $f_{n}(\\textrm{apple}) = (-0.06, -0.16, \\ldots, 0.34)$\n",
    "* $f_{n}(\\textrm{orange}) = (-0.10, -0.18, \\ldots, 0.08)$\n",
    "* $f_{n}(\\textrm{rabbit}) = (0.02, 0.11, \\ldots, 0.11)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## Neural Representation Similarities ##\n",
    "\n",
    "* $cos(f_{n}(\\textrm{apple}), f_{n}(\\textrm{rabbit})) \\approx 0.34$\n",
    "* $cos(f_{n}(\\textrm{apple}), f_{n}(\\textrm{orange})) \\approx 0.39$\n",
    "* $cos(f_{n}(\\textrm{orange}), f_{n}(\\textrm{rabbit})) \\approx 0.20$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Neural Representations Visualised ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/word_representations.svg?0.14263726697964574\" width=\"1200\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/word_representations.svg'+'?'+str(random.random()), width=1200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Dimensionality reduction using [t-SNE](https://lvdmaaten.github.io/tsne/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Neural Representations Visualised (zoomed) ##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/word_representations_zoom.svg?0.0936390600659478\" width=\"1200\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/word_representations_zoom.svg'+'?'+str(random.random()), width=1200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Dimensionality reduction using [t-SNE](https://lvdmaaten.github.io/tsne/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/quiz_time.png?0.8984209371543637\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/quiz_time.png'+'?'+str(random.random()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## [ucph.page.link/wordrep](https://ucph.page.link/wordrep)\n",
    "([Responses](https://docs.google.com/forms/d/1eJJKaSUc8MfxtAvwbLArg6cHneV4m_WS6xqc1CmNSvY/edit#responses))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Word Representation Algebra ##\n",
    "\n",
    "* $f_{n}(\\textrm{king}) - f_{n}(\\textrm{man}) + f_{n}(\\textrm{woman}) \\approx f_{n}(\\textrm{queen})$\n",
    "* $f_{n}(\\textrm{Paris}) - f_{n}(\\textrm{France}) + f_{n}(\\textrm{Italy}) \\approx f_{n}(\\textrm{Rome})$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/regularities.png?0.24745425222352702\" width=\"1000\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/regularities.png'+'?'+str(random.random()), width=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "(It's not really that simple... see [Drozd et al. (2016)](https://aclanthology.org/C16-1332.pdf))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Granularities of representations #\n",
    "\n",
    "## Output\n",
    "* Word representations\n",
    "* Sentence representations\n",
    "* Document representations\n",
    "\n",
    "## Input\n",
    "* Words\n",
    "* Characters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# From Words to Sentences to Documents\n",
    "\n",
    "* Standard pooling approaches of word embeddings\n",
    "    * Sum, Mean, Max"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/embedding_sum.jpg?0.3045411286595874\" width=\"500\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/embedding_sum.jpg'+'?'+str(random.random()), width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# From Words to Sentences to Documents\n",
    "\n",
    "* Standard pooling approaches of word embeddings\n",
    "    * Sum, Mean, Max\n",
    "    * TF-IDF weighting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/embedding_weighted.jpg?0.7298754432804081\" width=\"500\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/embedding_weighted.jpg'+'?'+str(random.random()), width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Building Sentence and Document Representations\n",
    "\n",
    "* Sentence representations from scratch (e.g. with RNNs in next lecture)\n",
    "\n",
    "* Doc2vec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Doc2vec\n",
    "\n",
    "\n",
    "* Simple extension of word2vec (cbow)\n",
    "* Paragraph vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/doc2vec_0.png?0.18292233610209685\" width=\"1000\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/doc2vec_0.png'+'?'+str(random.random()), width=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Source: https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Doc2vec\n",
    "\n",
    "\n",
    "* Simple extension of word2vec (cbow)\n",
    "* Paragraph vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"../img/doc2vec_1.png?0.8293094887551172\" width=\"1000\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Image(url='../img/doc2vec_1.png'+'?'+str(random.random()), width=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Source: https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Summary #\n",
    "\n",
    "* Moving from features to \"representations\"\n",
    "* Representations limits what we can learn and our generalisation power\n",
    "* Many ways to learn representations (there are many more than what we covered)\n",
    "* Neural representations most widely used\n",
    "* Different linguistic granularities - words, sentences, documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Additional Reading #\n",
    "\n",
    "* [\"Word Representations: A Simple and General Method for Semi-Supervised Learning\"](http://www.aclweb.org/anthology/P/P10/P10-1040.pdf) by Turian et al. (2010)\n",
    "* [\"Representation Learning: A Review and New Perspectives\"](https://arxiv.org/abs/1206.5538) by Bengio et al. (2012)\n",
    "* [\"Linguistic Regularities in Continuous Space Word Representations\"](http://www.aclweb.org/anthology/N/N13/N13-1090.pdf) by Mikolov et al. (2013a) ([video](http://techtalks.tv/talks/linguistic-regularities-in-continuous-space-word-representations/58471/))\n",
    "* [\"Distributed Representations of Words and Phrases and their Compositionality\"](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality) by Mikolov et al. (2013b)\n",
    "* [\"word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method\"](https://arxiv.org/abs/1402.3722) by Goldberg and Levy (2014)\n",
    "* [\"Neural Word Embedding as Implicit Matrix Factorization\"](http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization) by Levy and Goldberg (2014)\n",
    "* [Demystifying Neural Network in Skip-Gram Language Modeling](https://aegis4048.github.io/demystifying_neural_network_in_skip_gram_language_modeling)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Questions"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}