{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
       "<style>\n",
       ".rendered_html td {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       ".rendered_html th {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
    "<style>\n",
    ".rendered_html td {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    ".rendered_html th {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "import sys\n",
    "sys.path.append(\"..\")\n",
    "import statnlpbook.util as util\n",
    "import matplotlib\n",
    "matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "<!---\n",
    "Latex Macros\n",
    "-->\n",
    "$$\n",
    "\\newcommand{\\Xs}{\\mathcal{X}}\n",
    "\\newcommand{\\Ys}{\\mathcal{Y}}\n",
    "\\newcommand{\\y}{\\mathbf{y}}\n",
    "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n",
    "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n",
    "\\newcommand{\\aligns}{\\mathbf{a}}\n",
    "\\newcommand{\\align}{a}\n",
    "\\newcommand{\\source}{\\mathbf{s}}\n",
    "\\newcommand{\\target}{\\mathbf{t}}\n",
    "\\newcommand{\\ssource}{s}\n",
    "\\newcommand{\\starget}{t}\n",
    "\\newcommand{\\repr}{\\mathbf{f}}\n",
    "\\newcommand{\\repry}{\\mathbf{g}}\n",
    "\\newcommand{\\x}{\\mathbf{x}}\n",
    "\\newcommand{\\prob}{p}\n",
    "\\newcommand{\\bar}{\\,|\\,}\n",
    "\\newcommand{\\vocab}{V}\n",
    "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n",
    "\\newcommand{\\param}{\\theta}\n",
    "\\DeclareMathOperator{\\perplexity}{PP}\n",
    "\\DeclareMathOperator{\\argmax}{argmax}\n",
    "\\DeclareMathOperator{\\argmin}{argmin}\n",
    "\\newcommand{\\train}{\\mathcal{D}}\n",
    "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n",
    "\\newcommand{\\length}[1]{\\text{length}(#1) }\n",
    "\\newcommand{\\indi}{\\mathbb{I}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The tikzmagic extension is already loaded. To reload it, use:\n",
      "  %reload_ext tikzmagic\n"
     ]
    }
   ],
   "source": [
    "%load_ext tikzmagic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Please complete the course evaluation\n",
    "\n",
    "### [evaluering.ku.dk](https://evaluering.ku.dk)\n",
    "\n",
    "Deadline: November 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# What languages do you speak?\n",
    "\n",
    "### [ucph.page.link/lang](https://ucph.page.link/lang)\n",
    "\n",
    "([Responses](https://www.mentimeter.com/s/389360b38fa508b4ffd4e40bf47003e4/f43e4a2212e8/edit))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Machine Translation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "+ Challenges\n",
    "+ History\n",
    "+ Statistical MT\n",
    "+ Neural MT\n",
    "+ Evaluation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Languages are hard (even for humans)!\n",
    "\n",
    "<center style=\"padding-top:3em;\">\n",
    "  <img src=\"mt_figures/whatever.jpg\" />\n",
    "    \n",
    "  <span style=\"font-size:50%;\">(Source: <a href=\"https://www.flickr.com/photos/98991392@N00/8729849093/sizes/z/in/pool-47169589@N00/\">Flickr</a>)</span>\n",
    "</center>\n",
    "\n",
    "[随便](https://translate.google.com/#view=home&op=translate&sl=zh-CN&tl=en&text=%E9%9A%8F%E4%BE%BF)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Automatic machine translation is hard!\n",
    "\n",
    "<center style=\"padding-top:3em;\">\n",
    "<img src=\"../chapters/mt_figures/avocado.png\" width=\"100%\"/>\n",
    "</center>\n",
    "\n",
    "[J'ai besoin d'un avocat pour mon procés de guacamole.](https://translate.google.com/?sl=fr&tl=en&text=J%27ai%20besoin%20d%27un%20avocat%20pour%20mon%20proc%C3%A9s%20de%20guacamole.&op=translate)\n",
    "\n",
    "[guacamole lawsuit](https://www.latimes.com/archives/la-xpm-2006-dec-10-fi-letters10.2-story.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Many things could go wrong."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<center><img src=\"../img/quiz_time.png\"></center>\n",
    "\n",
    "### [ucph.page.link/mt](https://ucph.page.link/mt)\n",
    "\n",
    "([Responses](https://docs.google.com/forms/d/10UUpQqlduvYuz7-SXSvzMRDAs5qxdliKJmgnkWiHNK8/edit#responses), [Solution](https://docs.google.com/document/d/1l_NQNz6ME1mm2ra5nKMYErwI98d3TJ3aC_bXct0PNhg/edit?usp=sharing))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Challenges\n",
    "\n",
    "Divergences between languages include:\n",
    "\n",
    "* Word order\n",
    "* Grammatical (morphological) marking (e.g., gender)\n",
    "* Division of concept space (e.g., English \"wall\" vs. German \"Mauer\"/\"Wand\")\n",
    "\n",
    "Addressing them requires resolving ambiguities:\n",
    "\n",
    "* Word sense (e.g., \"bass\")\n",
    "* Attributes with grammatical marking (e.g., formality in Japanese)\n",
    "* Reference in pro-drop contexts (e.g., in Mandarin Chinese)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## History\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/l6Av1jJRcgtfL5n9vEvrspxpEJJalwHAusRx\" width=\"100%\" />\n",
    "</center>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Direct Machine Translation\n",
    "\n",
    "Just use a **dictionary** and translate word-by-word.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/gfF1OTGY1E6QQBDW6l6kh1Daa4iingQmt86V\" width=\"100%\" />\n",
    "</center>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Transfer-based Machine Translation\n",
    "\n",
    "Add **rules** to inflect/join/split words based on source and target syntax.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/AfcjFhnoMFmb8koHf42YQeE6WfKdFNjqpkGL\" width=\"100%\" />\n",
    "</center>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Interlingua\n",
    "\n",
    "The ideal: transform to and from a language-independent representation.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/vEPJYMmjDV0LLXIy07LksIiOsecXdlHcs8nI\" width=\"80%\" />\n",
    "</center>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)\n",
    "\n",
    "Too hard to achieve with rules!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Example-based Machine Translation (EBMT)\n",
    "\n",
    "Retrieve a similar example from a translation database, and make adjustments as necessary.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/H-VtzpHN02SMIhwYjqmn04Uyd-nGWLwLmBwW\" width=\"100%\" />\n",
    "</center>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Statistical Machine Translation (SMT)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## IBM Translation Models\n",
    "\n",
    "In the late 80s and early 90s, IBM researchers revolutionised MT using statistical approaches instead of rules.\n",
    "\n",
    "<img src=\"https://cdn-media-1.freecodecamp.org/images/1MJctJSHzUaYpUNIhvQdQtz4RKFq06nN7FJ9\"/>\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### IBM Model 1\n",
    "\n",
    "Simple word-by-word translation, but with **statistical** dictionaries.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/4dBTKxFLuXkmeALMuxILitzmBd0zVOQVrhnP\" width=\"100%\" />\n",
    "</center>\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/uF96He1UZaLMkC1TEuBGzoHAhw3PAvF8mgam\" width=\"60%\" />\n",
    "</center>\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### IBM Model 2\n",
    "Statistical translation and **reordering** model.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/AWurqrK2Zag9dIZSgYVGZCFxklsrZot7WLZ2\" width=\"90%\" />\n",
    "</center>\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### IBM Model 3\n",
    "Allows inserting new words.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/aPxpW2ssFd2wDio9C51Zbb0sIdXLBAV8DoYP\" width=\"70%\" />\n",
    "</center>\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## MT as Structured Prediction\n",
    "\n",
    "**Model** \\\\(p(\\target,\\source)\\\\): how likely the target \\\\(\\target\\\\) is to be a translation of source $\\source$.\n",
    "\n",
    "$$p(\\textrm{I like music}, \\textrm{音楽 が 好き}) \\gg p(\\textrm{I like persimmons}, \\textrm{音楽 が 好き})$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* How is the scoring function defined (**modeling**)?\n",
    "$$\\prob(\\target,\\source)\\approx s_\\params(\\target,\\source)$$\n",
    "\n",
    "* How are the parameters \\\\(\\params\\\\) learned (**training**)?\n",
    "\n",
    "$$\n",
    "\\argmax_\\params \\prod_{(\\target,\\source) \\in \\train} s_\\params(\\target, \\source)\n",
    "$$\n",
    "\n",
    "* How is translation \\\\(\\argmax\\\\) found (**decoding**)?\n",
    "\n",
    "$$\\argmax_\\target s_\\params(\\target,\\source)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Training\n",
    "Learn the parameters \\\\(\\params\\\\) from data.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/5qBpooShbY6xVngSotA6KINHyHP7NKeTJryb\" width=\"100%\" />\n",
    "</center>\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Generative Models \n",
    "Estimate $\\prob(\\target,\\source)$: how is the $(\\target,\\source)$ data **generated**?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Noisy Channel\n",
    "\n",
    "* Imagine a message $\\target$ is sent through a noisy channel and $\\source$ is received at the end.\n",
    "* Can we recover what was $\\target$?\n",
    "* **Language model** $\\prob(\\target)$: does the target $\\target$ look like real language?\n",
    "* **Translation model**: $\\prob(\\source|\\target)$: does the source $\\source$ match the target $\\target$?\n",
    "\n",
    "This defines a **joint** distribution\n",
    "\n",
    "$$\\prob(\\target,\\source) = \\prob(\\target) \\prob(\\source|\\target)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Word-based SMT\n",
    "\n",
    "Decompose source and target to words, with statistical **alignment**.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/jG95Sgc2W4VJbwi4LFlJeMHnjLZbdGydCCzI\" width=\"100%\" />\n",
    "</center>\n",
    "\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Phrase-based SMT\n",
    "\n",
    "Decompose source and target to **phrases** and look them up in phrase tables.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/lGJNqYGZOJMjs23F8-xf6E4buXptvm2IBzjg\" width=\"100%\" />\n",
    "</center>\n",
    "\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Neural Machine Translation (NMT)\n",
    "\n",
    "Model $s_\\params(\\target,\\source)$ directly using a neural network.\n",
    "\n",
    "<center>\n",
    "   <img src=\"https://cdn-media-1.freecodecamp.org/images/DD6GvRmtxZGhC9toN1CHXUBWLhUXLpJSiJF5\" width=\"100%\" />\n",
    "</center>\n",
    "\n",
    "\n",
    "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Sequence-to-sequence models (seq2seq)\n",
    "\n",
    "Encoder–decoder architecture first \"read\" the input sequence and then generate an output sequence\n",
    "([Sutskever et al., 2014](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf), [Cho et al., 2014](https://arxiv.org/abs/1406.1078)).\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec.svg\" width=\"70%\" />\n",
    "</center>\n",
    "\n",
    "(Examples are Basque–English)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### We can use RNNs for that!\n",
    "\n",
    "Example architecture:\n",
    "* Encoder: word embedding layer + Bi-LSTM to capture contextual information\n",
    "* Decoder: Uni-directional LSTM (because we need to *decode* word by word) + softmax layer on top\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec_rnn1.svg\" width=\"70%\" />\n",
    "</center>\n",
    "\n",
    "The end-of-sequence symbol `</S>` is necessary to know when to start and stop generating."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### But something's missing (again)...\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec_rnn1.svg\" width=\"70%\" />\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Output words **depend on each other**!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Autoregressive MT\n",
    "\n",
    "At each step, feed the predicted word to the decoder as input for predicting the next word.\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec_rnn2.svg\" width=\"70%\" />\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Training\n",
    "\n",
    "+ Loss function: negative log-likelihood\n",
    "+ **Teacher forcing:**  always feed the ground truth into the decoder.\n",
    "\n",
    "*Alternative:*\n",
    "\n",
    "+ **Scheduled sampling:** with a certain probability, use model predictions instead.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Decoding\n",
    "\n",
    "+ Greedy decoding:\n",
    "    + Always pick the **most likely word** (according to the softmax)\n",
    "    + Continue generating more words **until the `</S>` symbol is predicted**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "+ Beam search:\n",
    "    * In each step chooses **best next source word to translate**\n",
    "    * **Append a target word** based on source word\n",
    "    * Maintains a list of top-$k$ hypotheses in a **beam**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## More problems with our approach\n",
    "\n",
    "&nbsp;\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec_rnn3.svg\" width=\"70%\" />\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Word alignment\n",
    "\n",
    "In rule-based and statistical MT, word alignments are crucial.\n",
    "\n",
    "<center style=\"padding: 1em 0;\">\n",
    "    <img src=\"mt_figures/align.svg\" width=\"20%\" />\n",
    "</center>\n",
    "\n",
    "Can we use them in neural MT?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Attention mechanism\n",
    "\n",
    "Jointly learning to align and to translate.\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/encdec_att.svg\" width=\"40%\" />\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Attention matrix\n",
    "\n",
    "<center>\n",
    "    <img src=\"mt_figures/att_matrix.png\" width=\"40%\" />\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/abs/1409.0473\">Bahdanau et al., 2014</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Transformers\n",
    "\n",
    "Replace LSTMs by self-attention. Attend to encoded input *and* to partial output (autoregressive).\n",
    "\n",
    "<center>\n",
    "    <img src=\"http://jalammar.github.io/images/xlnet/transformer-encoder-decoder.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"http://jalammar.github.io/illustrated-gpt2/\">The Illustrated GPT-2</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# MT evaluation\n",
    "\n",
    "We're training the model with *negative log-likelihood*, but that's not the best way to *evaluate* it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Consider:\n",
    "    \n",
    "+ After lunch, he went to the gym.\n",
    "+ After he had lunch, he went to the gym.\n",
    "+ He went to the gym after lunch.\n",
    "+ He went to the gym after lunchtime.\n",
    "\n",
    "In machine translation, there are often **several acceptable variations!**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Human evaluation\n",
    "\n",
    "* **Faithfulness** (or meaning preservation) to evaluate the \"translation model\"\n",
    "* **Fluency** to evaluate the \"target language model\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "In general, manual evaluation is the best way, but it is not scalable. **Automatic** metrics are therefore often used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## BLEU score\n",
    "\n",
    "A widely used *reference-based* metric ([Papineni et al., 2002](https://aclanthology.org/P02-1040/)):\n",
    "\n",
    "+ Compare the prediction to one or more reference translations.\n",
    "+ Count the number of matching $n$-grams between them.\n",
    "  - It is common to consider $1 \\le n \\le 4$\n",
    "+ Divide by total number of $n$-grams.\n",
    "\n",
    "The BLEU score will range between 0 (*no match at all*) and 1.0 (*perfect match*, 100%)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Recommended library: [sacreBLEU](https://github.com/mjpost/sacrebleu)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### BLEU score examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting sacrebleu\n",
      "  Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)\n",
      "\u001b[K     |████████████████████████████████| 118 kB 4.7 MB/s eta 0:00:01\n",
      "\u001b[?25hCollecting tabulate>=0.8.9\n",
      "  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)\n",
      "Requirement already satisfied: regex in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (2020.7.14)\n",
      "Requirement already satisfied: numpy>=1.17 in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (1.19.1)\n",
      "Requirement already satisfied: colorama in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (0.4.5)\n",
      "Collecting portalocker\n",
      "  Downloading portalocker-2.6.0-py2.py3-none-any.whl (15 kB)\n",
      "Collecting lxml\n",
      "  Downloading lxml-4.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.9 MB)\n",
      "\u001b[K     |████████████████████████████████| 6.9 MB 31.1 MB/s eta 0:00:01\n",
      "\u001b[?25hInstalling collected packages: tabulate, portalocker, lxml, sacrebleu\n",
      "Successfully installed lxml-4.9.1 portalocker-2.6.0 sacrebleu-2.3.1 tabulate-0.9.0\n"
     ]
    }
   ],
   "source": [
    "!pip install sacrebleu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100.00000000000004"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sacrebleu.metrics import BLEU\n",
    "bleu = BLEU()\n",
    "\n",
    "refs = [[\"After lunch, he went to the gym.\",\n",
    "        \"He went to the gym after lunch.\"]]\n",
    "bleu.corpus_score([\"After lunch, he went to the gym.\"], refs).score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30.509752160562883"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bleu.corpus_score([\"Turtles are great animals to the gym.\"], refs).score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "69.89307622784945"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bleu.corpus_score([\"After he had lunch, he went to the gym.\"], refs).score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "86.33400213704509"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bleu.corpus_score([\"Before lunch, he went to the gym.\"], refs).score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "+ BLEU is very simplistic\n",
    "+ Many alternatives have been proposed, including chrF ([Popović, 2015](https://aclanthology.org/W15-3049/)) and BERTScore ([Zhang et al., 2020](https://arxiv.org/abs/1904.09675))\n",
    "+ ...but BLEU still remains very popular"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Improving efficiency and quality in MT\n",
    "\n",
    "+ More data\n",
    "+ Bigger models\n",
    "+ Better neural network architectures\n",
    "+ Semi-supervised learning\n",
    "+ **Transfer learning**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Further reading\n",
    "\n",
    "* Non-neural machine translation:\n",
    "  + Ilya Pestov's article [A history of machine translation from the Cold War to deep learning](https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/)\n",
    "  + [Slides on SMT from this repo](word_mt_slides.ipynb)\n",
    "  + Mike Collins's [Lecture notes on IBM Model 1 and 2](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf)\n",
    "\n",
    "* Sequence-to-sequence models:\n",
    "  + Graham Neubig, [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/abs/1703.01619)\n",
    "\n",
    "* And beyond...\n",
    "  + Philipp Koehn, [Neural Machine Translation, §13.6–13.8](https://arxiv.org/abs/1709.07809) gives a great overview of further refinements and challenges"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}