{ "cells": [ { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n", "<style>\n", ".rendered_html td {\n", " font-size: xx-large;\n", " text-align: left; !important\n", "}\n", ".rendered_html th {\n", " font-size: xx-large;\n", " text-align: left; !important\n", "}\n", "</style>\n" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "<script>\n", " function code_toggle() {\n", " if (code_shown){\n", " $('div.input').hide('500');\n", " $('#toggleButton').val('Show Code')\n", " } else {\n", " $('div.input').show('500');\n", " $('#toggleButton').val('Hide Code')\n", " }\n", " code_shown = !code_shown\n", " }\n", "\n", " $( document ).ready(function(){\n", " code_shown=false;\n", " $('div.input').hide()\n", " });\n", "</script>\n", "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n", "<style>\n", ".rendered_html td {\n", " font-size: xx-large;\n", " text-align: left; !important\n", "}\n", ".rendered_html th {\n", " font-size: xx-large;\n", " text-align: left; !important\n", "}\n", "</style>" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%%capture\n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "import matplotlib\n", "matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "<!---\n", "Latex Macros\n", "-->\n", "$$\n", "\\newcommand{\\Xs}{\\mathcal{X}}\n", "\\newcommand{\\Ys}{\\mathcal{Y}}\n", "\\newcommand{\\y}{\\mathbf{y}}\n", "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\newcommand{\\aligns}{\\mathbf{a}}\n", "\\newcommand{\\align}{a}\n", "\\newcommand{\\source}{\\mathbf{s}}\n", "\\newcommand{\\target}{\\mathbf{t}}\n", "\\newcommand{\\ssource}{s}\n", "\\newcommand{\\starget}{t}\n", "\\newcommand{\\repr}{\\mathbf{f}}\n", "\\newcommand{\\repry}{\\mathbf{g}}\n", "\\newcommand{\\x}{\\mathbf{x}}\n", "\\newcommand{\\prob}{p}\n", "\\newcommand{\\bar}{\\,|\\,}\n", "\\newcommand{\\vocab}{V}\n", "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n", "\\newcommand{\\param}{\\theta}\n", "\\DeclareMathOperator{\\perplexity}{PP}\n", "\\DeclareMathOperator{\\argmax}{argmax}\n", "\\DeclareMathOperator{\\argmin}{argmin}\n", "\\newcommand{\\train}{\\mathcal{D}}\n", "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n", "\\newcommand{\\length}[1]{\\text{length}(#1) }\n", "\\newcommand{\\indi}{\\mathbb{I}}\n", "$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The tikzmagic extension is already loaded. To reload it, use:\n", " %reload_ext tikzmagic\n" ] } ], "source": [ "%load_ext tikzmagic" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Please complete the course evaluation\n", "\n", "### [evaluering.ku.dk](https://evaluering.ku.dk)\n", "\n", "Deadline: November 6" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What languages do you speak?\n", "\n", "### [ucph.page.link/lang](https://ucph.page.link/lang)\n", "\n", "([Responses](https://www.mentimeter.com/s/389360b38fa508b4ffd4e40bf47003e4/f43e4a2212e8/edit))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Machine Translation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "+ Challenges\n", "+ History\n", "+ Statistical MT\n", "+ Neural MT\n", "+ Evaluation\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Languages are hard (even for humans)!\n", "\n", "<center style=\"padding-top:3em;\">\n", " <img src=\"mt_figures/whatever.jpg\" />\n", " \n", " <span style=\"font-size:50%;\">(Source: <a href=\"https://www.flickr.com/photos/98991392@N00/8729849093/sizes/z/in/pool-47169589@N00/\">Flickr</a>)</span>\n", "</center>\n", "\n", "[随便](https://translate.google.com/#view=home&op=translate&sl=zh-CN&tl=en&text=%E9%9A%8F%E4%BE%BF)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Automatic machine translation is hard!\n", "\n", "<center style=\"padding-top:3em;\">\n", "<img src=\"../chapters/mt_figures/avocado.png\" width=\"100%\"/>\n", "</center>\n", "\n", "[J'ai besoin d'un avocat pour mon procés de guacamole.](https://translate.google.com/?sl=fr&tl=en&text=J%27ai%20besoin%20d%27un%20avocat%20pour%20mon%20proc%C3%A9s%20de%20guacamole.&op=translate)\n", "\n", "[guacamole lawsuit](https://www.latimes.com/archives/la-xpm-2006-dec-10-fi-letters10.2-story.html)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Many things could go wrong." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "<center><img src=\"../img/quiz_time.png\"></center>\n", "\n", "### [ucph.page.link/mt](https://ucph.page.link/mt)\n", "\n", "([Responses](https://docs.google.com/forms/d/10UUpQqlduvYuz7-SXSvzMRDAs5qxdliKJmgnkWiHNK8/edit#responses), [Solution](https://docs.google.com/document/d/1l_NQNz6ME1mm2ra5nKMYErwI98d3TJ3aC_bXct0PNhg/edit?usp=sharing))\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Challenges\n", "\n", "Divergences between languages include:\n", "\n", "* Word order\n", "* Grammatical (morphological) marking (e.g., gender)\n", "* Division of concept space (e.g., English \"wall\" vs. German \"Mauer\"/\"Wand\")\n", "\n", "Addressing them requires resolving ambiguities:\n", "\n", "* Word sense (e.g., \"bass\")\n", "* Attributes with grammatical marking (e.g., formality in Japanese)\n", "* Reference in pro-drop contexts (e.g., in Mandarin Chinese)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## History\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/l6Av1jJRcgtfL5n9vEvrspxpEJJalwHAusRx\" width=\"100%\" />\n", "</center>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Direct Machine Translation\n", "\n", "Just use a **dictionary** and translate word-by-word.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/gfF1OTGY1E6QQBDW6l6kh1Daa4iingQmt86V\" width=\"100%\" />\n", "</center>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Transfer-based Machine Translation\n", "\n", "Add **rules** to inflect/join/split words based on source and target syntax.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/AfcjFhnoMFmb8koHf42YQeE6WfKdFNjqpkGL\" width=\"100%\" />\n", "</center>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Interlingua\n", "\n", "The ideal: transform to and from a language-independent representation.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/vEPJYMmjDV0LLXIy07LksIiOsecXdlHcs8nI\" width=\"80%\" />\n", "</center>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)\n", "\n", "Too hard to achieve with rules!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example-based Machine Translation (EBMT)\n", "\n", "Retrieve a similar example from a translation database, and make adjustments as necessary.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/H-VtzpHN02SMIhwYjqmn04Uyd-nGWLwLmBwW\" width=\"100%\" />\n", "</center>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Statistical Machine Translation (SMT)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## IBM Translation Models\n", "\n", "In the late 80s and early 90s, IBM researchers revolutionised MT using statistical approaches instead of rules.\n", "\n", "<img src=\"https://cdn-media-1.freecodecamp.org/images/1MJctJSHzUaYpUNIhvQdQtz4RKFq06nN7FJ9\"/>\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### IBM Model 1\n", "\n", "Simple word-by-word translation, but with **statistical** dictionaries.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/4dBTKxFLuXkmeALMuxILitzmBd0zVOQVrhnP\" width=\"100%\" />\n", "</center>\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/uF96He1UZaLMkC1TEuBGzoHAhw3PAvF8mgam\" width=\"60%\" />\n", "</center>\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### IBM Model 2\n", "Statistical translation and **reordering** model.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/AWurqrK2Zag9dIZSgYVGZCFxklsrZot7WLZ2\" width=\"90%\" />\n", "</center>\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### IBM Model 3\n", "Allows inserting new words.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/aPxpW2ssFd2wDio9C51Zbb0sIdXLBAV8DoYP\" width=\"70%\" />\n", "</center>\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## MT as Structured Prediction\n", "\n", "**Model** \\\\(p(\\target,\\source)\\\\): how likely the target \\\\(\\target\\\\) is to be a translation of source $\\source$.\n", "\n", "$$p(\\textrm{I like music}, \\textrm{音楽 が 好き}) \\gg p(\\textrm{I like persimmons}, \\textrm{音楽 が 好き})$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* How is the scoring function defined (**modeling**)?\n", "$$\\prob(\\target,\\source)\\approx s_\\params(\\target,\\source)$$\n", "\n", "* How are the parameters \\\\(\\params\\\\) learned (**training**)?\n", "\n", "$$\n", "\\argmax_\\params \\prod_{(\\target,\\source) \\in \\train} s_\\params(\\target, \\source)\n", "$$\n", "\n", "* How is translation \\\\(\\argmax\\\\) found (**decoding**)?\n", "\n", "$$\\argmax_\\target s_\\params(\\target,\\source)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Training\n", "Learn the parameters \\\\(\\params\\\\) from data.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/5qBpooShbY6xVngSotA6KINHyHP7NKeTJryb\" width=\"100%\" />\n", "</center>\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Generative Models \n", "Estimate $\\prob(\\target,\\source)$: how is the $(\\target,\\source)$ data **generated**?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Noisy Channel\n", "\n", "* Imagine a message $\\target$ is sent through a noisy channel and $\\source$ is received at the end.\n", "* Can we recover what was $\\target$?\n", "* **Language model** $\\prob(\\target)$: does the target $\\target$ look like real language?\n", "* **Translation model**: $\\prob(\\source|\\target)$: does the source $\\source$ match the target $\\target$?\n", "\n", "This defines a **joint** distribution\n", "\n", "$$\\prob(\\target,\\source) = \\prob(\\target) \\prob(\\source|\\target)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Word-based SMT\n", "\n", "Decompose source and target to words, with statistical **alignment**.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/jG95Sgc2W4VJbwi4LFlJeMHnjLZbdGydCCzI\" width=\"100%\" />\n", "</center>\n", "\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Phrase-based SMT\n", "\n", "Decompose source and target to **phrases** and look them up in phrase tables.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/lGJNqYGZOJMjs23F8-xf6E4buXptvm2IBzjg\" width=\"100%\" />\n", "</center>\n", "\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Neural Machine Translation (NMT)\n", "\n", "Model $s_\\params(\\target,\\source)$ directly using a neural network.\n", "\n", "<center>\n", " <img src=\"https://cdn-media-1.freecodecamp.org/images/DD6GvRmtxZGhC9toN1CHXUBWLhUXLpJSiJF5\" width=\"100%\" />\n", "</center>\n", "\n", "\n", "(Source: <a href=\"https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/\">freeCodeCamp</a>)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sequence-to-sequence models (seq2seq)\n", "\n", "Encoder–decoder architecture first \"read\" the input sequence and then generate an output sequence\n", "([Sutskever et al., 2014](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf), [Cho et al., 2014](https://arxiv.org/abs/1406.1078)).\n", "\n", " \n", "\n", "<center>\n", " <img src=\"mt_figures/encdec.svg\" width=\"70%\" />\n", "</center>\n", "\n", "(Examples are Basque–English)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### We can use RNNs for that!\n", "\n", "Example architecture:\n", "* Encoder: word embedding layer + Bi-LSTM to capture contextual information\n", "* Decoder: Uni-directional LSTM (because we need to *decode* word by word) + softmax layer on top\n", "\n", "<center>\n", " <img src=\"mt_figures/encdec_rnn1.svg\" width=\"70%\" />\n", "</center>\n", "\n", "The end-of-sequence symbol `</S>` is necessary to know when to start and stop generating." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### But something's missing (again)...\n", "\n", " \n", "\n", "<center>\n", " <img src=\"mt_figures/encdec_rnn1.svg\" width=\"70%\" />\n", "</center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Output words **depend on each other**!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Autoregressive MT\n", "\n", "At each step, feed the predicted word to the decoder as input for predicting the next word.\n", "\n", "<center>\n", " <img src=\"mt_figures/encdec_rnn2.svg\" width=\"70%\" />\n", "</center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Training\n", "\n", "+ Loss function: negative log-likelihood\n", "+ **Teacher forcing:** always feed the ground truth into the decoder.\n", "\n", "*Alternative:*\n", "\n", "+ **Scheduled sampling:** with a certain probability, use model predictions instead.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Decoding\n", "\n", "+ Greedy decoding:\n", " + Always pick the **most likely word** (according to the softmax)\n", " + Continue generating more words **until the `</S>` symbol is predicted**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "+ Beam search:\n", " * In each step chooses **best next source word to translate**\n", " * **Append a target word** based on source word\n", " * Maintains a list of top-$k$ hypotheses in a **beam**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## More problems with our approach\n", "\n", " \n", "\n", "<center>\n", " <img src=\"mt_figures/encdec_rnn3.svg\" width=\"70%\" />\n", "</center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Word alignment\n", "\n", "In rule-based and statistical MT, word alignments are crucial.\n", "\n", "<center style=\"padding: 1em 0;\">\n", " <img src=\"mt_figures/align.svg\" width=\"20%\" />\n", "</center>\n", "\n", "Can we use them in neural MT?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Attention mechanism\n", "\n", "Jointly learning to align and to translate.\n", "\n", "<center>\n", " <img src=\"mt_figures/encdec_att.svg\" width=\"40%\" />\n", "</center>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Attention matrix\n", "\n", "<center>\n", " <img src=\"mt_figures/att_matrix.png\" width=\"40%\" />\n", "</center>\n", "\n", "<div style=\"text-align: right;\">\n", " (from <a href=\"https://arxiv.org/abs/1409.0473\">Bahdanau et al., 2014</a>)\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Transformers\n", "\n", "Replace LSTMs by self-attention. Attend to encoded input *and* to partial output (autoregressive).\n", "\n", "<center>\n", " <img src=\"http://jalammar.github.io/images/xlnet/transformer-encoder-decoder.png\" width=80%/>\n", "</center>\n", "\n", "<div style=\"text-align: right;\">\n", " (from <a href=\"http://jalammar.github.io/illustrated-gpt2/\">The Illustrated GPT-2</a>)\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# MT evaluation\n", "\n", "We're training the model with *negative log-likelihood*, but that's not the best way to *evaluate* it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Consider:\n", " \n", "+ After lunch, he went to the gym.\n", "+ After he had lunch, he went to the gym.\n", "+ He went to the gym after lunch.\n", "+ He went to the gym after lunchtime.\n", "\n", "In machine translation, there are often **several acceptable variations!**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Human evaluation\n", "\n", "* **Faithfulness** (or meaning preservation) to evaluate the \"translation model\"\n", "* **Fluency** to evaluate the \"target language model\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In general, manual evaluation is the best way, but it is not scalable. **Automatic** metrics are therefore often used." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## BLEU score\n", "\n", "A widely used *reference-based* metric ([Papineni et al., 2002](https://aclanthology.org/P02-1040/)):\n", "\n", "+ Compare the prediction to one or more reference translations.\n", "+ Count the number of matching $n$-grams between them.\n", " - It is common to consider $1 \\le n \\le 4$\n", "+ Divide by total number of $n$-grams.\n", "\n", "The BLEU score will range between 0 (*no match at all*) and 1.0 (*perfect match*, 100%)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Recommended library: [sacreBLEU](https://github.com/mjpost/sacrebleu)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### BLEU score examples" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting sacrebleu\n", " Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)\n", "\u001b[K |████████████████████████████████| 118 kB 4.7 MB/s eta 0:00:01\n", "\u001b[?25hCollecting tabulate>=0.8.9\n", " Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)\n", "Requirement already satisfied: regex in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (2020.7.14)\n", "Requirement already satisfied: numpy>=1.17 in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (1.19.1)\n", "Requirement already satisfied: colorama in /home/daniel/anaconda3/envs/stat-nlp-book/lib/python3.8/site-packages (from sacrebleu) (0.4.5)\n", "Collecting portalocker\n", " Downloading portalocker-2.6.0-py2.py3-none-any.whl (15 kB)\n", "Collecting lxml\n", " Downloading lxml-4.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.9 MB)\n", "\u001b[K |████████████████████████████████| 6.9 MB 31.1 MB/s eta 0:00:01\n", "\u001b[?25hInstalling collected packages: tabulate, portalocker, lxml, sacrebleu\n", "Successfully installed lxml-4.9.1 portalocker-2.6.0 sacrebleu-2.3.1 tabulate-0.9.0\n" ] } ], "source": [ "!pip install sacrebleu" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "100.00000000000004" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sacrebleu.metrics import BLEU\n", "bleu = BLEU()\n", "\n", "refs = [[\"After lunch, he went to the gym.\",\n", " \"He went to the gym after lunch.\"]]\n", "bleu.corpus_score([\"After lunch, he went to the gym.\"], refs).score" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "30.509752160562883" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bleu.corpus_score([\"Turtles are great animals to the gym.\"], refs).score" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "69.89307622784945" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bleu.corpus_score([\"After he had lunch, he went to the gym.\"], refs).score" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "86.33400213704509" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bleu.corpus_score([\"Before lunch, he went to the gym.\"], refs).score" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "+ BLEU is very simplistic\n", "+ Many alternatives have been proposed, including chrF ([Popović, 2015](https://aclanthology.org/W15-3049/)) and BERTScore ([Zhang et al., 2020](https://arxiv.org/abs/1904.09675))\n", "+ ...but BLEU still remains very popular" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Improving efficiency and quality in MT\n", "\n", "+ More data\n", "+ Bigger models\n", "+ Better neural network architectures\n", "+ Semi-supervised learning\n", "+ **Transfer learning**" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Further reading\n", "\n", "* Non-neural machine translation:\n", " + Ilya Pestov's article [A history of machine translation from the Cold War to deep learning](https://www.freecodecamp.org/news/a-history-of-machine-translation-from-the-cold-war-to-deep-learning-f1d335ce8b5/)\n", " + [Slides on SMT from this repo](word_mt_slides.ipynb)\n", " + Mike Collins's [Lecture notes on IBM Model 1 and 2](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf)\n", "\n", "* Sequence-to-sequence models:\n", " + Graham Neubig, [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/abs/1703.01619)\n", "\n", "* And beyond...\n", " + Philipp Koehn, [Neural Machine Translation, §13.6–13.8](https://arxiv.org/abs/1709.07809) gives a great overview of further refinements and challenges" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 1 }