{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "  function code_toggle() {\n",
       "    if (code_shown){\n",
       "      $('div.input').hide('500');\n",
       "      $('#toggleButton').val('Show Code')\n",
       "    } else {\n",
       "      $('div.input').show('500');\n",
       "      $('#toggleButton').val('Hide Code')\n",
       "    }\n",
       "    code_shown = !code_shown\n",
       "  }\n",
       "\n",
       "  $( document ).ready(function(){\n",
       "    code_shown=false;\n",
       "    $('div.input').hide()\n",
       "  });\n",
       "</script>\n",
       "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
       "<style>\n",
       ".rendered_html td {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       ".rendered_html th {\n",
       "    font-size: xx-large;\n",
       "    text-align: left; !important\n",
       "}\n",
       "</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<script>\n",
    "  function code_toggle() {\n",
    "    if (code_shown){\n",
    "      $('div.input').hide('500');\n",
    "      $('#toggleButton').val('Show Code')\n",
    "    } else {\n",
    "      $('div.input').show('500');\n",
    "      $('#toggleButton').val('Hide Code')\n",
    "    }\n",
    "    code_shown = !code_shown\n",
    "  }\n",
    "\n",
    "  $( document ).ready(function(){\n",
    "    code_shown=false;\n",
    "    $('div.input').hide()\n",
    "  });\n",
    "</script>\n",
    "<form action=\"javascript:code_toggle()\"><input type=\"submit\" id=\"toggleButton\" value=\"Show Code\"></form>\n",
    "<style>\n",
    ".rendered_html td {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    ".rendered_html th {\n",
    "    font-size: xx-large;\n",
    "    text-align: left; !important\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "import sys\n",
    "sys.path.append(\"..\")\n",
    "from statnlpbook.util import execute_notebook\n",
    "import statnlpbook.parsing as parsing\n",
    "from statnlpbook.transition import *\n",
    "from statnlpbook.dep import *\n",
    "import pandas as pd\n",
    "from io import StringIO\n",
    "from IPython.display import display, HTML\n",
    "\n",
    "execute_notebook('transition-based_dependency_parsing.ipynb')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "<!---\n",
    "Latex Macros\n",
    "-->\n",
    "$$\n",
    "\\newcommand{\\Xs}{\\mathcal{X}}\n",
    "\\newcommand{\\Ys}{\\mathcal{Y}}\n",
    "\\newcommand{\\y}{\\mathbf{y}}\n",
    "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n",
    "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n",
    "\\newcommand{\\aligns}{\\mathbf{a}}\n",
    "\\newcommand{\\align}{a}\n",
    "\\newcommand{\\source}{\\mathbf{s}}\n",
    "\\newcommand{\\target}{\\mathbf{t}}\n",
    "\\newcommand{\\ssource}{s}\n",
    "\\newcommand{\\starget}{t}\n",
    "\\newcommand{\\repr}{\\mathbf{f}}\n",
    "\\newcommand{\\repry}{\\mathbf{g}}\n",
    "\\newcommand{\\x}{\\mathbf{x}}\n",
    "\\newcommand{\\prob}{p}\n",
    "\\newcommand{\\a}{\\alpha}\n",
    "\\newcommand{\\b}{\\beta}\n",
    "\\newcommand{\\vocab}{V}\n",
    "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n",
    "\\newcommand{\\param}{\\theta}\n",
    "\\DeclareMathOperator{\\perplexity}{PP}\n",
    "\\DeclareMathOperator{\\argmax}{argmax}\n",
    "\\DeclareMathOperator{\\argmin}{argmin}\n",
    "\\newcommand{\\train}{\\mathcal{D}}\n",
    "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n",
    "\\newcommand{\\length}[1]{\\text{length}(#1) }\n",
    "\\newcommand{\\indi}{\\mathbb{I}}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The tikzmagic extension is already loaded. To reload it, use:\n",
      "  %reload_ext tikzmagic\n"
     ]
    }
   ],
   "source": [
    "%load_ext tikzmagic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Parsing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Schedule\n",
    "\n",
    "+ Parsing motivation\n",
    "\n",
    "+ Background: parsing (10 min.)\n",
    "\n",
    "+ Exercise: multi-word expressions (10 min.)\n",
    "\n",
    "+ Background: Universal Dependencies (5 min.)\n",
    "\n",
    "+ Background: transition-based parsing (10 min.)\n",
    "\n",
    "+ Break (10 min.)\n",
    "\n",
    "+ Example: transition-based parsing (5 min.)\n",
    "\n",
    "+ Motivation: natural language understanding (5 min.)\n",
    "\n",
    "+ Background: learning to parse (10 min.)\n",
    "\n",
    "+ Math: dependency parsing evaluation (5 min.)\n",
    "\n",
    "+ Examples: dependency parsers (5 min.)\n",
    "\n",
    "+ Background: semantic parsing (15 min.)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Motivation: information extraction\n",
    "\n",
    "> <font color=\"blue\">Dechra Pharmaceuticals</font>, which has just made its second acquisition, had previously purchased <font color=\"green\">Genitrix</font>.\n",
    "\n",
    "> <font color=\"blue\">Trinity Mirror plc</font>, the largest British newspaper, purchased <font color=\"green\">Local World</font>, its rival.\n",
    "\n",
    "> <font color=\"blue\">Kraft</font>, owner of <font color=\"blue\">Milka</font>, purchased <font color=\"green\">Cadbury Dairy Milk</font> and is now gearing up for a roll-out of its new brand.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Check out [UDPipe](https://lindat.mff.cuni.cz/services/udpipe/run.php?model=english-ewt-ud-2.6-200830&data=Kraft,%20owner%20of%20Milka,%20purchased%20Cadbury%20Dairy%20Milk%20and%20is%20now%20gearing%20up%20for%20a%20roll-out%20of%20its%20new%20brand.) and [Stanza](http://stanza.run/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Motivation: question answering by reading comprehension\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/05dd7254b632376973f3a1b4d39485da17814df5/6-Figure4-1.png\" width=100%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from [Rajpurkar et al., 2016](https://aclanthology.org/D16-1264))\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Motivation: question answering from knowledge bases\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/faee0c81a1170402b149500f1b91c51ccaf24027/2-Figure1-1.png\" width=50%>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from [Reddy et al., 2017](https://aclanthology.org/D17-1009/))\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Parsing is is the process of **constructing these graphs**:\n",
    "\n",
    "* very important for downstream applications\n",
    "* researched in academia and [industry](https://ai.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "How is this done?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Syntactic Dependencies\n",
    "\n",
    "* **Lexical Elements**: words\n",
    "* **Syntactic Relations**: subject, direct object, nominal modifier, etc. \n",
    "\n",
    "Task: determine the syntactic relations between words"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Grammatical Relations\n",
    "> <font color=\"blue\">Kraft</font>, owner of <font color=\"blue\">Milka</font>, purchased <font color=\"green\">Cadbury Dairy Milk</font> and is now gearing up for a roll-out of its new brand.\n",
    "\n",
    "* *Subject* of **purchased**: Kraft\n",
    "* *Object* of **purchased**: Cadbury"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy44' style=\"overflow: scroll; width: 1200px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy44',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 3, \"label\": \"appos\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 9, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 1, \"end\": 7, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 3, \"end\": 5, \"label\": \"nmod\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 10, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 6, \"end\": 7, \"label\": \"punct\", \"dir\": \"left\"}, {\"start\": 1, \"end\": 2, \"label\": \"punct\", \"dir\": \"right\"}, {\"start\": 4, \"end\": 5, \"label\": \"case\", \"dir\": \"left\"}, {\"start\": 7, \"end\": 8, \"label\": \"dobj\", \"dir\": \"right\"}, {\"start\": 0, \"end\": 7, \"label\": \"root\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Kraft\"}, {\"text\": \",\"}, {\"text\": \"owner\"}, {\"text\": \"of\"}, {\"text\": \"Milka\"}, {\"text\": \",\"}, {\"text\": \"purchased\"}, {\"text\": \"Cadbury\"}, {\"text\": \"Dairy\"}, {\"text\": \"Milk\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy44'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tKraft\tKraft\tNOUN\tNN\t_\t7\tnsubj\t_\t_\n",
    "2\t,\t,\tPUNCT\t,\t_\t1\tpunct\t_\t_\n",
    "3\towner\towner\tNOUN\tNN\t_\t1\tappos\t_\t_\n",
    "4\tof\tof\tADP\tIN\t_\t5\tcase\t_\t_\n",
    "5\tMilka\tMilka\tPROPN\tNNP\t_\t3\tnmod\t_\t_\n",
    "6\t,\t,\tPUNCT\t,\t_\t7\tpunct\t_\t_\n",
    "7\tpurchased\tpurchase\tVERB\tVBD\t_\t0\troot\t_\t_\n",
    "8\tCadbury\tCadbury\tPROPN\tNNP\t_\t7\tdobj\t_\t_\n",
    "9\tDairy\tDairy\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "10\tMilk\tmilk\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "\"\"\"\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"1200px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Anatomy of a Dependency Tree\n",
    "\n",
    "* Nodes (vertices):\n",
    "    * Words of the sentence (+ punctuation tokens)\n",
    "    * a ROOT node\n",
    "* Arcs (edges):\n",
    "    * Directed from syntactic **head** to **dependent**\n",
    "    * Each **non-ROOT** token has **exactly one head**\n",
    "        * the word that controls its syntactic function, or\n",
    "        * the word \"it depends on\"\n",
    "* ROOT **has no head**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy45' style=\"overflow: scroll; width: 1200px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy45',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 3, \"label\": \"appos\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 9, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 1, \"end\": 7, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 3, \"end\": 5, \"label\": \"nmod\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 10, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 6, \"end\": 7, \"label\": \"punct\", \"dir\": \"left\"}, {\"start\": 1, \"end\": 2, \"label\": \"punct\", \"dir\": \"right\"}, {\"start\": 4, \"end\": 5, \"label\": \"case\", \"dir\": \"left\"}, {\"start\": 7, \"end\": 8, \"label\": \"dobj\", \"dir\": \"right\"}, {\"start\": 0, \"end\": 7, \"label\": \"root\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Kraft\"}, {\"text\": \",\"}, {\"text\": \"owner\"}, {\"text\": \"of\"}, {\"text\": \"Milka\"}, {\"text\": \",\"}, {\"text\": \"purchased\"}, {\"text\": \"Cadbury\"}, {\"text\": \"Dairy\"}, {\"text\": \"Milk\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy45'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tKraft\tKraft\tNOUN\tNN\t_\t7\tnsubj\t_\t_\n",
    "2\t,\t,\tPUNCT\t,\t_\t1\tpunct\t_\t_\n",
    "3\towner\towner\tNOUN\tNN\t_\t1\tappos\t_\t_\n",
    "4\tof\tof\tADP\tIN\t_\t5\tcase\t_\t_\n",
    "5\tMilka\tMilka\tPROPN\tNNP\t_\t3\tnmod\t_\t_\n",
    "6\t,\t,\tPUNCT\t,\t_\t7\tpunct\t_\t_\n",
    "7\tpurchased\tpurchase\tVERB\tVBD\t_\t0\troot\t_\t_\n",
    "8\tCadbury\tCadbury\tPROPN\tNNP\t_\t7\tdobj\t_\t_\n",
    "9\tDairy\tDairy\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "10\tMilk\tmilk\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "\"\"\"\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"1200px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example\n",
    "\n",
    "(in [CoNLL-U Format](https://universaldependencies.org/format.html))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>saw</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>Bob</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>dobj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy46' style=\"overflow: scroll; width: 900px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy46',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy46'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tAlice\t_\t_\t_\t_\t2\tnsubj\t_\t_\n",
    "2\tsaw\t_\t_\t_\t_\t0\troot\t_\t_\n",
    "3\tBob\t_\t_\t_\t_\t2\tdobj\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"900px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### https://ucph.padlet.org/dh/mw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Need for Universal Syntax\n",
    "\n",
    "### https://cl.lingfil.uu.se/~nivre/docs/NivreCLIN2020.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Universal Syntax\n",
    "\n",
    "English and Danish are similar, while others are more distant:\n",
    "![similarities](https://www.mitpressjournals.org/na101/home/literatum/publisher/mit/journals/content/coli/2019/coli.2019.45.issue-2/coli_a_00351/20190614/images/large/00351f03c.jpeg)\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    Left: clustering based on syntactic dependencies; right: genetic tree\n",
    "    (from <a href=\"https://www.mitpressjournals.org/doi/full/10.1162/coli_a_00351\">Bjerva et al., 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Danish Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>Alice</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>så</td>\n",
       "      <td>se</td>\n",
       "      <td>VERB</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>Bob</td>\n",
       "      <td>Bob</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>obj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy47' style=\"overflow: scroll; width: 900px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy47',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 2, \"end\": 3, \"label\": \"obj\", \"dir\": \"right\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"s\\u00e5\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy47'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tAlice\tAlice\tNOUN\t_\t_\t2\tnsubj\t_\t_\n",
    "2\tså\tse\tVERB\t_\t_\t0\troot\t_\t_\n",
    "3\tBob\tBob\tPROPN\t_\t_\t2\tobj\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"900px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Korean Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>앨리스는</td>\n",
       "      <td>앨리스+는</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>3</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>밥을</td>\n",
       "      <td>밥+을</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>3</td>\n",
       "      <td>obj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>보았다</td>\n",
       "      <td>보+았+다</td>\n",
       "      <td>VERB</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy48' style=\"overflow: scroll; width: 900px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy48',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 3, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 2, \"end\": 3, \"label\": \"obj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 3, \"label\": \"root\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"\\uc568\\ub9ac\\uc2a4\\ub294\"}, {\"text\": \"\\ubc25\\uc744\"}, {\"text\": \"\\ubcf4\\uc558\\ub2e4\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy48'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\t앨리스는\t앨리스+는\tNOUN\t_\t_\t3\tnsubj\t_\t_\n",
    "2\t밥을\t밥+을\tNOUN\t_\t_\t3\tobj\t_\t_\n",
    "3\t보았다\t보+았+다\tVERB\t_\t_\t0\troot\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"900px\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Longer English Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>Kraft</td>\n",
       "      <td>Kraft</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>NN</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>,</td>\n",
       "      <td>,</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>,</td>\n",
       "      <td>_</td>\n",
       "      <td>1</td>\n",
       "      <td>punct</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>owner</td>\n",
       "      <td>owner</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>NN</td>\n",
       "      <td>_</td>\n",
       "      <td>1</td>\n",
       "      <td>appos</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>of</td>\n",
       "      <td>of</td>\n",
       "      <td>ADP</td>\n",
       "      <td>IN</td>\n",
       "      <td>_</td>\n",
       "      <td>5</td>\n",
       "      <td>case</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>Milka</td>\n",
       "      <td>Milka</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>_</td>\n",
       "      <td>3</td>\n",
       "      <td>nmod</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>,</td>\n",
       "      <td>,</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>,</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>punct</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>purchased</td>\n",
       "      <td>purchase</td>\n",
       "      <td>VERB</td>\n",
       "      <td>VBD</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>Cadbury</td>\n",
       "      <td>Cadbury</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>dobj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>Dairy</td>\n",
       "      <td>Dairy</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>_</td>\n",
       "      <td>8</td>\n",
       "      <td>flat</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>Milk</td>\n",
       "      <td>milk</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>_</td>\n",
       "      <td>8</td>\n",
       "      <td>flat</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy49' style=\"overflow: scroll; width: 1200px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy49',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 3, \"label\": \"appos\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 9, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 1, \"end\": 7, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 3, \"end\": 5, \"label\": \"nmod\", \"dir\": \"right\"}, {\"start\": 8, \"end\": 10, \"label\": \"flat\", \"dir\": \"right\"}, {\"start\": 6, \"end\": 7, \"label\": \"punct\", \"dir\": \"left\"}, {\"start\": 1, \"end\": 2, \"label\": \"punct\", \"dir\": \"right\"}, {\"start\": 4, \"end\": 5, \"label\": \"case\", \"dir\": \"left\"}, {\"start\": 7, \"end\": 8, \"label\": \"dobj\", \"dir\": \"right\"}, {\"start\": 0, \"end\": 7, \"label\": \"root\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Kraft\"}, {\"text\": \",\"}, {\"text\": \"owner\"}, {\"text\": \"of\"}, {\"text\": \"Milka\"}, {\"text\": \",\"}, {\"text\": \"purchased\"}, {\"text\": \"Cadbury\"}, {\"text\": \"Dairy\"}, {\"text\": \"Milk\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy49'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tKraft\tKraft\tNOUN\tNN\t_\t7\tnsubj\t_\t_\n",
    "2\t,\t,\tPUNCT\t,\t_\t1\tpunct\t_\t_\n",
    "3\towner\towner\tNOUN\tNN\t_\t1\tappos\t_\t_\n",
    "4\tof\tof\tADP\tIN\t_\t5\tcase\t_\t_\n",
    "5\tMilka\tMilka\tPROPN\tNNP\t_\t3\tnmod\t_\t_\n",
    "6\t,\t,\tPUNCT\t,\t_\t7\tpunct\t_\t_\n",
    "7\tpurchased\tpurchase\tVERB\tVBD\t_\t0\troot\t_\t_\n",
    "8\tCadbury\tCadbury\tPROPN\tNNP\t_\t7\tdobj\t_\t_\n",
    "9\tDairy\tDairy\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "10\tMilk\tmilk\tPROPN\tNNP\t_\t8\tflat\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"1200px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Universal Dependencies \n",
    "\n",
    "* Annotation framework featuring [37 syntactic relations](https://universaldependencies.org/u/dep/all.html)\n",
    "* [Treebanks](http://universaldependencies.org/) in over 90 languages\n",
    "* Large project with over 200 contributors\n",
    "* Linguistically universal [annotation guidelines](https://universaldependencies.org/guidelines.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### UD Dependency Relations\n",
    "\n",
    "<table border=\"1\">\n",
    "  <tr style=\"background-color:cornflowerblue; font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"> </td>\n",
    "      <td style=\"text-align: left;\"> Nominals </td>\n",
    "      <td style=\"text-align: left;\"> Clauses </td>\n",
    "      <td style=\"text-align: left;\"> Modifier words </td>\n",
    "      <td style=\"text-align: left;\"> Function Words </td>\n",
    "  </tr>\n",
    "  <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"background-color:darkseagreen\">\n",
    "\tCore arguments\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/nsubj.html\" title=\"u-dep nsubj\">nsubj</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/obj.html\" title=\"u-dep obj\">obj</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/iobj.html\" title=\"u-dep iobj\">iobj</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/csubj.html\" title=\"u-dep csubj\">csubj</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/ccomp.html\" title=\"u-dep ccomp\">ccomp</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/xcomp.html\" title=\"u-dep xcomp\">xcomp</a>\n",
    "      </td>\n",
    "\t  <td style=\"text-align: left;\"></td><td style=\"text-align: left;\"></td>\n",
    "  </tr>\n",
    "  <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"background-color:darkseagreen;\">\n",
    "\tNon-core dependents\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/obl.html\" title=\"u-dep obl\">obl</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/vocative.html\" title=\"u-dep vocative\">vocative</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/expl.html\" title=\"u-dep expl\">expl</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/dislocated.html\" title=\"u-dep dislocated\">dislocated</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/advcl.html\" title=\"u-dep advcl\">advcl</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/advmod.html\" title=\"u-dep advmod\">advmod</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/discourse.html\" title=\"u-dep discourse\">discourse</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/aux_.html\" title=\"u-dep aux\">aux</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/cop.html\" title=\"u-dep cop\">cop</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/mark.html\" title=\"u-dep mark\">mark</a>\n",
    "      </td>\n",
    "  </tr>\n",
    "  <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"background-color:darkseagreen\">\n",
    "\tNominal dependents\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/nmod.html\" title=\"u-dep nmod\">nmod</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/appos.html\" title=\"u-dep appos\">appos</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/nummod.html\" title=\"u-dep nummod\">nummod</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/acl.html\" title=\"u-dep acl\">acl</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/amod.html\" title=\"u-dep amod\">amod</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/det.html\" title=\"u-dep det\">det</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/clf.html\" title=\"u-dep clf\">clf</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/case.html\" title=\"u-dep case\">case</a>\n",
    "      </td>\n",
    "  </tr style=\"font-size: x-large; text-align: left;\">\n",
    "  <tr style=\"background-color:cornflowerblue; font-size: x-large; text-align: left;\">\t\n",
    "      <td style=\"text-align: left;\"> Coordination </td>\n",
    "      <td style=\"text-align: left;\"> MWE </td>\n",
    "      <td style=\"text-align: left;\"> Loose </td>\n",
    "      <td style=\"text-align: left;\"> Special </td>\n",
    "      <td style=\"text-align: left;\"> Other </td>\n",
    "  </tr>\n",
    "  <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\">\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/conj.html\" title=\"u-dep conj\">conj</a><br>\n",
    "\t    <a href=\"https://universaldependencies.org/u/dep/cc.html\" title=\"u-dep cc\">cc</a>\n",
    "      </td>\n",
    "      <td style=\"text-align: left;\">\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/fixed.html\" title=\"u-dep fixed\">fixed</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/flat.html\" title=\"u-dep flat\">flat</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/compound.html\" title=\"u-dep compound\">compound</a>\n",
    "    </td>\n",
    "    <td style=\"text-align: left;\">\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/list.html\" title=\"u-dep list\">list</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/parataxis.html\" title=\"u-dep parataxis\">parataxis</a>\n",
    "    </td>\n",
    "    <td style=\"text-align: left;\">\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/orphan.html\" title=\"u-dep orphan\">orphan</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/goeswith.html\" title=\"u-dep goeswith\">goeswith</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/reparandum.html\" title=\"u-dep reparandum\">reparandum</a>\n",
    "    </td>\n",
    "    <td style=\"text-align: left;\">\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/punct.html\" title=\"u-dep punct\">punct</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/root.html\" title=\"u-dep root\">root</a><br>\n",
    "\t  <a href=\"https://universaldependencies.org/u/dep/dep.html\" title=\"u-dep dep\">dep</a>\n",
    "    </td>\n",
    "  </tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Universal POS Tags (UPOS)\n",
    "\n",
    "As opposed to language-specific POS tags (XPOS).\n",
    "\n",
    "<table class=\"typeindex\">\n",
    "  <thead>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <th>Open class words</th>\n",
    "      <th>Closed class words</th>\n",
    "      <th>Other</th>\n",
    "    </tr>\n",
    "  </thead>\n",
    "  <tbody>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/ADJ.html\" class=\"doclink doclabel\" title=\"u-pos ADJ\">ADJ</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/ADP.html\" class=\"doclink doclabel\" title=\"u-pos ADP\">ADP</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/PUNCT.html\" class=\"doclink doclabel\" title=\"u-pos PUNCT\">PUNCT</a></td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/ADV.html\" class=\"doclink doclabel\" title=\"u-pos ADV\">ADV</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/AUX_.html\" class=\"doclink doclabel\" title=\"u-pos AUX\">AUX</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/SYM.html\" class=\"doclink doclabel\" title=\"u-pos SYM\">SYM</a></td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/INTJ.html\" class=\"doclink doclabel\" title=\"u-pos INTJ\">INTJ</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/CCONJ.html\" class=\"doclink doclabel\" title=\"u-pos CCONJ\">CCONJ</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/X.html\" class=\"doclink doclabel\" title=\"u-pos X\">X</a></td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/NOUN.html\" class=\"doclink doclabel\" title=\"u-pos NOUN\">NOUN</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/DET.html\" class=\"doclink doclabel\" title=\"u-pos DET\">DET</a></td>\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/PROPN.html\" class=\"doclink doclabel\" title=\"u-pos PROPN\">PROPN</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/NUM.html\" class=\"doclink doclabel\" title=\"u-pos NUM\">NUM</a></td>\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/VERB.html\" class=\"doclink doclabel\" title=\"u-pos VERB\">VERB</a></td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/PART.html\" class=\"doclink doclabel\" title=\"u-pos PART\">PART</a></td>\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/PRON.html\" class=\"doclink doclabel\" title=\"u-pos PRON\">PRON</a></td>\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "    </tr>\n",
    "    <tr style=\"font-size: x-large; text-align: left;\">\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "      <td style=\"text-align: left;\"><a href=\"https://universaldependencies.org/u/pos/SCONJ.html\" class=\"doclink doclabel\" title=\"u-pos SCONJ\">SCONJ</a></td>\n",
    "      <td style=\"text-align: left;\">&nbsp;</td>\n",
    "    </tr>\n",
    "  </tbody>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dependency Parsing\n",
    "\n",
    "* Predict **head** and **relation** for each word.\n",
    "* Structured prediction, just like POS tagging.\n",
    "* Or is it?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>Alice</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>saw</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>Bob</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>dobj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tAlice\t_\t_\t_\t_\t2\tnsubj\t_\t_\n",
    "2\tsaw\t_\t_\t_\t_\t0\troot\t_\t_\n",
    "3\tBob\t_\t_\t_\t_\t2\tdobj\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Dependency Parsing Approaches\n",
    "\n",
    "* Graph-based: score all possible parts (e.g. word pairs), find best combination (e.g. maximum spanning tree)\n",
    "* Transition-based: incrementally build the tree, one arc at a time, by applying a sequence of actions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Transition-Based Parsing\n",
    "\n",
    "* Learn to perform the right action / transition in a bottom-up left-right parser\n",
    "* Train classifiers $p(y|\\x)$ where $y$ is an action, and $\\x$ is the **parser state**\n",
    "* Many possible transition systems; shown here: **arc-standard** ([Nivre, 2004](https://www.aclweb.org/anthology/W04-0308))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Configuration (Parser State)\n",
    "\n",
    "Consists of a buffer, stack and set of arcs created so far."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Buffer\n",
    "of tokens waiting for processing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'>Alice saw Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy6' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy6',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy6'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'></td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7aaa2898>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[0:1], tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Stack\n",
    "of tokens currently being processed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT Alice saw</td><td style='font-size: x-large;'>Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy7' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy7',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy7'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a820898>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[2:3],tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Parse (set of arcs)\n",
    "tree built so far"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy50' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy50',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy50'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>rightArc-root</td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a7f3f60>"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[6:7], tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "We use the following \n",
    "### Actions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Shift\n",
    "\n",
    "Push the word at the top of the buffer to the stack. \n",
    "\n",
    "$$\n",
    "(S, i|B, A)\\rightarrow(S|i, B, A)\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'>Alice saw Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy9' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy9',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy9'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'></td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT Alice</td><td style='font-size: x-large;'>saw Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy10' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy10',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy10'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a820cf8>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[0:2], tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### rightArc-[label]\n",
    "\n",
    "Add labeled arc from secondmost top node of stack \\\\(i\\\\) to top of the stack \\\\(j\\\\). Pop the top of the stack.\n",
    "\n",
    "$$\n",
    "(S|i|j, B, A) \\rightarrow (S|i, B, A\\cup\\{(i,j,l)\\})\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw Bob</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy51' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy51',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy51'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy52' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy52',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy52'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>rightArc-dobj</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy53' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy53',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy53'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>rightArc-root</td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a7f30f0>"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[4:7], tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### leftArc-[label] \n",
    "\n",
    "Add labeled arc from top of stack, \\\\(j\\\\), to secondmost top node of stack, \\\\(i\\\\). Reduce the secondmost top node of the stack.\n",
    "\n",
    "$$\n",
    "(S|i|j, B, A) \\rightarrow (S|j, B, A\\cup\\{(j,i,l)\\})\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT Alice saw</td><td style='font-size: x-large;'>Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy54' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy54',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy54'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw</td><td style='font-size: x-large;'>Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy55' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy55',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy55'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>leftArc-nsubj</td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a7f35c0>"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[2:4], tokenized_sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Full Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table><tr><td style='font-size: x-large;'>stack</td><td style='font-size: x-large;'>buffer</td><td style='font-size: x-large;'>parse</td><td style='font-size: x-large;'>action</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'>Alice saw Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy56' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy56',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy56'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'></td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT Alice</td><td style='font-size: x-large;'>saw Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy57' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy57',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy57'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT Alice saw</td><td style='font-size: x-large;'>Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy58' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy58',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy58'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw</td><td style='font-size: x-large;'>Bob</td><td style='font-size: x-large;'>\n",
       "    <div id='displacy59' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy59',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy59'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>leftArc-nsubj</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw Bob</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy60' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy60',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy60'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>shift</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT saw</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy61' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy61',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy61'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>rightArc-dobj</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy62' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy62',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy62'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'>rightArc-root</td></tr>\n",
       "<tr><td style='font-size: x-large;'>ROOT</td><td style='font-size: x-large;'></td><td style='font-size: x-large;'>\n",
       "    <div id='displacy63' style=\"overflow: scroll; width: 500px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy63',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 3, \"label\": \"dobj\", \"dir\": \"right\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"Alice\"}, {\"text\": \"saw\"}, {\"text\": \"Bob\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy63'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script></td><td style='font-size: x-large;'></td></tr></table>"
      ],
      "text/plain": [
       "<statnlpbook.transition.render_transitions_displacy.<locals>.Output at 0x7fbd7a7f33c8>"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "render_transitions_displacy(transitions[:], tokenized_sentence)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"parsing_figures/tb_example.png\" width=100%/>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Summary: Configuration\n",
    "\n",
    "**Configuration**:\n",
    "- Stack \\\\(S\\\\): a last-in, first-out memory to keep track of words to process later\n",
    "- Buffer \\\\(B\\\\): words not processed so far\n",
    "- Arcs \\\\(A\\\\): the dependency edges predicted so far\n",
    "\n",
    "We further define two special configurations:\n",
    "- initial: buffer is initialised to the words in the sentence, stack contains root, and arcs are empty\n",
    "- terminal: buffer is empty, stack contains only root"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Summary: Actions\n",
    "\n",
    "- shift: Push the word at the top of the buffer to the stack \\\\((S, i|B, A)\\rightarrow(S|i, B, A)\\\\)\n",
    "- rightArc-label: Add labeled arc from secondmost top node of stack \\\\(i\\\\) to top of the stack \\\\(j\\\\). Pop the top of the stack.\n",
    "- leftArc-label: Add labeled arc from top of stack, \\\\(j\\\\), to secondmost top node of stack, \\\\(i\\\\). Reduce the secondmost top node of the stack."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Syntactic Ambiguity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>I</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>saw</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>the</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>4</td>\n",
       "      <td>det</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>star</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>dobj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>with</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>case</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>the</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>det</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>telescope</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>obl</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy42' style=\"overflow: scroll; width: 900px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy42',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 6, \"end\": 7, \"label\": \"det\", \"dir\": \"left\"}, {\"start\": 2, \"end\": 7, \"label\": \"obl\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 4, \"label\": \"dobj\", \"dir\": \"right\"}, {\"start\": 3, \"end\": 4, \"label\": \"det\", \"dir\": \"left\"}, {\"start\": 5, \"end\": 7, \"label\": \"case\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"I\"}, {\"text\": \"saw\"}, {\"text\": \"the\"}, {\"text\": \"star\"}, {\"text\": \"with\"}, {\"text\": \"the\"}, {\"text\": \"telescope\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy42'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tI\t_\t_\t_\t_\t2\tnsubj\t_\t_\n",
    "2\tsaw\t_\t_\t_\t_\t0\troot\t_\t_\n",
    "3\tthe\t_\t_\t_\t_\t4\tdet\t_\t_\n",
    "4\tstar\t_\t_\t_\t_\t2\tdobj\t_\t_\n",
    "5\twith\t_\t_\t_\t_\t7\tcase\t_\t_\n",
    "6\tthe\t_\t_\t_\t_\t7\tdet\t_\t_\n",
    "7\ttelescope\t_\t_\t_\t_\t2\tobl\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"900px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"parsing_figures/telescope1.jpeg\" width=30%/>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Syntactic Ambiguity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th># ID</th>\n",
       "      <th>FORM</th>\n",
       "      <th>LEMMA</th>\n",
       "      <th>UPOS</th>\n",
       "      <th>XPOS</th>\n",
       "      <th>FEATS</th>\n",
       "      <th>HEAD</th>\n",
       "      <th>DEPREL</th>\n",
       "      <th>DEPS</th>\n",
       "      <th>MISC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>I</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>saw</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>0</td>\n",
       "      <td>root</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>the</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>4</td>\n",
       "      <td>det</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>star</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>2</td>\n",
       "      <td>dobj</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>with</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>case</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>the</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>7</td>\n",
       "      <td>det</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>telescope</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "      <td>4</td>\n",
       "      <td>nmod</td>\n",
       "      <td>_</td>\n",
       "      <td>_</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <div id='displacy43' style=\"overflow: scroll; width: 900px;\"></div>\n",
       "    <script>\n",
       "    $(function() {\n",
       "    requirejs.config({\n",
       "        paths: {\n",
       "            'displaCy': ['/files/node_modules/displacy/displacy'],\n",
       "                                                  // strip .js ^, require adds it back\n",
       "        },\n",
       "    });\n",
       "    require(['displaCy'], function() {\n",
       "        console.log(\"Loaded :)\");\n",
       "        const displacy = new displaCy('http://localhost:8000', {\n",
       "            container: '#displacy43',\n",
       "            format: 'spacy',\n",
       "            distance: 150,\n",
       "            offsetX: 0,\n",
       "            wordSpacing: 20,\n",
       "            arrowSpacing: 3,\n",
       "\n",
       "        });\n",
       "        const parse = {\n",
       "            arcs: [{\"start\": 1, \"end\": 2, \"label\": \"nsubj\", \"dir\": \"left\"}, {\"start\": 0, \"end\": 2, \"label\": \"root\", \"dir\": \"right\"}, {\"start\": 6, \"end\": 7, \"label\": \"det\", \"dir\": \"left\"}, {\"start\": 4, \"end\": 7, \"label\": \"nmod\", \"dir\": \"right\"}, {\"start\": 2, \"end\": 4, \"label\": \"dobj\", \"dir\": \"right\"}, {\"start\": 3, \"end\": 4, \"label\": \"det\", \"dir\": \"left\"}, {\"start\": 5, \"end\": 7, \"label\": \"case\", \"dir\": \"left\"}],\n",
       "            words: [{\"text\": \"ROOT\"}, {\"text\": \"I\"}, {\"text\": \"saw\"}, {\"text\": \"the\"}, {\"text\": \"star\"}, {\"text\": \"with\"}, {\"text\": \"the\"}, {\"text\": \"telescope\"}]\n",
       "        };\n",
       "\n",
       "        displacy.render(parse, {\n",
       "            uniqueId: 'render_displacy43'\n",
       "            //color: '#ff0000'\n",
       "        });\n",
       "        return {};\n",
       "    });\n",
       "    });\n",
       "    </script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conllu = \"\"\"\n",
    "# ID\tFORM\tLEMMA\tUPOS\tXPOS\tFEATS\tHEAD\tDEPREL\tDEPS\tMISC\n",
    "1\tI\t_\t_\t_\t_\t2\tnsubj\t_\t_\n",
    "2\tsaw\t_\t_\t_\t_\t0\troot\t_\t_\n",
    "3\tthe\t_\t_\t_\t_\t4\tdet\t_\t_\n",
    "4\tstar\t_\t_\t_\t_\t2\tdobj\t_\t_\n",
    "5\twith\t_\t_\t_\t_\t7\tcase\t_\t_\n",
    "6\tthe\t_\t_\t_\t_\t7\tdet\t_\t_\n",
    "7\ttelescope\t_\t_\t_\t_\t4\tnmod\t_\t_\n",
    "\"\"\"\n",
    "display(HTML(pd.read_csv(StringIO(conllu), sep=\"\\t\").to_html(index=False)))\n",
    "arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))\n",
    "render_displacy(arcs, tokens,\"900px\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"parsing_figures/telescope2.jpg\" width=30%/>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Learning a Transition-Based Parser\n",
    "\n",
    "* Decompose parse tree into a sequence of **actions**\n",
    "* Learn to score individual actions\n",
    "    * Structured prediction problem!\n",
    "    * Sequence labeling? Sequence-to-sequence?\n",
    "\n",
    "<center>\n",
    "    <img src=\"parsing_figures/tb1.png\" width=60%/>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "How to decide what action to take? \n",
    "\n",
    "* Learn a discriminative classifier $p(y | \\x)$ where \n",
    "   * $\\x$ is a representation of buffer, stack and parse\n",
    "   * $y$ is the action to choose\n",
    "* Current state-of-the-art systems use neural networks as classifiers (Bi-LSTMs, Transformers, BERT)\n",
    "* Use **greedy search** or **beam search** to find the highest scoring sequence of steps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"parsing_figures/tb2.png\" width=30%/>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/8292a74aba4eab2ca864b457c17b02634fef4ddd/5-Figure7-1.png\" width=30%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/K18-2010.pdf\">Hershcovich et al., 2018</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Oracle\n",
    "\n",
    "How do we get training data for the classifier?\n",
    "\n",
    "* Training data: whole trees labelled as correct\n",
    "* We need to design an **oracle**\n",
    "    * function that, given a sentence and its dependency tree, recovers the sequence of actions used to construct it\n",
    "    * can also be thought of reverse engineering a tree into a sequence of actions\n",
    "* An oracle does this for every possible parse tree\n",
    "* Oracle can also be thought of as human demonstrator teaching the parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dependency Parsing Evaluation\n",
    "\n",
    "* Unlabeled Attachment Score (**UAS**): % of words with correct head\n",
    "* Labeled Attachment Score (**LAS**): % of words with correct head and label\n",
    "\n",
    "Always 0 $\\leq$ LAS $\\leq$ UAS $\\leq$ 100%."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example: LAS and UAS\n",
    "\n",
    "<center>\n",
    "    <img src=\"parsing_figures/as.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<center>\n",
    "    $\\mathrm{UAS}=\\frac{8}{12}=67\\%$\n",
    "</center>\n",
    "\n",
    "<center>\n",
    "    $\\mathrm{LAS}=\\frac{7}{12}=58\\%$\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### State-of-the-Art in Dependency Parsing\n",
    "\n",
    "* [CoNLL 2018 Shared Task](https://universaldependencies.org/conll18/results-las.html)\n",
    "* [IWPT 2020 Shared Task](http://pauillac.inria.fr/~seddah/coarse_IWPT_SharedTask_unofficial_results.html)\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/d524efd5fe910c0f03c67cd3ba5335d95a5ee4fa/5-Figure1-1.png\" width=60%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://universaldependencies.org/conll18/proceedings/pdf/K18-2005.pdf\">Che et al., 2018</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### NN Parsers\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/a14045a751f5d8ed387c8630a86a3a2861b90643/4-Figure2-1.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/D14-1082.pdf\">Chen and Manning, 2014</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Stack LSTMs\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/396b7932beac62a72288eaea047981cc9a21379a/4-Figure2-1.png\" width=80%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/P15-1033.pdf\">Dyer et al., 2015</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Transition-Based Neural Networks\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/6d671e0e26d239bd6a0b8f67d5fc49a76d733f29/4-Figure3-1.png\" width=100%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://arxiv.org/pdf/1703.04474.pdf\">Kong et al., 2017</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### mBERT for zero-shot cross-lingual parsing\n",
    "\n",
    "<center>\n",
    "    <img src=\"https://d3i71xaburhd42.cloudfront.net/31c872514c28a172f7f0221c8596aa5bfcdb9e98/1-Figure1-1.png\" width=30%/>\n",
    "</center>\n",
    "\n",
    "<div style=\"text-align: right;\">\n",
    "    (from <a href=\"https://www.aclweb.org/anthology/D19-1279.pdf\">Kondratyuk and Straka, 2019</a>)\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Beyond Dependency Parsing: Meaning Representations\n",
    "\n",
    "### https://danielhers.github.io/dikubits_20200218.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Summary\n",
    "\n",
    "* **Dependency parsing** predicts word-to-word dependencies\n",
    "* Simple annotations in many languages, thanks to **UD**\n",
    "* Fast parsing, e.g. **transition-based**\n",
    "* Sufficient for most **down-stream applications**\n",
    "* More sophisticated **meaning representations** are more informative but harder to parse"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Background Material\n",
    "\n",
    "* Arc-standard transition-based parsing system ([Nivre, 2004](https://www.aclweb.org/anthology/W04-0308))\n",
    "* [EACL 2014 tutorial](http://stp.lingfil.uu.se/~nivre/eacl14.html)\n",
    "* Jurafsky & Martin, [Speech and Language Processing (Third Edition)](https://web.stanford.edu/~jurafsky/slp3/15.pdf): Chapter 15, Dependency Parsing."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}