{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "import statnlpbook.parsing as parsing\n", "util.execute_notebook('parsing.ipynb')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "\n", "$$\n", "\\newcommand{\\Xs}{\\mathcal{X}}\n", "\\newcommand{\\Ys}{\\mathcal{Y}}\n", "\\newcommand{\\y}{\\mathbf{y}}\n", "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\newcommand{\\aligns}{\\mathbf{a}}\n", "\\newcommand{\\align}{a}\n", "\\newcommand{\\source}{\\mathbf{s}}\n", "\\newcommand{\\target}{\\mathbf{t}}\n", "\\newcommand{\\ssource}{s}\n", "\\newcommand{\\starget}{t}\n", "\\newcommand{\\repr}{\\mathbf{f}}\n", "\\newcommand{\\repry}{\\mathbf{g}}\n", "\\newcommand{\\x}{\\mathbf{x}}\n", "\\newcommand{\\prob}{p}\n", "\\newcommand{\\a}{\\alpha}\n", "\\newcommand{\\b}{\\beta}\n", "\\newcommand{\\vocab}{V}\n", "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n", "\\newcommand{\\param}{\\theta}\n", "\\DeclareMathOperator{\\perplexity}{PP}\n", "\\DeclareMathOperator{\\argmax}{argmax}\n", "\\DeclareMathOperator{\\argmin}{argmin}\n", "\\newcommand{\\train}{\\mathcal{D}}\n", "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n", "\\newcommand{\\length}[1]{\\text{length}(#1) }\n", "\\newcommand{\\indi}{\\mathbb{I}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Parsing" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%HTML\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Motivation \n", "\n", "Say want to automatically build a database of this form\n", "\n", "| Brand | Parent |\n", "|---------|-----------|\n", "| KitKat | Nestle |\n", "| Lipton | Unilever | \n", "| ... | ... | \n", "\n", "or this [graph](http://geekologie.com/image.php?path=/2012/04/25/parent-companies-large.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Say you find positive textual mentions in this form:\n", "\n", "> Dechra Pharmaceuticals has made its second acquisition after purchasing Genitrix." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "> Trinity Mirror plc is the largest British newspaper after purchasing rival Local World." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Can you find a pattern? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How about this sentence \n", "\n", "> Kraft is gearing up for a roll-out of its Milka brand after purchasing Cadbury Dairy Milk.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Wouldn't it be great if we knew that\n", "\n", "* Kraft is the **subject** of the phrase **purchasing Cadbury Dairy Milk** " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Check out [enju parser](http://www.nactem.ac.uk/enju/demo.html#2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Parsing is is the process of **finding these trees**:\n", "\n", "* very important for downstream applications\n", "* the \"celebrity\" sub-field of NLP \n", " * partly because it marries linguistics and NLP\n", "* researched bigly in academia and [industry](http://www.telegraph.co.uk/technology/2016/05/17/has-googles-parsey-mcparseface-just-solved-one-of-the-worlds-big/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How is this done?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Syntax\n", "from the Greek syntaxis (arrangement):\n", "\n", "* **Constituency**: groups of words act as single units.\n", "* **Grammatical Relations**: object, subject, direct object etc. \n", "* **Subcategorization**: restrictions on the type of phrases that go with certain words.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Constituency\n", "\n", "* **Noun Phrase** (NP)\n", " * a roll-out of its Milka brand\n", " * Cadbury Dairy Milk\n", " * a roll-out\n", "* **Verb Phrase** (VP) \n", " * is gearing up\n", " * purchasing Cadbury Dairy Milk \n", "* **Prepositional Phrase** (PP)\n", " * of its Milka brand\n", " * after purchasing Cadbury Dairy Milk" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Grammatical Relations\n", "> Kraft is gearing up for a roll-out of its Milka brand after purchasing Cadbury Dairy Milk.\n", "\n", "* *Subject* of purchasing: **Kraft**\n", "* *Object* of purchasing: **Cadbury Dairy Milk**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Subcategorization\n", "\n", "There are more complex (sub) categories of verbs (and other types of words)\n", "\n", "* Intransitive Verbs: must not have objects\n", " * the student works\n", "* Transitive Verbs: must have exactly one object\n", " * Kraft purchased Cadbury Dairy Milk\n", "* Ditransitive Verbs: must have two objects\n", " * Give me a break! \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Context Free Grammars \n", "\n", "Formalise syntax by describing the hierarchical structure of sentences\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A **Context Free Grammar** (CFG) is a 4-tuple \\\\(G=(N,\\Sigma,R,S)\\\\) where\n", "\n", " * \\\\(N\\\\) is a set of _non-terminal symbols_.\n", " * \\\\(\\Sigma\\\\) is a set of _terminal symbols_.\n", " * \\\\(R\\\\) is a finite set of _rules_ \\\\(X \\rightarrow Y_1 Y_2\\ldots Y_n\\\\) where \\\\(X \\in N\\\\) and \\\\(Y_i \\in N \\cup \\Sigma\\\\). \n", " * \\\\(S \\in N\\\\) is a _start symbol_. \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Simple example grammar:\n", "* NP_p : plural Noun Phrase\n", "* NP_s : singular Noun Phrase\n", "* VP_s/p: same for verb phrases" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
SNP_p VP_p
SNP_s VP_s
NP_pMatko raps
VP_pare ADJ
NP_sMatko
VP_sraps in StatNLP
ADJsilly
" ], "text/plain": [ "<__main__.CFG at 0x7fc1755f5860>" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cfg = CFG.from_rules([('S', ['NP_p','VP_p']),('S',['NP_s','VP_s']), \n", " ('NP_p', ['Matko', 'raps']),\n", " ('VP_p', ['are', 'ADJ']),\n", " ('NP_s', ['Matko']),\n", " ('VP_s', ['raps', 'in', 'StatNLP']),\n", " ('ADJ', ['silly'])\n", " ])\n", "cfg" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## (Left-most) Derivation\n", "The structure of a sentence with respect to a grammar can be described by its **derivation** (if it exists) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Sequence of sequences \\\\(s_1 \\ldots s_n\\\\) such that \n", "\n", "* \\\\(s_1 = S\\\\)\n", " * first sequence is the start symbol\n", "* \\\\(s_n \\in \\Sigma^*\\\\)\n", " * last sequence consists of only terminals.\n", "* \\\\(s_i\\\\) for \\\\(i > 1\\\\)\n", " * replace left-most non-terminal \\\\(\\alpha\\\\) in $s_{i-1}$ with right-hand of $\\alpha\\rightarrow \\beta_1,\\ldots,\\beta_n$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
S
NP_p VP_p
Matko raps VP_p
Matko raps are ADJ
Matko raps are silly
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "util.Table(generate_deriv(cfg, [cfg.s]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Parse Trees\n", "Represent derivations as trees" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "VP_p\n", "\n", "\n", "5->3\n", "\n", "\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "S\n", "\n", "\n", "6->2\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree = ('S', [('NP_p',['Matko','raps']), ('VP_p',['are','silly'])])\n", "parsing.render_tree(tree)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "NP_s\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "raps\n", "\n", "\n", "3\n", "\n", "in\n", "\n", "\n", "4\n", "\n", "StatNLP\n", "\n", "\n", "5\n", "\n", "VP_s\n", "\n", "\n", "5->2\n", "\n", "\n", "\n", "\n", "5->3\n", "\n", "\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "S\n", "\n", "\n", "6->1\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_tree(generate_tree(cfg,'S')) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Parsing\n", "The inverse problem: given a sentence \n", "\n", "> Matko raps in StatNLP\n", "\n", "What's the derivation for it? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There are a couple of approaches to find a legal parse tree given a sentence and grammar:\n", "\n", "* **Top-Down**: Start with the start symbol and generate trees\n", " * backtrack if they do not match observed sentence\n", "* **Bottom-Up**: Start with the sentence, find rules that generate parts of it\n", " * backtrack if you can't reach the start symbol\n", "* **Dynamic Programming**: Explore several trees in parallel and re-use computations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Bottom-Up Parsing with Backtracking\n", "\n", "Incrementally build up a tree **left-to-right**, and maintain ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "a **buffer** of remaining words" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
Matko raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "\n", "Init
" ], "text/plain": [ ".Output at 0x7fc14d1f3320>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[0:1])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "a **stack** of trees build so far" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "\n", "Shift
" ], "text/plain": [ ".Output at 0x7fc14d189240>" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[13:14])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Perform three types of **actions**:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Shift\n", "Put first word from buffer to stack (as singleton tree)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "
Matko raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "\n", "Init
raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "\n", "Shift
" ], "text/plain": [ ".Output at 0x7fc1755f51d0>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[0:2])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Reduce\n", "For rule $X \\rightarrow Y \\: Z$ and stack $Y \\: Z$, create new tree headed with $X$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "\n", "Shift
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc1755f56a0>" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[11:13])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Backtrack\n", "If no rule can be found and the buffer is empty, go back to last decision point" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "\n", "Backtrack
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "\n", "Shift
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc1755f52b0>" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[10:13])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
SNP_p VP_p
SNP_s VP_s
NP_pMatko raps
VP_pare ADJ
NP_sMatko
VP_sraps in StatNLP
ADJsilly
" ], "text/plain": [ "<__main__.CFG at 0x7fc1755f5860>" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sentence = ['Matko', 'raps', 'are', 'silly']\n", "transitions = bottom_up_parse(cfg, sentence)\n", "cfg" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "\n", "Backtrack
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "\n", "Shift
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "\n", "Reduce
silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "\n", "Shift
" ], "text/plain": [ ".Output at 0x7fc14d199978>" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[10:14])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "ADJ\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "VP_p\n", "\n", "\n", "6->3\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "S\n", "\n", "\n", "7->2\n", "\n", "\n", "\n", "\n", "7->6\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_forest(transitions[-1][0].stack)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Dynamic Programming for Parsing\n", "Bottom-up parser repeats the same work several times" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "NP_s\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "raps\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "ADJ\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "VP_p\n", "\n", "\n", "6->3\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc14d1dee80>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[7:8]) " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
raps are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "\n", "Backtrack
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "\n", "Shift
are silly\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc1755f5ba8>" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[10:13])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "ADJ\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "VP_p\n", "\n", "\n", "6->3\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc175ecaf98>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[-2:-1])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Fortunately we can **cache** these computations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Chomsky Normal Form\n", "Algorithm for caching requires **Chomsky Normal Form**\n", "\n", "Rules have form:\n", "\n", "* \\\\(\\alpha \\rightarrow \\beta \\gamma\\\\) where \\\\(\\beta,\\gamma \\in N \\setminus S \\\\). \n", " * rule with exactly two non-terminals on RHS\n", "* \\\\(\\alpha \\rightarrow t\\\\) where \\\\(t \\in \\Sigma\\\\)\n", " * rule that expands to single \n", " terminal" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Conversion to CNF\n", "We can convert every CFG into an equivalent CFG in CNF\n", "\n", "Replace left rules by right rules: \n", "\n", "* $\\alpha \\rightarrow \\beta \\gamma \\delta \\Rightarrow \\alpha \\rightarrow \\beta\\alpha', \\alpha' \\rightarrow \\gamma \\delta$\n", "* $\\alpha \\rightarrow \\beta t \\Rightarrow \\alpha \\rightarrow \\beta \\alpha', \\alpha' \\rightarrow t$ where $t \\in \\Sigma$\n", "* $\\alpha \\rightarrow \\beta, \\beta \\rightarrow \\gamma \\delta \\Rightarrow \\alpha \\rightarrow \\gamma \\delta, \\beta \\rightarrow \\gamma \\delta$ \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Example\n", "\n", "$S \\rightarrow NP \\: VP \\: PP$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "becomes $S \\rightarrow S' \\: PP$ and $S' \\rightarrow NP \\: VP$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$VP \\rightarrow \\text{are} \\: ADJ$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "becomes $VP \\rightarrow X \\: ADJ$ and $X \\rightarrow \\text{are}$" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
SNP_p VP_p
SNP_s VP_s
NP_p_0Matko
NP_p_1raps
NP_pNP_p_0 NP_p_1
VP_p_2are
VP_pVP_p_2 ADJ
NP_sMatko
VP_s_4raps
VP_sVP_s_4 VP_s_3
VP_s_3_5in
VP_s_3_6StatNLP
VP_s_3VP_s_3_5 VP_s_3_6
ADJsilly
" ], "text/plain": [ "<__main__.CFG at 0x7fc14d0499e8>" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cnf_cfg = to_cnf(cfg)\n", "cnf_cfg" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Cocke–Younger–Kasami (CYK) Algorithm\n", "\n", "**Incrementally** build all parse trees for **spans of increasing length**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Like the one for \"are silly\" and \"Matko Raps\":" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "ADJ\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "VP_p\n", "\n", "\n", "6->3\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "\n", "Reduce
" ], "text/plain": [ ".Output at 0x7fc14d049d68>" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_transitions(transitions[16:17]) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### CYK Algorithm\n", "Populate chart with non-terminal $l$ for span $(i,j)$ \n", "\n", "if $j=i$\n", "* Add label $l$ if $l \\rightarrow x_i \\in R$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "if $j>i$\n", "* Consider all *middle* indices $m$ \n", "* **combine trees** of span $(i,m)$ and $(m+1,j)$ with labels $l_1$ and $l_2$\n", "* if there is a rule $l \\rightarrow l_1 \\: l_2 \\in R$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Best done in a **chart** to store \n", "* legal non-terminals per span \n", "* and back-pointers to child spans" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_s, NP_p_0NP_p_2
1: rapsVP_s_6, NP_p_1
2: are
3: silly
" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chart = parsing.Chart(sentence)\n", "chart.append_label(0,0,'NP_s')\n", "chart.append_label(0,0,'NP_p_0')\n", "chart.append_label(1,1,'VP_s_6')\n", "chart.append_label(1,1,'NP_p_1')\n", "chart.append_label(0,1,'NP_p_2', [(0,0,'NP_p_0'),(1,1,'NP_p_1')]) \n", "chart.mark(0, 1, 'NP_p_2')\n", "chart.mark_target(0,1)\n", "chart" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
SNP_p VP_p
SNP_s VP_s
NP_p_0Matko
NP_p_1raps
NP_pNP_p_0 NP_p_1
VP_p_2are
VP_pVP_p_2 ADJ
NP_sMatko
VP_s_4raps
VP_sVP_s_4 VP_s_3
VP_s_3_5in
VP_s_3_6StatNLP
VP_s_3VP_s_3_5 VP_s_3_6
ADJsilly
" ], "text/plain": [ "<__main__.CFG at 0x7fc14d0499e8>" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cnf_cfg" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Previous\n", "  \n", " Next\n", "
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: Matko
1: raps
2: are
3: silly
1 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: Matko
1: raps
2: are
3: silly
2 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0
1: raps
2: are
3: silly
3 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: raps
2: are
3: silly
4 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: raps
2: are
3: silly
5 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1
2: are
3: silly
6 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: are
3: silly
7 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: are
3: silly
8 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: silly
9 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: silly
10 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
11 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
12 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_s
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
13 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
14 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
15 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
16 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
17 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
18 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
19 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
20 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
21 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
22 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
23 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
24 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
25 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
26 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
27 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
28 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
29 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
30 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
31 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
32 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
33 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_pS
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
34 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_pS
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
35 / 36
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_pS
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
36 / 36
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trace = cyk(cnf_cfg, sentence)\n", "util.Carousel(trace)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The chart can be **traversed backwards** to get all trees" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Previous\n", "  \n", " Next\n", "
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_pS
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
1 / 4
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_pS
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
2 / 4
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2VP_p
3: sillyADJ
3 / 4
\n", "
\n", "\n", "\n", "\n", "
0: Matko1: raps2: are3: silly
0: MatkoNP_p_0, NP_sNP_p
1: rapsNP_p_1, VP_s_4
2: areVP_p_2
3: sillyADJ
4 / 4
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "util.Carousel([trace[i] for i in [35,33,22,13]])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "NP_p_0\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "raps\n", "\n", "\n", "3\n", "\n", "NP_p_1\n", "\n", "\n", "3->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "NP_p\n", "\n", "\n", "4->1\n", "\n", "\n", "\n", "\n", "4->3\n", "\n", "\n", "\n", "\n", "5\n", "\n", "are\n", "\n", "\n", "6\n", "\n", "VP_p_2\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "silly\n", "\n", "\n", "8\n", "\n", "ADJ\n", "\n", "\n", "8->7\n", "\n", "\n", "\n", "\n", "9\n", "\n", "VP_p\n", "\n", "\n", "9->6\n", "\n", "\n", "\n", "\n", "9->8\n", "\n", "\n", "\n", "\n", "10\n", "\n", "S\n", "\n", "\n", "10->4\n", "\n", "\n", "\n", "\n", "10->9\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse_result = trace[-1].derive_trees()[0]\n", "parsing.render_tree(parse_result)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Collapse **CNF non-terminals**" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "Matko\n", "\n", "\n", "1\n", "\n", "raps\n", "\n", "\n", "2\n", "\n", "NP_p\n", "\n", "\n", "2->0\n", "\n", "\n", "\n", "\n", "2->1\n", "\n", "\n", "\n", "\n", "3\n", "\n", "are\n", "\n", "\n", "4\n", "\n", "silly\n", "\n", "\n", "5\n", "\n", "ADJ\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "6\n", "\n", "VP_p\n", "\n", "\n", "6->3\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "S\n", "\n", "\n", "7->2\n", "\n", "\n", "\n", "\n", "7->6\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parsing.render_tree(parsing.filter_non_terminals(parse_result, cfg.n))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Ambiguity \n", "For real world grammars many phrases have **several legal parse trees**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Consider the following grammar and sentence" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hidePrompt": true, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "amb_cfg = CFG.from_rules([\n", " ('S', ['Subj','VP']),\n", " ('Subj', ['He']),\n", " ('Verb', ['shot']),\n", " ('VP', ['Verb', 'Obj']), ('VP', ['Verb', 'Obj', 'PP']),\n", " ('PP', ['in','his','pyjamas']),\n", " ('Obj', ['the','elephant']), ('Obj', ['the','elephant','PP'])\n", " ])\n", "amb_cnf_cfg = to_cnf(amb_cfg)\n", "amb_sentence = [\"He\", \"shot\", \"the\", \"elephant\", \"in\", \"his\", \"pyjamas\"]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
SSubj VP
SubjHe
Verbshot
VPVerb Obj
VPVerb Obj PP
PPin his pyjamas
Objthe elephant
Objthe elephant PP
" ], "text/plain": [ "<__main__.CFG at 0x7fc14d19a518>" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "amb_cfg" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "'He shot the elephant in his pyjamas'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\" \".join(amb_sentence)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "amb_trace = cyk(amb_cnf_cfg, amb_sentence)\n", "amb_parse_results = amb_trace[-1].derive_trees()\n", "def ambiguous_tree(num):\n", " return parsing.render_tree(parsing.filter_non_terminals(amb_parse_results[num],amb_cfg.n)) # try results[1]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "He\n", "\n", "\n", "1\n", "\n", "Subj\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "shot\n", "\n", "\n", "3\n", "\n", "Verb\n", "\n", "\n", "3->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "the\n", "\n", "\n", "5\n", "\n", "elephant\n", "\n", "\n", "6\n", "\n", "Obj\n", "\n", "\n", "6->4\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "in\n", "\n", "\n", "8\n", "\n", "his\n", "\n", "\n", "9\n", "\n", "pyjamas\n", "\n", "\n", "10\n", "\n", "PP\n", "\n", "\n", "10->7\n", "\n", "\n", "\n", "\n", "10->8\n", "\n", "\n", "\n", "\n", "10->9\n", "\n", "\n", "\n", "\n", "11\n", "\n", "VP\n", "\n", "\n", "11->3\n", "\n", "\n", "\n", "\n", "11->6\n", "\n", "\n", "\n", "\n", "11->10\n", "\n", "\n", "\n", "\n", "12\n", "\n", "S\n", "\n", "\n", "12->1\n", "\n", "\n", "\n", "\n", "12->11\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ambiguous_tree(1) # try tree 1 " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**prepositional phrase attachment ambiguity**: \"in his pyjamas\" could be \n", "\n", "* in verb phrase (in pyjamas when shooting)\n", "* in noun phrase (elephant in pyjamas)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Both readings are grammatical, but one is **more probable**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Probabilistic Context Free Grammars\n", "[Probabilistic Context Free Grammars](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf) (PFCGs) are Context Free Grammars in which rules have probabilities \n", "\n", "* A Context Free Grammar \\\\(G(N,\\Sigma,R,S)\\\\)\n", "* A parameter \\\\(\\param(\\alpha \\rightarrow \\beta) \\in [0,1]\\\\) for each rule \\\\(\\alpha \\rightarrow \\beta \\in R\\\\) \n", "* For each left hand side \\\\(\\alpha \\in N\\\\) we require \\\\(\\sum_\\beta \\param(\\alpha \\rightarrow \\beta) = 1\\\\)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A PCFG defines probability for parse tree \\\\(\\mathbf{t}\\\\) containing the rules \\\\(\\alpha_1 \\rightarrow \\beta_1, \\ldots, \\alpha_n \\rightarrow \\beta_n\\\\):\n", "$$\n", " \\newcommand{parse}{\\mathbf{t}}\n", " p_{\\param}(\\parse) = \\prod_i^n \\param(\\alpha_i \\rightarrow \\beta_i) \n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Example PCFG:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
S1.0Subj VP
Subj1.0He
Verb1.0shot
VP0.3Verb Obj
VP0.7Verb Obj PP
PP1.0in his pyjamas
Obj0.5the elephant
Obj0.5the elephant PP
" ], "text/plain": [ "<__main__.PCFG at 0x7fc14d161b38>" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pcfg = PCFG.from_rules([\n", " ('S', 1.0, ['Subj','VP']),\n", " ('Subj', 1.0, ['He']),\n", " ('Verb', 1.0, ['shot']),\n", " ('VP', 0.3, ['Verb', 'Obj']), ('VP', 0.7, ['Verb', 'Obj', 'PP']),\n", " ('PP', 1.0, ['in','his','pyjamas']),\n", " ('Obj', 0.5, ['the','elephant']), ('Obj', 0.5, ['the','elephant','PP'])\n", " ])\n", "pcfg" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Parsing\n", "\n", "For given sentence $\\x$, let $\\Ys(\\x,G)$ be all trees $\\mathbf{t}$ with $\\x$ as terminals:\n", "\n", "$$\n", "\\argmax_{\\mathbf{t} \\in \\Ys(\\x,G)} \\prob_\\params(\\mathbf{t}) \n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## CYK for PCFGs\n", "We can use a variant of the CYK algorithm to solve the prediction problem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Populate chart with non-terminal $l$ for span $(i,j)$ **and score $s$**\n", "\n", "if $j=i$\n", "* Add label $l$ **with score $\\theta(l \\rightarrow x_i )$** if $l \\rightarrow x_i \\in R$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "if $j>i$\n", "* Consider all *middle* indices $m$ \n", "* combine trees of span $(i,m)$ and $(m+1,j)$ with labels $l_1$ and $l_2$ and scores $s_1$ and $s_2$\n", " * **and score $\\theta(l \\rightarrow l_1 \\: l_2) \\times s_1 \\times s_2$**\n", "* if there is a rule $l \\rightarrow l_1 \\: l_2 \\in R$" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hideCode": true, "hidePrompt": true, "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
S1.0Subj VP
Subj1.0He
Verb1.0shot
VP0.3Verb Obj
VP0.7Verb VP_0
VP_01.0Obj PP
PP_21.0in
PP1.0PP_2 PP_1
PP_1_31.0his
PP_1_41.0pyjamas
PP_11.0PP_1_3 PP_1_4
Obj_51.0the
Obj_61.0elephant
Obj0.5Obj_5 Obj_6
Obj_81.0the
Obj0.5Obj_8 Obj_7
Obj_7_91.0elephant
Obj_71.0Obj_7_9 PP
" ], "text/plain": [ "<__main__.PCFG at 0x7fc14d0bd4e0>" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cnf_pcfg" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "hideCode": true, "hidePrompt": true, "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Previous\n", "  \n", " Next\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: He
1: shot
2: the
3: elephant
4: in
5: his
6: pyjamas
1 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: He
1: shot
2: the
3: elephant
4: in
5: his
6: pyjamas
2 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shot
2: the
3: elephant
4: in
5: his
6: pyjamas
3 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shot
2: the
3: elephant
4: in
5: his
6: pyjamas
4 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: the
3: elephant
4: in
5: his
6: pyjamas
5 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: the
3: elephant
4: in
5: his
6: pyjamas
6 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00
3: elephant
4: in
5: his
6: pyjamas
7 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephant
4: in
5: his
6: pyjamas
8 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephant
4: in
5: his
6: pyjamas
9 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00
4: in
5: his
6: pyjamas
10 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: in
5: his
6: pyjamas
11 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: in
5: his
6: pyjamas
12 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: his
6: pyjamas
13 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: his
6: pyjamas
14 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamas
15 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamas
16 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
17 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
18 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
19 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
20 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
21 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
22 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
23 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
24 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
25 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
26 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
27 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
28 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
29 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
30 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
31 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
32 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
33 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
34 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00
6: pyjamasPP_1_4: 0.00
35 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
36 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
37 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
38 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
39 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
40 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
41 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
42 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
43 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
44 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
45 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
46 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
47 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
48 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
49 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
50 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
51 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
52 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
53 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
54 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
55 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
56 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
57 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
58 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
59 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
60 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
61 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
62 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
63 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
64 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
65 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
66 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
67 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
68 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
69 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
70 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
71 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
72 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
73 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
74 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
75 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
76 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
77 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
78 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
79 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
80 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
81 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
82 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
83 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
84 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
85 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
86 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
87 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
88 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
89 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
90 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
91 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
92 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
93 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
94 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
95 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.90
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
96 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
97 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
98 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
99 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
100 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
101 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
102 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
103 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
104 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
105 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
106 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
107 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
108 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
109 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
110 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
111 / 112
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0: He1: shot2: the3: elephant4: in5: his6: pyjamas
0: HeSubj: 0.00S: -1.90S: -1.05
1: shotVerb: 0.00VP: -1.90VP: -1.05
2: theObj_5: 0.00, Obj_8: 0.00Obj: -0.69Obj: -0.69, VP_0: -0.69
3: elephantObj_6: 0.00, Obj_7_9: 0.00Obj_7: 0.00
4: inPP_2: 0.00PP: 0.00
5: hisPP_1_3: 0.00PP_1: 0.00
6: pyjamasPP_1_4: 0.00
112 / 112
\n", "
\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "util.Carousel(pcyk_trace)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Runtime with respect to sentence length? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Resolve parse by going backwards ... " ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "hideCode": true, "hidePrompt": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "He\n", "\n", "\n", "1\n", "\n", "Subj\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "shot\n", "\n", "\n", "3\n", "\n", "Verb\n", "\n", "\n", "3->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "the\n", "\n", "\n", "5\n", "\n", "elephant\n", "\n", "\n", "6\n", "\n", "Obj\n", "\n", "\n", "6->4\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "in\n", "\n", "\n", "8\n", "\n", "his\n", "\n", "\n", "9\n", "\n", "pyjamas\n", "\n", "\n", "10\n", "\n", "PP\n", "\n", "\n", "10->7\n", "\n", "\n", "\n", "\n", "10->8\n", "\n", "\n", "\n", "\n", "10->9\n", "\n", "\n", "\n", "\n", "11\n", "\n", "VP\n", "\n", "\n", "11->3\n", "\n", "\n", "\n", "\n", "11->6\n", "\n", "\n", "\n", "\n", "11->10\n", "\n", "\n", "\n", "\n", "12\n", "\n", "S\n", "\n", "\n", "12->1\n", "\n", "\n", "\n", "\n", "12->11\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pcyk_trace = pcyk(cnf_pcfg, amb_sentence)\n", "parsing.render_tree(parsing.filter_non_terminals(pcyk_trace[-1].derive_trees()[0],pcfg.cfg.n))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Learning\n", "\n", "Learning for PCFGs :\n", "\n", "1. What should the rules in the grammar be?\n", "2. What should the probabilities associated with these rules be?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Need corpus of parse trees $\\train=(\\parse_1, \\ldots, \\parse_n)$ \n", "\n", "* English: [Penn Treebank Project](https://www.cis.upenn.edu/~treebank/) parses for the 1989 Wall Street Journal (among other sources). \n", "* Other languages: e.g. [Chinese](https://catalog.ldc.upenn.edu/LDC2013T21)\n", "* Other domains: e.g. [Biomedical Papers](www.nactem.ac.uk/aNT/genia.html)\n", "\n", "Annotation expensive and need experts, major bottleneck in parsing research. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To learn the parameters $\\params$ of the model we can again use the maximum-likelihood criterium:\n", "\n", "$$\n", "\\params^* = \\argmax_\\params \\sum_{\\parse \\in \\train} \\log \\prob_\\params(\\parse)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Amounts to **counting**\n", "\n", "$$\n", " \\param(\\alpha \\rightarrow \\beta) = \\frac{\\counts{\\train}{\\alpha \\rightarrow \\beta}}{\\counts{\\train}{\\alpha}}\n", "$$\n", "\n", "Details omitted here, as you have seen this before" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Advanced: Parent Annotation\n", "\n", "In practice \n", "\n", "* Let $X^Y$ be a non-terminal $X$ with parent $Y$\n", "* **Grandparents** matter\n", " * $NP^{VP} \\rightarrow NP \\: PP$ vs \n", " * $NP^{PP} \\rightarrow NP \\: PP$ \n", "* Can be captured by labelling nodes in training trees with their parent\n", " * Same machinery" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Advanced: Head Driven PCFG\n", "\n", "In practice \n", "\n", "* **VP NP** is not necessarily less or more likely than **VP NP PP**\n", "* But **elephant** in **pyjamas** is very unlikely\n", "* PCFGs must model relations between important words (\"heads\")\n", " * $PP^{NP(\\text{elephant})} \\rightarrow IN \\: NP(\\text{pyjamas})$ vs\n", " * $PP^{VP(\\text{shot})} \\rightarrow IN \\: NP(\\text{pyjamas})$\n", "* Needs more complex model and search algorithms" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background Material\n", "\n", "* [Mike Collins' PCFG lecture](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf)\n", "* Jurafsky & Martin, Chapter 12, Statistical Parsing" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }