{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Parsing Natural Language in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(C) 2018-2024 by [Damir Cavar](http://damir.cavar.me/)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a tutorial related to the discussion of parsing with Probabilistic Context Free Grammars (PCFG) in the class *Advanced Natural Language Processing* taught at Indiana University in Fall 2018 and 2023."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This code and tutorial is based on different summer school courses that I taught and tutorials that I gave at different occasions in Europe and the US. This particular example will use code from the **TDAParser.py** and other scripts developed since 2002. Most of this material was used in general introduction courses to algorithms in Natural Language Processing that I taught at Indiana University, University of Konstanz, University of Zadar, University of Nova Gorica."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import sys"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Grammar Class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us assume that our Phrase Structure Grammar consists of rules that contain one symbol in the Left-Hand Side, followed by a production symbol, an arrow, and by a list of at least one terminal and symbol. Comments can be introduced using the *#* symbol. Every rule has to be contained in one line."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"grammarText = \"\"\"\n",
"# PSG1\n",
"# small English grammar\n",
"# (C) 2005 by Damir Cavar, Indiana University\n",
"\n",
"# Grammar:\n",
"S -> NP VP\n",
"\n",
"NP -> N\n",
"NP -> Adj N\n",
"NP -> Art Adj N\n",
"NP -> Art N\n",
"NP -> Art N PP\n",
"#NP -> Art N NP\n",
"\n",
"VP -> V\n",
"VP -> V NP\n",
"VP -> Adv V NP\n",
"VP -> V PP\n",
"VP -> V NP PP\n",
"\n",
"PP -> P NP\n",
"\n",
"\n",
"# Lexicon:\n",
"N -> John\n",
"N -> Mary\n",
"N -> bench\n",
"N -> cat\n",
"N -> mouse\n",
"\n",
"Art -> the\n",
"Art -> a\n",
"\n",
"Adj -> green\n",
"Adj -> yellow\n",
"Adj -> big\n",
"Adj -> small\n",
"\n",
"Adv -> often\n",
"Adv -> yesterday\n",
"\n",
"V -> kissed\n",
"V -> loves\n",
"V -> sees\n",
"V -> meets\n",
"V -> chases\n",
"\n",
"P -> on\n",
"P -> in\n",
"P -> beside\n",
"P -> under\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can parse this grammar into a representation that allows us to fetch the left- and the right-hand side of a rule for top- or bottom-up parsing."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"class PSG:\n",
" def __init__(self, grammar):\n",
" self.LHS = {}\n",
" self.RHS = {}\n",
" self.__read__(grammar)\n",
"\n",
" def __str__(self):\n",
" text = \"\"\n",
" for i in self.LHS.keys():\n",
" if len(text) > 0:\n",
" text += \"\\n\"\n",
" for x in self.LHS[i]:\n",
" text += i + \" -> \" + \" \".join(x) + \"\\n\"\n",
" return text\n",
"\n",
" def __read__(self, g):\n",
" for i in g.split(\"\\n\"):\n",
" i = i.split(\"#\")[0].strip() # cut off comment string and strip\n",
" if len(i) == 0: continue\n",
" tokens = i.split(\"->\")\n",
" if len(tokens) != 2: continue\n",
" lhs = tokens[0].split()\n",
" if len(lhs) != 1: continue\n",
" rhs = tuple(tokens[1].split())\n",
" value = self.LHS.get(lhs[0], [])\n",
" if rhs not in value: value.append(rhs)\n",
" self.LHS[lhs[0]] = value\n",
" value = self.RHS.get(rhs, [])\n",
" if lhs[0] not in value: value.append(lhs[0])\n",
" self.RHS[rhs] = value\n",
"\n",
" def getRHS(self, left):\n",
" return self.LHS.get(left, [])\n",
"\n",
" def getLHS(self, right):\n",
" return self.RHS.get(right, [])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The grammar file:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Top-Down Parser"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining some parameters:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"LIFO = -1\n",
"FIFO = 0\n",
"strategy = FIFO"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def tdparse(inp, goal, grammar, agenda):\n",
" print(\"Got : %s\\tinput: %s\" % (goal, inp))\n",
" if goal == inp == []: print(\"Success\")\n",
" elif goal == [] or inp == []:\n",
" if agenda == []: print(\"Fail: Agenda empty!\")\n",
" else:\n",
" entry = agenda.pop(strategy)\n",
" print(\"Backing up to: %s with %s\" % (entry[0], entry[1]))\n",
" tdparse(entry[1], entry[0], grammar, agenda)\n",
" else: # there is something in goal and input\n",
" if goal[0] == inp[0]: # if initial symbols match, reduce lists, parse\n",
" tdparse(inp[1:], goal[1:], grammar, agenda)\n",
" else:\n",
" for i in grammar.LHS.get(goal[0], []):\n",
" if [list(i) + goal[1:], inp] not in agenda:\n",
" agenda.append([list(i) + goal[1:], inp])\n",
" if len(agenda) > 0:\n",
" entry = agenda.pop(strategy)\n",
" tdparse(entry[1], entry[0], grammar, agenda)\n",
" else: print(\"Fail: Agenda empty!\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"S -> NP VP\n",
"\n",
"NP -> N\n",
"NP -> Adj N\n",
"NP -> Art Adj N\n",
"NP -> Art N\n",
"NP -> Art N PP\n",
"\n",
"VP -> V\n",
"VP -> V NP\n",
"VP -> Adv V NP\n",
"VP -> V PP\n",
"VP -> V NP PP\n",
"\n",
"PP -> P NP\n",
"\n",
"N -> John\n",
"N -> Mary\n",
"N -> bench\n",
"N -> cat\n",
"N -> mouse\n",
"\n",
"Art -> the\n",
"Art -> a\n",
"\n",
"Adj -> green\n",
"Adj -> yellow\n",
"Adj -> big\n",
"Adj -> small\n",
"\n",
"Adv -> often\n",
"Adv -> yesterday\n",
"\n",
"V -> kissed\n",
"V -> loves\n",
"V -> sees\n",
"V -> meets\n",
"V -> chases\n",
"\n",
"P -> on\n",
"P -> in\n",
"P -> beside\n",
"P -> under\n",
"\n",
"Got : ['S']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['NP', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['Art', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['Art', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['Art', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['John', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['VP']\tinput: ('loves', 'Mary')\n",
"Got : ['Mary', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['bench', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['cat', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['mouse', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['green', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['yellow', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['big', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['small', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['the', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['a', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['the', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['a', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['the', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['a', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n",
"Got : ['V']\tinput: ('loves', 'Mary')\n",
"Got : ['V', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['Adv', 'V', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['V', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['V', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['kissed']\tinput: ('loves', 'Mary')\n",
"Got : ['loves']\tinput: ('loves', 'Mary')\n",
"Got : []\tinput: ('Mary',)\n",
"Backing up to: ['sees'] with ('loves', 'Mary')\n",
"Got : ['sees']\tinput: ('loves', 'Mary')\n",
"Got : ['meets']\tinput: ('loves', 'Mary')\n",
"Got : ['chases']\tinput: ('loves', 'Mary')\n",
"Got : ['kissed', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['loves', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['NP']\tinput: ('Mary',)\n",
"Got : ['sees', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['meets', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['chases', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['often', 'V', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['yesterday', 'V', 'NP']\tinput: ('loves', 'Mary')\n",
"Got : ['kissed', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['loves', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['PP']\tinput: ('Mary',)\n",
"Got : ['sees', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['meets', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['chases', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['kissed', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['loves', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['NP', 'PP']\tinput: ('Mary',)\n",
"Got : ['sees', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['meets', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['chases', 'NP', 'PP']\tinput: ('loves', 'Mary')\n",
"Got : ['N']\tinput: ('Mary',)\n",
"Got : ['Adj', 'N']\tinput: ('Mary',)\n",
"Got : ['Art', 'Adj', 'N']\tinput: ('Mary',)\n",
"Got : ['Art', 'N']\tinput: ('Mary',)\n",
"Got : ['Art', 'N', 'PP']\tinput: ('Mary',)\n",
"Got : ['P', 'NP']\tinput: ('Mary',)\n",
"Got : ['N', 'PP']\tinput: ('Mary',)\n",
"Got : ['Adj', 'N', 'PP']\tinput: ('Mary',)\n",
"Got : ['Art', 'Adj', 'N', 'PP']\tinput: ('Mary',)\n",
"Got : ['Art', 'N', 'PP', 'PP']\tinput: ('Mary',)\n",
"Got : ['John']\tinput: ('Mary',)\n",
"Got : ['Mary']\tinput: ('Mary',)\n",
"Got : []\tinput: ()\n",
"Backing up to: ['bench'] with ('Mary',)\n",
"Got : ['bench']\tinput: ('Mary',)\n",
"Got : ['cat']\tinput: ('Mary',)\n",
"Got : ['mouse']\tinput: ('Mary',)\n",
"Got : ['green', 'N']\tinput: ('Mary',)\n",
"Got : ['yellow', 'N']\tinput: ('Mary',)\n",
"Got : ['big', 'N']\tinput: ('Mary',)\n",
"Got : ['small', 'N']\tinput: ('Mary',)\n",
"Got : ['the', 'Adj', 'N']\tinput: ('Mary',)\n",
"Got : ['a', 'Adj', 'N']\tinput: ('Mary',)\n",
"Got : ['the', 'N']\tinput: ('Mary',)\n",
"Got : ['a', 'N']\tinput: ('Mary',)\n",
"Got : ['the', 'N', 'PP']\tinput: ('Mary',)\n",
"Got : ['a', 'N', 'PP']\tinput: ('Mary',)\n",
"Got : ['on', 'NP']\tinput: ('Mary',)\n",
"Got : ['in', 'NP']\tinput: ('Mary',)\n",
"Got : ['beside', 'NP']\tinput: ('Mary',)\n",
"Got : ['under', 'NP']\tinput: ('Mary',)\n",
"Got : ['John', 'PP']\tinput: ('Mary',)\n",
"Got : ['Mary', 'PP']\tinput: ('Mary',)\n",
"Got : ['PP']\tinput: ()\n"
]
},
{
"ename": "IndexError",
"evalue": "tuple index out of range",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m3\n\u001b[0;32m 1\u001b[0m myGrammar \u001b[39m=\u001b[39m PSG(grammarText)\n\u001b[0;32m 2\u001b[0m \u001b[39mprint\u001b[39m(myGrammar)\n\u001b[1;32m----> 3\u001b[0m tdparse( (\u001b[39m'\u001b[39;49m\u001b[39mJohn\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mloves\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mMary\u001b[39;49m\u001b[39m'\u001b[39;49m) , [\u001b[39m\"\u001b[39;49m\u001b[39mS\u001b[39;49m\u001b[39m\"\u001b[39;49m], myGrammar, [])\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (5 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (20 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m9\n\u001b[0;32m 7\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[0;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mBacking up to: \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m with \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m%\u001b[39m (entry[\u001b[39m0\u001b[39m], entry[\u001b[39m1\u001b[39m]))\n\u001b[1;32m----> 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (2 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (7 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (19 times), tdparse at line 12 (1 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m9\n\u001b[0;32m 7\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[0;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mBacking up to: \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m with \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m%\u001b[39m (entry[\u001b[39m0\u001b[39m], entry[\u001b[39m1\u001b[39m]))\n\u001b[1;32m----> 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
" \u001b[1;31m[... skipping similar frames: tdparse at line 19 (15 times)]\u001b[0m\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n",
"\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39m], entry[\u001b[39m0\u001b[39m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[1;32m---> 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39;49m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[0;32m 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39m:], goal[\u001b[39m1\u001b[39m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n",
"\u001b[1;31mIndexError\u001b[0m: tuple index out of range"
]
}
],
"source": [
"myGrammar = PSG(grammarText)\n",
"print(myGrammar)\n",
"tdparse( ('John', 'loves', 'Mary') , [\"S\"], myGrammar, [])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": false,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}