{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Parsing Natural Language in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**(C) 2018-2024 by [Damir Cavar](http://damir.cavar.me/)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a tutorial related to the discussion of parsing with Probabilistic Context Free Grammars (PCFG) in the class *Advanced Natural Language Processing* taught at Indiana University in Fall 2018 and 2023." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code and tutorial is based on different summer school courses that I taught and tutorials that I gave at different occasions in Europe and the US. This particular example will use code from the **TDAParser.py** and other scripts developed since 2002. Most of this material was used in general introduction courses to algorithms in Natural Language Processing that I taught at Indiana University, University of Konstanz, University of Zadar, University of Nova Gorica." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import sys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Grammar Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us assume that our Phrase Structure Grammar consists of rules that contain one symbol in the Left-Hand Side, followed by a production symbol, an arrow, and by a list of at least one terminal and symbol. Comments can be introduced using the *#* symbol. Every rule has to be contained in one line." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "grammarText = \"\"\"\n", "# PSG1\n", "# small English grammar\n", "# (C) 2005 by Damir Cavar, Indiana University\n", "\n", "# Grammar:\n", "S -> NP VP\n", "\n", "NP -> N\n", "NP -> Adj N\n", "NP -> Art Adj N\n", "NP -> Art N\n", "NP -> Art N PP\n", "#NP -> Art N NP\n", "\n", "VP -> V\n", "VP -> V NP\n", "VP -> Adv V NP\n", "VP -> V PP\n", "VP -> V NP PP\n", "\n", "PP -> P NP\n", "\n", "\n", "# Lexicon:\n", "N -> John\n", "N -> Mary\n", "N -> bench\n", "N -> cat\n", "N -> mouse\n", "\n", "Art -> the\n", "Art -> a\n", "\n", "Adj -> green\n", "Adj -> yellow\n", "Adj -> big\n", "Adj -> small\n", "\n", "Adv -> often\n", "Adv -> yesterday\n", "\n", "V -> kissed\n", "V -> loves\n", "V -> sees\n", "V -> meets\n", "V -> chases\n", "\n", "P -> on\n", "P -> in\n", "P -> beside\n", "P -> under\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can parse this grammar into a representation that allows us to fetch the left- and the right-hand side of a rule for top- or bottom-up parsing." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "class PSG:\n", " def __init__(self, grammar):\n", " self.LHS = {}\n", " self.RHS = {}\n", " self.__read__(grammar)\n", "\n", " def __str__(self):\n", " text = \"\"\n", " for i in self.LHS.keys():\n", " if len(text) > 0:\n", " text += \"\\n\"\n", " for x in self.LHS[i]:\n", " text += i + \" -> \" + \" \".join(x) + \"\\n\"\n", " return text\n", "\n", " def __read__(self, g):\n", " for i in g.split(\"\\n\"):\n", " i = i.split(\"#\")[0].strip() # cut off comment string and strip\n", " if len(i) == 0: continue\n", " tokens = i.split(\"->\")\n", " if len(tokens) != 2: continue\n", " lhs = tokens[0].split()\n", " if len(lhs) != 1: continue\n", " rhs = tuple(tokens[1].split())\n", " value = self.LHS.get(lhs[0], [])\n", " if rhs not in value: value.append(rhs)\n", " self.LHS[lhs[0]] = value\n", " value = self.RHS.get(rhs, [])\n", " if lhs[0] not in value: value.append(lhs[0])\n", " self.RHS[rhs] = value\n", "\n", " def getRHS(self, left):\n", " return self.LHS.get(left, [])\n", "\n", " def getLHS(self, right):\n", " return self.RHS.get(right, [])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The grammar file:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Top-Down Parser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Defining some parameters:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "LIFO = -1\n", "FIFO = 0\n", "strategy = FIFO" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def tdparse(inp, goal, grammar, agenda):\n", " print(\"Got : %s\\tinput: %s\" % (goal, inp))\n", " if goal == inp == []: print(\"Success\")\n", " elif goal == [] or inp == []:\n", " if agenda == []: print(\"Fail: Agenda empty!\")\n", " else:\n", " entry = agenda.pop(strategy)\n", " print(\"Backing up to: %s with %s\" % (entry[0], entry[1]))\n", " tdparse(entry[1], entry[0], grammar, agenda)\n", " else: # there is something in goal and input\n", " if goal[0] == inp[0]: # if initial symbols match, reduce lists, parse\n", " tdparse(inp[1:], goal[1:], grammar, agenda)\n", " else:\n", " for i in grammar.LHS.get(goal[0], []):\n", " if [list(i) + goal[1:], inp] not in agenda:\n", " agenda.append([list(i) + goal[1:], inp])\n", " if len(agenda) > 0:\n", " entry = agenda.pop(strategy)\n", " tdparse(entry[1], entry[0], grammar, agenda)\n", " else: print(\"Fail: Agenda empty!\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S -> NP VP\n", "\n", "NP -> N\n", "NP -> Adj N\n", "NP -> Art Adj N\n", "NP -> Art N\n", "NP -> Art N PP\n", "\n", "VP -> V\n", "VP -> V NP\n", "VP -> Adv V NP\n", "VP -> V PP\n", "VP -> V NP PP\n", "\n", "PP -> P NP\n", "\n", "N -> John\n", "N -> Mary\n", "N -> bench\n", "N -> cat\n", "N -> mouse\n", "\n", "Art -> the\n", "Art -> a\n", "\n", "Adj -> green\n", "Adj -> yellow\n", "Adj -> big\n", "Adj -> small\n", "\n", "Adv -> often\n", "Adv -> yesterday\n", "\n", "V -> kissed\n", "V -> loves\n", "V -> sees\n", "V -> meets\n", "V -> chases\n", "\n", "P -> on\n", "P -> in\n", "P -> beside\n", "P -> under\n", "\n", "Got : ['S']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['NP', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['Art', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['Art', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['Art', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['John', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['VP']\tinput: ('loves', 'Mary')\n", "Got : ['Mary', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['bench', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['cat', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['mouse', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['green', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['yellow', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['big', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['small', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['the', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['a', 'Adj', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['the', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['a', 'N', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['the', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['a', 'N', 'PP', 'VP']\tinput: ('John', 'loves', 'Mary')\n", "Got : ['V']\tinput: ('loves', 'Mary')\n", "Got : ['V', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['Adv', 'V', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['V', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['V', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['kissed']\tinput: ('loves', 'Mary')\n", "Got : ['loves']\tinput: ('loves', 'Mary')\n", "Got : []\tinput: ('Mary',)\n", "Backing up to: ['sees'] with ('loves', 'Mary')\n", "Got : ['sees']\tinput: ('loves', 'Mary')\n", "Got : ['meets']\tinput: ('loves', 'Mary')\n", "Got : ['chases']\tinput: ('loves', 'Mary')\n", "Got : ['kissed', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['loves', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['NP']\tinput: ('Mary',)\n", "Got : ['sees', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['meets', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['chases', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['often', 'V', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['yesterday', 'V', 'NP']\tinput: ('loves', 'Mary')\n", "Got : ['kissed', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['loves', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['PP']\tinput: ('Mary',)\n", "Got : ['sees', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['meets', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['chases', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['kissed', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['loves', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['NP', 'PP']\tinput: ('Mary',)\n", "Got : ['sees', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['meets', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['chases', 'NP', 'PP']\tinput: ('loves', 'Mary')\n", "Got : ['N']\tinput: ('Mary',)\n", "Got : ['Adj', 'N']\tinput: ('Mary',)\n", "Got : ['Art', 'Adj', 'N']\tinput: ('Mary',)\n", "Got : ['Art', 'N']\tinput: ('Mary',)\n", "Got : ['Art', 'N', 'PP']\tinput: ('Mary',)\n", "Got : ['P', 'NP']\tinput: ('Mary',)\n", "Got : ['N', 'PP']\tinput: ('Mary',)\n", "Got : ['Adj', 'N', 'PP']\tinput: ('Mary',)\n", "Got : ['Art', 'Adj', 'N', 'PP']\tinput: ('Mary',)\n", "Got : ['Art', 'N', 'PP', 'PP']\tinput: ('Mary',)\n", "Got : ['John']\tinput: ('Mary',)\n", "Got : ['Mary']\tinput: ('Mary',)\n", "Got : []\tinput: ()\n", "Backing up to: ['bench'] with ('Mary',)\n", "Got : ['bench']\tinput: ('Mary',)\n", "Got : ['cat']\tinput: ('Mary',)\n", "Got : ['mouse']\tinput: ('Mary',)\n", "Got : ['green', 'N']\tinput: ('Mary',)\n", "Got : ['yellow', 'N']\tinput: ('Mary',)\n", "Got : ['big', 'N']\tinput: ('Mary',)\n", "Got : ['small', 'N']\tinput: ('Mary',)\n", "Got : ['the', 'Adj', 'N']\tinput: ('Mary',)\n", "Got : ['a', 'Adj', 'N']\tinput: ('Mary',)\n", "Got : ['the', 'N']\tinput: ('Mary',)\n", "Got : ['a', 'N']\tinput: ('Mary',)\n", "Got : ['the', 'N', 'PP']\tinput: ('Mary',)\n", "Got : ['a', 'N', 'PP']\tinput: ('Mary',)\n", "Got : ['on', 'NP']\tinput: ('Mary',)\n", "Got : ['in', 'NP']\tinput: ('Mary',)\n", "Got : ['beside', 'NP']\tinput: ('Mary',)\n", "Got : ['under', 'NP']\tinput: ('Mary',)\n", "Got : ['John', 'PP']\tinput: ('Mary',)\n", "Got : ['Mary', 'PP']\tinput: ('Mary',)\n", "Got : ['PP']\tinput: ()\n" ] }, { "ename": "IndexError", "evalue": "tuple index out of range", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m3\n\u001b[0;32m 1\u001b[0m myGrammar \u001b[39m=\u001b[39m PSG(grammarText)\n\u001b[0;32m 2\u001b[0m \u001b[39mprint\u001b[39m(myGrammar)\n\u001b[1;32m----> 3\u001b[0m tdparse( (\u001b[39m'\u001b[39;49m\u001b[39mJohn\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mloves\u001b[39;49m\u001b[39m'\u001b[39;49m, \u001b[39m'\u001b[39;49m\u001b[39mMary\u001b[39;49m\u001b[39m'\u001b[39;49m) , [\u001b[39m\"\u001b[39;49m\u001b[39mS\u001b[39;49m\u001b[39m\"\u001b[39;49m], myGrammar, [])\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (5 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (20 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m9\n\u001b[0;32m 7\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[0;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mBacking up to: \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m with \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m%\u001b[39m (entry[\u001b[39m0\u001b[39m], entry[\u001b[39m1\u001b[39m]))\n\u001b[1;32m----> 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (2 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (7 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (19 times), tdparse at line 12 (1 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m9\n\u001b[0;32m 7\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[0;32m 8\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mBacking up to: \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m with \u001b[39m\u001b[39m%s\u001b[39;00m\u001b[39m\"\u001b[39m \u001b[39m%\u001b[39m (entry[\u001b[39m0\u001b[39m], entry[\u001b[39m1\u001b[39m]))\n\u001b[1;32m----> 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", " \u001b[1;31m[... skipping similar frames: tdparse at line 19 (15 times)]\u001b[0m\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 17\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(agenda) \u001b[39m>\u001b[39m \u001b[39m0\u001b[39m:\n\u001b[0;32m 18\u001b[0m entry \u001b[39m=\u001b[39m agenda\u001b[39m.\u001b[39mpop(strategy)\n\u001b[1;32m---> 19\u001b[0m tdparse(entry[\u001b[39m1\u001b[39;49m], entry[\u001b[39m0\u001b[39;49m], grammar, agenda)\n\u001b[0;32m 20\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39mprint\u001b[39m(\u001b[39m\"\u001b[39m\u001b[39mFail: Agenda empty!\u001b[39m\u001b[39m\"\u001b[39m)\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[0;32m 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[1;32m---> 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39;49m:], goal[\u001b[39m1\u001b[39;49m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 14\u001b[0m \u001b[39mfor\u001b[39;00m i \u001b[39min\u001b[39;00m grammar\u001b[39m.\u001b[39mLHS\u001b[39m.\u001b[39mget(goal[\u001b[39m0\u001b[39m], []):\n", "\u001b[1;32mc:\\Users\\damir\\Dropbox\\Develop\\python-tutorial-notebooks\\notebooks\\Parsing Natural Language in Python.ipynb Cell 17\u001b[0m line \u001b[0;36m1\n\u001b[0;32m 9\u001b[0m tdparse(entry[\u001b[39m1\u001b[39m], entry[\u001b[39m0\u001b[39m], grammar, agenda)\n\u001b[0;32m 10\u001b[0m \u001b[39melse\u001b[39;00m: \u001b[39m# there is something in goal and input\u001b[39;00m\n\u001b[1;32m---> 11\u001b[0m \u001b[39mif\u001b[39;00m goal[\u001b[39m0\u001b[39m] \u001b[39m==\u001b[39m inp[\u001b[39m0\u001b[39;49m]: \u001b[39m# if initial symbols match, reduce lists, parse\u001b[39;00m\n\u001b[0;32m 12\u001b[0m tdparse(inp[\u001b[39m1\u001b[39m:], goal[\u001b[39m1\u001b[39m:], grammar, agenda)\n\u001b[0;32m 13\u001b[0m \u001b[39melse\u001b[39;00m:\n", "\u001b[1;31mIndexError\u001b[0m: tuple index out of range" ] } ], "source": [ "myGrammar = PSG(grammarText)\n", "print(myGrammar)\n", "tdparse( ('John', 'loves', 'Mary') , [\"S\"], myGrammar, [])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": false, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }