{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Constituent Parsing Exercises\n", "\n", "\n", "\n", "In the lecture we took a look at a simple tokenizer and sentence segmenter. In this exercise we will expand our understanding of the problem by asking a few important questions, and looking at the problem from a different perspectives." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup 1: Load Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "# %cd .. \n", "import sys\n", "sys.path.append(\"..\")\n", "import math \n", "import statnlpbook.util as util\n", "import statnlpbook.parsing as parsing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 1: Understanding parsing\n", "\n", "Be sure you understand [grammatical categories and structures](http://webdelprofesor.ula.ve/humanidades/azapata/materias/english_4/grammatical_categories_structures_and_syntactical_functions.pdf) and brush up on your [grammar skils](http://www.ucl.ac.uk/internet-grammar/intro/intro.htm).\n", "\n", "Then re-visit the [Enju online parser](http://www.nactem.ac.uk/enju/demo.html), and parse the following sentences...\n", "\n", "What is wrong with the parses of the following sentences? Are they correct?\n", "- Fat people eat accumulates.\n", "- The fat that people eat accumulates in their bodies.\n", "- The fat that people eat is accumulating in their bodies.\n", "\n", "What about these, is the problem in the parser or in the sentence?\n", " - The old man the boat.\n", " - The old people man the boat. \n", "\n", "These were examples of garden path sentences, find out what that means.\n", "\n", "What about these sentences? Are their parses correct?\n", " - Time flies like an arrow; fruit flies like a banana.\n", " - We saw her duck." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 2: Parent Annotation\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reminisce the lecture notes in parsing, and the mentioned parent annotation. (grand)*parents, matter - knowing who the parent is in a tree gives a bit of context information which can later help us with smoothing probabilities, and approaching context-dependent parsing.\n", "\n", "in that case, each non-terminal node should know it's parent. We'll do this exercise on a single tree, just to play around a bit with trees and their labeling.\n", "\n", "\n", "Given the following tree:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "He\n", "\n", "\n", "1\n", "\n", "Subj\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "shot\n", "\n", "\n", "3\n", "\n", "Verb\n", "\n", "\n", "3->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "the\n", "\n", "\n", "5\n", "\n", "elephant\n", "\n", "\n", "6\n", "\n", "Obj\n", "\n", "\n", "6->4\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "in\n", "\n", "\n", "8\n", "\n", "his\n", "\n", "\n", "9\n", "\n", "pyjamas\n", "\n", "\n", "10\n", "\n", "PP\n", "\n", "\n", "10->7\n", "\n", "\n", "\n", "\n", "10->8\n", "\n", "\n", "\n", "\n", "10->9\n", "\n", "\n", "\n", "\n", "11\n", "\n", "VP\n", "\n", "\n", "11->3\n", "\n", "\n", "\n", "\n", "11->6\n", "\n", "\n", "\n", "\n", "11->10\n", "\n", "\n", "\n", "\n", "12\n", "\n", "S\n", "\n", "\n", "12->1\n", "\n", "\n", "\n", "\n", "12->11\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])\n", "parsing.render_tree(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Construct an `annotate_parents` function which will take that tree, and annotate its parents. The final annotation result should look like this:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "0\n", "\n", "He\n", "\n", "\n", "1\n", "\n", "Subj^S\n", "\n", "\n", "1->0\n", "\n", "\n", "\n", "\n", "2\n", "\n", "shot\n", "\n", "\n", "3\n", "\n", "Verb^VP\n", "\n", "\n", "3->2\n", "\n", "\n", "\n", "\n", "4\n", "\n", "the\n", "\n", "\n", "5\n", "\n", "elephant\n", "\n", "\n", "6\n", "\n", "Obj^VP\n", "\n", "\n", "6->4\n", "\n", "\n", "\n", "\n", "6->5\n", "\n", "\n", "\n", "\n", "7\n", "\n", "in\n", "\n", "\n", "8\n", "\n", "his\n", "\n", "\n", "9\n", "\n", "pyjamas\n", "\n", "\n", "10\n", "\n", "PP^VP\n", "\n", "\n", "10->7\n", "\n", "\n", "\n", "\n", "10->8\n", "\n", "\n", "\n", "\n", "10->9\n", "\n", "\n", "\n", "\n", "11\n", "\n", "VP^S\n", "\n", "\n", "11->3\n", "\n", "\n", "\n", "\n", "11->6\n", "\n", "\n", "\n", "\n", "11->10\n", "\n", "\n", "\n", "\n", "12\n", "\n", "S^?\n", "\n", "\n", "12->1\n", "\n", "\n", "\n", "\n", "12->11\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = ('S^?', [('Subj^S', ['He']), ('VP^S', [('Verb^VP', ['shot']), ('Obj^VP', ['the', 'elephant']), ('PP^VP', ['in', 'his', 'pyjamas'])])])\n", "parsing.render_tree(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solutions\n", "\n", "You can find the solutions to this exercises [here](parsing_solutions.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }