{ "metadata": { "name": "", "signature": "sha256:19774a4338ed2431fa55b59ae0d61ac71b1bf3af4d34e6d9a8299fd0f60d9666" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This file illustrates a general template for making decisions in NLG:\n", "- Choose the **head word**, given the target meaning and target syntactic category. The [head][1] is the key word that determines the syntax and semantics of a phrase.\n", "- Choose **arguments** to express, assign them to **semantic roles**, and figure out how those semantic roles will be realized through **grammatical relationships**. An [argument][2] is an element that will be interpreted as completing the meaning of the head; a [semantic role][3] (also called thematic role) indicates a particular underlying relationship between an argument and the head, and [grammatical relationships][4] indicate a particular way to realize that relationship in word order and hierarchical structure.\n", "- Generate constituents for the arguments **recursively**\n", "- **Inflect** the head for agreement\n", "- **Linearize** the resulting structure\n", "\n", "[1]: http://en.wikipedia.org/wiki/Head_(linguistics)\n", "[2]: http://en.wikipedia.org/wiki/Argument_(linguistics)\n", "[3]: http://en.wikipedia.org/wiki/Thematic_relation\n", "[4]: http://en.wikipedia.org/wiki/Grammatical_relation\n", "\n", "It also includes a simple python implementation of **feature structures**\n", "- A [feature structure][5] is a recursive object representing the structure or meaning of a complex linguistic object\n", "- The basic items are dictionaries\n", "- Basically, dictionaries pair **features** with **values**, which are strings used to represent symbolic information\n", "- Recursively, dictionaries can pair **features** with **other feature structures**\n", "\n", "[5]: http://en.wikipedia.org/wiki/Feature_structure\n", "\n", "The process of NLG records results and decisions in feature structures. Feature structures are mutable objects that are shared across NLG decision making. So changes made in one part of the derivation become visible in other parts of NLG. This information sharing makes processes of agreement, linearization and the like easy.\n", "\n", "This file illustrates everything with a classic \"locative alternation\", also called the \"spray/load\" alternation, because of the verbs it occurs with. [This link][6] gives you some resources to learn more about this variability in English verbs.\n", "\n", "[6]: http://allthingslinguistic.com/post/82327954516/list-of-verbs-grouped-by-their-syntactic-processes\n", "\n", "Here is a simple feature structure that we will start with. It initializes a feature structure with a semantic description, characterizing a change of location event in which one worker acts as the agent performing the action, four machines are moved, and their destination is the place on two trucks. This particular representation is due to [Ray Jackendoff][7], but unfortunately there doesn't seem to be a nice informal introduction to it online. \n", "\n", "[7]: http://en.wikipedia.org/wiki/Ray_Jackendoff\n", "\n", "This demo was inspired by [a blog post by Pablo Duboue][8].\n", "\n", "[8]:http://duboue.net/blog5.html" ] }, { "cell_type": "code", "collapsed": true, "input": [ "global m1 \n", "m1 = {}\n", "def reset():\n", " global m1\n", " m1 = {\"semantics\": {\"event\": \"change-of-location\",\n", " \"agent\": {\"category\": \"worker\",\n", " \"number\": 1},\n", " \"moved\": {\"category\": \"machine\",\n", " \"number\": 4},\n", " \"place\": {\"relation\": \"on\",\n", " \"landmark\": {\"category\": \"truck\",\n", " \"number\": 2}\n", " }}}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a function that applies one template for the word `load`. It checks to make sure that what we have is a loading event, and that we know everything about the event that we need to know to describe this event as `(X) loading Z (with Y)`. If we have this information, we pick the word load, and assign underlying grammatical relationships: `X` is the underlying subject, `Z` is the underlying object, and `Y` is an underlying `with` prepositional phrase." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def apply_load_with(fs) :\n", " features = fs[\"semantics\"] \n", " if \"event\" not in features:\n", " return False\n", " if features[\"event\"] != \"change-of-location\":\n", " return False\n", " if \"place\" not in features:\n", " return False\n", " if \"relation\" not in features[\"place\"] or \"landmark\" not in features[\"place\"]:\n", " return False\n", " if features[\"place\"][\"relation\"] != \"on\":\n", " return False\n", " fs[\"verb-stem\"] = \"load\"\n", " fs[\"u-obj\"] = features[\"place\"][\"landmark\"]\n", " if \"agent\" in features:\n", " fs[\"u-subj\"] = features[\"agent\"]\n", " if \"moved\" in features:\n", " fs[\"u-pp-obj\"] = features[\"moved\"]\n", " fs[\"u-pp-obj\"][\"role-marker\"] = \"with\"\n", " return True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function that applies the other template for the word `load`. It checks to make sure that what we have is a loading event, and that we know everything about the event that we need to know to describe this event as `(X) loading Y (on Z)`. If we have this information, we pick the word load, and assign underlying grammatical relationships: `X` is the underlying subject, `Y` is the underlying object, and `Z` is an underlying `on` prepositional phrase." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def apply_load_on(fs) :\n", " features = fs[\"semantics\"] \n", " if \"event\" not in features:\n", " return False\n", " if features[\"event\"] != \"change-of-location\":\n", " return False\n", " if \"moved\" not in features:\n", " return False\n", " fs[\"verb-stem\"] = \"load\"\n", " fs[\"u-obj\"] = features[\"moved\"]\n", " if \"agent\" in features:\n", " fs[\"u-subj\"] = features[\"agent\"]\n", " if \"place\" in features and \"landmark\" in features[\"place\"] and \"relation\" in features[\"place\"]:\n", " fs[\"u-pp-obj\"] = features[\"place\"][\"landmark\"]\n", " fs[\"u-pp-obj\"][\"role-marker\"] = features[\"place\"][\"relation\"]\n", " return True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next functions translate the underlying semantic roles to relationships in surface syntax, and realzie the arguments recursively. There are two options. In an active sentence, the underlying subject is the surface subject; the underlying object is the surface object, and the underlying prepositional phrases are realized as modifiers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def realize_active(fs) :\n", " if \"u-subj\" not in fs:\n", " return False\n", " fs[\"voice\"] = \"active\"\n", " realize_np(fs[\"u-subj\"])\n", " fs[\"subj\"] = fs[\"u-subj\"]\n", " if \"u-obj\" in fs:\n", " realize_np(fs[\"u-obj\"])\n", " fs[\"dobj\"] = fs[\"u-obj\"]\n", " if \"u-pp-obj\" in fs:\n", " realize_np(fs[\"u-pp-obj\"])\n", " fs[\"mod\"] = [fs[\"u-pp-obj\"]] \n", " return True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a passive sentence, the underlying subject is the surface object; the underlying subject is realized with a `by` prepositional phrase, and the underlying prepositional phrases are realized as modifiers:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "def realize_passive(fs) :\n", " if \"u-obj\" not in fs:\n", " return False\n", " fs[\"voice\"] = \"passive\"\n", " realize_np(fs[\"u-obj\"])\n", " fs[\"subj\"] = fs[\"u-obj\"]\n", " fs[\"mod\"] = []\n", " if \"u-subj\" in fs :\n", " realize_np(fs[\"u-subj\"])\n", " fs[\"u-subj\"][\"role-marker\"] = \"by\"\n", " fs[\"mod\"].append(fs[\"u-subj\"])\n", " if \"u-pp-obj\" in fs:\n", " realize_np(fs[\"u-pp-obj\"])\n", " fs[\"mod\"].append(fs[\"u-pp-obj\"])\n", " return True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a simple way to realize noun phrases. You could write this recursively, because in general noun phrases have complicated structures, but this is a start..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def realize_np(fs):\n", " if \"number\" not in fs or fs[\"number\"] == 1:\n", " fs[\"g-number\"] = \"singular\"\n", " fs[\"string\"] = \"the \" + fs[\"category\"] \n", " return True\n", " fs[\"g-number\"] = \"plural\"\n", " fs[\"string\"] = \"the \" + str(fs[\"number\"]) + \" \" + fs[\"category\"] + \"s\"\n", " return True" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is another sketch: handling agreement. What you should really do is distinguish regular verbs and irregular verbs, whose forms are listed. In the rule for regular verbs, you need to keep track of whether the verb forms the present singular with `-s` or `-es` and whether it forms the past with `-d` or `-ed`. But making that table is easy -- it's a big list. The hard part is making sure the information is available to make the right choice. This function shows how a feature structure puts all the information together to make it work..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def inflect_verb(fs):\n", " stem = fs[\"verb-stem\"]\n", " voice = fs[\"voice\"]\n", " number = fs[\"subj\"][\"g-number\"]\n", " if voice == \"active\" :\n", " if number == \"singular\" :\n", " fs[\"verb-form\"] = stem + \"s\"\n", " else:\n", " fs[\"verb-form\"] = stem\n", " else:\n", " if number == \"singular\" :\n", " fs[\"verb-form\"] = \"is \" + stem + \"ed\"\n", " else:\n", " fs[\"verb-form\"] = \"are \" + stem + \"ed\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Linearizing is turning a structure into a string. Here we just join all the constituents in order, separated by spaces." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def linearize(fs) :\n", " items = [ fs[\"subj\"][\"string\"], fs[\"verb-form\"] ]\n", " if \"dobj\" in fs :\n", " items.append(fs[\"dobj\"][\"string\"])\n", " if \"mod\" in fs :\n", " for x in fs[\"mod\"]:\n", " items.append(x[\"role-marker\"] + \" \" + x[\"string\"])\n", " fs[\"string\"] = ' '.join(items)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function puts it all together. You give it a feature structure, and the option to prefer `with` or `on` to describe movement and `active` or `passive` voice. The function tries multiple alternatives (since not all the information may necessarily be available) but prefers what you've specified as input." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def describe_loading(fs, role='with', voice='active') :\n", " if role=='with':\n", " pattern1, pattern2 = apply_load_with, apply_load_on\n", " else:\n", " pattern1, pattern2 = apply_load_on, apply_load_with\n", " if not pattern1(fs) and not pattern2(fs) :\n", " return None\n", " if voice=='active':\n", " voice1, voice2 = realize_active, realize_passive\n", " else:\n", " voice1, voice2 = realize_passive, realize_active\n", " if not voice1(fs) and not voice2(fs) :\n", " return None\n", " inflect_verb(fs)\n", " linearize(fs)\n", " return fs[\"string\"]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The initial semantics" ] }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); print m1" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "{'semantics': {'place': {'landmark': {'category': 'truck', 'number': 2}, 'relation': 'on'}, 'moved': {'category': 'machine', 'number': 4}, 'event': 'change-of-location', 'agent': {'category': 'worker', 'number': 1}}}\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Four different grammatical realizations, combining the locative alternation and the active/passive alternation." ] }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "'the worker loads the 2 trucks with the 4 machines'" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); describe_loading(m1, role=\"on\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "'the worker loads the 4 machines on the 2 trucks'" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); describe_loading(m1, voice=\"passive\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "'the 2 trucks are loaded by the worker with the 4 machines'" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); describe_loading(m1, role=\"on\", voice=\"passive\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "'the 4 machines are loaded by the worker on the 2 trucks'" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preferences are overridden when we have only partial information about the event. Perhaps this leads to ambiguity..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"agent\"]; describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "'the 2 trucks are loaded with the 4 machines'" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"agent\"]; describe_loading(m1, role=\"on\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "'the 4 machines are loaded on the 2 trucks'" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"place\"]; describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "'the worker loads the 4 machines'" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"moved\"]; describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "'the worker loads the 2 trucks'" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"moved\"]; del m1[\"semantics\"][\"agent\"]; \n", "describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "'the 2 trucks are loaded'" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "reset(); del m1[\"semantics\"][\"place\"]; del m1[\"semantics\"][\"agent\"]; \n", "describe_loading(m1)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "'the 4 machines are loaded'" ] } ], "prompt_number": 20 } ], "metadata": {} } ] }