In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail." ], "text/plain": [ "
{doc_text}\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the code below, we instruct Watson Natural Language Understanding to perform five different kinds of analysis on the example document:\n", "* entities (with sentiment)\n", "* keywords (with sentiment and emotion)\n", "* relations\n", "* semantic_roles\n", "* syntax (with sentences, tokens, and part of speech)\n", "\n", "See [the Watson NLU documentation](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#text-analytics-features) for a full description of the types of analysis that NLU can perform." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Make the request\n", "response = natural_language_understanding.analyze(\n", " text=doc_text,\n", " # TODO: Use this URL once we've pushed the shortened document to Github\n", " #url=\"https://raw.githubusercontent.com/CODAIT/text-extensions-for-pandas/master/resources/holy_grail_short.txt\",\n", " return_analyzed_text=True,\n", " features=nlu.Features(\n", " entities=nlu.EntitiesOptions(sentiment=True, mentions=True),\n", " keywords=nlu.KeywordsOptions(sentiment=True, emotion=True),\n", " relations=nlu.RelationsOptions(),\n", " semantic_roles=nlu.SemanticRolesOptions(),\n", " syntax=nlu.SyntaxOptions(sentences=True, \n", " tokens=nlu.SyntaxOptionsTokens(lemma=True, part_of_speech=True))\n", " )).get_result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The response from the `analyze()` method is a Python dictionary. The dictionary contains an entry \n", "for each pass of analysis requested, plus some additional entries with metadata about the API request\n", "itself. Here's a list of the keys in `response`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['usage', 'syntax', 'semantic_roles', 'relations', 'language', 'keywords', 'entities', 'analyzed_text'])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Perform an Example Task\n", "\n", "Let's use the information that Watson Natural Language Understanding has extracted from our example document to perform an example task: *Find all the pronouns in each sentence, broken down by sentence.*\n", "\n", "This task could serve as first step to a number of more complex tasks, such as \n", "resolving anaphora (for example, associating \"King Arthur\" with \"his\" in the phrase \"King Arthur and his squire, Patsy\") or analyzing the relationship between sentiment and the gender of pronouns.\n", "\n", "We'll start by doing this task using straight Python code that operates directly over the output of Watson NLU's `analyze()` method. Then we'll redo the task using Pandas DataFrames and Text Extensions for Pandas. This exercise will show how Pandas DataFrames can represent the intermediate data structures of an NLP application in a way that is both easier to understand and easier to manipulate with less code.\n", "\n", "Let's begin." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform the Task Without Using Pandas\n", "\n", "All the information that we need to perform our task is in the \"syntax\" section of the response \n", "we captured above from Watson NLU's `analyze()` method. Syntax analysis captures a large amount\n", "of information, so the \"syntax\" section of the response is very verbose. \n", "\n", "For reference, here's the text of our example document again:\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Document Text:
In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail." ], "text/plain": [ "
{doc_text}\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's the output of Watson NLU's syntax analysis, converted to a string:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'tokens': [{'text': 'In',\n", " 'part_of_speech': 'ADP',\n", " 'location': [0, 2],\n", " 'lemma': 'in'},\n", " {'text': 'AD', 'part_of_speech': 'PROPN', 'location': [3, 5], 'lemma': 'Ad'},\n", " {'text': '932', 'part_of_speech': 'NUM', 'location': [6, 9]},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [9, 10]},\n", " {'text': 'King',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [11, 15],\n", " 'lemma': 'King'},\n", " {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [16, 22]},\n", " {'text': 'and',\n", " 'part_of_speech': 'CCONJ',\n", " 'location': [23, 26],\n", " 'lemma': 'and'},\n", " {'text': 'his',\n", " 'part_of_speech': 'PRON',\n", " 'location': [27, 30],\n", " 'lemma': 'his'},\n", " {'text': 'squire',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [31, 37],\n", " 'lemma': 'squire'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [37, 38]},\n", " {'text': 'Patsy',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [39, 44],\n", " 'lemma': 'Patsy'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [44, 45]},\n", " {'text': 'travel',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [46, 52],\n", " 'lemma': 'travel'},\n", " {'text': 'throughout',\n", " 'part_of_speech': 'ADP',\n", " 'location': [53, 63],\n", " 'lemma': 'throughout'},\n", " {'text': 'Britain', 'part_of_speech': 'PROPN', 'location': [64, 71]},\n", " {'text': 'searching',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [72, 81],\n", " 'lemma': 'searching'},\n", " {'text': 'for',\n", " 'part_of_speech': 'ADP',\n", " 'location': [82, 85],\n", " 'lemma': 'for'},\n", " {'text': 'men',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [86, 89],\n", " 'lemma': 'man'},\n", " {'text': 'to',\n", " 'part_of_speech': 'PART',\n", " 'location': [90, 92],\n", " 'lemma': 'to'},\n", " {'text': 'join',\n", " 'part_of_speech': 'VERB',\n", " 'location': [93, 97],\n", " 'lemma': 'join'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [98, 101],\n", " 'lemma': 'the'},\n", " {'text': 'Knights',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [102, 109],\n", " 'lemma': 'Knight'},\n", " {'text': 'of',\n", " 'part_of_speech': 'ADP',\n", " 'location': [110, 112],\n", " 'lemma': 'of'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [113, 116],\n", " 'lemma': 'the'},\n", " {'text': 'Round',\n", " 'part_of_speech': 'ADJ',\n", " 'location': [117, 122],\n", " 'lemma': 'round'},\n", " {'text': 'Table',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [123, 128],\n", " 'lemma': 'table'},\n", " {'text': '.', 'part_of_speech': 'PUNCT', 'location': [128, 129]},\n", " {'text': 'Along',\n", " 'part_of_speech': 'ADP',\n", " 'location': [130, 135],\n", " 'lemma': 'along'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [136, 139],\n", " 'lemma': 'the'},\n", " {'text': 'way',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [140, 143],\n", " 'lemma': 'way'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [143, 144]},\n", " {'text': 'he',\n", " 'part_of_speech': 'PRON',\n", " 'location': [145, 147],\n", " 'lemma': 'he'},\n", " {'text': 'recruits',\n", " 'part_of_speech': 'VERB',\n", " 'location': [148, 156],\n", " 'lemma': 'recruit'},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [157, 160],\n", " 'lemma': 'Sir'},\n", " {'text': 'Bedevere', 'part_of_speech': 'PROPN', 'location': [161, 169]},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [170, 173],\n", " 'lemma': 'the'},\n", " {'text': 'Wise',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [174, 178],\n", " 'lemma': 'Wise'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [178, 179]},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [180, 183],\n", " 'lemma': 'Sir'},\n", " {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [184, 192]},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [193, 196],\n", " 'lemma': 'the'},\n", " {'text': 'Brave',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [197, 202],\n", " 'lemma': 'Brave'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [202, 203]},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [204, 207],\n", " 'lemma': 'Sir'},\n", " {'text': 'Galahad', 'part_of_speech': 'PROPN', 'location': [208, 215]},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [216, 219],\n", " 'lemma': 'the'},\n", " {'text': 'Pure', 'part_of_speech': 'PROPN', 'location': [220, 224]},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [224, 225]},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [226, 229],\n", " 'lemma': 'Sir'},\n", " {'text': 'Robin',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [230, 235],\n", " 'lemma': 'Robin'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [236, 239],\n", " 'lemma': 'the'},\n", " {'text': 'Not', 'part_of_speech': 'PROPN', 'location': [240, 243]},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [243, 244]},\n", " {'text': 'Quite', 'part_of_speech': 'PROPN', 'location': [244, 249]},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [249, 250]},\n", " {'text': 'So',\n", " 'part_of_speech': 'ADV',\n", " 'location': [250, 252],\n", " 'lemma': 'so'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [252, 253]},\n", " {'text': 'Brave',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [253, 258],\n", " 'lemma': 'Brave'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [258, 259]},\n", " {'text': 'as',\n", " 'part_of_speech': 'ADP',\n", " 'location': [259, 261],\n", " 'lemma': 'as'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [261, 262]},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [262, 265],\n", " 'lemma': 'Sir'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [265, 266]},\n", " {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [266, 274]},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [274, 275]},\n", " {'text': 'and',\n", " 'part_of_speech': 'CCONJ',\n", " 'location': [276, 279],\n", " 'lemma': 'and'},\n", " {'text': 'Sir',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [280, 283],\n", " 'lemma': 'Sir'},\n", " {'text': 'Not',\n", " 'part_of_speech': 'ADV',\n", " 'location': [284, 287],\n", " 'lemma': 'not'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [287, 288]},\n", " {'text': 'Appearing', 'part_of_speech': 'PROPN', 'location': [288, 297]},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [297, 298]},\n", " {'text': 'in',\n", " 'part_of_speech': 'ADP',\n", " 'location': [298, 300],\n", " 'lemma': 'in'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [300, 301]},\n", " {'text': 'this',\n", " 'part_of_speech': 'PRON',\n", " 'location': [301, 305],\n", " 'lemma': 'this'},\n", " {'text': '-', 'part_of_speech': 'PUNCT', 'location': [305, 306]},\n", " {'text': 'Film',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [306, 310],\n", " 'lemma': 'Film'},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [310, 311]},\n", " {'text': 'along',\n", " 'part_of_speech': 'ADP',\n", " 'location': [312, 317],\n", " 'lemma': 'along'},\n", " {'text': 'with',\n", " 'part_of_speech': 'ADP',\n", " 'location': [318, 322],\n", " 'lemma': 'with'},\n", " {'text': 'their',\n", " 'part_of_speech': 'PRON',\n", " 'location': [323, 328],\n", " 'lemma': 'their'},\n", " {'text': 'squires',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [329, 336],\n", " 'lemma': 'squire'},\n", " {'text': 'and',\n", " 'part_of_speech': 'CCONJ',\n", " 'location': [337, 340],\n", " 'lemma': 'and'},\n", " {'text': 'Robin',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [341, 346],\n", " 'lemma': 'Robin'},\n", " {'text': \"'s\",\n", " 'part_of_speech': 'PART',\n", " 'location': [346, 348],\n", " 'lemma': \"'s\"},\n", " {'text': 'troubadours',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [349, 360],\n", " 'lemma': 'troubadour'},\n", " {'text': '.', 'part_of_speech': 'PUNCT', 'location': [360, 361]},\n", " {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [362, 368]},\n", " {'text': 'leads',\n", " 'part_of_speech': 'VERB',\n", " 'location': [369, 374],\n", " 'lemma': 'lead'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [375, 378],\n", " 'lemma': 'the'},\n", " {'text': 'men',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [379, 382],\n", " 'lemma': 'man'},\n", " {'text': 'to',\n", " 'part_of_speech': 'ADP',\n", " 'location': [383, 385],\n", " 'lemma': 'to'},\n", " {'text': 'Camelot', 'part_of_speech': 'PROPN', 'location': [386, 393]},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [393, 394]},\n", " {'text': 'but',\n", " 'part_of_speech': 'CCONJ',\n", " 'location': [395, 398],\n", " 'lemma': 'but'},\n", " {'text': 'upon',\n", " 'part_of_speech': 'ADP',\n", " 'location': [399, 403],\n", " 'lemma': 'upon'},\n", " {'text': 'further',\n", " 'part_of_speech': 'ADJ',\n", " 'location': [404, 411],\n", " 'lemma': 'far'},\n", " {'text': 'consideration',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [412, 425],\n", " 'lemma': 'consideration'},\n", " {'text': '(', 'part_of_speech': 'PUNCT', 'location': [426, 427]},\n", " {'text': 'thanks',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [427, 433],\n", " 'lemma': 'thanks'},\n", " {'text': 'to',\n", " 'part_of_speech': 'ADP',\n", " 'location': [434, 436],\n", " 'lemma': 'to'},\n", " {'text': 'a', 'part_of_speech': 'DET', 'location': [437, 438], 'lemma': 'a'},\n", " {'text': 'musical',\n", " 'part_of_speech': 'ADJ',\n", " 'location': [439, 446],\n", " 'lemma': 'musical'},\n", " {'text': 'number',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [447, 453],\n", " 'lemma': 'number'},\n", " {'text': ')', 'part_of_speech': 'PUNCT', 'location': [453, 454]},\n", " {'text': 'he',\n", " 'part_of_speech': 'PRON',\n", " 'location': [455, 457],\n", " 'lemma': 'he'},\n", " {'text': 'decides',\n", " 'part_of_speech': 'VERB',\n", " 'location': [458, 465],\n", " 'lemma': 'decide'},\n", " {'text': 'not',\n", " 'part_of_speech': 'PART',\n", " 'location': [466, 469],\n", " 'lemma': 'not'},\n", " {'text': 'to',\n", " 'part_of_speech': 'PART',\n", " 'location': [470, 472],\n", " 'lemma': 'to'},\n", " {'text': 'go',\n", " 'part_of_speech': 'VERB',\n", " 'location': [473, 475],\n", " 'lemma': 'go'},\n", " {'text': 'there',\n", " 'part_of_speech': 'ADV',\n", " 'location': [476, 481],\n", " 'lemma': 'there'},\n", " {'text': 'because',\n", " 'part_of_speech': 'SCONJ',\n", " 'location': [482, 489],\n", " 'lemma': 'because'},\n", " {'text': 'it',\n", " 'part_of_speech': 'PRON',\n", " 'location': [490, 492],\n", " 'lemma': 'it'},\n", " {'text': 'is',\n", " 'part_of_speech': 'AUX',\n", " 'location': [493, 495],\n", " 'lemma': 'be'},\n", " {'text': '\"', 'part_of_speech': 'PUNCT', 'location': [496, 497]},\n", " {'text': 'a', 'part_of_speech': 'DET', 'location': [497, 498], 'lemma': 'a'},\n", " {'text': 'silly',\n", " 'part_of_speech': 'ADJ',\n", " 'location': [499, 504],\n", " 'lemma': 'silly'},\n", " {'text': 'place',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [505, 510],\n", " 'lemma': 'place'},\n", " {'text': '\"', 'part_of_speech': 'PUNCT', 'location': [510, 511]},\n", " {'text': '.', 'part_of_speech': 'PUNCT', 'location': [511, 512]},\n", " {'text': 'As',\n", " 'part_of_speech': 'SCONJ',\n", " 'location': [513, 515],\n", " 'lemma': 'as'},\n", " {'text': 'they',\n", " 'part_of_speech': 'PRON',\n", " 'location': [516, 520],\n", " 'lemma': 'they'},\n", " {'text': 'turn',\n", " 'part_of_speech': 'VERB',\n", " 'location': [521, 525],\n", " 'lemma': 'turn'},\n", " {'text': 'away', 'part_of_speech': 'ADP', 'location': [526, 530]},\n", " {'text': ',', 'part_of_speech': 'PUNCT', 'location': [530, 531]},\n", " {'text': 'God',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [532, 535],\n", " 'lemma': 'God'},\n", " {'text': '(', 'part_of_speech': 'PUNCT', 'location': [536, 537]},\n", " {'text': 'an',\n", " 'part_of_speech': 'DET',\n", " 'location': [537, 539],\n", " 'lemma': 'a'},\n", " {'text': 'image',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [540, 545],\n", " 'lemma': 'image'},\n", " {'text': 'of',\n", " 'part_of_speech': 'ADP',\n", " 'location': [546, 548],\n", " 'lemma': 'of'},\n", " {'text': 'W.', 'part_of_speech': 'PROPN', 'location': [549, 551]},\n", " {'text': 'G.', 'part_of_speech': 'PROPN', 'location': [552, 554]},\n", " {'text': 'Grace',\n", " 'part_of_speech': 'PROPN',\n", " 'location': [555, 560],\n", " 'lemma': 'Grace'},\n", " {'text': ')', 'part_of_speech': 'PUNCT', 'location': [560, 561]},\n", " {'text': 'speaks',\n", " 'part_of_speech': 'VERB',\n", " 'location': [562, 568],\n", " 'lemma': 'speak'},\n", " {'text': 'to',\n", " 'part_of_speech': 'ADP',\n", " 'location': [569, 571],\n", " 'lemma': 'to'},\n", " {'text': 'them',\n", " 'part_of_speech': 'PRON',\n", " 'location': [572, 576],\n", " 'lemma': 'they'},\n", " {'text': 'and',\n", " 'part_of_speech': 'CCONJ',\n", " 'location': [577, 580],\n", " 'lemma': 'and'},\n", " {'text': 'gives',\n", " 'part_of_speech': 'VERB',\n", " 'location': [581, 586],\n", " 'lemma': 'give'},\n", " {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [587, 593]},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [594, 597],\n", " 'lemma': 'the'},\n", " {'text': 'task',\n", " 'part_of_speech': 'NOUN',\n", " 'location': [598, 602],\n", " 'lemma': 'task'},\n", " {'text': 'of',\n", " 'part_of_speech': 'SCONJ',\n", " 'location': [603, 605],\n", " 'lemma': 'of'},\n", " {'text': 'finding',\n", " 'part_of_speech': 'VERB',\n", " 'location': [606, 613],\n", " 'lemma': 'find'},\n", " {'text': 'the',\n", " 'part_of_speech': 'DET',\n", " 'location': [614, 617],\n", " 'lemma': 'the'},\n", " {'text': 'Holy', 'part_of_speech': 'PROPN', 'location': [618, 622]},\n", " {'text': 'Grail', 'part_of_speech': 'PROPN', 'location': [623, 628]},\n", " {'text': '.', 'part_of_speech': 'PUNCT', 'location': [628, 629]}],\n", " 'sentences': [{'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',\n", " 'location': [0, 129]},\n", " {'text': \"Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.\",\n", " 'location': [130, 361]},\n", " {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\".',\n", " 'location': [362, 512]},\n", " {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',\n", " 'location': [513, 629]}]}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response[\"syntax\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Buried in the above data structure is all the information we need to perform our example task:\n", "* The location of every token in the document.\n", "* The part of speech of every token in the document.\n", "* The location of every sentence in the document.\n", "\n", "The Python code in the next cell uses this information to construct a list of pronouns\n", "in each sentence in the document." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[{'sentence': {'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',\n", " 'location': [0, 129]},\n", " 'pronouns': [{'text': 'his',\n", " 'part_of_speech': 'PRON',\n", " 'location': [27, 30],\n", " 'lemma': 'his'}]},\n", " {'sentence': {'text': \"Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.\",\n", " 'location': [130, 361]},\n", " 'pronouns': [{'text': 'he',\n", " 'part_of_speech': 'PRON',\n", " 'location': [145, 147],\n", " 'lemma': 'he'},\n", " {'text': 'this',\n", " 'part_of_speech': 'PRON',\n", " 'location': [301, 305],\n", " 'lemma': 'this'},\n", " {'text': 'their',\n", " 'part_of_speech': 'PRON',\n", " 'location': [323, 328],\n", " 'lemma': 'their'}]},\n", " {'sentence': {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\".',\n", " 'location': [362, 512]},\n", " 'pronouns': [{'text': 'he',\n", " 'part_of_speech': 'PRON',\n", " 'location': [455, 457],\n", " 'lemma': 'he'},\n", " {'text': 'it',\n", " 'part_of_speech': 'PRON',\n", " 'location': [490, 492],\n", " 'lemma': 'it'}]},\n", " {'sentence': {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',\n", " 'location': [513, 629]},\n", " 'pronouns': [{'text': 'they',\n", " 'part_of_speech': 'PRON',\n", " 'location': [516, 520],\n", " 'lemma': 'they'},\n", " {'text': 'them',\n", " 'part_of_speech': 'PRON',\n", " 'location': [572, 576],\n", " 'lemma': 'they'}]}]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import collections\n", "\n", "# Create a data structure to hold a mapping from sentence identifier\n", "# to a list of pronouns. This step requires defining sentence ids.\n", "def sentence_id(sentence_record: Dict[str, Any]):\n", " return tuple(sentence_record[\"location\"])\n", "\n", "pronouns_by_sentence_id = collections.defaultdict(list)\n", "\n", "# Pass 1: Use nested for loops to identify pronouns and match them with \n", "# their containing sentences.\n", "# Running time: O(num_tokens * num_sentences), i.e. O(document_size^2)\n", "for t in response[\"syntax\"][\"tokens\"]:\n", " pos_str = t[\"part_of_speech\"] # Decode numeric POS enum\n", " if pos_str == \"PRON\":\n", " found_sentence = False\n", " for s in response[\"syntax\"][\"sentences\"]:\n", " if (t[\"location\"][0] >= s[\"location\"][0] \n", " and t[\"location\"][1] <= s[\"location\"][1]):\n", " found_sentence = True\n", " pronouns_by_sentence_id[sentence_id(s)].append(t)\n", " if not found_sentence:\n", " raise ValueError(f\"Token {t} is not in any sentence\")\n", " pass # Make JupyterLab syntax highlighting happy\n", "\n", "# Pass 2: Translate sentence identifiers to full sentence metadata.\n", "sentence_id_to_sentence = {sentence_id(s): s \n", " for s in response[\"syntax\"][\"sentences\"]}\n", "result = [\n", " {\n", " \"sentence\": sentence_id_to_sentence[key],\n", " \"pronouns\": pronouns\n", " }\n", " for key, pronouns in pronouns_by_sentence_id.items()\n", "]\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code above is quite complex given the simplicity of the task. You would need to stare at the previous cell for a few minutes to convince yourself that the algorithm is correct. This implementation also has scalability issues: The worst-case running time of the nested for loops section is proportional to the square of the document length.\n", "\n", "We can do better." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Repeat the Example Task Using Pandas\n", "\n", "Let's revisit the example task we just performed in the previous cell. Again, the task is: *Find all the pronouns in each sentence, broken down by sentence.* This time around, let's perform this task using Pandas.\n", "\n", "Text Extensions for Pandas includes a function `parse_response()` that turns the output of Watson NLU's `analyze()` function into a dictionary of Pandas DataFrames. Let's run our response object through that conversion." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['syntax', 'entities', 'entity_mentions', 'keywords', 'relations', 'semantic_roles'])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = tp.io.watson.nlu.parse_response(response)\n", "dfs.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of each analysis pass that Watson NLU performed is now a DataFrame. \n", "Let's look at the DataFrame for the \"syntax\" pass:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | span | \n", "part_of_speech | \n", "lemma | \n", "sentence | \n", "
---|---|---|---|---|
0 | \n", "[0, 2): 'In' | \n", "ADP | \n", "in | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
1 | \n", "[3, 5): 'AD' | \n", "PROPN | \n", "Ad | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
2 | \n", "[6, 9): '932' | \n", "NUM | \n", "None | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
3 | \n", "[9, 10): ',' | \n", "PUNCT | \n", "None | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
4 | \n", "[11, 15): 'King' | \n", "PROPN | \n", "King | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
142 | \n", "[606, 613): 'finding' | \n", "VERB | \n", "find | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
143 | \n", "[614, 617): 'the' | \n", "DET | \n", "the | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
144 | \n", "[618, 622): 'Holy' | \n", "PROPN | \n", "None | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
145 | \n", "[623, 628): 'Grail' | \n", "PROPN | \n", "None | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
146 | \n", "[628, 629): '.' | \n", "PUNCT | \n", "None | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
147 rows × 4 columns
\n", "\n", " | sentence | \n", "span | \n", "
---|---|---|
7 | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "[27, 30): 'his' | \n", "
31 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[145, 147): 'he' | \n", "
73 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[301, 305): 'this' | \n", "
79 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[323, 328): 'their' | \n", "
104 | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "[455, 457): 'he' | \n", "
111 | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "[490, 492): 'it' | \n", "
120 | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "[516, 520): 'they' | \n", "
135 | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "[572, 576): 'them' | \n", "
\n", " | span | \n", "part_of_speech | \n", "lemma | \n", "sentence | \n", "
---|---|---|---|---|
0 | \n", "[0, 2): 'In' | \n", "ADP | \n", "in | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
1 | \n", "[3, 5): 'AD' | \n", "PROPN | \n", "Ad | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
2 | \n", "[6, 9): '932' | \n", "NUM | \n", "None | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
\n", "\n", "\n", "\n", " In\n", "\n", "\n", "\n", " AD\n", "\n", "\n", "\n", " 932\n", "\n", "\n", "\n", " ,\n", "\n", "\n", "\n", " King\n", "\n", "\n", "\n", " Arthur\n", "\n", "\n", "\n", " and\n", "\n", "\n", "\n", " his\n", "\n", "\n", "\n", " squire\n", "\n", "\n", "\n", " ,\n", " Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.\n", "
\n", "\n", " | span | \n", "part_of_speech | \n", "lemma | \n", "sentence | \n", "
---|---|---|---|---|
0 | \n", "[0, 2): 'In' | \n", "ADP | \n", "in | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
1 | \n", "[3, 5): 'AD' | \n", "PROPN | \n", "Ad | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
2 | \n", "[6, 9): '932' | \n", "NUM | \n", "None | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
\n", "\n", "\n", "\n", " In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.\n", "\n", "\n", "\n", " Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.\n", "\n", "\n", "\n", " Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place".\n", "\n", "\n", "\n", " As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.\n", "\n", "
\n", "\n", " | sentence | \n", "
---|---|
0 | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
27 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "
86 | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "
119 | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
\n", " | sentence | \n", "span | \n", "
---|---|---|
7 | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "[27, 30): 'his' | \n", "
31 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[145, 147): 'he' | \n", "
73 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[301, 305): 'this' | \n", "
79 | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "[323, 328): 'their' | \n", "
104 | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "[455, 457): 'he' | \n", "
111 | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "[490, 492): 'it' | \n", "
120 | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "[516, 520): 'they' | \n", "
135 | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "[572, 576): 'them' | \n", "
\n", "\n", " In AD 932, King Arthur and \n", "\n", " his\n", "\n", " squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) \n", "\n", " he\n", "\n", " decides not to go there because \n", "\n", " it\n", "\n", " is "a silly place". As \n", "\n", " they\n", "\n", " turn away, God (an image of W. G. Grace) speaks to \n", "\n", " them\n", " and gives Arthur the task of finding the Holy Grail.\n", "
\n", "\n", " | arthur_span | \n", "pronoun_span | \n", "sentence | \n", "
---|---|---|---|
0 | \n", "[16, 22): 'Arthur' | \n", "[27, 30): 'his' | \n", "[0, 129): 'In AD 932, King Arthur and his squi... | \n", "
1 | \n", "[362, 368): 'Arthur' | \n", "[455, 457): 'he' | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "
2 | \n", "[362, 368): 'Arthur' | \n", "[490, 492): 'it' | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "
3 | \n", "[587, 593): 'Arthur' | \n", "[516, 520): 'they' | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
4 | \n", "[587, 593): 'Arthur' | \n", "[572, 576): 'them' | \n", "[513, 629): 'As they turn away, God (an image ... | \n", "
\n", " | type | \n", "text | \n", "sentiment.label | \n", "sentiment.score | \n", "relevance | \n", "count | \n", "confidence | \n", "disambiguation.subtype | \n", "disambiguation.name | \n", "disambiguation.dbpedia_resource | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Person | \n", "Sir Bedevere | \n", "positive | \n", "0.835873 | \n", "0.950560 | \n", "1 | \n", "0.982315 | \n", "None | \n", "None | \n", "None | \n", "
1 | \n", "Person | \n", "King Arthur | \n", "neutral | \n", "0.000000 | \n", "0.720381 | \n", "1 | \n", "0.924937 | \n", "None | \n", "None | \n", "None | \n", "
2 | \n", "Person | \n", "Patsy | \n", "neutral | \n", "0.000000 | \n", "0.679300 | \n", "1 | \n", "0.830596 | \n", "None | \n", "None | \n", "None | \n", "
3 | \n", "Person | \n", "Sir Lancelot | \n", "positive | \n", "0.835873 | \n", "0.662902 | \n", "1 | \n", "0.956371 | \n", "[MusicalArtist, TVActor] | \n", "Sir_Lancelot_%28singer%29 | \n", "http://dbpedia.org/resource/Sir_Lancelot_%28si... | \n", "
4 | \n", "Person | \n", "Sir Galahad | \n", "positive | \n", "0.835873 | \n", "0.654170 | \n", "1 | \n", "0.948409 | \n", "None | \n", "None | \n", "None | \n", "
\n", " | type | \n", "text | \n", "span | \n", "confidence | \n", "
---|---|---|---|---|
0 | \n", "Person | \n", "Sir Bedevere | \n", "[157, 169): 'Sir Bedevere' | \n", "0.982315 | \n", "
1 | \n", "Person | \n", "King Arthur | \n", "[11, 22): 'King Arthur' | \n", "0.924937 | \n", "
2 | \n", "Person | \n", "Patsy | \n", "[39, 44): 'Patsy' | \n", "0.830596 | \n", "
3 | \n", "Person | \n", "Sir Lancelot | \n", "[180, 192): 'Sir Lancelot' | \n", "0.956371 | \n", "
4 | \n", "Person | \n", "Sir Galahad | \n", "[204, 215): 'Sir Galahad' | \n", "0.948409 | \n", "
\n", " | type | \n", "text | \n", "span | \n", "confidence | \n", "
---|---|---|---|---|
10 | \n", "Person | \n", "Arthur | \n", "[362, 368): 'Arthur' | \n", "0.996876 | \n", "
11 | \n", "Person | \n", "Arthur | \n", "[587, 593): 'Arthur' | \n", "0.973795 | \n", "
\n", " | type | \n", "text | \n", "span | \n", "confidence_mention | \n", "sentiment.label | \n", "sentiment.score | \n", "relevance | \n", "count | \n", "confidence_entity | \n", "disambiguation.subtype | \n", "disambiguation.name | \n", "disambiguation.dbpedia_resource | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Person | \n", "Arthur | \n", "[362, 368): 'Arthur' | \n", "0.996876 | \n", "positive | \n", "0.721919 | \n", "0.311653 | \n", "2 | \n", "0.999918 | \n", "None | \n", "None | \n", "None | \n", "
1 | \n", "Person | \n", "Arthur | \n", "[587, 593): 'Arthur' | \n", "0.973795 | \n", "positive | \n", "0.721919 | \n", "0.311653 | \n", "2 | \n", "0.999918 | \n", "None | \n", "None | \n", "None | \n", "
\n", " | text | \n", "sentiment.label | \n", "sentiment.score | \n", "relevance | \n", "emotion.sadness | \n", "emotion.joy | \n", "emotion.fear | \n", "emotion.disgust | \n", "emotion.anger | \n", "count | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Sir Bedevere | \n", "positive | \n", "0.835873 | \n", "0.884359 | \n", "0.031301 | \n", "0.496318 | \n", "0.135650 | \n", "0.015545 | \n", "0.022961 | \n", "1 | \n", "
1 | \n", "King Arthur | \n", "neutral | \n", "0.000000 | \n", "0.850874 | \n", "0.441230 | \n", "0.330559 | \n", "0.043714 | \n", "0.020016 | \n", "0.025905 | \n", "1 | \n", "
2 | \n", "Sir Lancelot | \n", "positive | \n", "0.835873 | \n", "0.823645 | \n", "0.031301 | \n", "0.496318 | \n", "0.135650 | \n", "0.015545 | \n", "0.022961 | \n", "1 | \n", "
3 | \n", "image of W. G. Grace | \n", "positive | \n", "0.721919 | \n", "0.722026 | \n", "0.044130 | \n", "0.901205 | \n", "0.039773 | \n", "0.012838 | \n", "0.027599 | \n", "1 | \n", "
4 | \n", "musical number | \n", "neutral | \n", "0.000000 | \n", "0.621432 | \n", "0.312246 | \n", "0.174343 | \n", "0.032726 | \n", "0.077707 | \n", "0.045592 | \n", "1 | \n", "
\n", " | type | \n", "sentence_span | \n", "score | \n", "arguments.0.span | \n", "arguments.1.span | \n", "arguments.0.entities.type | \n", "arguments.1.entities.type | \n", "arguments.0.entities.text | \n", "arguments.1.entities.text | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "partOfMany | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "0.610221 | \n", "[208, 215): 'Galahad' | \n", "[323, 328): 'their' | \n", "Person | \n", "Person | \n", "Galahad | \n", "their | \n", "
1 | \n", "partOfMany | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "0.710112 | \n", "[266, 274): 'Lancelot' | \n", "[323, 328): 'their' | \n", "Person | \n", "Person | \n", "Lancelot | \n", "their | \n", "
2 | \n", "parentOf | \n", "[130, 361): 'Along the way, he recruits Sir Be... | \n", "0.382100 | \n", "[323, 328): 'their' | \n", "[329, 336): 'squires' | \n", "Person | \n", "Person | \n", "their | \n", "squires | \n", "
3 | \n", "residesIn | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "0.492869 | \n", "[362, 368): 'Arthur' | \n", "[386, 393): 'Camelot' | \n", "Person | \n", "GeopoliticalEntity | \n", "King Arthur | \n", "Camelot | \n", "
4 | \n", "locatedAt | \n", "[362, 512): 'Arthur leads the men to Camelot, ... | \n", "0.339446 | \n", "[379, 382): 'men' | \n", "[386, 393): 'Camelot' | \n", "Person | \n", "GeopoliticalEntity | \n", "men | \n", "Camelot | \n", "
\n", " | subject.text | \n", "sentence | \n", "object.text | \n", "action.verb.text | \n", "action.verb.tense | \n", "action.text | \n", "action.normalized | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "for men | \n", "In AD 932, King Arthur and his squire, Patsy, ... | \n", "the Knights of the Round Table | \n", "join | \n", "infinitive | \n", "join | \n", "join | \n", "
1 | \n", "he | \n", "Along the way, he recruits Sir Bedevere the Wi... | \n", "Sir Bedevere the Wise, Sir Lancelot the Brave,... | \n", "recruit | \n", "present | \n", "recruits | \n", "recruit | \n", "
2 | \n", "Arthur | \n", "Arthur leads the men to Camelot, but upon furt... | \n", "the men to Camelot | \n", "lead | \n", "present | \n", "leads | \n", "lead | \n", "
3 | \n", "he | \n", "Arthur leads the men to Camelot, but upon furt... | \n", "not to go there because it is \"a silly place\" | \n", "decide | \n", "present | \n", "decides | \n", "decide | \n", "
4 | \n", "he | \n", "Arthur leads the men to Camelot, but upon furt... | \n", "None | \n", "go | \n", "infinitive | \n", "go | \n", "go | \n", "