{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=6>\n",
    "    <b>Analyze_Text.ipynb:</b> Analyze Text with Pandas and Watson Natural Language Understanding\n",
    " </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "This notebook shows how the open source library [Text Extensions for Pandas](https://github.com/CODAIT/text-extensions-for-pandas) lets you use [Pandas](https://pandas.pydata.org/) DataFrames and the [Watson Natural Language Understanding](https://www.ibm.com/cloud/watson-natural-language-understanding) service to analyze natural language text. \n",
    "\n",
    "We start out with an excerpt from the [plot synopsis from the Wikipedia page\n",
    "for *Monty Python and the Holy Grail*](https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail#Plot). \n",
    "We pass this example document to the Watson Natural Language \n",
    "Understanding (NLU) service. Then we use Text Extensions for Pandas to convert the output of the \n",
    "Watson NLU service to Pandas DataFrames. Next, we perform an example analysis task both with \n",
    "and without Pandas to show how Pandas makes analyzing NLP information easier. Finally, we \n",
    "walk through all the different DataFrames that Text Extensions for Pandas can extract from \n",
    "the output of Watson Natural Language Understanding."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Environment Setup\n",
    "\n",
    "This notebook requires a Python 3.7 or later environment with the following packages:\n",
    "* The dependencies listed in the [\"requirements.txt\" file for Text Extensions for Pandas](https://github.com/CODAIT/text-extensions-for-pandas/blob/master/requirements.txt)\n",
    "* The \"[ibm-watson](https://pypi.org/project/ibm-watson/)\" package, available via `pip install ibm-watson`\n",
    "* `text_extensions_for_pandas`\n",
    "\n",
    "You can satisfy the dependency on `text_extensions_for_pandas` in either of two ways:\n",
    "\n",
    "* Run `pip install text_extensions_for_pandas` before running this notebook. This command adds the library to your Python environment.\n",
    "* Run this notebook out of your local copy of the Text Extensions for Pandas project's [source tree](https://github.com/CODAIT/text-extensions-for-pandas). In this case, the notebook will use the version of Text Extensions for Pandas in your local source tree **if the package is not installed in your Python environment**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Core Python libraries\n",
    "import json\n",
    "import os\n",
    "import sys\n",
    "import pandas as pd\n",
    "from typing import *\n",
    "\n",
    "# IBM Watson libraries\n",
    "import ibm_watson\n",
    "import ibm_watson.natural_language_understanding_v1 as nlu\n",
    "import ibm_cloud_sdk_core\n",
    "\n",
    "# And of course we need the text_extensions_for_pandas library itself.\n",
    "try:\n",
    "    import text_extensions_for_pandas as tp\n",
    "except ModuleNotFoundError as e:\n",
    "    # If we're running from within the project source tree and the parent Python\n",
    "    # environment doesn't have the text_extensions_for_pandas package, use the\n",
    "    # version in the local source tree.\n",
    "    if not os.getcwd().endswith(\"notebooks\"):\n",
    "        raise e\n",
    "    if \"..\" not in sys.path:\n",
    "        sys.path.insert(0, \"..\")\n",
    "    import text_extensions_for_pandas as tp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set up the Watson Natural Language Understanding Service\n",
    "\n",
    "In this part of the notebook, we will use the Watson Natural Language Understanding (NLU) service to extract key features from our example document.\n",
    "\n",
    "You can create an instance of Watson NLU on the IBM Cloud for free by navigating to [this page](https://www.ibm.com/cloud/watson-natural-language-understanding) and clicking on the button marked \"Get started free\". You can also install your own instance of Watson NLU on [OpenShift](https://www.openshift.com/) by using [IBM Watson Natural Language Understanding for IBM Cloud Pak for Data](\n",
    "https://catalog.redhat.com/software/operators/detail/5e9873e13f398525a0ceafe5).\n",
    "\n",
    "You'll need two pieces of information to access your instance of Watson NLU: An **API key** and a **service URL**. If you're using Watson NLU on the IBM Cloud, you can find your API key and service URL in the IBM Cloud web UI. Navigate to the [resource list](https://cloud.ibm.com/resources) and click on your instance of Natural Language Understanding to open the management UI for your service. Then click on the \"Manage\" tab to show a page with your API key and service URL.\n",
    "\n",
    "The cell that follows assumes that you are using the environment variables `IBM_API_KEY` and `IBM_SERVICE_URL` to store your credentials. If you're running this notebook in Jupyter on your laptop, you can set these environment variables while starting up `jupyter notebook` or `jupyter lab`. For example:\n",
    "``` console\n",
    "IBM_API_KEY='<my API key>' \\\n",
    "IBM_SERVICE_URL='<my service URL>' \\\n",
    "  jupyter lab\n",
    "```\n",
    "\n",
    "Alternately, you can uncomment the first two lines of code below to set the `IBM_API_KEY` and `IBM_SERVICE_URL` environment variables directly.\n",
    "**Be careful not to store your API key in any publicly-accessible location!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you need to embed your credentials inline, uncomment the following two lines and\n",
    "# paste your credentials in the indicated locations.\n",
    "# os.environ[\"IBM_API_KEY\"] = \"<API key goes here>\"\n",
    "# os.environ[\"IBM_SERVICE_URL\"] = \"<Service URL goes here>\"\n",
    "\n",
    "# Retrieve the API key for your Watson NLU service instance\n",
    "if \"IBM_API_KEY\" not in os.environ:\n",
    "    raise ValueError(\"Expected Watson NLU api key in the environment variable 'IBM_API_KEY'\")\n",
    "api_key = os.environ.get(\"IBM_API_KEY\")\n",
    "    \n",
    "# Retrieve the service URL for your Watson NLU service instance\n",
    "if \"IBM_SERVICE_URL\" not in os.environ:\n",
    "    raise ValueError(\"Expected Watson NLU service URL in the environment variable 'IBM_SERVICE_URL'\")\n",
    "service_url = os.environ.get(\"IBM_SERVICE_URL\")  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Connect to the Watson Natural Language Understanding Python API\n",
    "\n",
    "This notebook uses the IBM Watson Python SDK to perform authentication on the IBM Cloud via the \n",
    "`IAMAuthenticator` class. See [the IBM Watson Python SDK documentation](https://github.com/watson-developer-cloud/python-sdk#iam) for more information. \n",
    "\n",
    "We start by using the API key and service URL from the previous cell to create an instance of the\n",
    "Python API for Watson NLU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<ibm_watson.natural_language_understanding_v1.NaturalLanguageUnderstandingV1 at 0x7fc05134dc70>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "natural_language_understanding = ibm_watson.NaturalLanguageUnderstandingV1(\n",
    "    version=\"2019-07-12\",\n",
    "    authenticator=ibm_cloud_sdk_core.authenticators.IAMAuthenticator(api_key)\n",
    ")\n",
    "natural_language_understanding.set_service_url(service_url)\n",
    "natural_language_understanding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pass a Document through the Watson NLU Service\n",
    "\n",
    "Once you've opened a connection to the Watson NLU service, you can pass documents through \n",
    "the service by invoking the [`analyze()` method](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#analyze).\n",
    "\n",
    "The [example document](https://raw.githubusercontent.com/CODAIT/text-extensions-for-pandas/master/resources/holy_grail_short.txt) that we use here is an excerpt from\n",
    "the plot summary for *Monty Python and the Holy Grail*, drawn from the [Wikipedia entry](https://en.wikipedia.org/wiki/Monty_Python_and_the_Holy_Grail) for that movie.\n",
    "\n",
    "Let's show what the raw text looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Document Text:</b><blockquote>In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.</blockquote>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display, HTML\n",
    "doc_file = \"../resources/holy_grail_short.txt\"\n",
    "with open(doc_file, \"r\") as f:\n",
    "    doc_text = f.read()\n",
    "\n",
    "display(HTML(f\"<b>Document Text:</b><blockquote>{doc_text}</blockquote>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the code below, we instruct Watson Natural Language Understanding to perform five different kinds of analysis on the example document:\n",
    "* entities (with sentiment)\n",
    "* keywords (with sentiment and emotion)\n",
    "* relations\n",
    "* semantic_roles\n",
    "* syntax (with sentences, tokens, and part of speech)\n",
    "\n",
    "See [the Watson NLU documentation](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#text-analytics-features) for a full description of the types of analysis that NLU can perform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make the request\n",
    "response = natural_language_understanding.analyze(\n",
    "    text=doc_text,\n",
    "    # TODO: Use this URL once we've pushed the shortened document to Github\n",
    "    #url=\"https://raw.githubusercontent.com/CODAIT/text-extensions-for-pandas/master/resources/holy_grail_short.txt\",\n",
    "    return_analyzed_text=True,\n",
    "    features=nlu.Features(\n",
    "        entities=nlu.EntitiesOptions(sentiment=True, mentions=True),\n",
    "        keywords=nlu.KeywordsOptions(sentiment=True, emotion=True),\n",
    "        relations=nlu.RelationsOptions(),\n",
    "        semantic_roles=nlu.SemanticRolesOptions(),\n",
    "        syntax=nlu.SyntaxOptions(sentences=True, \n",
    "                                 tokens=nlu.SyntaxOptionsTokens(lemma=True, part_of_speech=True))\n",
    "    )).get_result()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The response from the `analyze()` method is a Python dictionary. The dictionary contains an entry \n",
    "for each pass of analysis requested, plus some additional entries with metadata about the API request\n",
    "itself. Here's a list of the keys in `response`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['usage', 'syntax', 'semantic_roles', 'relations', 'language', 'keywords', 'entities', 'analyzed_text'])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Perform an Example Task\n",
    "\n",
    "Let's use the information that Watson Natural Language Understanding has extracted from our example document to perform an example task: *Find all the pronouns in each sentence, broken down by sentence.*\n",
    "\n",
    "This task could serve as first step to a number of more complex tasks, such as \n",
    "resolving anaphora (for example, associating \"King Arthur\" with \"his\" in the phrase \"King Arthur and his squire, Patsy\") or analyzing the relationship between sentiment and the gender of pronouns.\n",
    "\n",
    "We'll start by doing this task using straight Python code that operates directly over the output of Watson NLU's `analyze()` method. Then we'll redo the task using Pandas DataFrames and Text Extensions for Pandas. This exercise will show how Pandas DataFrames can represent the intermediate data structures of an NLP application in a way that is both easier to understand and easier to manipulate with less code.\n",
    "\n",
    "Let's begin."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform the Task Without Using Pandas\n",
    "\n",
    "All the information that we need to perform our task is in the \"syntax\" section of the response \n",
    "we captured above from Watson NLU's `analyze()` method. Syntax analysis captures a large amount\n",
    "of information, so the \"syntax\" section of the response is very verbose. \n",
    "\n",
    "For reference, here's the text of our example document again:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>Document Text:</b><blockquote>In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.</blockquote>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML(f\"<b>Document Text:</b><blockquote>{doc_text}</blockquote>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's the output of Watson NLU's syntax analysis, converted to a string:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'tokens': [{'text': 'In',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [0, 2],\n",
       "   'lemma': 'in'},\n",
       "  {'text': 'AD', 'part_of_speech': 'PROPN', 'location': [3, 5], 'lemma': 'Ad'},\n",
       "  {'text': '932', 'part_of_speech': 'NUM', 'location': [6, 9]},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [9, 10]},\n",
       "  {'text': 'King',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [11, 15],\n",
       "   'lemma': 'King'},\n",
       "  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [16, 22]},\n",
       "  {'text': 'and',\n",
       "   'part_of_speech': 'CCONJ',\n",
       "   'location': [23, 26],\n",
       "   'lemma': 'and'},\n",
       "  {'text': 'his',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [27, 30],\n",
       "   'lemma': 'his'},\n",
       "  {'text': 'squire',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [31, 37],\n",
       "   'lemma': 'squire'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [37, 38]},\n",
       "  {'text': 'Patsy',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [39, 44],\n",
       "   'lemma': 'Patsy'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [44, 45]},\n",
       "  {'text': 'travel',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [46, 52],\n",
       "   'lemma': 'travel'},\n",
       "  {'text': 'throughout',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [53, 63],\n",
       "   'lemma': 'throughout'},\n",
       "  {'text': 'Britain', 'part_of_speech': 'PROPN', 'location': [64, 71]},\n",
       "  {'text': 'searching',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [72, 81],\n",
       "   'lemma': 'searching'},\n",
       "  {'text': 'for',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [82, 85],\n",
       "   'lemma': 'for'},\n",
       "  {'text': 'men',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [86, 89],\n",
       "   'lemma': 'man'},\n",
       "  {'text': 'to',\n",
       "   'part_of_speech': 'PART',\n",
       "   'location': [90, 92],\n",
       "   'lemma': 'to'},\n",
       "  {'text': 'join',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [93, 97],\n",
       "   'lemma': 'join'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [98, 101],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Knights',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [102, 109],\n",
       "   'lemma': 'Knight'},\n",
       "  {'text': 'of',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [110, 112],\n",
       "   'lemma': 'of'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [113, 116],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Round',\n",
       "   'part_of_speech': 'ADJ',\n",
       "   'location': [117, 122],\n",
       "   'lemma': 'round'},\n",
       "  {'text': 'Table',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [123, 128],\n",
       "   'lemma': 'table'},\n",
       "  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [128, 129]},\n",
       "  {'text': 'Along',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [130, 135],\n",
       "   'lemma': 'along'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [136, 139],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'way',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [140, 143],\n",
       "   'lemma': 'way'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [143, 144]},\n",
       "  {'text': 'he',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [145, 147],\n",
       "   'lemma': 'he'},\n",
       "  {'text': 'recruits',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [148, 156],\n",
       "   'lemma': 'recruit'},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [157, 160],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': 'Bedevere', 'part_of_speech': 'PROPN', 'location': [161, 169]},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [170, 173],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Wise',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [174, 178],\n",
       "   'lemma': 'Wise'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [178, 179]},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [180, 183],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [184, 192]},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [193, 196],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Brave',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [197, 202],\n",
       "   'lemma': 'Brave'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [202, 203]},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [204, 207],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': 'Galahad', 'part_of_speech': 'PROPN', 'location': [208, 215]},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [216, 219],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Pure', 'part_of_speech': 'PROPN', 'location': [220, 224]},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [224, 225]},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [226, 229],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': 'Robin',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [230, 235],\n",
       "   'lemma': 'Robin'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [236, 239],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Not', 'part_of_speech': 'PROPN', 'location': [240, 243]},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [243, 244]},\n",
       "  {'text': 'Quite', 'part_of_speech': 'PROPN', 'location': [244, 249]},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [249, 250]},\n",
       "  {'text': 'So',\n",
       "   'part_of_speech': 'ADV',\n",
       "   'location': [250, 252],\n",
       "   'lemma': 'so'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [252, 253]},\n",
       "  {'text': 'Brave',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [253, 258],\n",
       "   'lemma': 'Brave'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [258, 259]},\n",
       "  {'text': 'as',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [259, 261],\n",
       "   'lemma': 'as'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [261, 262]},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [262, 265],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [265, 266]},\n",
       "  {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [266, 274]},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [274, 275]},\n",
       "  {'text': 'and',\n",
       "   'part_of_speech': 'CCONJ',\n",
       "   'location': [276, 279],\n",
       "   'lemma': 'and'},\n",
       "  {'text': 'Sir',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [280, 283],\n",
       "   'lemma': 'Sir'},\n",
       "  {'text': 'Not',\n",
       "   'part_of_speech': 'ADV',\n",
       "   'location': [284, 287],\n",
       "   'lemma': 'not'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [287, 288]},\n",
       "  {'text': 'Appearing', 'part_of_speech': 'PROPN', 'location': [288, 297]},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [297, 298]},\n",
       "  {'text': 'in',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [298, 300],\n",
       "   'lemma': 'in'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [300, 301]},\n",
       "  {'text': 'this',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [301, 305],\n",
       "   'lemma': 'this'},\n",
       "  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [305, 306]},\n",
       "  {'text': 'Film',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [306, 310],\n",
       "   'lemma': 'Film'},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [310, 311]},\n",
       "  {'text': 'along',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [312, 317],\n",
       "   'lemma': 'along'},\n",
       "  {'text': 'with',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [318, 322],\n",
       "   'lemma': 'with'},\n",
       "  {'text': 'their',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [323, 328],\n",
       "   'lemma': 'their'},\n",
       "  {'text': 'squires',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [329, 336],\n",
       "   'lemma': 'squire'},\n",
       "  {'text': 'and',\n",
       "   'part_of_speech': 'CCONJ',\n",
       "   'location': [337, 340],\n",
       "   'lemma': 'and'},\n",
       "  {'text': 'Robin',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [341, 346],\n",
       "   'lemma': 'Robin'},\n",
       "  {'text': \"'s\",\n",
       "   'part_of_speech': 'PART',\n",
       "   'location': [346, 348],\n",
       "   'lemma': \"'s\"},\n",
       "  {'text': 'troubadours',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [349, 360],\n",
       "   'lemma': 'troubadour'},\n",
       "  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [360, 361]},\n",
       "  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [362, 368]},\n",
       "  {'text': 'leads',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [369, 374],\n",
       "   'lemma': 'lead'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [375, 378],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'men',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [379, 382],\n",
       "   'lemma': 'man'},\n",
       "  {'text': 'to',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [383, 385],\n",
       "   'lemma': 'to'},\n",
       "  {'text': 'Camelot', 'part_of_speech': 'PROPN', 'location': [386, 393]},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [393, 394]},\n",
       "  {'text': 'but',\n",
       "   'part_of_speech': 'CCONJ',\n",
       "   'location': [395, 398],\n",
       "   'lemma': 'but'},\n",
       "  {'text': 'upon',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [399, 403],\n",
       "   'lemma': 'upon'},\n",
       "  {'text': 'further',\n",
       "   'part_of_speech': 'ADJ',\n",
       "   'location': [404, 411],\n",
       "   'lemma': 'far'},\n",
       "  {'text': 'consideration',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [412, 425],\n",
       "   'lemma': 'consideration'},\n",
       "  {'text': '(', 'part_of_speech': 'PUNCT', 'location': [426, 427]},\n",
       "  {'text': 'thanks',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [427, 433],\n",
       "   'lemma': 'thanks'},\n",
       "  {'text': 'to',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [434, 436],\n",
       "   'lemma': 'to'},\n",
       "  {'text': 'a', 'part_of_speech': 'DET', 'location': [437, 438], 'lemma': 'a'},\n",
       "  {'text': 'musical',\n",
       "   'part_of_speech': 'ADJ',\n",
       "   'location': [439, 446],\n",
       "   'lemma': 'musical'},\n",
       "  {'text': 'number',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [447, 453],\n",
       "   'lemma': 'number'},\n",
       "  {'text': ')', 'part_of_speech': 'PUNCT', 'location': [453, 454]},\n",
       "  {'text': 'he',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [455, 457],\n",
       "   'lemma': 'he'},\n",
       "  {'text': 'decides',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [458, 465],\n",
       "   'lemma': 'decide'},\n",
       "  {'text': 'not',\n",
       "   'part_of_speech': 'PART',\n",
       "   'location': [466, 469],\n",
       "   'lemma': 'not'},\n",
       "  {'text': 'to',\n",
       "   'part_of_speech': 'PART',\n",
       "   'location': [470, 472],\n",
       "   'lemma': 'to'},\n",
       "  {'text': 'go',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [473, 475],\n",
       "   'lemma': 'go'},\n",
       "  {'text': 'there',\n",
       "   'part_of_speech': 'ADV',\n",
       "   'location': [476, 481],\n",
       "   'lemma': 'there'},\n",
       "  {'text': 'because',\n",
       "   'part_of_speech': 'SCONJ',\n",
       "   'location': [482, 489],\n",
       "   'lemma': 'because'},\n",
       "  {'text': 'it',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [490, 492],\n",
       "   'lemma': 'it'},\n",
       "  {'text': 'is',\n",
       "   'part_of_speech': 'AUX',\n",
       "   'location': [493, 495],\n",
       "   'lemma': 'be'},\n",
       "  {'text': '\"', 'part_of_speech': 'PUNCT', 'location': [496, 497]},\n",
       "  {'text': 'a', 'part_of_speech': 'DET', 'location': [497, 498], 'lemma': 'a'},\n",
       "  {'text': 'silly',\n",
       "   'part_of_speech': 'ADJ',\n",
       "   'location': [499, 504],\n",
       "   'lemma': 'silly'},\n",
       "  {'text': 'place',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [505, 510],\n",
       "   'lemma': 'place'},\n",
       "  {'text': '\"', 'part_of_speech': 'PUNCT', 'location': [510, 511]},\n",
       "  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [511, 512]},\n",
       "  {'text': 'As',\n",
       "   'part_of_speech': 'SCONJ',\n",
       "   'location': [513, 515],\n",
       "   'lemma': 'as'},\n",
       "  {'text': 'they',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [516, 520],\n",
       "   'lemma': 'they'},\n",
       "  {'text': 'turn',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [521, 525],\n",
       "   'lemma': 'turn'},\n",
       "  {'text': 'away', 'part_of_speech': 'ADP', 'location': [526, 530]},\n",
       "  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [530, 531]},\n",
       "  {'text': 'God',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [532, 535],\n",
       "   'lemma': 'God'},\n",
       "  {'text': '(', 'part_of_speech': 'PUNCT', 'location': [536, 537]},\n",
       "  {'text': 'an',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [537, 539],\n",
       "   'lemma': 'a'},\n",
       "  {'text': 'image',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [540, 545],\n",
       "   'lemma': 'image'},\n",
       "  {'text': 'of',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [546, 548],\n",
       "   'lemma': 'of'},\n",
       "  {'text': 'W.', 'part_of_speech': 'PROPN', 'location': [549, 551]},\n",
       "  {'text': 'G.', 'part_of_speech': 'PROPN', 'location': [552, 554]},\n",
       "  {'text': 'Grace',\n",
       "   'part_of_speech': 'PROPN',\n",
       "   'location': [555, 560],\n",
       "   'lemma': 'Grace'},\n",
       "  {'text': ')', 'part_of_speech': 'PUNCT', 'location': [560, 561]},\n",
       "  {'text': 'speaks',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [562, 568],\n",
       "   'lemma': 'speak'},\n",
       "  {'text': 'to',\n",
       "   'part_of_speech': 'ADP',\n",
       "   'location': [569, 571],\n",
       "   'lemma': 'to'},\n",
       "  {'text': 'them',\n",
       "   'part_of_speech': 'PRON',\n",
       "   'location': [572, 576],\n",
       "   'lemma': 'they'},\n",
       "  {'text': 'and',\n",
       "   'part_of_speech': 'CCONJ',\n",
       "   'location': [577, 580],\n",
       "   'lemma': 'and'},\n",
       "  {'text': 'gives',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [581, 586],\n",
       "   'lemma': 'give'},\n",
       "  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [587, 593]},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [594, 597],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'task',\n",
       "   'part_of_speech': 'NOUN',\n",
       "   'location': [598, 602],\n",
       "   'lemma': 'task'},\n",
       "  {'text': 'of',\n",
       "   'part_of_speech': 'SCONJ',\n",
       "   'location': [603, 605],\n",
       "   'lemma': 'of'},\n",
       "  {'text': 'finding',\n",
       "   'part_of_speech': 'VERB',\n",
       "   'location': [606, 613],\n",
       "   'lemma': 'find'},\n",
       "  {'text': 'the',\n",
       "   'part_of_speech': 'DET',\n",
       "   'location': [614, 617],\n",
       "   'lemma': 'the'},\n",
       "  {'text': 'Holy', 'part_of_speech': 'PROPN', 'location': [618, 622]},\n",
       "  {'text': 'Grail', 'part_of_speech': 'PROPN', 'location': [623, 628]},\n",
       "  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [628, 629]}],\n",
       " 'sentences': [{'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',\n",
       "   'location': [0, 129]},\n",
       "  {'text': \"Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.\",\n",
       "   'location': [130, 361]},\n",
       "  {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\".',\n",
       "   'location': [362, 512]},\n",
       "  {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',\n",
       "   'location': [513, 629]}]}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response[\"syntax\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Buried in the above data structure is all the information we need to perform our example task:\n",
    "* The location of every token in the document.\n",
    "* The part of speech of every token in the document.\n",
    "* The location of every sentence in the document.\n",
    "\n",
    "The Python code in the next cell uses this information to construct a list of pronouns\n",
    "in each sentence in the document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'sentence': {'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',\n",
       "   'location': [0, 129]},\n",
       "  'pronouns': [{'text': 'his',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [27, 30],\n",
       "    'lemma': 'his'}]},\n",
       " {'sentence': {'text': \"Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.\",\n",
       "   'location': [130, 361]},\n",
       "  'pronouns': [{'text': 'he',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [145, 147],\n",
       "    'lemma': 'he'},\n",
       "   {'text': 'this',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [301, 305],\n",
       "    'lemma': 'this'},\n",
       "   {'text': 'their',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [323, 328],\n",
       "    'lemma': 'their'}]},\n",
       " {'sentence': {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\".',\n",
       "   'location': [362, 512]},\n",
       "  'pronouns': [{'text': 'he',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [455, 457],\n",
       "    'lemma': 'he'},\n",
       "   {'text': 'it',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [490, 492],\n",
       "    'lemma': 'it'}]},\n",
       " {'sentence': {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',\n",
       "   'location': [513, 629]},\n",
       "  'pronouns': [{'text': 'they',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [516, 520],\n",
       "    'lemma': 'they'},\n",
       "   {'text': 'them',\n",
       "    'part_of_speech': 'PRON',\n",
       "    'location': [572, 576],\n",
       "    'lemma': 'they'}]}]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import collections\n",
    "\n",
    "# Create a data structure to hold a mapping from sentence identifier\n",
    "# to a list of pronouns. This step requires defining sentence ids.\n",
    "def sentence_id(sentence_record: Dict[str, Any]):\n",
    "    return tuple(sentence_record[\"location\"])\n",
    "\n",
    "pronouns_by_sentence_id = collections.defaultdict(list)\n",
    "\n",
    "# Pass 1: Use nested for loops to identify pronouns and match them with \n",
    "#         their containing sentences.\n",
    "# Running time: O(num_tokens * num_sentences), i.e. O(document_size^2)\n",
    "for t in response[\"syntax\"][\"tokens\"]:\n",
    "    pos_str = t[\"part_of_speech\"]  # Decode numeric POS enum\n",
    "    if pos_str == \"PRON\":\n",
    "        found_sentence = False\n",
    "        for s in response[\"syntax\"][\"sentences\"]:\n",
    "            if (t[\"location\"][0] >= s[\"location\"][0] \n",
    "                    and t[\"location\"][1] <= s[\"location\"][1]):\n",
    "                found_sentence = True\n",
    "                pronouns_by_sentence_id[sentence_id(s)].append(t)\n",
    "        if not found_sentence:\n",
    "            raise ValueError(f\"Token {t} is not in any sentence\")\n",
    "            pass  # Make JupyterLab syntax highlighting happy\n",
    "\n",
    "# Pass 2: Translate sentence identifiers to full sentence metadata.\n",
    "sentence_id_to_sentence = {sentence_id(s): s \n",
    "                           for s in response[\"syntax\"][\"sentences\"]}\n",
    "result = [\n",
    "    {\n",
    "        \"sentence\": sentence_id_to_sentence[key],\n",
    "        \"pronouns\": pronouns\n",
    "    }\n",
    "    for key, pronouns in pronouns_by_sentence_id.items()\n",
    "]\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code above is quite complex given the simplicity of the task. You would need to stare at the previous cell for a few minutes to convince yourself that the algorithm is correct. This implementation also has scalability issues: The worst-case running time of the nested for loops section is proportional to the square of the document length.\n",
    "\n",
    "We can do better."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Repeat the Example Task Using Pandas\n",
    "\n",
    "Let's revisit the example task we just performed in the previous cell. Again, the task is: *Find all the pronouns in each sentence, broken down by sentence.* This time around, let's perform this task using Pandas.\n",
    "\n",
    "Text Extensions for Pandas includes a function `parse_response()` that turns the output of Watson NLU's `analyze()` function into a dictionary of Pandas DataFrames. Let's run our response object through that conversion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['syntax', 'entities', 'entity_mentions', 'keywords', 'relations', 'semantic_roles'])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs = tp.io.watson.nlu.parse_response(response)\n",
    "dfs.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of each analysis pass that Watson NLU performed is now a DataFrame. \n",
    "Let's look at the DataFrame for the \"syntax\" pass:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>span</th>\n",
       "      <th>part_of_speech</th>\n",
       "      <th>lemma</th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0, 2): 'In'</td>\n",
       "      <td>ADP</td>\n",
       "      <td>in</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[3, 5): 'AD'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>Ad</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[6, 9): '932'</td>\n",
       "      <td>NUM</td>\n",
       "      <td>None</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[9, 10): ','</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>None</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[11, 15): 'King'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>King</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>[606, 613): 'finding'</td>\n",
       "      <td>VERB</td>\n",
       "      <td>find</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>[614, 617): 'the'</td>\n",
       "      <td>DET</td>\n",
       "      <td>the</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>144</th>\n",
       "      <td>[618, 622): 'Holy'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>None</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>[623, 628): 'Grail'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>None</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>146</th>\n",
       "      <td>[628, 629): '.'</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>None</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>147 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                      span part_of_speech lemma  \\\n",
       "0             [0, 2): 'In'            ADP    in   \n",
       "1             [3, 5): 'AD'          PROPN    Ad   \n",
       "2            [6, 9): '932'            NUM  None   \n",
       "3             [9, 10): ','          PUNCT  None   \n",
       "4         [11, 15): 'King'          PROPN  King   \n",
       "..                     ...            ...   ...   \n",
       "142  [606, 613): 'finding'           VERB  find   \n",
       "143      [614, 617): 'the'            DET   the   \n",
       "144     [618, 622): 'Holy'          PROPN  None   \n",
       "145    [623, 628): 'Grail'          PROPN  None   \n",
       "146        [628, 629): '.'          PUNCT  None   \n",
       "\n",
       "                                              sentence  \n",
       "0    [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "1    [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "2    [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "3    [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "4    [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "..                                                 ...  \n",
       "142  [513, 629): 'As they turn away, God (an image ...  \n",
       "143  [513, 629): 'As they turn away, God (an image ...  \n",
       "144  [513, 629): 'As they turn away, God (an image ...  \n",
       "145  [513, 629): 'As they turn away, God (an image ...  \n",
       "146  [513, 629): 'As they turn away, God (an image ...  \n",
       "\n",
       "[147 rows x 4 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df = dfs[\"syntax\"]\n",
    "syntax_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DataFrame has one row for every token in the document. Each row has information on\n",
    "the span of the token, its part of speech, its lemmatized form, and the span of the \n",
    "containing sentence.\n",
    "\n",
    "Let's use this DataFrame to perform our example task a second time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence</th>\n",
       "      <th>span</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "      <td>[27, 30): 'his'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[145, 147): 'he'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[301, 305): 'this'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[323, 328): 'their'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>[455, 457): 'he'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111</th>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>[490, 492): 'it'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "      <td>[516, 520): 'they'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "      <td>[572, 576): 'them'</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              sentence                 span\n",
       "7    [0, 129): 'In AD 932, King Arthur and his squi...      [27, 30): 'his'\n",
       "31   [130, 361): 'Along the way, he recruits Sir Be...     [145, 147): 'he'\n",
       "73   [130, 361): 'Along the way, he recruits Sir Be...   [301, 305): 'this'\n",
       "79   [130, 361): 'Along the way, he recruits Sir Be...  [323, 328): 'their'\n",
       "104  [362, 512): 'Arthur leads the men to Camelot, ...     [455, 457): 'he'\n",
       "111  [362, 512): 'Arthur leads the men to Camelot, ...     [490, 492): 'it'\n",
       "120  [513, 629): 'As they turn away, God (an image ...   [516, 520): 'they'\n",
       "135  [513, 629): 'As they turn away, God (an image ...   [572, 576): 'them'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pronouns_by_sentence = syntax_df[syntax_df[\"part_of_speech\"] == \"PRON\"][[\"sentence\", \"span\"]]\n",
    "pronouns_by_sentence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it. With the DataFrame version of this data, we can perform our example task with **one line of code**.\n",
    "\n",
    "Specifically, we use a Pandas selection condition to filter out the tokens that aren't pronouns, and then we \n",
    "project down to the columns containing sentence and token spans. The result is another DataFrame that \n",
    "we can display directly in our Jupyter notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How it Works\n",
    "\n",
    "\n",
    "Let's take a moment to drill into the internals of the DataFrames we just used.\n",
    "For reference, here are the first three rows of the syntax analysis DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>span</th>\n",
       "      <th>part_of_speech</th>\n",
       "      <th>lemma</th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0, 2): 'In'</td>\n",
       "      <td>ADP</td>\n",
       "      <td>in</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[3, 5): 'AD'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>Ad</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[6, 9): '932'</td>\n",
       "      <td>NUM</td>\n",
       "      <td>None</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            span part_of_speech lemma  \\\n",
       "0   [0, 2): 'In'            ADP    in   \n",
       "1   [3, 5): 'AD'          PROPN    Ad   \n",
       "2  [6, 9): '932'            NUM  None   \n",
       "\n",
       "                                            sentence  \n",
       "0  [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "1  [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "2  [0, 129): 'In AD 932, King Arthur and his squi...  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here is that DataFrame's data type information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "span                   SpanDtype\n",
       "part_of_speech            object\n",
       "lemma                     object\n",
       "sentence          TokenSpanDtype\n",
       "dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Two of the columns in this DataFrame &mdash; \"span\" and \"sentence\" &mdash; contain\n",
    "extension types from the Text Extensions for Pandas library. Let's look first at the \"span\"\n",
    "column. \n",
    "\n",
    "The \"span\" column is stored internally using the class `SpanArray` from \n",
    "Text Extensions for Pandas.\n",
    "`SpanArray` is a subclass of \n",
    "[`ExtensionArray`](\n",
    "    https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray.html), \n",
    "the base class for custom 1-D array types in Pandas.\n",
    "\n",
    "You can use the property [`pandas.Series.array`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.array.html) to access the `ExtensionArray` behind any Pandas extension type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<SpanArray>\n",
      "[         [0, 2): 'In',          [3, 5): 'AD',         [6, 9): '932',\n",
      "          [9, 10): ',',      [11, 15): 'King',    [16, 22): 'Arthur',\n",
      "       [23, 26): 'and',       [27, 30): 'his',    [31, 37): 'squire',\n",
      "         [37, 38): ',',\n",
      " ...\n",
      "   [581, 586): 'gives',  [587, 593): 'Arthur',     [594, 597): 'the',\n",
      "    [598, 602): 'task',      [603, 605): 'of', [606, 613): 'finding',\n",
      "     [614, 617): 'the',    [618, 622): 'Holy',   [623, 628): 'Grail',\n",
      "       [628, 629): '.']\n",
      "Length: 147, dtype: SpanDtype\n"
     ]
    }
   ],
   "source": [
    "print(syntax_df[\"span\"].array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internally, a `SpanArray` is stored as Numpy arrays of begin and end offsets, plus a Python string \n",
    "containing the target text. You can access this internal data as properties if your application needs that\n",
    "information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([ 0,  3,  6,  9, 11, 16, 23, 27, 31, 37]),\n",
       " array([ 2,  5,  9, 10, 15, 22, 26, 30, 37, 38]))"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df[\"span\"].array.begin[:10], syntax_df[\"span\"].array.end[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also convert an individual element of the array into a Python object of type `Span`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\"[0, 2): 'In'\" is an object of type <class 'text_extensions_for_pandas.array.span.Span'>\n"
     ]
    }
   ],
   "source": [
    "span_obj = syntax_df[\"span\"].array[0]\n",
    "print(f\"\\\"{span_obj}\\\" is an object of type {type(span_obj)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can convert the entire array (or a slice of it) into Python objects, one object per span:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0, 2): 'In', [3, 5): 'AD', [6, 9): '932', [9, 10): ',',\n",
       "       [11, 15): 'King', [16, 22): 'Arthur', [23, 26): 'and',\n",
       "       [27, 30): 'his', [31, 37): 'squire', [37, 38): ','], dtype=object)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df[\"span\"].iloc[:10].to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `SpanArray` can also render itself using [Jupyter Notebook callbacks](https://ipython.readthedocs.io/en/stable/config/integrating.html). To\n",
    "see the HTML representation of the `SpanArray`, pass the array object\n",
    "to Jupyter's [`display()`](https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.display)\n",
    "function; or make that object be the last line of the cell, as in the following example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style class=\"span-array-css\">\n",
       "            .span-array {\n",
       "    --thead-background-color: var(--jp-layout-color1, inherit);\n",
       "    --thead-text-color: var(--jp-ui-font-color1, inherit);\n",
       "    --tbody-background-color-1: var(--jp-layout-color1, inherit);\n",
       "    --tbody-background-color-2: var(--jp-layout-color2, inherit);\n",
       "    --tbody-background-color-hover: var(--jp-rendermime-table-row-hover-background, var(--jp-layout-color3, inherit));\n",
       "    --tbody-background-color-disabled: var(--jp-layout-color4, #ccccd1);\n",
       "    --tbody-text-color: var(--jp-ui-font-color0, inherit);\n",
       "    --tbody-text-color-disabled: var(--jp-ui-inverse-font-color0, #b3b3b9);\n",
       "    --table-font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "\n",
       "    --table-control-background: rgba(0, 0, 0, 0.2);\n",
       "    --table-control-color: var(--jp-ui-font-color0);\n",
       "    --table-control-border: 1px solid rgba(0, 0, 0, 0.8);\n",
       "    --table-control-border-radius: 0.5em;\n",
       "\n",
       "    --root-highlight: #a0c4ff;\n",
       "    --nested-highlight: #ffadad;\n",
       "    --hover-highlight: #ffd6a5;\n",
       "\n",
       "    --inverted-background-color: #0B525B;\n",
       "    --inverted-text-color: rgb(243, 243, 243);\n",
       "    --paragraph-border-color: var(--jp-layout-color2, inherit);\n",
       "\n",
       "    --fallback-font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif\n",
       "}\n",
       "\n",
       "/* Table of span offsets */\n",
       ".span-array>.document>table {\n",
       "    table-layout: auto;\n",
       "    overflow: hidden;\n",
       "    width: 100%;\n",
       "    border-collapse: collapse;\n",
       "    font-family: var(--table-font-family);\n",
       "}\n",
       "\n",
       ".span-array>.document>table thead {\n",
       "    font-variant-caps: all-petite-caps;\n",
       "}\n",
       "\n",
       ".span-array>.document>table th {\n",
       "    padding: 1em;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:last-child, .span-array>.document>table tr>th:last-child {\n",
       "    text-align: right;\n",
       "    width: 100%;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:not(tr>td:last-child), .span-array>.document>table tr>th:not(tr>th:last-child) {\n",
       "    text-align: left;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n), .span-array>.document>table tr.disabled.hover:nth-child(n) {\n",
       "    background-color: var(--tbody-background-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n)>td, .span-array>.document>table tr.disabled.hover:nth-child(n)>td {\n",
       "    color: var(--tbody-text-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.hover:not(.disabled) {\n",
       "    background: var(--jp-rendermime-table-row-hover-background);\n",
       "}\n",
       "\n",
       "/* Table control buttons */\n",
       "\n",
       ".span-array>.document>table td.sa-table-controls-container {\n",
       "    vertical-align: center;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls {\n",
       "    display: flex;\n",
       "    flex-direction: row;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button {\n",
       "    background-color: var(--table-control-background);\n",
       "    color: var(--table-control-color);\n",
       "    border: var(--table-control-border);\n",
       "    border-right: none;\n",
       "    border-radius: 0;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:first-child {\n",
       "    border-radius: var(--table-control-border-radius) 0 0 var(--table-control-border-radius);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:last-child {\n",
       "    border-radius: 0 var(--table-control-border-radius) var(--table-control-border-radius) 0;\n",
       "    border-right: var(--table-control-border);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button[data-control=\"visibility\"]:hover {\n",
       "    background-color: var(--root-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"]:hover {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.highlighted:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"] {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       "/* Styling for spans within document context */\n",
       ".span-array>.document>p {\n",
       "    border:1px solid var(--paragraph-border-color);\n",
       "    border-radius: 0.2em;\n",
       "    padding: 1em;\n",
       "    line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "    box-sizing: border-box;\n",
       "    font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "}\n",
       "\n",
       "body[data-jp-theme-light=\"false\"].span-array>.document>p {\n",
       "    border: 1px solid black;\n",
       "    background-color: var(--inverted-background-color);\n",
       "    color: var(--inverted-text-color);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    padding: 0.4em 0.4em;\n",
       "    border-radius: 0.35em;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>p .mark {\n",
       "    color: var(black);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    background-color: var(--root-highlight);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>mark {\n",
       "    background-color: var(--nested-highlight);\n",
       "    padding: 0.2em 0.4em;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.complex-set {\n",
       "    background: linear-gradient(to right, var(--root-highlight), var(--nested-highlight))\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>span.mark-tag {\n",
       "    font-weight: bolder;\n",
       "    font-size: 0.8em;\n",
       "    font-variant: small-caps;\n",
       "    font-variant-caps: all-small-caps;\n",
       "    margin-left: 8px;\n",
       "    text-transform: uppercase;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.hover, .span-array.span-array>.document>p mark>mark.hover, .span-array>.document>p mark.complex-set.hover, .span-array>.document>p mark.highlighted, .span-array>.document>p mark.complex-set.highlighted, .span-array.span-array>.document>p mark>mark.highlighted {\n",
       "    background: none;\n",
       "    background-color: var(--hover-highlight);\n",
       "}\n",
       "\n",
       "</style>\n",
       "<script>\n",
       "{\n",
       "            // Increment the version to invalidate the cached script\n",
       "const VERSION = 0.79\n",
       "const global_stylesheet = document.head.querySelector(\"style.span-array-css\")\n",
       "const local_stylesheet = document.currentScript.parentElement.querySelector(\"style.span-array-css\")\n",
       "\n",
       "if(window.SpanArray == undefined || window.SpanArray.VERSION == undefined || window.SpanArray.VERSION < VERSION) {\n",
       "\n",
       "    // Replace global SpanArray CSS with latest copy\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet != undefined) {\n",
       "            document.head.removeChild(global_stylesheet)\n",
       "        }\n",
       "        document.head.appendChild(local_stylesheet)\n",
       "    }\n",
       "\n",
       "    // Sets up the SpanArray global namespace\n",
       "    window.SpanArray = {}\n",
       "    window.SpanArray.VERSION = VERSION\n",
       "\n",
       "    window.SpanArray.TYPE_OVERLAP = 0;\n",
       "    window.SpanArray.TYPE_NESTED = 1;\n",
       "    window.SpanArray.TYPE_COMPLEX = 2;\n",
       "    window.SpanArray.TYPE_SOLO = 3;\n",
       "\n",
       "    const TYPE_OVERLAP = window.SpanArray.TYPE_OVERLAP;\n",
       "    const TYPE_NESTED = window.SpanArray.TYPE_NESTED;\n",
       "    const TYPE_COMPLEX = window.SpanArray.TYPE_COMPLEX;\n",
       "    const TYPE_SOLO = window.SpanArray.TYPE_SOLO;\n",
       "\n",
       "    function sanitize(input) {\n",
       "        let out = input.slice();\n",
       "        out = out.replace(/&/g, \"&amp;\")\n",
       "        out = out.replace(/</g, \"&lt;\")\n",
       "        out = out.replace(/>/g, \"&gt;\")\n",
       "        out = out.replace(/\\$/g, \"<span>&#36;</span>\")\n",
       "        out = out.replace(/\"/g, \"&quot;\")\n",
       "        out = out.replace(/(\\r|\\n)/g, \"<br>\")\n",
       "        return out;\n",
       "    }\n",
       "\n",
       "    /** Comparison function used to sort SpanArrays by position and length\n",
       "     *  Will sort by primarily by earliest beginning point. On a tie, will prioritize latest end point (first and largest)\n",
       "     *  Used by mark relationship algorithm.\n",
       "     */\n",
       "    function compareSpanArrays(a, b) {\n",
       "        const start_diff = a.begin - b.begin\n",
       "        if(start_diff == 0) {\n",
       "            return b.end - a.end\n",
       "        }\n",
       "        return start_diff;\n",
       "    }\n",
       "\n",
       "    /** Models an instance of a SpanArray, with document-separated spans and text\n",
       "     * NOTE: Using docs instead of documents to avoid unintentionally manipulating the global 'document' object.\n",
       "    */\n",
       "    class SpanArray {\n",
       "        constructor(docs, show_offsets, script_context) {\n",
       "            this.docs = docs\n",
       "            this.show_offsets = show_offsets\n",
       "            this.script_context = script_context\n",
       "\n",
       "            // For each doc, generate a lookup map for quick ID access\n",
       "            this.docs = this.docs.map(doc => {\n",
       "                doc.span_objects = Span.arrayFromSpanArray(doc.doc_spans)\n",
       "                doc.lookup_table = {}\n",
       "                doc.span_objects.forEach(span => {\n",
       "                    doc.lookup_table[span.id] = span\n",
       "                })\n",
       "                return doc\n",
       "            })\n",
       "        }\n",
       "\n",
       "        render() {\n",
       "            let span_array_frag = document.createDocumentFragment()\n",
       "            // For each document, create a document fragment and append to a document container\n",
       "            for(let doc_index = 0; doc_index < this.docs.length; doc_index++)\n",
       "            {\n",
       "                let doc = this.docs[doc_index]\n",
       "                let doc_container = document.createElement(\"div\")\n",
       "                // Using the data-doc-id attribute allows a selector to access a document's render by its index\n",
       "                doc_container.setAttribute(\"data-doc-id\", doc_index)\n",
       "                doc_container.classList.add(\"document\")\n",
       "                const document_fragment = getDocumentFragment(doc, this.show_offsets)\n",
       "                if(this.show_offsets) {\n",
       "                    attachDocumentEvents(document_fragment, doc, this)\n",
       "                }\n",
       "                doc_container.appendChild(document_fragment)\n",
       "                span_array_frag.appendChild(doc_container)\n",
       "            }\n",
       "            let container = this.script_context.parentElement.querySelector(\".span-array\")\n",
       "            if(container != undefined) {\n",
       "                container.innerHTML = \"\"\n",
       "                container.appendChild(span_array_frag)\n",
       "            } else {\n",
       "                console.error(\"No container found for SpanArray renderer\")\n",
       "            }\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.SpanArray = SpanArray\n",
       "\n",
       "\n",
       "    /** Models an instance of a Span and its relationship to other spans in the document */\n",
       "    class Span {\n",
       "\n",
       "        // Creates an ordered list of entries from a list of spans with struct [begin, end]\n",
       "        static arrayFromSpanArray(spanArray) {\n",
       "            let entries = []\n",
       "            let span;\n",
       "            for(let i = 0; i < spanArray.length; i++)\n",
       "            {\n",
       "                span = spanArray[i];\n",
       "                entries.push(new Span(i, span[0], span[1]))\n",
       "            }\n",
       "\n",
       "            entries = entries.sort(compareSpanArrays)\n",
       "\n",
       "            let set;\n",
       "            for(let i = 0; i < entries.length; i++) {\n",
       "                for(let j = i+1; j < entries.length && entries[j].begin < entries[i].end; j++) {\n",
       "                    if(entries[j].end <= entries[i].end) {\n",
       "                        set = {type: TYPE_NESTED, entry: entries[j]}\n",
       "                    } else {\n",
       "                        set = {type: TYPE_OVERLAP, entry: entries[j]}\n",
       "                    }\n",
       "                    entries[i].sets.push(set)\n",
       "                }\n",
       "            }\n",
       "\n",
       "            return entries\n",
       "        }\n",
       "\n",
       "        constructor(id, begin, end) {\n",
       "            this.id = id\n",
       "            this.begin = begin\n",
       "            this.end = end\n",
       "            this.sets = []\n",
       "            this.visible = true\n",
       "            this.highlighted = false\n",
       "        }\n",
       "\n",
       "        // Returns only visible sets\n",
       "        get valid_sets() {\n",
       "            let valid_sets = []\n",
       "\n",
       "            this.sets.forEach(set => {\n",
       "                if(set.entry.visible) valid_sets.push(set)\n",
       "            })\n",
       "\n",
       "            return valid_sets\n",
       "        }\n",
       "\n",
       "        // Returns true if mark should render as a compound set of spans\n",
       "        isComplex() {\n",
       "            for(let i = 0; i < this.valid_sets.length; i++) {\n",
       "                let otherMember = this.valid_sets[i].entry;\n",
       "                if(this.valid_sets[i].type == TYPE_OVERLAP && otherMember.visible) {\n",
       "                    return true;\n",
       "                } else {\n",
       "                    if(otherMember.valid_sets.length > 0 && otherMember.visible) {\n",
       "                        return true;\n",
       "                    }\n",
       "                }\n",
       "            }\n",
       "            return false;\n",
       "        }\n",
       "\n",
       "        // Gets the combined span of all connected elements\n",
       "        getSetSpan() {\n",
       "            let begin = this.begin\n",
       "            let end = this.end\n",
       "            let highest_id = this.id\n",
       "\n",
       "            this.valid_sets.forEach(set => {\n",
       "                let other = set.entry.getSetSpan()\n",
       "                if(other.begin < begin) begin = other.begin\n",
       "                if(other.end > end) end = other.end\n",
       "                if(other.highest_id > highest_id) highest_id = other.highest_id\n",
       "            })\n",
       "\n",
       "            return {begin: begin, end: end, highest_id: highest_id}\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.Span = Span\n",
       "\n",
       "    // Get the DocumentFragment for a single document \n",
       "    function getDocumentFragment(doc, show_offsets) {\n",
       "\n",
       "        const doc_text = doc.doc_text;\n",
       "        const entries = doc.span_objects;\n",
       "\n",
       "        let frag = document.createDocumentFragment()\n",
       "\n",
       "        // Render Table\n",
       "        if(show_offsets) {\n",
       "            let table = document.createElement(\"table\")\n",
       "            table.innerHTML = `\n",
       "            <thead>\n",
       "            <tr>\n",
       "                <th></th>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "                ${(doc['doc_token_spans'] != undefined) ? '<th>begin token</th> <th>end token</th>' : ''}\n",
       "                <th>context</th>\n",
       "            </tr>\n",
       "            </thead>`\n",
       "            let tbody = document.createElement(\"tbody\")\n",
       "            entries.forEach(entry => {\n",
       "                let row = document.createElement(\"tr\")\n",
       "                row.setAttribute(\"data-id\", entry.id.toString())\n",
       "                if(!entry.visible)\n",
       "                {\n",
       "                    row.classList.add(\"disabled\")\n",
       "                }\n",
       "                if(entry.highlighted)\n",
       "                {\n",
       "                    row.classList.add(\"highlighted\")\n",
       "                }\n",
       "\n",
       "                // Adds the span entry to the table. doc_text is sanitized by replacing the reserved\n",
       "                // symbols by their entity name representations\n",
       "                row.innerHTML += `\n",
       "                <td>\n",
       "                    <div class='sa-table-controls'>\n",
       "                    <button data-control='visibility' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 576 512\"><path d=\"M572.52 241.4C518.29 135.59 410.93 64 288 64S57.68 135.64 3.48 241.41a32.35 32.35 0 0 0 0 29.19C57.71 376.41 165.07 448 288 448s230.32-71.64 284.52-177.41a32.35 32.35 0 0 0 0-29.19zM288 400a144 144 0 1 1 144-144 143.93 143.93 0 0 1-144 144zm0-240a95.31 95.31 0 0 0-25.31 3.79 47.85 47.85 0 0 1-66.9 66.9A95.78 95.78 0 1 0 288 160z\"/></svg></button>\n",
       "                    <button data-control='highlight' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 512 512\"><path d=\"M256 160c-52.9 0-96 43.1-96 96s43.1 96 96 96 96-43.1 96-96-43.1-96-96-96zm246.4 80.5l-94.7-47.3 33.5-100.4c4.5-13.6-8.4-26.5-21.9-21.9l-100.4 33.5-47.4-94.8c-6.4-12.8-24.6-12.8-31 0l-47.3 94.7L92.7 70.8c-13.6-4.5-26.5 8.4-21.9 21.9l33.5 100.4-94.7 47.4c-12.8 6.4-12.8 24.6 0 31l94.7 47.3-33.5 100.5c-4.5 13.6 8.4 26.5 21.9 21.9l100.4-33.5 47.3 94.7c6.4 12.8 24.6 12.8 31 0l47.3-94.7 100.4 33.5c13.6 4.5 26.5-8.4 21.9-21.9l-33.5-100.4 94.7-47.3c13-6.5 13-24.7.2-31.1zm-155.9 106c-49.9 49.9-131.1 49.9-181 0-49.9-49.9-49.9-131.1 0-181 49.9-49.9 131.1-49.9 181 0 49.9 49.9 49.9 131.1 0 181z\"/></svg></button>\n",
       "                    </div>\n",
       "                </td>\n",
       "                <td><b>${entry.id.toString()}</b></td>\n",
       "                <td>${entry.begin}</td>\n",
       "                <td>${entry.end}</td>\n",
       "                ${(doc.doc_token_spans != undefined) ? `<td>${doc.doc_token_spans[entry.id][0]}</td><td>${doc.doc_token_spans[entry.id][1]}</td>` : ''}\n",
       "                <td>${sanitize(doc_text.substring(entry.begin, entry.end))}</td>`\n",
       "\n",
       "                tbody.appendChild(row)\n",
       "            })\n",
       "            table.appendChild(tbody)\n",
       "            frag.appendChild(table)\n",
       "        }\n",
       "\n",
       "        // Render Text\n",
       "        let highlight_regions = []\n",
       "        for(let i = 0; i < entries.length; i++)\n",
       "        {\n",
       "            if(!entries[i].visible) continue\n",
       "            if(entries[i].valid_sets.length > 0)\n",
       "            {\n",
       "                let span = entries[i].getSetSpan();\n",
       "                let ids = [entries[i].id, ...entries[i].valid_sets.map(set => { return set.entry.id })]\n",
       "                if(entries[i].isComplex()) {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_COMPLEX, ids: ids})\n",
       "                } else {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_NESTED, ids: ids})\n",
       "                }\n",
       "                i = span.highest_id\n",
       "            } else {\n",
       "                highlight_regions.push({begin: entries[i].begin, end: entries[i].end, type: TYPE_SOLO, ids: [entries[i].id]})\n",
       "            }\n",
       "        }\n",
       "\n",
       "        let paragraph = document.createElement(\"p\")\n",
       "        if(highlight_regions.length == 0) {\n",
       "            paragraph.innerHTML = sanitize(doc_text)\n",
       "        } else {\n",
       "            let begin = 0\n",
       "            highlight_regions.forEach(region => {\n",
       "                paragraph.innerHTML += sanitize(doc_text.substring(begin, region.begin))\n",
       "\n",
       "                let mark = document.createElement(\"mark\")\n",
       "                // The data-ids tag is a list of comma-separated reference IDs for matching Spans \n",
       "                mark.setAttribute(\"data-ids\", \"\");\n",
       "                if (region.type != TYPE_NESTED) {\n",
       "                    region.ids.forEach(id => {\n",
       "                        mark.setAttribute(\"data-ids\", mark.getAttribute(\"data-ids\") + `#${id},`)\n",
       "                        if(doc.lookup_table[id].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    })\n",
       "                    mark.innerHTML = sanitize(doc_text.substring(region.begin, region.end))\n",
       "                } else {\n",
       "                    mark.setAttribute(\"data-ids\", `#${region.ids[0]},`)\n",
       "                    if(doc.lookup_table[region.ids[0]].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    let nested_begin = region.begin\n",
       "                    region.ids.slice(1).forEach(nested_id => {\n",
       "                        let nested_region = doc.lookup_table[nested_id]\n",
       "                        mark.innerHTML += sanitize(doc_text.substring(nested_begin, nested_region.begin))\n",
       "                        let nested_mark = document.createElement(\"mark\")\n",
       "                        nested_mark.setAttribute(\"data-ids\", `#${nested_id},`)\n",
       "                        if(nested_region.highlighted) nested_mark.classList.add(\"highlighted\")\n",
       "                        nested_mark.innerHTML = sanitize(doc_text.substring(nested_region.begin, nested_region.end))\n",
       "                        nested_begin = nested_region.end\n",
       "                        mark.appendChild(nested_mark)\n",
       "                    })\n",
       "                    mark.innerHTML += sanitize(doc_text.substring(nested_begin, region.end))\n",
       "                }\n",
       "\n",
       "                if(region.type == TYPE_COMPLEX) {\n",
       "                    let markTag = document.createElement(\"span\")\n",
       "                    markTag.textContent = \"Set\"\n",
       "                    markTag.classList.add(\"mark-tag\")\n",
       "                    mark.classList.add(\"complex-set\")\n",
       "                    mark.appendChild(markTag)\n",
       "                }\n",
       "\n",
       "                begin = region.end\n",
       "                paragraph.appendChild(mark)\n",
       "            })\n",
       "            paragraph.innerHTML += sanitize(doc_text.substring(highlight_regions[highlight_regions.length - 1].end, doc_text.length))\n",
       "        }\n",
       "\n",
       "        frag.appendChild(paragraph)\n",
       "\n",
       "        return frag\n",
       "    }\n",
       "\n",
       "    /** Attach hover and click events to a document render via event delegation */\n",
       "    function attachDocumentEvents(fragment, doc_object, source_spanarray) {\n",
       "        const doc_table_body = fragment.querySelector(\"table>tbody\")\n",
       "        const doc_text = fragment.querySelector(\"p\")\n",
       "\n",
       "        // Hover highlight events\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        // Click disable/enable events\n",
       "\n",
       "        doc_table_body.addEventListener(\"click\", (event) => {\n",
       "            const closest_control_button = event.target.closest(\"button[data-control]\")\n",
       "            if(closest_control_button == undefined) return\n",
       "\n",
       "            const closest_tr = event.target.closest(\"tr\")\n",
       "            if(closest_tr == undefined) return\n",
       "\n",
       "            const matching_span = doc_object.lookup_table[closest_tr.getAttribute(\"data-id\")]\n",
       "            if(matching_span == undefined) return\n",
       "\n",
       "            switch(closest_control_button.getAttribute(\"data-control\")) {\n",
       "                case \"visibility\":\n",
       "                    {\n",
       "                        matching_span.visible = !matching_span.visible\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "                case \"highlight\":\n",
       "                    {\n",
       "                        matching_span.highlighted = !matching_span.highlighted\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "            }\n",
       "\n",
       "\n",
       "\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"click\", (event) => {\n",
       "            const closest_mark = event.target.closest(\"mark\")\n",
       "            if(closest_mark == undefined) return\n",
       "\n",
       "            // Preprocess ID string into a list of IDs\n",
       "            const ids = closest_mark.getAttribute(\"data-ids\")\n",
       "                .split(\",\")\n",
       "                .slice(0, -1)\n",
       "                .map(id => {\n",
       "                    return id.substring(1)\n",
       "                })\n",
       "\n",
       "            // If any of the connected IDs are highlighted, we set all spans in the list to not highlighted.\n",
       "            // Inversely, we want all spans highlighted if none were previously.\n",
       "\n",
       "            const highlighted_entry = ids.find(id => {\n",
       "                return doc_object.lookup_table[id].highlighted\n",
       "            })\n",
       "\n",
       "            const is_highlighted = (highlighted_entry != undefined)\n",
       "\n",
       "            ids.forEach(id => {\n",
       "                const span = doc_object.lookup_table[id]\n",
       "                if(span != undefined) span.highlighted = !is_highlighted\n",
       "            })\n",
       "\n",
       "            source_spanarray.render()\n",
       "        })\n",
       "    }\n",
       "} else {\n",
       "    // SpanArray JS is already defined and not an outdated copy\n",
       "    // Replace global SpanArray CSS with latest copy IFF global stylesheet is undefined\n",
       "\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet == undefined) {\n",
       "            document.head.appendChild(local_stylesheet)\n",
       "        } else {\n",
       "            document.currentScript.parentElement.removeChild(local_stylesheet)\n",
       "        }\n",
       "    }       \n",
       "}\n",
       "}\n",
       "</script>\n",
       "<div class=\"span-array\">\n",
       "\n",
       "    <div class='document'>\n",
       "        <table style='\n",
       "            table-layout: auto;\n",
       "            overflow: hidden;\n",
       "            width: 100%;\n",
       "            border-collapse: collapse;\n",
       "            '>\n",
       "            <thead style='font-variant-caps: all-petite-caps;'>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "\n",
       "                <th style='text-align:right;width:100%'>context</th>\n",
       "            </tr></thead>\n",
       "            <tbody>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>0</b></td>\n",
       "            <td>0</td>\n",
       "            <td>2</td>\n",
       "\n",
       "            <td>In</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>1</b></td>\n",
       "            <td>3</td>\n",
       "            <td>5</td>\n",
       "\n",
       "            <td>AD</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>2</b></td>\n",
       "            <td>6</td>\n",
       "            <td>9</td>\n",
       "\n",
       "            <td>932</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>3</b></td>\n",
       "            <td>9</td>\n",
       "            <td>10</td>\n",
       "\n",
       "            <td>,</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>4</b></td>\n",
       "            <td>11</td>\n",
       "            <td>15</td>\n",
       "\n",
       "            <td>King</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>5</b></td>\n",
       "            <td>16</td>\n",
       "            <td>22</td>\n",
       "\n",
       "            <td>Arthur</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>6</b></td>\n",
       "            <td>23</td>\n",
       "            <td>26</td>\n",
       "\n",
       "            <td>and</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>7</b></td>\n",
       "            <td>27</td>\n",
       "            <td>30</td>\n",
       "\n",
       "            <td>his</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>8</b></td>\n",
       "            <td>31</td>\n",
       "            <td>37</td>\n",
       "\n",
       "            <td>squire</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>9</b></td>\n",
       "            <td>37</td>\n",
       "            <td>38</td>\n",
       "\n",
       "            <td>,</td>\n",
       "        </tr>\n",
       "\n",
       "            </tbody>\n",
       "        </table>\n",
       "        <p style='\n",
       "            padding: 1em;\n",
       "            line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "            '>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>In</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>AD</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>932</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>,</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>King</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>Arthur</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>and</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>his</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>squire</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>,</span>\n",
       "             Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin&#39;s troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is &quot;a silly place&quot;. As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.\n",
       "        </p>\n",
       "    </div>\n",
       "\n",
       "    <span style=\"font-size: 0.8em;color: #b3b3b3;\">Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.</span>\n",
       "</div>\n",
       "<script>\n",
       "    {\n",
       "        const Span = window.SpanArray.Span\n",
       "        const script_context = document.currentScript\n",
       "        const documents = []\n",
       "\n",
       "    {\n",
       "\n",
       "    const doc_spans = [[0,2],[3,5],[6,9],[9,10],[11,15],[16,22],[23,26],[27,30],[31,37],[37,38]]\n",
       "    const doc_text = 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin\\'s troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.'\n",
       "\n",
       "        documents.push({doc_text: doc_text, doc_spans: doc_spans})\n",
       "\n",
       "    }\n",
       "\n",
       "        const instance = new window.SpanArray.SpanArray(documents, true, script_context)\n",
       "        instance.render()\n",
       "    }\n",
       "</script>\n",
       "\n"
      ],
      "text/plain": [
       "<SpanArray>\n",
       "[      [0, 2): 'In',       [3, 5): 'AD',      [6, 9): '932',\n",
       "       [9, 10): ',',   [11, 15): 'King', [16, 22): 'Arthur',\n",
       "    [23, 26): 'and',    [27, 30): 'his', [31, 37): 'squire',\n",
       "      [37, 38): ',']\n",
       "Length: 10, dtype: SpanDtype"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Show the first 10 tokens in context\n",
    "syntax_df[\"span\"].iloc[:10].array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take another look at our DataFrame of syntax information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>span</th>\n",
       "      <th>part_of_speech</th>\n",
       "      <th>lemma</th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0, 2): 'In'</td>\n",
       "      <td>ADP</td>\n",
       "      <td>in</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[3, 5): 'AD'</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>Ad</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[6, 9): '932'</td>\n",
       "      <td>NUM</td>\n",
       "      <td>None</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            span part_of_speech lemma  \\\n",
       "0   [0, 2): 'In'            ADP    in   \n",
       "1   [3, 5): 'AD'          PROPN    Ad   \n",
       "2  [6, 9): '932'            NUM  None   \n",
       "\n",
       "                                            sentence  \n",
       "0  [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "1  [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "2  [0, 129): 'In AD 932, King Arthur and his squi...  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"sentence\" column is backed by an object of type `TokenSpanArray`.\n",
    "`TokenSpanArray`, another extension type from Text Extensions for Pandas,\n",
    "is a version of `SpanArray` for representing a set of spans that are \n",
    "constrained to begin and end on token boundaries. In addition to all the\n",
    "functionality of a `SpanArray`, a `TokenSpanArray` encodes additional \n",
    "information about the relationships between its spans and a tokenization\n",
    "of the document.\n",
    "\n",
    "Here are the distinct elements of the \"sentence\" column rendered as HTML:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style class=\"span-array-css\">\n",
       "            .span-array {\n",
       "    --thead-background-color: var(--jp-layout-color1, inherit);\n",
       "    --thead-text-color: var(--jp-ui-font-color1, inherit);\n",
       "    --tbody-background-color-1: var(--jp-layout-color1, inherit);\n",
       "    --tbody-background-color-2: var(--jp-layout-color2, inherit);\n",
       "    --tbody-background-color-hover: var(--jp-rendermime-table-row-hover-background, var(--jp-layout-color3, inherit));\n",
       "    --tbody-background-color-disabled: var(--jp-layout-color4, #ccccd1);\n",
       "    --tbody-text-color: var(--jp-ui-font-color0, inherit);\n",
       "    --tbody-text-color-disabled: var(--jp-ui-inverse-font-color0, #b3b3b9);\n",
       "    --table-font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "\n",
       "    --table-control-background: rgba(0, 0, 0, 0.2);\n",
       "    --table-control-color: var(--jp-ui-font-color0);\n",
       "    --table-control-border: 1px solid rgba(0, 0, 0, 0.8);\n",
       "    --table-control-border-radius: 0.5em;\n",
       "\n",
       "    --root-highlight: #a0c4ff;\n",
       "    --nested-highlight: #ffadad;\n",
       "    --hover-highlight: #ffd6a5;\n",
       "\n",
       "    --inverted-background-color: #0B525B;\n",
       "    --inverted-text-color: rgb(243, 243, 243);\n",
       "    --paragraph-border-color: var(--jp-layout-color2, inherit);\n",
       "\n",
       "    --fallback-font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif\n",
       "}\n",
       "\n",
       "/* Table of span offsets */\n",
       ".span-array>.document>table {\n",
       "    table-layout: auto;\n",
       "    overflow: hidden;\n",
       "    width: 100%;\n",
       "    border-collapse: collapse;\n",
       "    font-family: var(--table-font-family);\n",
       "}\n",
       "\n",
       ".span-array>.document>table thead {\n",
       "    font-variant-caps: all-petite-caps;\n",
       "}\n",
       "\n",
       ".span-array>.document>table th {\n",
       "    padding: 1em;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:last-child, .span-array>.document>table tr>th:last-child {\n",
       "    text-align: right;\n",
       "    width: 100%;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:not(tr>td:last-child), .span-array>.document>table tr>th:not(tr>th:last-child) {\n",
       "    text-align: left;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n), .span-array>.document>table tr.disabled.hover:nth-child(n) {\n",
       "    background-color: var(--tbody-background-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n)>td, .span-array>.document>table tr.disabled.hover:nth-child(n)>td {\n",
       "    color: var(--tbody-text-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.hover:not(.disabled) {\n",
       "    background: var(--jp-rendermime-table-row-hover-background);\n",
       "}\n",
       "\n",
       "/* Table control buttons */\n",
       "\n",
       ".span-array>.document>table td.sa-table-controls-container {\n",
       "    vertical-align: center;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls {\n",
       "    display: flex;\n",
       "    flex-direction: row;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button {\n",
       "    background-color: var(--table-control-background);\n",
       "    color: var(--table-control-color);\n",
       "    border: var(--table-control-border);\n",
       "    border-right: none;\n",
       "    border-radius: 0;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:first-child {\n",
       "    border-radius: var(--table-control-border-radius) 0 0 var(--table-control-border-radius);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:last-child {\n",
       "    border-radius: 0 var(--table-control-border-radius) var(--table-control-border-radius) 0;\n",
       "    border-right: var(--table-control-border);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button[data-control=\"visibility\"]:hover {\n",
       "    background-color: var(--root-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"]:hover {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.highlighted:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"] {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       "/* Styling for spans within document context */\n",
       ".span-array>.document>p {\n",
       "    border:1px solid var(--paragraph-border-color);\n",
       "    border-radius: 0.2em;\n",
       "    padding: 1em;\n",
       "    line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "    box-sizing: border-box;\n",
       "    font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "}\n",
       "\n",
       "body[data-jp-theme-light=\"false\"].span-array>.document>p {\n",
       "    border: 1px solid black;\n",
       "    background-color: var(--inverted-background-color);\n",
       "    color: var(--inverted-text-color);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    padding: 0.4em 0.4em;\n",
       "    border-radius: 0.35em;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>p .mark {\n",
       "    color: var(black);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    background-color: var(--root-highlight);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>mark {\n",
       "    background-color: var(--nested-highlight);\n",
       "    padding: 0.2em 0.4em;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.complex-set {\n",
       "    background: linear-gradient(to right, var(--root-highlight), var(--nested-highlight))\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>span.mark-tag {\n",
       "    font-weight: bolder;\n",
       "    font-size: 0.8em;\n",
       "    font-variant: small-caps;\n",
       "    font-variant-caps: all-small-caps;\n",
       "    margin-left: 8px;\n",
       "    text-transform: uppercase;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.hover, .span-array.span-array>.document>p mark>mark.hover, .span-array>.document>p mark.complex-set.hover, .span-array>.document>p mark.highlighted, .span-array>.document>p mark.complex-set.highlighted, .span-array.span-array>.document>p mark>mark.highlighted {\n",
       "    background: none;\n",
       "    background-color: var(--hover-highlight);\n",
       "}\n",
       "\n",
       "</style>\n",
       "<script>\n",
       "{\n",
       "            // Increment the version to invalidate the cached script\n",
       "const VERSION = 0.79\n",
       "const global_stylesheet = document.head.querySelector(\"style.span-array-css\")\n",
       "const local_stylesheet = document.currentScript.parentElement.querySelector(\"style.span-array-css\")\n",
       "\n",
       "if(window.SpanArray == undefined || window.SpanArray.VERSION == undefined || window.SpanArray.VERSION < VERSION) {\n",
       "\n",
       "    // Replace global SpanArray CSS with latest copy\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet != undefined) {\n",
       "            document.head.removeChild(global_stylesheet)\n",
       "        }\n",
       "        document.head.appendChild(local_stylesheet)\n",
       "    }\n",
       "\n",
       "    // Sets up the SpanArray global namespace\n",
       "    window.SpanArray = {}\n",
       "    window.SpanArray.VERSION = VERSION\n",
       "\n",
       "    window.SpanArray.TYPE_OVERLAP = 0;\n",
       "    window.SpanArray.TYPE_NESTED = 1;\n",
       "    window.SpanArray.TYPE_COMPLEX = 2;\n",
       "    window.SpanArray.TYPE_SOLO = 3;\n",
       "\n",
       "    const TYPE_OVERLAP = window.SpanArray.TYPE_OVERLAP;\n",
       "    const TYPE_NESTED = window.SpanArray.TYPE_NESTED;\n",
       "    const TYPE_COMPLEX = window.SpanArray.TYPE_COMPLEX;\n",
       "    const TYPE_SOLO = window.SpanArray.TYPE_SOLO;\n",
       "\n",
       "    function sanitize(input) {\n",
       "        let out = input.slice();\n",
       "        out = out.replace(/&/g, \"&amp;\")\n",
       "        out = out.replace(/</g, \"&lt;\")\n",
       "        out = out.replace(/>/g, \"&gt;\")\n",
       "        out = out.replace(/\\$/g, \"<span>&#36;</span>\")\n",
       "        out = out.replace(/\"/g, \"&quot;\")\n",
       "        out = out.replace(/(\\r|\\n)/g, \"<br>\")\n",
       "        return out;\n",
       "    }\n",
       "\n",
       "    /** Comparison function used to sort SpanArrays by position and length\n",
       "     *  Will sort by primarily by earliest beginning point. On a tie, will prioritize latest end point (first and largest)\n",
       "     *  Used by mark relationship algorithm.\n",
       "     */\n",
       "    function compareSpanArrays(a, b) {\n",
       "        const start_diff = a.begin - b.begin\n",
       "        if(start_diff == 0) {\n",
       "            return b.end - a.end\n",
       "        }\n",
       "        return start_diff;\n",
       "    }\n",
       "\n",
       "    /** Models an instance of a SpanArray, with document-separated spans and text\n",
       "     * NOTE: Using docs instead of documents to avoid unintentionally manipulating the global 'document' object.\n",
       "    */\n",
       "    class SpanArray {\n",
       "        constructor(docs, show_offsets, script_context) {\n",
       "            this.docs = docs\n",
       "            this.show_offsets = show_offsets\n",
       "            this.script_context = script_context\n",
       "\n",
       "            // For each doc, generate a lookup map for quick ID access\n",
       "            this.docs = this.docs.map(doc => {\n",
       "                doc.span_objects = Span.arrayFromSpanArray(doc.doc_spans)\n",
       "                doc.lookup_table = {}\n",
       "                doc.span_objects.forEach(span => {\n",
       "                    doc.lookup_table[span.id] = span\n",
       "                })\n",
       "                return doc\n",
       "            })\n",
       "        }\n",
       "\n",
       "        render() {\n",
       "            let span_array_frag = document.createDocumentFragment()\n",
       "            // For each document, create a document fragment and append to a document container\n",
       "            for(let doc_index = 0; doc_index < this.docs.length; doc_index++)\n",
       "            {\n",
       "                let doc = this.docs[doc_index]\n",
       "                let doc_container = document.createElement(\"div\")\n",
       "                // Using the data-doc-id attribute allows a selector to access a document's render by its index\n",
       "                doc_container.setAttribute(\"data-doc-id\", doc_index)\n",
       "                doc_container.classList.add(\"document\")\n",
       "                const document_fragment = getDocumentFragment(doc, this.show_offsets)\n",
       "                if(this.show_offsets) {\n",
       "                    attachDocumentEvents(document_fragment, doc, this)\n",
       "                }\n",
       "                doc_container.appendChild(document_fragment)\n",
       "                span_array_frag.appendChild(doc_container)\n",
       "            }\n",
       "            let container = this.script_context.parentElement.querySelector(\".span-array\")\n",
       "            if(container != undefined) {\n",
       "                container.innerHTML = \"\"\n",
       "                container.appendChild(span_array_frag)\n",
       "            } else {\n",
       "                console.error(\"No container found for SpanArray renderer\")\n",
       "            }\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.SpanArray = SpanArray\n",
       "\n",
       "\n",
       "    /** Models an instance of a Span and its relationship to other spans in the document */\n",
       "    class Span {\n",
       "\n",
       "        // Creates an ordered list of entries from a list of spans with struct [begin, end]\n",
       "        static arrayFromSpanArray(spanArray) {\n",
       "            let entries = []\n",
       "            let span;\n",
       "            for(let i = 0; i < spanArray.length; i++)\n",
       "            {\n",
       "                span = spanArray[i];\n",
       "                entries.push(new Span(i, span[0], span[1]))\n",
       "            }\n",
       "\n",
       "            entries = entries.sort(compareSpanArrays)\n",
       "\n",
       "            let set;\n",
       "            for(let i = 0; i < entries.length; i++) {\n",
       "                for(let j = i+1; j < entries.length && entries[j].begin < entries[i].end; j++) {\n",
       "                    if(entries[j].end <= entries[i].end) {\n",
       "                        set = {type: TYPE_NESTED, entry: entries[j]}\n",
       "                    } else {\n",
       "                        set = {type: TYPE_OVERLAP, entry: entries[j]}\n",
       "                    }\n",
       "                    entries[i].sets.push(set)\n",
       "                }\n",
       "            }\n",
       "\n",
       "            return entries\n",
       "        }\n",
       "\n",
       "        constructor(id, begin, end) {\n",
       "            this.id = id\n",
       "            this.begin = begin\n",
       "            this.end = end\n",
       "            this.sets = []\n",
       "            this.visible = true\n",
       "            this.highlighted = false\n",
       "        }\n",
       "\n",
       "        // Returns only visible sets\n",
       "        get valid_sets() {\n",
       "            let valid_sets = []\n",
       "\n",
       "            this.sets.forEach(set => {\n",
       "                if(set.entry.visible) valid_sets.push(set)\n",
       "            })\n",
       "\n",
       "            return valid_sets\n",
       "        }\n",
       "\n",
       "        // Returns true if mark should render as a compound set of spans\n",
       "        isComplex() {\n",
       "            for(let i = 0; i < this.valid_sets.length; i++) {\n",
       "                let otherMember = this.valid_sets[i].entry;\n",
       "                if(this.valid_sets[i].type == TYPE_OVERLAP && otherMember.visible) {\n",
       "                    return true;\n",
       "                } else {\n",
       "                    if(otherMember.valid_sets.length > 0 && otherMember.visible) {\n",
       "                        return true;\n",
       "                    }\n",
       "                }\n",
       "            }\n",
       "            return false;\n",
       "        }\n",
       "\n",
       "        // Gets the combined span of all connected elements\n",
       "        getSetSpan() {\n",
       "            let begin = this.begin\n",
       "            let end = this.end\n",
       "            let highest_id = this.id\n",
       "\n",
       "            this.valid_sets.forEach(set => {\n",
       "                let other = set.entry.getSetSpan()\n",
       "                if(other.begin < begin) begin = other.begin\n",
       "                if(other.end > end) end = other.end\n",
       "                if(other.highest_id > highest_id) highest_id = other.highest_id\n",
       "            })\n",
       "\n",
       "            return {begin: begin, end: end, highest_id: highest_id}\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.Span = Span\n",
       "\n",
       "    // Get the DocumentFragment for a single document \n",
       "    function getDocumentFragment(doc, show_offsets) {\n",
       "\n",
       "        const doc_text = doc.doc_text;\n",
       "        const entries = doc.span_objects;\n",
       "\n",
       "        let frag = document.createDocumentFragment()\n",
       "\n",
       "        // Render Table\n",
       "        if(show_offsets) {\n",
       "            let table = document.createElement(\"table\")\n",
       "            table.innerHTML = `\n",
       "            <thead>\n",
       "            <tr>\n",
       "                <th></th>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "                ${(doc['doc_token_spans'] != undefined) ? '<th>begin token</th> <th>end token</th>' : ''}\n",
       "                <th>context</th>\n",
       "            </tr>\n",
       "            </thead>`\n",
       "            let tbody = document.createElement(\"tbody\")\n",
       "            entries.forEach(entry => {\n",
       "                let row = document.createElement(\"tr\")\n",
       "                row.setAttribute(\"data-id\", entry.id.toString())\n",
       "                if(!entry.visible)\n",
       "                {\n",
       "                    row.classList.add(\"disabled\")\n",
       "                }\n",
       "                if(entry.highlighted)\n",
       "                {\n",
       "                    row.classList.add(\"highlighted\")\n",
       "                }\n",
       "\n",
       "                // Adds the span entry to the table. doc_text is sanitized by replacing the reserved\n",
       "                // symbols by their entity name representations\n",
       "                row.innerHTML += `\n",
       "                <td>\n",
       "                    <div class='sa-table-controls'>\n",
       "                    <button data-control='visibility' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 576 512\"><path d=\"M572.52 241.4C518.29 135.59 410.93 64 288 64S57.68 135.64 3.48 241.41a32.35 32.35 0 0 0 0 29.19C57.71 376.41 165.07 448 288 448s230.32-71.64 284.52-177.41a32.35 32.35 0 0 0 0-29.19zM288 400a144 144 0 1 1 144-144 143.93 143.93 0 0 1-144 144zm0-240a95.31 95.31 0 0 0-25.31 3.79 47.85 47.85 0 0 1-66.9 66.9A95.78 95.78 0 1 0 288 160z\"/></svg></button>\n",
       "                    <button data-control='highlight' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 512 512\"><path d=\"M256 160c-52.9 0-96 43.1-96 96s43.1 96 96 96 96-43.1 96-96-43.1-96-96-96zm246.4 80.5l-94.7-47.3 33.5-100.4c4.5-13.6-8.4-26.5-21.9-21.9l-100.4 33.5-47.4-94.8c-6.4-12.8-24.6-12.8-31 0l-47.3 94.7L92.7 70.8c-13.6-4.5-26.5 8.4-21.9 21.9l33.5 100.4-94.7 47.4c-12.8 6.4-12.8 24.6 0 31l94.7 47.3-33.5 100.5c-4.5 13.6 8.4 26.5 21.9 21.9l100.4-33.5 47.3 94.7c6.4 12.8 24.6 12.8 31 0l47.3-94.7 100.4 33.5c13.6 4.5 26.5-8.4 21.9-21.9l-33.5-100.4 94.7-47.3c13-6.5 13-24.7.2-31.1zm-155.9 106c-49.9 49.9-131.1 49.9-181 0-49.9-49.9-49.9-131.1 0-181 49.9-49.9 131.1-49.9 181 0 49.9 49.9 49.9 131.1 0 181z\"/></svg></button>\n",
       "                    </div>\n",
       "                </td>\n",
       "                <td><b>${entry.id.toString()}</b></td>\n",
       "                <td>${entry.begin}</td>\n",
       "                <td>${entry.end}</td>\n",
       "                ${(doc.doc_token_spans != undefined) ? `<td>${doc.doc_token_spans[entry.id][0]}</td><td>${doc.doc_token_spans[entry.id][1]}</td>` : ''}\n",
       "                <td>${sanitize(doc_text.substring(entry.begin, entry.end))}</td>`\n",
       "\n",
       "                tbody.appendChild(row)\n",
       "            })\n",
       "            table.appendChild(tbody)\n",
       "            frag.appendChild(table)\n",
       "        }\n",
       "\n",
       "        // Render Text\n",
       "        let highlight_regions = []\n",
       "        for(let i = 0; i < entries.length; i++)\n",
       "        {\n",
       "            if(!entries[i].visible) continue\n",
       "            if(entries[i].valid_sets.length > 0)\n",
       "            {\n",
       "                let span = entries[i].getSetSpan();\n",
       "                let ids = [entries[i].id, ...entries[i].valid_sets.map(set => { return set.entry.id })]\n",
       "                if(entries[i].isComplex()) {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_COMPLEX, ids: ids})\n",
       "                } else {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_NESTED, ids: ids})\n",
       "                }\n",
       "                i = span.highest_id\n",
       "            } else {\n",
       "                highlight_regions.push({begin: entries[i].begin, end: entries[i].end, type: TYPE_SOLO, ids: [entries[i].id]})\n",
       "            }\n",
       "        }\n",
       "\n",
       "        let paragraph = document.createElement(\"p\")\n",
       "        if(highlight_regions.length == 0) {\n",
       "            paragraph.innerHTML = sanitize(doc_text)\n",
       "        } else {\n",
       "            let begin = 0\n",
       "            highlight_regions.forEach(region => {\n",
       "                paragraph.innerHTML += sanitize(doc_text.substring(begin, region.begin))\n",
       "\n",
       "                let mark = document.createElement(\"mark\")\n",
       "                // The data-ids tag is a list of comma-separated reference IDs for matching Spans \n",
       "                mark.setAttribute(\"data-ids\", \"\");\n",
       "                if (region.type != TYPE_NESTED) {\n",
       "                    region.ids.forEach(id => {\n",
       "                        mark.setAttribute(\"data-ids\", mark.getAttribute(\"data-ids\") + `#${id},`)\n",
       "                        if(doc.lookup_table[id].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    })\n",
       "                    mark.innerHTML = sanitize(doc_text.substring(region.begin, region.end))\n",
       "                } else {\n",
       "                    mark.setAttribute(\"data-ids\", `#${region.ids[0]},`)\n",
       "                    if(doc.lookup_table[region.ids[0]].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    let nested_begin = region.begin\n",
       "                    region.ids.slice(1).forEach(nested_id => {\n",
       "                        let nested_region = doc.lookup_table[nested_id]\n",
       "                        mark.innerHTML += sanitize(doc_text.substring(nested_begin, nested_region.begin))\n",
       "                        let nested_mark = document.createElement(\"mark\")\n",
       "                        nested_mark.setAttribute(\"data-ids\", `#${nested_id},`)\n",
       "                        if(nested_region.highlighted) nested_mark.classList.add(\"highlighted\")\n",
       "                        nested_mark.innerHTML = sanitize(doc_text.substring(nested_region.begin, nested_region.end))\n",
       "                        nested_begin = nested_region.end\n",
       "                        mark.appendChild(nested_mark)\n",
       "                    })\n",
       "                    mark.innerHTML += sanitize(doc_text.substring(nested_begin, region.end))\n",
       "                }\n",
       "\n",
       "                if(region.type == TYPE_COMPLEX) {\n",
       "                    let markTag = document.createElement(\"span\")\n",
       "                    markTag.textContent = \"Set\"\n",
       "                    markTag.classList.add(\"mark-tag\")\n",
       "                    mark.classList.add(\"complex-set\")\n",
       "                    mark.appendChild(markTag)\n",
       "                }\n",
       "\n",
       "                begin = region.end\n",
       "                paragraph.appendChild(mark)\n",
       "            })\n",
       "            paragraph.innerHTML += sanitize(doc_text.substring(highlight_regions[highlight_regions.length - 1].end, doc_text.length))\n",
       "        }\n",
       "\n",
       "        frag.appendChild(paragraph)\n",
       "\n",
       "        return frag\n",
       "    }\n",
       "\n",
       "    /** Attach hover and click events to a document render via event delegation */\n",
       "    function attachDocumentEvents(fragment, doc_object, source_spanarray) {\n",
       "        const doc_table_body = fragment.querySelector(\"table>tbody\")\n",
       "        const doc_text = fragment.querySelector(\"p\")\n",
       "\n",
       "        // Hover highlight events\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        // Click disable/enable events\n",
       "\n",
       "        doc_table_body.addEventListener(\"click\", (event) => {\n",
       "            const closest_control_button = event.target.closest(\"button[data-control]\")\n",
       "            if(closest_control_button == undefined) return\n",
       "\n",
       "            const closest_tr = event.target.closest(\"tr\")\n",
       "            if(closest_tr == undefined) return\n",
       "\n",
       "            const matching_span = doc_object.lookup_table[closest_tr.getAttribute(\"data-id\")]\n",
       "            if(matching_span == undefined) return\n",
       "\n",
       "            switch(closest_control_button.getAttribute(\"data-control\")) {\n",
       "                case \"visibility\":\n",
       "                    {\n",
       "                        matching_span.visible = !matching_span.visible\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "                case \"highlight\":\n",
       "                    {\n",
       "                        matching_span.highlighted = !matching_span.highlighted\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "            }\n",
       "\n",
       "\n",
       "\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"click\", (event) => {\n",
       "            const closest_mark = event.target.closest(\"mark\")\n",
       "            if(closest_mark == undefined) return\n",
       "\n",
       "            // Preprocess ID string into a list of IDs\n",
       "            const ids = closest_mark.getAttribute(\"data-ids\")\n",
       "                .split(\",\")\n",
       "                .slice(0, -1)\n",
       "                .map(id => {\n",
       "                    return id.substring(1)\n",
       "                })\n",
       "\n",
       "            // If any of the connected IDs are highlighted, we set all spans in the list to not highlighted.\n",
       "            // Inversely, we want all spans highlighted if none were previously.\n",
       "\n",
       "            const highlighted_entry = ids.find(id => {\n",
       "                return doc_object.lookup_table[id].highlighted\n",
       "            })\n",
       "\n",
       "            const is_highlighted = (highlighted_entry != undefined)\n",
       "\n",
       "            ids.forEach(id => {\n",
       "                const span = doc_object.lookup_table[id]\n",
       "                if(span != undefined) span.highlighted = !is_highlighted\n",
       "            })\n",
       "\n",
       "            source_spanarray.render()\n",
       "        })\n",
       "    }\n",
       "} else {\n",
       "    // SpanArray JS is already defined and not an outdated copy\n",
       "    // Replace global SpanArray CSS with latest copy IFF global stylesheet is undefined\n",
       "\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet == undefined) {\n",
       "            document.head.appendChild(local_stylesheet)\n",
       "        } else {\n",
       "            document.currentScript.parentElement.removeChild(local_stylesheet)\n",
       "        }\n",
       "    }       \n",
       "}\n",
       "}\n",
       "</script>\n",
       "<div class=\"span-array\">\n",
       "\n",
       "    <div class='document'>\n",
       "        <table style='\n",
       "            table-layout: auto;\n",
       "            overflow: hidden;\n",
       "            width: 100%;\n",
       "            border-collapse: collapse;\n",
       "            '>\n",
       "            <thead style='font-variant-caps: all-petite-caps;'>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "                <th>begin token</th><th>end token</th>\n",
       "                <th style='text-align:right;width:100%'>context</th>\n",
       "            </tr></thead>\n",
       "            <tbody>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>0</b></td>\n",
       "            <td>0</td>\n",
       "            <td>129</td>\n",
       "\n",
       "            <td>0</td>\n",
       "            <td>27</td>\n",
       "\n",
       "            <td>In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>1</b></td>\n",
       "            <td>130</td>\n",
       "            <td>361</td>\n",
       "\n",
       "            <td>27</td>\n",
       "            <td>86</td>\n",
       "\n",
       "            <td>Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin&#39;s troubadours.</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>2</b></td>\n",
       "            <td>362</td>\n",
       "            <td>512</td>\n",
       "\n",
       "            <td>86</td>\n",
       "            <td>119</td>\n",
       "\n",
       "            <td>Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is &quot;a silly place&quot;.</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>3</b></td>\n",
       "            <td>513</td>\n",
       "            <td>629</td>\n",
       "\n",
       "            <td>119</td>\n",
       "            <td>147</td>\n",
       "\n",
       "            <td>As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.</td>\n",
       "        </tr>\n",
       "\n",
       "            </tbody>\n",
       "        </table>\n",
       "        <p style='\n",
       "            padding: 1em;\n",
       "            line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "            '>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin&#39;s troubadours.</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is &quot;a silly place&quot;.</span>\n",
       "\n",
       "\n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.</span>\n",
       "\n",
       "        </p>\n",
       "    </div>\n",
       "\n",
       "    <span style=\"font-size: 0.8em;color: #b3b3b3;\">Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.</span>\n",
       "</div>\n",
       "<script>\n",
       "    {\n",
       "        const Span = window.SpanArray.Span\n",
       "        const script_context = document.currentScript\n",
       "        const documents = []\n",
       "\n",
       "    {\n",
       "\n",
       "    const doc_spans = [[0,129],[130,361],[362,512],[513,629]]\n",
       "    const doc_text = 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin\\'s troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.'\n",
       "\n",
       "        const doc_token_spans = [[0,27],[27,86],[86,119],[119,147]]\n",
       "        documents.push({doc_text: doc_text, doc_spans: doc_spans, doc_token_spans: doc_token_spans})\n",
       "\n",
       "    }\n",
       "\n",
       "        const instance = new window.SpanArray.SpanArray(documents, true, script_context)\n",
       "        instance.render()\n",
       "    }\n",
       "</script>\n",
       "\n"
      ],
      "text/plain": [
       "<TokenSpanArray>\n",
       "[     [0, 129): 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain [...]',\n",
       "  [130, 361): 'Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, [...]',\n",
       " [362, 512): 'Arthur leads the men to Camelot, but upon further consideration (thanks to [...]',\n",
       "  [513, 629): 'As they turn away, God (an image of W. G. Grace) speaks to them and gives [...]']\n",
       "Length: 4, dtype: TokenSpanDtype"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df[\"sentence\"].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As the table in the previous cell's output shows, each span in the `TokenSpanArray` has begin and end offsets in terms \n",
    "of both characters and tokens. Internally, the `TokenSpanArray` is stored as follows:\n",
    "* A Numpy array of begin offsets, measured in tokens\n",
    "* A Numpy array of end offsets in tokens\n",
    "* A reference to a `SpanArray` of spans representing the tokens\n",
    "\n",
    "The `TokenSpanArray` object computes the character offsets and covered text of its spans on demand.\n",
    "\n",
    "Applications can access the internals of a `TokenSpanArray` via the properties `begin_token`, `end_token`, and `document_tokens`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Offset information (stored in the TokenSpanArray):\n",
      "`begin_token` property: [  0  27  86 119]\n",
      "  `end_token` property: [ 27  86 119 147]\n",
      "   \n",
      "Token information (`document_tokens` property, shared among mulitple TokenSpanArrays):\n",
      "<SpanArray>\n",
      "[         [0, 2): 'In',          [3, 5): 'AD',         [6, 9): '932',\n",
      "          [9, 10): ',',      [11, 15): 'King',    [16, 22): 'Arthur',\n",
      "       [23, 26): 'and',       [27, 30): 'his',    [31, 37): 'squire',\n",
      "         [37, 38): ',',\n",
      " ...\n",
      "   [581, 586): 'gives',  [587, 593): 'Arthur',     [594, 597): 'the',\n",
      "    [598, 602): 'task',      [603, 605): 'of', [606, 613): 'finding',\n",
      "     [614, 617): 'the',    [618, 622): 'Holy',   [623, 628): 'Grail',\n",
      "       [628, 629): '.']\n",
      "Length: 147, dtype: SpanDtype\n",
      "\n"
     ]
    }
   ],
   "source": [
    "token_span_array = syntax_df[\"sentence\"].unique()\n",
    "print(f\"\"\"\n",
    "Offset information (stored in the TokenSpanArray):\n",
    "`begin_token` property: {token_span_array.begin_token}\n",
    "  `end_token` property: {token_span_array.end_token}\n",
    "   \n",
    "Token information (`document_tokens` property, shared among mulitple TokenSpanArrays):\n",
    "{token_span_array.document_tokens}\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The extension types in Text Extensions for Pandas support the full set of Pandas array operations. For example, we can build up a DataFrame of the spans of all sentences in the document by applying `pandas.DataFrame.drop_duplicates()` to the `sentence` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              sentence\n",
       "0    [0, 129): 'In AD 932, King Arthur and his squi...\n",
       "27   [130, 361): 'Along the way, he recruits Sir Be...\n",
       "86   [362, 512): 'Arthur leads the men to Camelot, ...\n",
       "119  [513, 629): 'As they turn away, God (an image ..."
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "syntax_df[[\"sentence\"]].drop_duplicates()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A More Complex Example\n",
    "\n",
    "Now that we've had an introduction to the Text Extensions for Pandas span types, let's take another\n",
    "look at the DataFrame that our \"find pronouns by sentence\" code produced:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence</th>\n",
       "      <th>span</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "      <td>[27, 30): 'his'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[145, 147): 'he'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>73</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[301, 305): 'this'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>[323, 328): 'their'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>[455, 457): 'he'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>111</th>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>[490, 492): 'it'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "      <td>[516, 520): 'they'</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "      <td>[572, 576): 'them'</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              sentence                 span\n",
       "7    [0, 129): 'In AD 932, King Arthur and his squi...      [27, 30): 'his'\n",
       "31   [130, 361): 'Along the way, he recruits Sir Be...     [145, 147): 'he'\n",
       "73   [130, 361): 'Along the way, he recruits Sir Be...   [301, 305): 'this'\n",
       "79   [130, 361): 'Along the way, he recruits Sir Be...  [323, 328): 'their'\n",
       "104  [362, 512): 'Arthur leads the men to Camelot, ...     [455, 457): 'he'\n",
       "111  [362, 512): 'Arthur leads the men to Camelot, ...     [490, 492): 'it'\n",
       "120  [513, 629): 'As they turn away, God (an image ...   [516, 520): 'they'\n",
       "135  [513, 629): 'As they turn away, God (an image ...   [572, 576): 'them'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pronouns_by_sentence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This DataFrame contains two columns backed by Text Extensions for Pandas span types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sentence    TokenSpanDtype\n",
       "span             SpanDtype\n",
       "dtype: object"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pronouns_by_sentence.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That means that we can use the full power of Pandas' high-level operations on this DataFrame. \n",
    "Let's use the output of our earlier task to build up a more complex task: \n",
    "*Highlight all pronouns in sentences containing the word \"Arthur\"*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<style class=\"span-array-css\">\n",
       "            .span-array {\n",
       "    --thead-background-color: var(--jp-layout-color1, inherit);\n",
       "    --thead-text-color: var(--jp-ui-font-color1, inherit);\n",
       "    --tbody-background-color-1: var(--jp-layout-color1, inherit);\n",
       "    --tbody-background-color-2: var(--jp-layout-color2, inherit);\n",
       "    --tbody-background-color-hover: var(--jp-rendermime-table-row-hover-background, var(--jp-layout-color3, inherit));\n",
       "    --tbody-background-color-disabled: var(--jp-layout-color4, #ccccd1);\n",
       "    --tbody-text-color: var(--jp-ui-font-color0, inherit);\n",
       "    --tbody-text-color-disabled: var(--jp-ui-inverse-font-color0, #b3b3b9);\n",
       "    --table-font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "\n",
       "    --table-control-background: rgba(0, 0, 0, 0.2);\n",
       "    --table-control-color: var(--jp-ui-font-color0);\n",
       "    --table-control-border: 1px solid rgba(0, 0, 0, 0.8);\n",
       "    --table-control-border-radius: 0.5em;\n",
       "\n",
       "    --root-highlight: #a0c4ff;\n",
       "    --nested-highlight: #ffadad;\n",
       "    --hover-highlight: #ffd6a5;\n",
       "\n",
       "    --inverted-background-color: #0B525B;\n",
       "    --inverted-text-color: rgb(243, 243, 243);\n",
       "    --paragraph-border-color: var(--jp-layout-color2, inherit);\n",
       "\n",
       "    --fallback-font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif\n",
       "}\n",
       "\n",
       "/* Table of span offsets */\n",
       ".span-array>.document>table {\n",
       "    table-layout: auto;\n",
       "    overflow: hidden;\n",
       "    width: 100%;\n",
       "    border-collapse: collapse;\n",
       "    font-family: var(--table-font-family);\n",
       "}\n",
       "\n",
       ".span-array>.document>table thead {\n",
       "    font-variant-caps: all-petite-caps;\n",
       "}\n",
       "\n",
       ".span-array>.document>table th {\n",
       "    padding: 1em;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:last-child, .span-array>.document>table tr>th:last-child {\n",
       "    text-align: right;\n",
       "    width: 100%;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr>td:not(tr>td:last-child), .span-array>.document>table tr>th:not(tr>th:last-child) {\n",
       "    text-align: left;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n), .span-array>.document>table tr.disabled.hover:nth-child(n) {\n",
       "    background-color: var(--tbody-background-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.disabled:nth-child(n)>td, .span-array>.document>table tr.disabled.hover:nth-child(n)>td {\n",
       "    color: var(--tbody-text-color-disabled);\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.hover:not(.disabled) {\n",
       "    background: var(--jp-rendermime-table-row-hover-background);\n",
       "}\n",
       "\n",
       "/* Table control buttons */\n",
       "\n",
       ".span-array>.document>table td.sa-table-controls-container {\n",
       "    vertical-align: center;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls {\n",
       "    display: flex;\n",
       "    flex-direction: row;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button {\n",
       "    background-color: var(--table-control-background);\n",
       "    color: var(--table-control-color);\n",
       "    border: var(--table-control-border);\n",
       "    border-right: none;\n",
       "    border-radius: 0;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:first-child {\n",
       "    border-radius: var(--table-control-border-radius) 0 0 var(--table-control-border-radius);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button:last-child {\n",
       "    border-radius: 0 var(--table-control-border-radius) var(--table-control-border-radius) 0;\n",
       "    border-right: var(--table-control-border);\n",
       "}\n",
       "\n",
       ".span-array>.document>table div.sa-table-controls button[data-control=\"visibility\"]:hover {\n",
       "    background-color: var(--root-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"]:hover {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       ".span-array>.document>table tr.highlighted:not(tr.disabled) div.sa-table-controls button[data-control=\"highlight\"] {\n",
       "    background-color: var(--hover-highlight);\n",
       "    color: black;\n",
       "}\n",
       "\n",
       "/* Styling for spans within document context */\n",
       ".span-array>.document>p {\n",
       "    border:1px solid var(--paragraph-border-color);\n",
       "    border-radius: 0.2em;\n",
       "    padding: 1em;\n",
       "    line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "    box-sizing: border-box;\n",
       "    font-family: var(--jp-content-font-family, var(--fallback-font-family, inherit));\n",
       "}\n",
       "\n",
       "body[data-jp-theme-light=\"false\"].span-array>.document>p {\n",
       "    border: 1px solid black;\n",
       "    background-color: var(--inverted-background-color);\n",
       "    color: var(--inverted-text-color);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    padding: 0.4em 0.4em;\n",
       "    border-radius: 0.35em;\n",
       "    cursor: pointer;\n",
       "}\n",
       "\n",
       ".span-array>.document>p .mark {\n",
       "    color: var(black);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark {\n",
       "    background-color: var(--root-highlight);\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>mark {\n",
       "    background-color: var(--nested-highlight);\n",
       "    padding: 0.2em 0.4em;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.complex-set {\n",
       "    background: linear-gradient(to right, var(--root-highlight), var(--nested-highlight))\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark>span.mark-tag {\n",
       "    font-weight: bolder;\n",
       "    font-size: 0.8em;\n",
       "    font-variant: small-caps;\n",
       "    font-variant-caps: all-small-caps;\n",
       "    margin-left: 8px;\n",
       "    text-transform: uppercase;\n",
       "}\n",
       "\n",
       ".span-array>.document>p mark.hover, .span-array.span-array>.document>p mark>mark.hover, .span-array>.document>p mark.complex-set.hover, .span-array>.document>p mark.highlighted, .span-array>.document>p mark.complex-set.highlighted, .span-array.span-array>.document>p mark>mark.highlighted {\n",
       "    background: none;\n",
       "    background-color: var(--hover-highlight);\n",
       "}\n",
       "\n",
       "</style>\n",
       "<script>\n",
       "{\n",
       "            // Increment the version to invalidate the cached script\n",
       "const VERSION = 0.79\n",
       "const global_stylesheet = document.head.querySelector(\"style.span-array-css\")\n",
       "const local_stylesheet = document.currentScript.parentElement.querySelector(\"style.span-array-css\")\n",
       "\n",
       "if(window.SpanArray == undefined || window.SpanArray.VERSION == undefined || window.SpanArray.VERSION < VERSION) {\n",
       "\n",
       "    // Replace global SpanArray CSS with latest copy\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet != undefined) {\n",
       "            document.head.removeChild(global_stylesheet)\n",
       "        }\n",
       "        document.head.appendChild(local_stylesheet)\n",
       "    }\n",
       "\n",
       "    // Sets up the SpanArray global namespace\n",
       "    window.SpanArray = {}\n",
       "    window.SpanArray.VERSION = VERSION\n",
       "\n",
       "    window.SpanArray.TYPE_OVERLAP = 0;\n",
       "    window.SpanArray.TYPE_NESTED = 1;\n",
       "    window.SpanArray.TYPE_COMPLEX = 2;\n",
       "    window.SpanArray.TYPE_SOLO = 3;\n",
       "\n",
       "    const TYPE_OVERLAP = window.SpanArray.TYPE_OVERLAP;\n",
       "    const TYPE_NESTED = window.SpanArray.TYPE_NESTED;\n",
       "    const TYPE_COMPLEX = window.SpanArray.TYPE_COMPLEX;\n",
       "    const TYPE_SOLO = window.SpanArray.TYPE_SOLO;\n",
       "\n",
       "    function sanitize(input) {\n",
       "        let out = input.slice();\n",
       "        out = out.replace(/&/g, \"&amp;\")\n",
       "        out = out.replace(/</g, \"&lt;\")\n",
       "        out = out.replace(/>/g, \"&gt;\")\n",
       "        out = out.replace(/\\$/g, \"<span>&#36;</span>\")\n",
       "        out = out.replace(/\"/g, \"&quot;\")\n",
       "        out = out.replace(/(\\r|\\n)/g, \"<br>\")\n",
       "        return out;\n",
       "    }\n",
       "\n",
       "    /** Comparison function used to sort SpanArrays by position and length\n",
       "     *  Will sort by primarily by earliest beginning point. On a tie, will prioritize latest end point (first and largest)\n",
       "     *  Used by mark relationship algorithm.\n",
       "     */\n",
       "    function compareSpanArrays(a, b) {\n",
       "        const start_diff = a.begin - b.begin\n",
       "        if(start_diff == 0) {\n",
       "            return b.end - a.end\n",
       "        }\n",
       "        return start_diff;\n",
       "    }\n",
       "\n",
       "    /** Models an instance of a SpanArray, with document-separated spans and text\n",
       "     * NOTE: Using docs instead of documents to avoid unintentionally manipulating the global 'document' object.\n",
       "    */\n",
       "    class SpanArray {\n",
       "        constructor(docs, show_offsets, script_context) {\n",
       "            this.docs = docs\n",
       "            this.show_offsets = show_offsets\n",
       "            this.script_context = script_context\n",
       "\n",
       "            // For each doc, generate a lookup map for quick ID access\n",
       "            this.docs = this.docs.map(doc => {\n",
       "                doc.span_objects = Span.arrayFromSpanArray(doc.doc_spans)\n",
       "                doc.lookup_table = {}\n",
       "                doc.span_objects.forEach(span => {\n",
       "                    doc.lookup_table[span.id] = span\n",
       "                })\n",
       "                return doc\n",
       "            })\n",
       "        }\n",
       "\n",
       "        render() {\n",
       "            let span_array_frag = document.createDocumentFragment()\n",
       "            // For each document, create a document fragment and append to a document container\n",
       "            for(let doc_index = 0; doc_index < this.docs.length; doc_index++)\n",
       "            {\n",
       "                let doc = this.docs[doc_index]\n",
       "                let doc_container = document.createElement(\"div\")\n",
       "                // Using the data-doc-id attribute allows a selector to access a document's render by its index\n",
       "                doc_container.setAttribute(\"data-doc-id\", doc_index)\n",
       "                doc_container.classList.add(\"document\")\n",
       "                const document_fragment = getDocumentFragment(doc, this.show_offsets)\n",
       "                if(this.show_offsets) {\n",
       "                    attachDocumentEvents(document_fragment, doc, this)\n",
       "                }\n",
       "                doc_container.appendChild(document_fragment)\n",
       "                span_array_frag.appendChild(doc_container)\n",
       "            }\n",
       "            let container = this.script_context.parentElement.querySelector(\".span-array\")\n",
       "            if(container != undefined) {\n",
       "                container.innerHTML = \"\"\n",
       "                container.appendChild(span_array_frag)\n",
       "            } else {\n",
       "                console.error(\"No container found for SpanArray renderer\")\n",
       "            }\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.SpanArray = SpanArray\n",
       "\n",
       "\n",
       "    /** Models an instance of a Span and its relationship to other spans in the document */\n",
       "    class Span {\n",
       "\n",
       "        // Creates an ordered list of entries from a list of spans with struct [begin, end]\n",
       "        static arrayFromSpanArray(spanArray) {\n",
       "            let entries = []\n",
       "            let span;\n",
       "            for(let i = 0; i < spanArray.length; i++)\n",
       "            {\n",
       "                span = spanArray[i];\n",
       "                entries.push(new Span(i, span[0], span[1]))\n",
       "            }\n",
       "\n",
       "            entries = entries.sort(compareSpanArrays)\n",
       "\n",
       "            let set;\n",
       "            for(let i = 0; i < entries.length; i++) {\n",
       "                for(let j = i+1; j < entries.length && entries[j].begin < entries[i].end; j++) {\n",
       "                    if(entries[j].end <= entries[i].end) {\n",
       "                        set = {type: TYPE_NESTED, entry: entries[j]}\n",
       "                    } else {\n",
       "                        set = {type: TYPE_OVERLAP, entry: entries[j]}\n",
       "                    }\n",
       "                    entries[i].sets.push(set)\n",
       "                }\n",
       "            }\n",
       "\n",
       "            return entries\n",
       "        }\n",
       "\n",
       "        constructor(id, begin, end) {\n",
       "            this.id = id\n",
       "            this.begin = begin\n",
       "            this.end = end\n",
       "            this.sets = []\n",
       "            this.visible = true\n",
       "            this.highlighted = false\n",
       "        }\n",
       "\n",
       "        // Returns only visible sets\n",
       "        get valid_sets() {\n",
       "            let valid_sets = []\n",
       "\n",
       "            this.sets.forEach(set => {\n",
       "                if(set.entry.visible) valid_sets.push(set)\n",
       "            })\n",
       "\n",
       "            return valid_sets\n",
       "        }\n",
       "\n",
       "        // Returns true if mark should render as a compound set of spans\n",
       "        isComplex() {\n",
       "            for(let i = 0; i < this.valid_sets.length; i++) {\n",
       "                let otherMember = this.valid_sets[i].entry;\n",
       "                if(this.valid_sets[i].type == TYPE_OVERLAP && otherMember.visible) {\n",
       "                    return true;\n",
       "                } else {\n",
       "                    if(otherMember.valid_sets.length > 0 && otherMember.visible) {\n",
       "                        return true;\n",
       "                    }\n",
       "                }\n",
       "            }\n",
       "            return false;\n",
       "        }\n",
       "\n",
       "        // Gets the combined span of all connected elements\n",
       "        getSetSpan() {\n",
       "            let begin = this.begin\n",
       "            let end = this.end\n",
       "            let highest_id = this.id\n",
       "\n",
       "            this.valid_sets.forEach(set => {\n",
       "                let other = set.entry.getSetSpan()\n",
       "                if(other.begin < begin) begin = other.begin\n",
       "                if(other.end > end) end = other.end\n",
       "                if(other.highest_id > highest_id) highest_id = other.highest_id\n",
       "            })\n",
       "\n",
       "            return {begin: begin, end: end, highest_id: highest_id}\n",
       "        }\n",
       "    }\n",
       "\n",
       "    window.SpanArray.Span = Span\n",
       "\n",
       "    // Get the DocumentFragment for a single document \n",
       "    function getDocumentFragment(doc, show_offsets) {\n",
       "\n",
       "        const doc_text = doc.doc_text;\n",
       "        const entries = doc.span_objects;\n",
       "\n",
       "        let frag = document.createDocumentFragment()\n",
       "\n",
       "        // Render Table\n",
       "        if(show_offsets) {\n",
       "            let table = document.createElement(\"table\")\n",
       "            table.innerHTML = `\n",
       "            <thead>\n",
       "            <tr>\n",
       "                <th></th>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "                ${(doc['doc_token_spans'] != undefined) ? '<th>begin token</th> <th>end token</th>' : ''}\n",
       "                <th>context</th>\n",
       "            </tr>\n",
       "            </thead>`\n",
       "            let tbody = document.createElement(\"tbody\")\n",
       "            entries.forEach(entry => {\n",
       "                let row = document.createElement(\"tr\")\n",
       "                row.setAttribute(\"data-id\", entry.id.toString())\n",
       "                if(!entry.visible)\n",
       "                {\n",
       "                    row.classList.add(\"disabled\")\n",
       "                }\n",
       "                if(entry.highlighted)\n",
       "                {\n",
       "                    row.classList.add(\"highlighted\")\n",
       "                }\n",
       "\n",
       "                // Adds the span entry to the table. doc_text is sanitized by replacing the reserved\n",
       "                // symbols by their entity name representations\n",
       "                row.innerHTML += `\n",
       "                <td>\n",
       "                    <div class='sa-table-controls'>\n",
       "                    <button data-control='visibility' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 576 512\"><path d=\"M572.52 241.4C518.29 135.59 410.93 64 288 64S57.68 135.64 3.48 241.41a32.35 32.35 0 0 0 0 29.19C57.71 376.41 165.07 448 288 448s230.32-71.64 284.52-177.41a32.35 32.35 0 0 0 0-29.19zM288 400a144 144 0 1 1 144-144 143.93 143.93 0 0 1-144 144zm0-240a95.31 95.31 0 0 0-25.31 3.79 47.85 47.85 0 0 1-66.9 66.9A95.78 95.78 0 1 0 288 160z\"/></svg></button>\n",
       "                    <button data-control='highlight' style='width:1em'><svg style='display:block;margin:0.2em auto;' xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 512 512\"><path d=\"M256 160c-52.9 0-96 43.1-96 96s43.1 96 96 96 96-43.1 96-96-43.1-96-96-96zm246.4 80.5l-94.7-47.3 33.5-100.4c4.5-13.6-8.4-26.5-21.9-21.9l-100.4 33.5-47.4-94.8c-6.4-12.8-24.6-12.8-31 0l-47.3 94.7L92.7 70.8c-13.6-4.5-26.5 8.4-21.9 21.9l33.5 100.4-94.7 47.4c-12.8 6.4-12.8 24.6 0 31l94.7 47.3-33.5 100.5c-4.5 13.6 8.4 26.5 21.9 21.9l100.4-33.5 47.3 94.7c6.4 12.8 24.6 12.8 31 0l47.3-94.7 100.4 33.5c13.6 4.5 26.5-8.4 21.9-21.9l-33.5-100.4 94.7-47.3c13-6.5 13-24.7.2-31.1zm-155.9 106c-49.9 49.9-131.1 49.9-181 0-49.9-49.9-49.9-131.1 0-181 49.9-49.9 131.1-49.9 181 0 49.9 49.9 49.9 131.1 0 181z\"/></svg></button>\n",
       "                    </div>\n",
       "                </td>\n",
       "                <td><b>${entry.id.toString()}</b></td>\n",
       "                <td>${entry.begin}</td>\n",
       "                <td>${entry.end}</td>\n",
       "                ${(doc.doc_token_spans != undefined) ? `<td>${doc.doc_token_spans[entry.id][0]}</td><td>${doc.doc_token_spans[entry.id][1]}</td>` : ''}\n",
       "                <td>${sanitize(doc_text.substring(entry.begin, entry.end))}</td>`\n",
       "\n",
       "                tbody.appendChild(row)\n",
       "            })\n",
       "            table.appendChild(tbody)\n",
       "            frag.appendChild(table)\n",
       "        }\n",
       "\n",
       "        // Render Text\n",
       "        let highlight_regions = []\n",
       "        for(let i = 0; i < entries.length; i++)\n",
       "        {\n",
       "            if(!entries[i].visible) continue\n",
       "            if(entries[i].valid_sets.length > 0)\n",
       "            {\n",
       "                let span = entries[i].getSetSpan();\n",
       "                let ids = [entries[i].id, ...entries[i].valid_sets.map(set => { return set.entry.id })]\n",
       "                if(entries[i].isComplex()) {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_COMPLEX, ids: ids})\n",
       "                } else {\n",
       "                    highlight_regions.push({begin: span.begin, end: span.end, type: TYPE_NESTED, ids: ids})\n",
       "                }\n",
       "                i = span.highest_id\n",
       "            } else {\n",
       "                highlight_regions.push({begin: entries[i].begin, end: entries[i].end, type: TYPE_SOLO, ids: [entries[i].id]})\n",
       "            }\n",
       "        }\n",
       "\n",
       "        let paragraph = document.createElement(\"p\")\n",
       "        if(highlight_regions.length == 0) {\n",
       "            paragraph.innerHTML = sanitize(doc_text)\n",
       "        } else {\n",
       "            let begin = 0\n",
       "            highlight_regions.forEach(region => {\n",
       "                paragraph.innerHTML += sanitize(doc_text.substring(begin, region.begin))\n",
       "\n",
       "                let mark = document.createElement(\"mark\")\n",
       "                // The data-ids tag is a list of comma-separated reference IDs for matching Spans \n",
       "                mark.setAttribute(\"data-ids\", \"\");\n",
       "                if (region.type != TYPE_NESTED) {\n",
       "                    region.ids.forEach(id => {\n",
       "                        mark.setAttribute(\"data-ids\", mark.getAttribute(\"data-ids\") + `#${id},`)\n",
       "                        if(doc.lookup_table[id].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    })\n",
       "                    mark.innerHTML = sanitize(doc_text.substring(region.begin, region.end))\n",
       "                } else {\n",
       "                    mark.setAttribute(\"data-ids\", `#${region.ids[0]},`)\n",
       "                    if(doc.lookup_table[region.ids[0]].highlighted) mark.classList.add(\"highlighted\")\n",
       "                    let nested_begin = region.begin\n",
       "                    region.ids.slice(1).forEach(nested_id => {\n",
       "                        let nested_region = doc.lookup_table[nested_id]\n",
       "                        mark.innerHTML += sanitize(doc_text.substring(nested_begin, nested_region.begin))\n",
       "                        let nested_mark = document.createElement(\"mark\")\n",
       "                        nested_mark.setAttribute(\"data-ids\", `#${nested_id},`)\n",
       "                        if(nested_region.highlighted) nested_mark.classList.add(\"highlighted\")\n",
       "                        nested_mark.innerHTML = sanitize(doc_text.substring(nested_region.begin, nested_region.end))\n",
       "                        nested_begin = nested_region.end\n",
       "                        mark.appendChild(nested_mark)\n",
       "                    })\n",
       "                    mark.innerHTML += sanitize(doc_text.substring(nested_begin, region.end))\n",
       "                }\n",
       "\n",
       "                if(region.type == TYPE_COMPLEX) {\n",
       "                    let markTag = document.createElement(\"span\")\n",
       "                    markTag.textContent = \"Set\"\n",
       "                    markTag.classList.add(\"mark-tag\")\n",
       "                    mark.classList.add(\"complex-set\")\n",
       "                    mark.appendChild(markTag)\n",
       "                }\n",
       "\n",
       "                begin = region.end\n",
       "                paragraph.appendChild(mark)\n",
       "            })\n",
       "            paragraph.innerHTML += sanitize(doc_text.substring(highlight_regions[highlight_regions.length - 1].end, doc_text.length))\n",
       "        }\n",
       "\n",
       "        frag.appendChild(paragraph)\n",
       "\n",
       "        return frag\n",
       "    }\n",
       "\n",
       "    /** Attach hover and click events to a document render via event delegation */\n",
       "    function attachDocumentEvents(fragment, doc_object, source_spanarray) {\n",
       "        const doc_table_body = fragment.querySelector(\"table>tbody\")\n",
       "        const doc_text = fragment.querySelector(\"p\")\n",
       "\n",
       "        // Hover highlight events\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_table_body.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"TR\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const span_id = event.target.getAttribute(\"data-id\")\n",
       "                const marks = doc_text.querySelectorAll(\"mark[data-ids]\")\n",
       "                Array.from(marks)\n",
       "                    .filter(mark => {\n",
       "                        return mark.getAttribute(\"data-ids\").includes(`#${span_id},`)\n",
       "                    })\n",
       "                    .forEach(related_mark => {\n",
       "                        related_mark.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerenter\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.add(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.add(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"pointerleave\", (event) => {\n",
       "            if(event.target.nodeName == \"MARK\") {\n",
       "                event.target.classList.remove(\"hover\")\n",
       "                const ids = event.target.getAttribute(\"data-ids\").split(\",\").slice(0, -1)\n",
       "                Array.from(ids)\n",
       "                    .map(id_tag => {\n",
       "                        return id_tag.substring(1)\n",
       "                    })\n",
       "                    .forEach(id => {\n",
       "                        const entry = doc_table_body.querySelector(`tr[data-id=\"${id}\"]`)\n",
       "                        entry.classList.remove(\"hover\")\n",
       "                    })\n",
       "            }\n",
       "        }, true)\n",
       "\n",
       "        // Click disable/enable events\n",
       "\n",
       "        doc_table_body.addEventListener(\"click\", (event) => {\n",
       "            const closest_control_button = event.target.closest(\"button[data-control]\")\n",
       "            if(closest_control_button == undefined) return\n",
       "\n",
       "            const closest_tr = event.target.closest(\"tr\")\n",
       "            if(closest_tr == undefined) return\n",
       "\n",
       "            const matching_span = doc_object.lookup_table[closest_tr.getAttribute(\"data-id\")]\n",
       "            if(matching_span == undefined) return\n",
       "\n",
       "            switch(closest_control_button.getAttribute(\"data-control\")) {\n",
       "                case \"visibility\":\n",
       "                    {\n",
       "                        matching_span.visible = !matching_span.visible\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "                case \"highlight\":\n",
       "                    {\n",
       "                        matching_span.highlighted = !matching_span.highlighted\n",
       "                        source_spanarray.render()\n",
       "                    }\n",
       "                    break;\n",
       "            }\n",
       "\n",
       "\n",
       "\n",
       "        }, true)\n",
       "\n",
       "        doc_text.addEventListener(\"click\", (event) => {\n",
       "            const closest_mark = event.target.closest(\"mark\")\n",
       "            if(closest_mark == undefined) return\n",
       "\n",
       "            // Preprocess ID string into a list of IDs\n",
       "            const ids = closest_mark.getAttribute(\"data-ids\")\n",
       "                .split(\",\")\n",
       "                .slice(0, -1)\n",
       "                .map(id => {\n",
       "                    return id.substring(1)\n",
       "                })\n",
       "\n",
       "            // If any of the connected IDs are highlighted, we set all spans in the list to not highlighted.\n",
       "            // Inversely, we want all spans highlighted if none were previously.\n",
       "\n",
       "            const highlighted_entry = ids.find(id => {\n",
       "                return doc_object.lookup_table[id].highlighted\n",
       "            })\n",
       "\n",
       "            const is_highlighted = (highlighted_entry != undefined)\n",
       "\n",
       "            ids.forEach(id => {\n",
       "                const span = doc_object.lookup_table[id]\n",
       "                if(span != undefined) span.highlighted = !is_highlighted\n",
       "            })\n",
       "\n",
       "            source_spanarray.render()\n",
       "        })\n",
       "    }\n",
       "} else {\n",
       "    // SpanArray JS is already defined and not an outdated copy\n",
       "    // Replace global SpanArray CSS with latest copy IFF global stylesheet is undefined\n",
       "\n",
       "    if(local_stylesheet != undefined) {\n",
       "        if(global_stylesheet == undefined) {\n",
       "            document.head.appendChild(local_stylesheet)\n",
       "        } else {\n",
       "            document.currentScript.parentElement.removeChild(local_stylesheet)\n",
       "        }\n",
       "    }       \n",
       "}\n",
       "}\n",
       "</script>\n",
       "<div class=\"span-array\">\n",
       "\n",
       "    <div class='document'>\n",
       "        <table style='\n",
       "            table-layout: auto;\n",
       "            overflow: hidden;\n",
       "            width: 100%;\n",
       "            border-collapse: collapse;\n",
       "            '>\n",
       "            <thead style='font-variant-caps: all-petite-caps;'>\n",
       "                <th></th>\n",
       "                <th>begin</th>\n",
       "                <th>end</th>\n",
       "\n",
       "                <th style='text-align:right;width:100%'>context</th>\n",
       "            </tr></thead>\n",
       "            <tbody>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>0</b></td>\n",
       "            <td>27</td>\n",
       "            <td>30</td>\n",
       "\n",
       "            <td>his</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>1</b></td>\n",
       "            <td>455</td>\n",
       "            <td>457</td>\n",
       "\n",
       "            <td>he</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>2</b></td>\n",
       "            <td>490</td>\n",
       "            <td>492</td>\n",
       "\n",
       "            <td>it</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>3</b></td>\n",
       "            <td>516</td>\n",
       "            <td>520</td>\n",
       "\n",
       "            <td>they</td>\n",
       "        </tr>\n",
       "\n",
       "        <tr>\n",
       "            <td><b>4</b></td>\n",
       "            <td>572</td>\n",
       "            <td>576</td>\n",
       "\n",
       "            <td>them</td>\n",
       "        </tr>\n",
       "\n",
       "            </tbody>\n",
       "        </table>\n",
       "        <p style='\n",
       "            padding: 1em;\n",
       "            line-height: calc(var(--jp-content-line-height, 1.6) * 1.6);\n",
       "            '>\n",
       "\n",
       "            In AD 932, King Arthur and \n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>his</span>\n",
       "\n",
       "             squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin&#39;s troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) \n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>he</span>\n",
       "\n",
       "             decides not to go there because \n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>it</span>\n",
       "\n",
       "             is &quot;a silly place&quot;. As \n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>they</span>\n",
       "\n",
       "             turn away, God (an image of W. G. Grace) speaks to \n",
       "\n",
       "                <span class='mark btn-primary' style='padding:0.4em;border-radius:0.35em;background-color: #a0c4ff;color:black;'>them</span>\n",
       "             and gives Arthur the task of finding the Holy Grail.\n",
       "        </p>\n",
       "    </div>\n",
       "\n",
       "    <span style=\"font-size: 0.8em;color: #b3b3b3;\">Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.</span>\n",
       "</div>\n",
       "<script>\n",
       "    {\n",
       "        const Span = window.SpanArray.Span\n",
       "        const script_context = document.currentScript\n",
       "        const documents = []\n",
       "\n",
       "    {\n",
       "\n",
       "    const doc_spans = [[27,30],[455,457],[490,492],[516,520],[572,576]]\n",
       "    const doc_text = 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin\\'s troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is \"a silly place\". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.'\n",
       "\n",
       "        documents.push({doc_text: doc_text, doc_spans: doc_spans})\n",
       "\n",
       "    }\n",
       "\n",
       "        const instance = new window.SpanArray.SpanArray(documents, true, script_context)\n",
       "        instance.render()\n",
       "    }\n",
       "</script>\n",
       "\n"
      ],
      "text/plain": [
       "<SpanArray>\n",
       "[   [27, 30): 'his',   [455, 457): 'he',   [490, 492): 'it',\n",
       " [516, 520): 'they', [572, 576): 'them']\n",
       "Length: 5, dtype: SpanDtype"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask = pronouns_by_sentence[\"sentence\"].map(lambda s: s.covered_text).str.contains(\"Arthur\")\n",
    "pronouns_by_sentence[\"span\"][mask].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's another variation: *Pair each instance of the word \"Arthur\" with the pronouns that occur in the same sentence.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>arthur_span</th>\n",
       "      <th>pronoun_span</th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[16, 22): 'Arthur'</td>\n",
       "      <td>[27, 30): 'his'</td>\n",
       "      <td>[0, 129): 'In AD 932, King Arthur and his squi...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[362, 368): 'Arthur'</td>\n",
       "      <td>[455, 457): 'he'</td>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[362, 368): 'Arthur'</td>\n",
       "      <td>[490, 492): 'it'</td>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>[587, 593): 'Arthur'</td>\n",
       "      <td>[516, 520): 'they'</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>[587, 593): 'Arthur'</td>\n",
       "      <td>[572, 576): 'them'</td>\n",
       "      <td>[513, 629): 'As they turn away, God (an image ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            arthur_span        pronoun_span  \\\n",
       "0    [16, 22): 'Arthur'     [27, 30): 'his'   \n",
       "1  [362, 368): 'Arthur'    [455, 457): 'he'   \n",
       "2  [362, 368): 'Arthur'    [490, 492): 'it'   \n",
       "3  [587, 593): 'Arthur'  [516, 520): 'they'   \n",
       "4  [587, 593): 'Arthur'  [572, 576): 'them'   \n",
       "\n",
       "                                            sentence  \n",
       "0  [0, 129): 'In AD 932, King Arthur and his squi...  \n",
       "1  [362, 512): 'Arthur leads the men to Camelot, ...  \n",
       "2  [362, 512): 'Arthur leads the men to Camelot, ...  \n",
       "3  [513, 629): 'As they turn away, God (an image ...  \n",
       "4  [513, 629): 'As they turn away, God (an image ...  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(\n",
    "    syntax_df[syntax_df[\"span\"].array.covered_text == \"Arthur\"]  # Find instances of \"Arthur\"\n",
    "    .merge(pronouns_by_sentence, on=\"sentence\")  # Match with pronouns in the same sentence\n",
    "    .rename(columns={\"span_x\": \"arthur_span\", \"span_y\": \"pronoun_span\"})\n",
    "    [[\"arthur_span\", \"pronoun_span\", \"sentence\"]]  # Reorder columns\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Other Outputs of Watson NLU as DataFrames \n",
    "\n",
    "The examples so far have used the DataFrame representation of Watson Natural Language Understanding's syntax analysis.\n",
    "In addition to syntax analysis, Watson NLU can perform several other types of analysis. Let's take a look at the \n",
    "DataFrames that Text Extensions for Pandas can produce from the output of Watson NLU.\n",
    "\n",
    "We'll start by revisiting the results of our earlier code that ran \n",
    "```python\n",
    "dfs = tp.io.watson.nlu.parse_response(response)\n",
    "```\n",
    "over the `response` object that the Watson NLU's Python API returned. `dfs` is a dictionary of DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['syntax', 'entities', 'entity_mentions', 'keywords', 'relations', 'semantic_roles'])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"syntax\" element of `dfs` contains the syntax analysis DataFrame that we showed earlier.\n",
    "Let's take a look at the other elements."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"entities\" element of `dfs` contains the named entities that Watson Natural Language \n",
    "Understanding found in the document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>text</th>\n",
       "      <th>sentiment.label</th>\n",
       "      <th>sentiment.score</th>\n",
       "      <th>relevance</th>\n",
       "      <th>count</th>\n",
       "      <th>confidence</th>\n",
       "      <th>disambiguation.subtype</th>\n",
       "      <th>disambiguation.name</th>\n",
       "      <th>disambiguation.dbpedia_resource</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Bedevere</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.835873</td>\n",
       "      <td>0.950560</td>\n",
       "      <td>1</td>\n",
       "      <td>0.982315</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Person</td>\n",
       "      <td>King Arthur</td>\n",
       "      <td>neutral</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.720381</td>\n",
       "      <td>1</td>\n",
       "      <td>0.924937</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Person</td>\n",
       "      <td>Patsy</td>\n",
       "      <td>neutral</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.679300</td>\n",
       "      <td>1</td>\n",
       "      <td>0.830596</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Lancelot</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.835873</td>\n",
       "      <td>0.662902</td>\n",
       "      <td>1</td>\n",
       "      <td>0.956371</td>\n",
       "      <td>[MusicalArtist, TVActor]</td>\n",
       "      <td>Sir_Lancelot_%28singer%29</td>\n",
       "      <td>http://dbpedia.org/resource/Sir_Lancelot_%28si...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Galahad</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.835873</td>\n",
       "      <td>0.654170</td>\n",
       "      <td>1</td>\n",
       "      <td>0.948409</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     type          text sentiment.label  sentiment.score  relevance  count  \\\n",
       "0  Person  Sir Bedevere        positive         0.835873   0.950560      1   \n",
       "1  Person   King Arthur         neutral         0.000000   0.720381      1   \n",
       "2  Person         Patsy         neutral         0.000000   0.679300      1   \n",
       "3  Person  Sir Lancelot        positive         0.835873   0.662902      1   \n",
       "4  Person   Sir Galahad        positive         0.835873   0.654170      1   \n",
       "\n",
       "   confidence    disambiguation.subtype        disambiguation.name  \\\n",
       "0    0.982315                      None                       None   \n",
       "1    0.924937                      None                       None   \n",
       "2    0.830596                      None                       None   \n",
       "3    0.956371  [MusicalArtist, TVActor]  Sir_Lancelot_%28singer%29   \n",
       "4    0.948409                      None                       None   \n",
       "\n",
       "                     disambiguation.dbpedia_resource  \n",
       "0                                               None  \n",
       "1                                               None  \n",
       "2                                               None  \n",
       "3  http://dbpedia.org/resource/Sir_Lancelot_%28si...  \n",
       "4                                               None  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs[\"entities\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"entity_mentions\" element of `dfs` contains the locations of individual mentions of\n",
    "entities from the \"entities\" DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>text</th>\n",
       "      <th>span</th>\n",
       "      <th>confidence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Bedevere</td>\n",
       "      <td>[157, 169): 'Sir Bedevere'</td>\n",
       "      <td>0.982315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Person</td>\n",
       "      <td>King Arthur</td>\n",
       "      <td>[11, 22): 'King Arthur'</td>\n",
       "      <td>0.924937</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Person</td>\n",
       "      <td>Patsy</td>\n",
       "      <td>[39, 44): 'Patsy'</td>\n",
       "      <td>0.830596</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Lancelot</td>\n",
       "      <td>[180, 192): 'Sir Lancelot'</td>\n",
       "      <td>0.956371</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Person</td>\n",
       "      <td>Sir Galahad</td>\n",
       "      <td>[204, 215): 'Sir Galahad'</td>\n",
       "      <td>0.948409</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     type          text                        span  confidence\n",
       "0  Person  Sir Bedevere  [157, 169): 'Sir Bedevere'    0.982315\n",
       "1  Person   King Arthur     [11, 22): 'King Arthur'    0.924937\n",
       "2  Person         Patsy           [39, 44): 'Patsy'    0.830596\n",
       "3  Person  Sir Lancelot  [180, 192): 'Sir Lancelot'    0.956371\n",
       "4  Person   Sir Galahad   [204, 215): 'Sir Galahad'    0.948409"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs[\"entity_mentions\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the DataFrame under \"entitiy_mentions\" may contain multiple mentions of the same\n",
    "name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>text</th>\n",
       "      <th>span</th>\n",
       "      <th>confidence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Person</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>[362, 368): 'Arthur'</td>\n",
       "      <td>0.996876</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Person</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>[587, 593): 'Arthur'</td>\n",
       "      <td>0.973795</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      type    text                  span  confidence\n",
       "10  Person  Arthur  [362, 368): 'Arthur'    0.996876\n",
       "11  Person  Arthur  [587, 593): 'Arthur'    0.973795"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arthur_mentions = dfs[\"entity_mentions\"][dfs[\"entity_mentions\"][\"text\"] == \"Arthur\"]\n",
    "arthur_mentions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"type\" and \"text\" columns of the \"entity_mentions\" DataFrame refer back to the \n",
    "\"entities\" DataFrame columns of the same names.\n",
    "You can combine the global and local information about entities into a single DataFrame\n",
    "using Pandas' `DataFrame.merge()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>text</th>\n",
       "      <th>span</th>\n",
       "      <th>confidence_mention</th>\n",
       "      <th>sentiment.label</th>\n",
       "      <th>sentiment.score</th>\n",
       "      <th>relevance</th>\n",
       "      <th>count</th>\n",
       "      <th>confidence_entity</th>\n",
       "      <th>disambiguation.subtype</th>\n",
       "      <th>disambiguation.name</th>\n",
       "      <th>disambiguation.dbpedia_resource</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Person</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>[362, 368): 'Arthur'</td>\n",
       "      <td>0.996876</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.721919</td>\n",
       "      <td>0.311653</td>\n",
       "      <td>2</td>\n",
       "      <td>0.999918</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Person</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>[587, 593): 'Arthur'</td>\n",
       "      <td>0.973795</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.721919</td>\n",
       "      <td>0.311653</td>\n",
       "      <td>2</td>\n",
       "      <td>0.999918</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     type    text                  span  confidence_mention sentiment.label  \\\n",
       "0  Person  Arthur  [362, 368): 'Arthur'            0.996876        positive   \n",
       "1  Person  Arthur  [587, 593): 'Arthur'            0.973795        positive   \n",
       "\n",
       "   sentiment.score  relevance  count  confidence_entity  \\\n",
       "0         0.721919   0.311653      2           0.999918   \n",
       "1         0.721919   0.311653      2           0.999918   \n",
       "\n",
       "  disambiguation.subtype disambiguation.name disambiguation.dbpedia_resource  \n",
       "0                   None                None                            None  \n",
       "1                   None                None                            None  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arthur_mentions.merge(dfs[\"entities\"], on=[\"type\", \"text\"], suffixes=[\"_mention\", \"_entity\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Watson Natural Language Understanding has several other models besides the `entities` and `syntax` models. Text Extensions for Pandas can also convert these other outputs. Here's the output of the `keywords` model on our example document:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>sentiment.label</th>\n",
       "      <th>sentiment.score</th>\n",
       "      <th>relevance</th>\n",
       "      <th>emotion.sadness</th>\n",
       "      <th>emotion.joy</th>\n",
       "      <th>emotion.fear</th>\n",
       "      <th>emotion.disgust</th>\n",
       "      <th>emotion.anger</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Sir Bedevere</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.835873</td>\n",
       "      <td>0.884359</td>\n",
       "      <td>0.031301</td>\n",
       "      <td>0.496318</td>\n",
       "      <td>0.135650</td>\n",
       "      <td>0.015545</td>\n",
       "      <td>0.022961</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>King Arthur</td>\n",
       "      <td>neutral</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.850874</td>\n",
       "      <td>0.441230</td>\n",
       "      <td>0.330559</td>\n",
       "      <td>0.043714</td>\n",
       "      <td>0.020016</td>\n",
       "      <td>0.025905</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Sir Lancelot</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.835873</td>\n",
       "      <td>0.823645</td>\n",
       "      <td>0.031301</td>\n",
       "      <td>0.496318</td>\n",
       "      <td>0.135650</td>\n",
       "      <td>0.015545</td>\n",
       "      <td>0.022961</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>image of W. G. Grace</td>\n",
       "      <td>positive</td>\n",
       "      <td>0.721919</td>\n",
       "      <td>0.722026</td>\n",
       "      <td>0.044130</td>\n",
       "      <td>0.901205</td>\n",
       "      <td>0.039773</td>\n",
       "      <td>0.012838</td>\n",
       "      <td>0.027599</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>musical number</td>\n",
       "      <td>neutral</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.621432</td>\n",
       "      <td>0.312246</td>\n",
       "      <td>0.174343</td>\n",
       "      <td>0.032726</td>\n",
       "      <td>0.077707</td>\n",
       "      <td>0.045592</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   text sentiment.label  sentiment.score  relevance  \\\n",
       "0          Sir Bedevere        positive         0.835873   0.884359   \n",
       "1           King Arthur         neutral         0.000000   0.850874   \n",
       "2          Sir Lancelot        positive         0.835873   0.823645   \n",
       "3  image of W. G. Grace        positive         0.721919   0.722026   \n",
       "4        musical number         neutral         0.000000   0.621432   \n",
       "\n",
       "   emotion.sadness  emotion.joy  emotion.fear  emotion.disgust  emotion.anger  \\\n",
       "0         0.031301     0.496318      0.135650         0.015545       0.022961   \n",
       "1         0.441230     0.330559      0.043714         0.020016       0.025905   \n",
       "2         0.031301     0.496318      0.135650         0.015545       0.022961   \n",
       "3         0.044130     0.901205      0.039773         0.012838       0.027599   \n",
       "4         0.312246     0.174343      0.032726         0.077707       0.045592   \n",
       "\n",
       "   count  \n",
       "0      1  \n",
       "1      1  \n",
       "2      1  \n",
       "3      1  \n",
       "4      1  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs[\"keywords\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the notebook [Sentiment_Analysis.ipynb](./Sentiment_Analysis.ipynb) for more information on the `keywords` model and its sentiment-related outputs.\n",
    "\n",
    "Watson Natural Language Understanding also has a `relations` model that finds relationships between pairs of nouns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>sentence_span</th>\n",
       "      <th>score</th>\n",
       "      <th>arguments.0.span</th>\n",
       "      <th>arguments.1.span</th>\n",
       "      <th>arguments.0.entities.type</th>\n",
       "      <th>arguments.1.entities.type</th>\n",
       "      <th>arguments.0.entities.text</th>\n",
       "      <th>arguments.1.entities.text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>partOfMany</td>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>0.610221</td>\n",
       "      <td>[208, 215): 'Galahad'</td>\n",
       "      <td>[323, 328): 'their'</td>\n",
       "      <td>Person</td>\n",
       "      <td>Person</td>\n",
       "      <td>Galahad</td>\n",
       "      <td>their</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>partOfMany</td>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>0.710112</td>\n",
       "      <td>[266, 274): 'Lancelot'</td>\n",
       "      <td>[323, 328): 'their'</td>\n",
       "      <td>Person</td>\n",
       "      <td>Person</td>\n",
       "      <td>Lancelot</td>\n",
       "      <td>their</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>parentOf</td>\n",
       "      <td>[130, 361): 'Along the way, he recruits Sir Be...</td>\n",
       "      <td>0.382100</td>\n",
       "      <td>[323, 328): 'their'</td>\n",
       "      <td>[329, 336): 'squires'</td>\n",
       "      <td>Person</td>\n",
       "      <td>Person</td>\n",
       "      <td>their</td>\n",
       "      <td>squires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>residesIn</td>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>0.492869</td>\n",
       "      <td>[362, 368): 'Arthur'</td>\n",
       "      <td>[386, 393): 'Camelot'</td>\n",
       "      <td>Person</td>\n",
       "      <td>GeopoliticalEntity</td>\n",
       "      <td>King Arthur</td>\n",
       "      <td>Camelot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>locatedAt</td>\n",
       "      <td>[362, 512): 'Arthur leads the men to Camelot, ...</td>\n",
       "      <td>0.339446</td>\n",
       "      <td>[379, 382): 'men'</td>\n",
       "      <td>[386, 393): 'Camelot'</td>\n",
       "      <td>Person</td>\n",
       "      <td>GeopoliticalEntity</td>\n",
       "      <td>men</td>\n",
       "      <td>Camelot</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         type                                      sentence_span     score  \\\n",
       "0  partOfMany  [130, 361): 'Along the way, he recruits Sir Be...  0.610221   \n",
       "1  partOfMany  [130, 361): 'Along the way, he recruits Sir Be...  0.710112   \n",
       "2    parentOf  [130, 361): 'Along the way, he recruits Sir Be...  0.382100   \n",
       "3   residesIn  [362, 512): 'Arthur leads the men to Camelot, ...  0.492869   \n",
       "4   locatedAt  [362, 512): 'Arthur leads the men to Camelot, ...  0.339446   \n",
       "\n",
       "         arguments.0.span       arguments.1.span arguments.0.entities.type  \\\n",
       "0   [208, 215): 'Galahad'    [323, 328): 'their'                    Person   \n",
       "1  [266, 274): 'Lancelot'    [323, 328): 'their'                    Person   \n",
       "2     [323, 328): 'their'  [329, 336): 'squires'                    Person   \n",
       "3    [362, 368): 'Arthur'  [386, 393): 'Camelot'                    Person   \n",
       "4       [379, 382): 'men'  [386, 393): 'Camelot'                    Person   \n",
       "\n",
       "  arguments.1.entities.type arguments.0.entities.text  \\\n",
       "0                    Person                   Galahad   \n",
       "1                    Person                  Lancelot   \n",
       "2                    Person                     their   \n",
       "3        GeopoliticalEntity               King Arthur   \n",
       "4        GeopoliticalEntity                       men   \n",
       "\n",
       "  arguments.1.entities.text  \n",
       "0                     their  \n",
       "1                     their  \n",
       "2                   squires  \n",
       "3                   Camelot  \n",
       "4                   Camelot  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs[\"relations\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `semantic_roles` model identifies places where the document describes events and extracts a subject-verb-object triple for each such event: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>subject.text</th>\n",
       "      <th>sentence</th>\n",
       "      <th>object.text</th>\n",
       "      <th>action.verb.text</th>\n",
       "      <th>action.verb.tense</th>\n",
       "      <th>action.text</th>\n",
       "      <th>action.normalized</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>for men</td>\n",
       "      <td>In AD 932, King Arthur and his squire, Patsy, ...</td>\n",
       "      <td>the Knights of the Round Table</td>\n",
       "      <td>join</td>\n",
       "      <td>infinitive</td>\n",
       "      <td>join</td>\n",
       "      <td>join</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>he</td>\n",
       "      <td>Along the way, he recruits Sir Bedevere the Wi...</td>\n",
       "      <td>Sir Bedevere the Wise, Sir Lancelot the Brave,...</td>\n",
       "      <td>recruit</td>\n",
       "      <td>present</td>\n",
       "      <td>recruits</td>\n",
       "      <td>recruit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Arthur</td>\n",
       "      <td>Arthur leads the men to Camelot, but upon furt...</td>\n",
       "      <td>the men to Camelot</td>\n",
       "      <td>lead</td>\n",
       "      <td>present</td>\n",
       "      <td>leads</td>\n",
       "      <td>lead</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>he</td>\n",
       "      <td>Arthur leads the men to Camelot, but upon furt...</td>\n",
       "      <td>not to go there because it is \"a silly place\"</td>\n",
       "      <td>decide</td>\n",
       "      <td>present</td>\n",
       "      <td>decides</td>\n",
       "      <td>decide</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>he</td>\n",
       "      <td>Arthur leads the men to Camelot, but upon furt...</td>\n",
       "      <td>None</td>\n",
       "      <td>go</td>\n",
       "      <td>infinitive</td>\n",
       "      <td>go</td>\n",
       "      <td>go</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  subject.text                                           sentence  \\\n",
       "0      for men  In AD 932, King Arthur and his squire, Patsy, ...   \n",
       "1           he  Along the way, he recruits Sir Bedevere the Wi...   \n",
       "2       Arthur  Arthur leads the men to Camelot, but upon furt...   \n",
       "3           he  Arthur leads the men to Camelot, but upon furt...   \n",
       "4           he  Arthur leads the men to Camelot, but upon furt...   \n",
       "\n",
       "                                         object.text action.verb.text  \\\n",
       "0                     the Knights of the Round Table             join   \n",
       "1  Sir Bedevere the Wise, Sir Lancelot the Brave,...          recruit   \n",
       "2                                 the men to Camelot             lead   \n",
       "3      not to go there because it is \"a silly place\"           decide   \n",
       "4                                               None               go   \n",
       "\n",
       "  action.verb.tense action.text action.normalized  \n",
       "0        infinitive        join              join  \n",
       "1           present    recruits           recruit  \n",
       "2           present       leads              lead  \n",
       "3           present     decides            decide  \n",
       "4        infinitive          go                go  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfs[\"semantic_roles\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at our [market intelligence tutorial](../tutorials/market/Market_Intelligence_Part1.ipynb) to learn more about the `semantic_roles` model."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}