{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<small><small><i>\n",
    "All the IPython Notebooks in **[Python Natural Language Processing](https://github.com/milaan9/Python_Python_Natural_Language_Processing)** lecture series by **[Dr. Milaan Parmar](https://www.linkedin.com/in/milaanparmar/)** are available @ **[GitHub](https://github.com/milaan9)**\n",
    "</i></small></small>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/milaan9/Python_Python_Natural_Language_Processing/blob/main/06_Named_Entity_Recognition.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C5SsYvaIAjXd"
   },
   "source": [
    "# 06 Named Entity Recognition (NER)\n",
    "(also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes\n",
    "\n",
    "spaCy has an **'ner'** pipeline component that identifies token spans fitting a predetermined set of named entities. These are available as the **`ents`** property of a **`Doc`** object.\n",
    "\n",
    "https://spacy.io/usage/training#ner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "oMxIwzuVAjXe"
   },
   "outputs": [],
   "source": [
    "# Perform standard imports\n",
    "import spacy\n",
    "nlp = spacy.load('en_core_web_sm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "5egxj7m2AjXj"
   },
   "outputs": [],
   "source": [
    "# Write a function to display basic entity info:\n",
    "def show_ents(doc):\n",
    "    if doc.ents:\n",
    "        for ent in doc.ents:\n",
    "            print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))\n",
    "    else:\n",
    "        print('No named entities found.')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "PU_SegPsPbNm",
    "outputId": "71fb6119-64f8-4e91-823b-96ea9ab81937"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Milaan Parmar CS - ORG - Companies, agencies, institutions, etc.\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'Hi, everyone welcome to Milaan Parmar CS tutorial on NPL')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "hOE3UPZ5AjXo",
    "outputId": "89e61213-11e1-4ba3-8144-fa101ddc4053",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "England - GPE - Countries, cities, states\n",
      "Canada - GPE - Countries, cities, states\n",
      "next month - DATE - Absolute or relative dates or periods\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'May I go to England or Canada, next month to see the virus report?')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YYSnge4iAjXx"
   },
   "source": [
    "## Entity annotations\n",
    "`Doc.ents` are token spans with their own set of annotations.\n",
    "<table>\n",
    "<tr><td>`ent.text`</td><td>The original entity text</td></tr>\n",
    "<tr><td>`ent.label`</td><td>The entity type's hash value</td></tr>\n",
    "<tr><td>`ent.label_`</td><td>The entity type's string description</td></tr>\n",
    "<tr><td>`ent.start`</td><td>The token span's *start* index position in the Doc</td></tr>\n",
    "<tr><td>`ent.end`</td><td>The token span's *stop* index position in the Doc</td></tr>\n",
    "<tr><td>`ent.start_char`</td><td>The entity text's *start* index position in the Doc</td></tr>\n",
    "<tr><td>`ent.end_char`</td><td>The entity text's *stop* index position in the Doc</td></tr>\n",
    "</table>\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "CU0mEsoPAjXx",
    "outputId": "cb31c06a-fba3-4ad2-f376-42afa0bee2fd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "500 dollars 4 6 20 31 MONEY\n",
      "Blake 7 8 37 42 PERSON\n",
      "Microsoft 11 12 55 64 ORG\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'Can I please borrow 500 dollars from Blake to buy some Microsoft stock?')\n",
    "\n",
    "for ent in doc.ents:\n",
    "    print(ent.text, ent.start, ent.end, ent.start_char, ent.end_char, ent.label_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cvNQ5nFGAjX1"
   },
   "source": [
    "## NER Tags\n",
    "Tags are accessible through the `.label_` property of an entity.\n",
    "<table>\n",
    "<tr><th>TYPE</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>\n",
    "<tr><td>`PERSON`</td><td>People, including fictional.</td><td>*Fred Flintstone*</td></tr>\n",
    "<tr><td>`NORP`</td><td>Nationalities or religious or political groups.</td><td>*The Republican Party*</td></tr>\n",
    "<tr><td>`FAC`</td><td>Buildings, airports, highways, bridges, etc.</td><td>*Logan International Airport, The Golden Gate*</td></tr>\n",
    "<tr><td>`ORG`</td><td>Companies, agencies, institutions, etc.</td><td>*Microsoft, FBI, MIT*</td></tr>\n",
    "<tr><td>`GPE`</td><td>Countries, cities, states.</td><td>*France, UAR, Chicago, Idaho*</td></tr>\n",
    "<tr><td>`LOC`</td><td>Non-GPE locations, mountain ranges, bodies of water.</td><td>*Europe, Nile River, Midwest*</td></tr>\n",
    "<tr><td>`PRODUCT`</td><td>Objects, vehicles, foods, etc. (Not services.)</td><td>*Formula 1*</td></tr>\n",
    "<tr><td>`EVENT`</td><td>Named hurricanes, battles, wars, sports events, etc.</td><td>*Olympic Games*</td></tr>\n",
    "<tr><td>`WORK_OF_ART`</td><td>Titles of books, songs, etc.</td><td>*The Mona Lisa*</td></tr>\n",
    "<tr><td>`LAW`</td><td>Named documents made into laws.</td><td>*Roe v. Wade*</td></tr>\n",
    "<tr><td>`LANGUAGE`</td><td>Any named language.</td><td>*English*</td></tr>\n",
    "<tr><td>`DATE`</td><td>Absolute or relative dates or periods.</td><td>*20 July 1969*</td></tr>\n",
    "<tr><td>`TIME`</td><td>Times smaller than a day.</td><td>*Four hours*</td></tr>\n",
    "<tr><td>`PERCENT`</td><td>Percentage, including \"%\".</td><td>*Eighty percent*</td></tr>\n",
    "<tr><td>`MONEY`</td><td>Monetary values, including unit.</td><td>*Twenty Cents*</td></tr>\n",
    "<tr><td>`QUANTITY`</td><td>Measurements, as of weight or distance.</td><td>*Several kilometers, 55kg*</td></tr>\n",
    "<tr><td>`ORDINAL`</td><td>\"first\", \"second\", etc.</td><td>*9th, Ninth*</td></tr>\n",
    "<tr><td>`CARDINAL`</td><td>Numerals that do not fall under another type.</td><td>*2, Two, Fifty-two*</td></tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gvqo94OAAjX2"
   },
   "source": [
    "___\n",
    "### **Adding a Named Entity to a Span**\n",
    "Normally we would have spaCy build a library of named entities by training it on several samples of text.<br>In this case, we only want to add one value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "6QFBkrcqAjX2",
    "outputId": "1f9083c2-ad2a-474c-90a1-de6ef81219e3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Arthur - ORG - Companies, agencies, institutions, etc.\n",
      "U.K. - GPE - Countries, cities, states\n",
      "$6 million - MONEY - Monetary values, including unit\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'Arthur to build a U.K. factory for $6 million')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Etm6Z9CWGeRA"
   },
   "source": [
    "Add **Milaan** as **PERSON**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 248
    },
    "id": "G_8tbV5rAjX7",
    "outputId": "a19effa8-2ae6-4e38-c4cd-a15c0e92168d"
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "ignored",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-7-f76bf8be7547>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;31m# Add the entity to the existing Doc object\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0mdoc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0ments\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0ments\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mnew_ent\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32mdoc.pyx\u001b[0m in \u001b[0;36mspacy.tokens.doc.Doc.ents.__set__\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: [E103] Trying to set conflicting doc.ents: '(0, 1, 'ORG')' and '(0, 1, 'PERSON')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap."
     ]
    }
   ],
   "source": [
    "from spacy.tokens import Span\n",
    "\n",
    "# Get the hash value of the ORG entity label\n",
    "ORG = doc.vocab.strings[u'PERSON']  \n",
    "\n",
    "# Create a Span for the new entity\n",
    "new_ent = Span(doc, 0, 1, label=ORG)\n",
    "\n",
    "# Add the entity to the existing Doc object\n",
    "doc.ents = list(doc.ents) + [new_ent]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "unAVoOAxAjX_"
   },
   "source": [
    "<font color=green>In the code above, the arguments passed to `Span()` are:</font>\n",
    "-  `doc` - the name of the Doc object\n",
    "-  `0` - the *start* index position of the span\n",
    "-  `1` - the *stop* index position (exclusive)\n",
    "-  `label=PERSON` - the label assigned to our entity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "JtoYKMknAjYA",
    "outputId": "aaf8d3ee-2655-432b-d70a-dc79208af5a0",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Arthur - ORG - Companies, agencies, institutions, etc.\n",
      "U.K. - GPE - Countries, cities, states\n",
      "$6 million - MONEY - Monetary values, including unit\n"
     ]
    }
   ],
   "source": [
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nPNZAomgAjYD"
   },
   "source": [
    "___\n",
    "## Adding Named Entities to All Matching Spans\n",
    "What if we want to tag *all* occurrences of \"WORDS\"? WE NEED TO use the PhraseMatcher to identify a series of spans in the Doc:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "eocm1Oq9AjYE",
    "outputId": "2a4a0f1d-a6f9-4256-df81-54d90528ddcc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "first - ORDINAL - \"first\", \"second\", etc.\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'Our company plans to introduce a new vacuum cleaner. '\n",
    "          u'If successful, the vacuum cleaner will be our first product.')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "x_ZfN0WJAjYH"
   },
   "outputs": [],
   "source": [
    "# Import PhraseMatcher and create a matcher object:\n",
    "from spacy.matcher import PhraseMatcher\n",
    "matcher = PhraseMatcher(nlp.vocab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "WwTZbId2AjYK"
   },
   "outputs": [],
   "source": [
    "# Create the desired phrase patterns:\n",
    "phrase_list = ['vacuum cleaner', 'vacuum-cleaner']\n",
    "phrase_patterns = [nlp(text) for text in phrase_list]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "zdvup9bOAjYN",
    "outputId": "c4eff7bc-0341-4460-98a5-2a333ea26820"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(2689272359382549672, 7, 9), (2689272359382549672, 14, 16)]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Apply the patterns to our matcher object:\n",
    "matcher.add('newproduct', None, *phrase_patterns)\n",
    "\n",
    "# Apply the matcher to our Doc object:\n",
    "matches = matcher(doc)\n",
    "\n",
    "# See what matches occur:\n",
    "matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "mcRv7ezzAjYS"
   },
   "outputs": [],
   "source": [
    "# Here we create Spans from each match, and create named entities from them:\n",
    "from spacy.tokens import Span\n",
    "\n",
    "PROD = doc.vocab.strings[u'PRODUCT']\n",
    "\n",
    "new_ents = [Span(doc, match[1],match[2],label=PROD) for match in matches]\n",
    "\n",
    "doc.ents = list(doc.ents) + new_ents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Hn77OkeNAjYW",
    "outputId": "0f5c8750-5c9b-404b-fc87-313dfda6edcc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vacuum cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)\n",
      "vacuum cleaner - PRODUCT - Objects, vehicles, foods, etc. (not services)\n",
      "first - ORDINAL - \"first\", \"second\", etc.\n"
     ]
    }
   ],
   "source": [
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6WLXgXDYAjYa"
   },
   "source": [
    "___\n",
    "## Counting Entities\n",
    "While spaCy may not have a built-in tool for counting entities, we can pass a conditional statement into a list comprehension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Y4AQj2yfAjYb",
    "outputId": "0df969e7-a578-44e9-e888-23aee6d168fb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "29.50 - MONEY - Monetary values, including unit\n",
      "five dollars - MONEY - Monetary values, including unit\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u'Originally priced at $29.50, the sweater was marked down to five dollars.')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NPr5Q-eWAjYi",
    "outputId": "ffe19b4c-3f43-4bc2-e498-50eb6f6b7df6"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len([ent for ent in doc.ents if ent.label_=='MONEY'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "id": "xw0U1DC5AjYp",
    "outputId": "180a517b-e326-4cac-f735-095072785cb8"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "type": "string"
      },
      "text/plain": [
       "'2.2.4'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "spacy.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MYzkCgmfAjYu",
    "outputId": "4ed899d6-9e4c-403a-cf13-4a649309b4e0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "29.50 - MONEY - Monetary values, including unit\n",
      "five dollars - MONEY - Monetary values, including unit\n"
     ]
    }
   ],
   "source": [
    "\n",
    "doc = nlp(u'Originally priced at $29.50,\\nthe sweater was marked down to five dollars.')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8E0XnrraAjYy"
   },
   "source": [
    "### <font color=blue>However, there is a simple fix that can be added to the nlp pipeline:</font>\n",
    "\n",
    "https://spacy.io/usage/processing-pipelines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "jkNfED2IAjYz"
   },
   "outputs": [],
   "source": [
    "# Quick function to remove ents formed on whitespace:\n",
    "def remove_whitespace_entities(doc):\n",
    "    doc.ents = [e for e in doc.ents if not e.text.isspace()]\n",
    "    return doc\n",
    "\n",
    "# Insert this into the pipeline AFTER the ner component:\n",
    "nlp.add_pipe(remove_whitespace_entities, after='ner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "KyDsaxhFAjY2",
    "outputId": "2fc120a8-161d-47e0-e118-d7578e55f2ba"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "29.50 - MONEY - Monetary values, including unit\n",
      "five dollars - MONEY - Monetary values, including unit\n"
     ]
    }
   ],
   "source": [
    "# Rerun nlp on the text above, and show ents:\n",
    "doc = nlp(u'Originally priced at $29.50,\\nthe sweater was marked down to five dollars.')\n",
    "\n",
    "show_ents(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5nr_1GyZAjY7"
   },
   "source": [
    "For more on **Named Entity Recognition** visit https://spacy.io/usage/linguistic-features#101"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_dcHMWK9AjY8"
   },
   "source": [
    "___\n",
    "## Noun Chunks\n",
    "`Doc.noun_chunks` are *base noun phrases*: token spans that include the noun and words describing the noun. Noun chunks cannot be nested, cannot overlap, and do not involve prepositional phrases or relative clauses.<br>\n",
    "Where `Doc.ents` rely on the **ner** pipeline component, `Doc.noun_chunks` are provided by the **parser**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YUtcqSTZAjY8"
   },
   "source": [
    "### `noun_chunks` components:\n",
    "<table>\n",
    "<tr><td>`.text`</td><td>The original noun chunk text.</td></tr>\n",
    "<tr><td>`.root.text`</td><td>The original text of the word connecting the noun chunk to the rest of the parse.</td></tr>\n",
    "<tr><td>`.root.dep_`</td><td>Dependency relation connecting the root to its head.</td></tr>\n",
    "<tr><td>`.root.head.text`</td><td>The text of the root token's head.</td></tr>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "v6cVi0G5AjY9",
    "outputId": "f9355e51-e6a4-44c4-a997-ae6dac031c37"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Autonomous cars - cars - nsubj - shift\n",
      "insurance liability - liability - dobj - shift\n",
      "manufacturers - manufacturers - pobj - toward\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(u\"Autonomous cars shift insurance liability toward manufacturers.\")\n",
    "\n",
    "for chunk in doc.noun_chunks:\n",
    "    print(chunk.text+' - '+chunk.root.text+' - '+chunk.root.dep_+' - '+chunk.root.head.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "F7BZ5FqMAjZA"
   },
   "source": [
    "### `Doc.noun_chunks` is a  generator function\n",
    "Previously we mentioned that `Doc` objects do not retain a list of sentences, but they're available through the `Doc.sents` generator.<br>It's the same with `Doc.noun_chunks` - lists can be created if needed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 163
    },
    "id": "0vZ5biA4AjZA",
    "outputId": "156aac33-eff2-45bf-8420-dacb2bfb9b6d"
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "ignored",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-22-8b52b37c204e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnoun_chunks\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: object of type 'generator' has no len()"
     ]
    }
   ],
   "source": [
    "len(doc.noun_chunks)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Wb-4fGA4AjZF",
    "outputId": "82adf8ab-b87f-4c00-8b52-ae3fb3ce7b5a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(list(doc.noun_chunks))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-HUINE9GAjZK"
   },
   "source": [
    "For more on **noun_chunks** visit https://spacy.io/usage/linguistic-features#noun-chunks"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "6_Named_Entity_Recognition.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}