{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "db5f4f9a-7776-42b3-8758-85624d4c15ea", "metadata": { "id": "db5f4f9a-7776-42b3-8758-85624d4c15ea" }, "source": [ "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "21e9eafb", "metadata": { "id": "21e9eafb" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/80.1.Legal_Subpoenas_NER.ipynb)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "gk3kZHmNj51v", "metadata": { "collapsed": false, "id": "gk3kZHmNj51v" }, "source": [ "# ๐ŸŽฌ Installation" ] }, { "cell_type": "code", "execution_count": null, "id": "_914itZsj51v", "metadata": { "id": "_914itZsj51v", "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "! pip install -q johnsnowlabs" ] }, { "attachments": {}, "cell_type": "markdown", "id": "YPsbAnNoPt0Z", "metadata": { "id": "YPsbAnNoPt0Z" }, "source": [ "## ๐Ÿ”— Automatic Installation\n", "Using my.johnsnowlabs.com SSO" ] }, { "cell_type": "code", "execution_count": null, "id": "fY0lcShkj51w", "metadata": { "id": "fY0lcShkj51w", "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "from johnsnowlabs import nlp, finance, legal\n", "\n", "nlp.install(refresh_install=True, visual=True, force_browser = True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "hsJvn_WWM2GL", "metadata": { "id": "hsJvn_WWM2GL" }, "source": [ "## ๐Ÿ”— Manual downloading\n", "If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.\n", "\n", "- Go to my.johnsnowlabs.com\n", "- Download your license\n", "- Upload it using the following command" ] }, { "cell_type": "code", "execution_count": null, "id": "i57QV3-_P2sQ", "metadata": { "id": "i57QV3-_P2sQ" }, "outputs": [], "source": [ "from google.colab import files\n", "print('Please Upload your John Snow Labs License using the button below')\n", "license_keys = files.upload()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "xGgNdFzZP_hQ", "metadata": { "id": "xGgNdFzZP_hQ" }, "source": [ "- Install it" ] }, { "cell_type": "code", "execution_count": null, "id": "OfmmPqknP4rR", "metadata": { "id": "OfmmPqknP4rR" }, "outputs": [], "source": [ "nlp.install()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "DCl5ErZkNNLk", "metadata": { "id": "DCl5ErZkNNLk" }, "source": [ "# ๐Ÿ“Œ Starting" ] }, { "cell_type": "code", "execution_count": null, "id": "wRXTnNl3j51w", "metadata": { "id": "wRXTnNl3j51w" }, "outputs": [], "source": [ "spark = nlp.start()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "FmR3AkNEMjEO", "metadata": { "id": "FmR3AkNEMjEO" }, "source": [ "## ๐Ÿ”Ž **Legal Subpoenas NER (small)**\n", "\n", "\n", "โœExplanation:\n", "\n", " The Legal Subpoenas NER (small) statement refers to a pre-trained named entity recognition (NER) model specifically designed for legal text processing. \n", "\n", "- The `legner_subpoena` model is trained specifically to recognize and extract information related to subpoenas in legal documents. A subpoena is a legal document issued by a court that commands an individual or organization to provide specific documents, testimony, or evidence relevant to a legal case. Recognizing and extracting subpoena-related information from large volumes of legal texts can be a time-consuming task, and the legner_subpoena model is designed to automate this process.\n", "\n", "๐Ÿ“šEntities:\n", "\n", "- `ADDRESS`, `MATTER_VS`, `APPOINTMENT_HOUR`, `DOCUMENT_TOPIC`, `DOCUMENT_PERSON`, `COURT_ADDRESS`, `APPOINTMENT_DATE`, `COUNTY`, `CASE`, `SIGNER`, `COURT`, `DOCUMENT_DATE_TO`, `DOCUMENT_TYPE`, `STATE`, `DOCUMENT_DATE_FROM`, `RECEIVER`, `MATTER`, `SUBPOENA_DATE`, `DOCUMENT_DATE_YEAR`\n" ] }, { "cell_type": "code", "execution_count": 5, "id": "YfObKaqZY4kC", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YfObKaqZY4kC", "outputId": "7d9e7456-a5e6-4fc1-b18b-c33f05c73494" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "roberta_embeddings_legal_roberta_base download started this may take some time.\n", "Approximate size to download 447.2 MB\n", "[OK!]\n", "legner_subpoena download started this may take some time.\n", "[OK!]\n" ] } ], "source": [ "document = nlp.DocumentAssembler()\\\n", " .setInputCol(\"text\")\\\n", " .setOutputCol(\"document\")\n", "\n", "textSplitter = legal.TextSplitter()\\\n", " .setInputCols(['document'])\\\n", " .setOutputCol('sentence')\n", "\n", "token = nlp.Tokenizer()\\\n", " .setInputCols(['sentence'])\\\n", " .setOutputCol('token')\n", "\n", "roberta_embeddings = nlp.RoBertaEmbeddings.pretrained(\"roberta_embeddings_legal_roberta_base\",\"en\") \\\n", " .setInputCols([\"sentence\", \"token\"]) \\\n", " .setOutputCol(\"embeddings\") \\\n", " .setMaxSentenceLength(512)\n", " \n", "loaded_ner_model = legal.NerModel.pretrained('legner_subpoena','en','legal/models')\\\n", " .setInputCols([\"sentence\", \"token\", \"embeddings\"])\\\n", " .setOutputCol(\"ner\")\n", "\n", "converter = nlp.NerConverter()\\\n", " .setInputCols([\"document\", \"token\", \"ner\"])\\\n", " .setOutputCol(\"ner_span\")\n", "\n", "ner_prediction_pipeline = nlp.Pipeline(stages = [\n", " document,\n", " textSplitter,\n", " token,\n", " roberta_embeddings,\n", " loaded_ner_model,\n", " converter\n", " ])\n", "\n", "empty_data = spark.createDataFrame([['']]).toDF(\"text\")\n", "\n", "prediction_model = ner_prediction_pipeline.fit(empty_data)\n", "\n", "text = \"\"\"SUBPOENA TO PRODUCE DOCUMENTS, INFORMATION, OR OBJECTS OR TO PERMIT INSPECTION OF PREMISES IN A CIVIL ACTION\n", "\n", "UNITED STATES DISTRICT COURT\n", "DISTRICT OF NEW YORK\n", "\n", "Plaintiff: Chang Lee\n", "v.\n", "Defendant: Jie Chen\n", "\n", "To: Kim Nguyen\n", "789 Elm Street\n", "New York, NY 10003\n", "\n", "You are hereby commanded to produce at the time, date, and place set forth below the following documents, electronically stored information, or tangible things:\n", "\n", "All financial records, including bank statements, credit card statements, and tax returns for Jie Chen from January 1, 2017 to present;\n", "All emails and other correspondence between Jie Chen and any business partners, associates or employees related to the above financial records from January 1, 2017 to present;\n", "All contracts and agreements entered into by Jie Chen, including any non-disclosure agreements, from January 1, 2017 to present.\n", "The production shall occur at the following time and location:\n", "\n", "Date: August 15, 2023\n", "Time: 10:00 a.m.\n", "Location: Law Office of Lee & Associates, 456 Broadway, Suite 800, New York, NY 10003.\n", "\n", "You are further commanded to preserve and protect the confidentiality of any documents, electronically stored information, or tangible things produced or inspected, in accordance with the applicable law or agreement.\n", "\n", "You are not required to produce or permit inspection of any privileged or protected documents or information.\n", "\n", "This subpoena is issued by the court at the request of the Plaintiff's attorney, and you are hereby ordered to comply with this subpoena as provided by the Federal Rules of Civil Procedure.\n", "\n", "You must comply with this subpoena under the penalty of law.\n", "\n", "Dated: May 4, 2023\n", "\n", "[Signature of Clerk of Court]\n", "By: Sarah Johnson\n", "Deputy Clerk\"\"\"\n", "\n", "sample_data = spark.createDataFrame([[text]]).toDF(\"text\")\n", "\n", "result = prediction_model.transform(sample_data)" ] }, { "cell_type": "code", "execution_count": 6, "id": "UsjsWeSzKrjv", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UsjsWeSzKrjv", "outputId": "b1597914-c3e7-4bfb-e6cd-d3290fb76051" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+--------------+--------------------+----------+\n", "| token| ner_label|confidence|\n", "+--------------+--------------------+----------+\n", "| SUBPOENA| O| 1.0|\n", "| TO| O| 1.0|\n", "| PRODUCE| O| 0.9991|\n", "| DOCUMENTS| B-DOCUMENT_TYPE| 0.9844|\n", "| ,| O| 1.0|\n", "| INFORMATION| B-DOCUMENT_TYPE| 0.9345|\n", "| ,| O| 0.9993|\n", "| OR| O| 0.9999|\n", "| OBJECTS| O| 0.9624|\n", "| OR| O| 1.0|\n", "| TO| O| 1.0|\n", "| PERMIT| O| 1.0|\n", "| INSPECTION| O| 0.9982|\n", "| OF| O| 1.0|\n", "| PREMISES| O| 0.9966|\n", "| IN| O| 0.9999|\n", "| A| O| 0.9999|\n", "| CIVIL| O| 0.9797|\n", "| ACTION| O| 0.9995|\n", "| UNITED| O| 0.9977|\n", "| STATES| O| 0.985|\n", "| DISTRICT| O| 0.5852|\n", "| COURT| O| 0.3786|\n", "| DISTRICT| O| 0.4604|\n", "| OF| O| 0.9222|\n", "| NEW| B-STATE| 0.3463|\n", "| YORK| I-COUNTY| 0.6732|\n", "| Plaintiff| O| 0.7924|\n", "| :| O| 0.9997|\n", "| Chang| B-MATTER| 0.6147|\n", "| Lee| I-MATTER| 0.8425|\n", "| v| O| 0.98|\n", "| .| O| 0.9977|\n", "| Defendant| O| 0.7419|\n", "| :| O| 1.0|\n", "| Jie| B-RECEIVER| 0.7958|\n", "| Chen| I-RECEIVER| 0.5887|\n", "| To| O| 0.997|\n", "| :| O| 0.9992|\n", "| Kim| B-RECEIVER| 0.8381|\n", "| Nguyen| I-RECEIVER| 0.7569|\n", "| 789| B-ADDRESS| 0.9807|\n", "| Elm| I-ADDRESS| 0.9924|\n", "| Street| I-ADDRESS| 0.9932|\n", "| New| I-ADDRESS| 0.9938|\n", "| York| I-ADDRESS| 0.992|\n", "| ,| I-ADDRESS| 0.9922|\n", "| NY| I-ADDRESS| 0.9936|\n", "| 10003| I-ADDRESS| 0.9934|\n", "| You| O| 1.0|\n", "| are| O| 1.0|\n", "| hereby| O| 0.9999|\n", "| commanded| O| 0.9998|\n", "| to| O| 1.0|\n", "| produce| O| 0.9996|\n", "| at| O| 1.0|\n", "| the| O| 1.0|\n", "| time| O| 1.0|\n", "| ,| O| 1.0|\n", "| date| O| 0.9999|\n", "| ,| O| 0.9998|\n", "| and| O| 1.0|\n", "| place| O| 1.0|\n", "| set| O| 1.0|\n", "| forth| O| 1.0|\n", "| below| O| 1.0|\n", "| the| O| 1.0|\n", "| following| O| 1.0|\n", "| documents| B-DOCUMENT_TYPE| 0.9848|\n", "| ,| O| 0.9997|\n", "|electronically| B-DOCUMENT_TYPE| 0.9925|\n", "| stored| I-DOCUMENT_TYPE| 0.9401|\n", "| information| I-DOCUMENT_TYPE| 0.9836|\n", "| ,| O| 0.9996|\n", "| or| O| 0.9995|\n", "| tangible| O| 0.948|\n", "| things| O| 0.9717|\n", "| :| O| 1.0|\n", "| All| O| 0.9995|\n", "| financial| B-DOCUMENT_TYPE| 0.9791|\n", "| records| I-DOCUMENT_TYPE| 0.9918|\n", "| ,| O| 0.9999|\n", "| including| O| 1.0|\n", "| bank| B-DOCUMENT_TYPE| 0.8418|\n", "| statements| I-DOCUMENT_TYPE| 0.963|\n", "| ,| O| 0.9993|\n", "| credit| B-DOCUMENT_TYPE| 0.7652|\n", "| card| I-DOCUMENT_TYPE| 0.4994|\n", "| statements| I-DOCUMENT_TYPE| 0.9563|\n", "| ,| O| 0.9997|\n", "| and| O| 0.9999|\n", "| tax| B-DOCUMENT_TYPE| 0.8595|\n", "| returns| I-DOCUMENT_TYPE| 0.9063|\n", "| for| O| 0.9999|\n", "| Jie| B-DOCUMENT_PERSON| 0.9927|\n", "| Chen| I-DOCUMENT_PERSON| 0.9865|\n", "| from| O| 0.9997|\n", "| January|B-DOCUMENT_DATE_FROM| 0.9993|\n", "| 1|I-DOCUMENT_DATE_FROM| 0.9997|\n", "| ,|I-DOCUMENT_DATE_FROM| 0.9995|\n", "| 2017|I-DOCUMENT_DATE_FROM| 0.9981|\n", "| to| O| 0.9998|\n", "| present| O| 0.9904|\n", "| ;| O| 1.0|\n", "| All| O| 0.9907|\n", "| emails| B-DOCUMENT_TYPE| 0.9979|\n", "| and| O| 0.9999|\n", "| other| O| 0.9319|\n", "|correspondence| B-DOCUMENT_TYPE| 0.9553|\n", "| between| O| 0.9998|\n", "| Jie| B-DOCUMENT_PERSON| 0.9817|\n", "| Chen| I-DOCUMENT_PERSON| 0.9883|\n", "| and| O| 0.9998|\n", "| any| O| 0.9997|\n", "| business| B-DOCUMENT_PERSON| 0.6979|\n", "| partners| I-DOCUMENT_PERSON| 0.4181|\n", "| ,| O| 1.0|\n", "| associates| B-DOCUMENT_PERSON| 0.6085|\n", "| or| O| 0.9997|\n", "| employees| B-DOCUMENT_PERSON| 0.9321|\n", "| related| O| 0.9999|\n", "| to| O| 0.9999|\n", "| the| O| 0.9998|\n", "| above| O| 0.9998|\n", "| financial| B-DOCUMENT_TYPE| 0.4994|\n", "| records| I-DOCUMENT_TYPE| 0.6143|\n", "| from| O| 0.9997|\n", "| January|B-DOCUMENT_DATE_FROM| 0.9991|\n", "| 1|I-DOCUMENT_DATE_FROM| 0.9998|\n", "| ,|I-DOCUMENT_DATE_FROM| 0.9994|\n", "| 2017|I-DOCUMENT_DATE_FROM| 0.9959|\n", "| to| O| 1.0|\n", "| present| O| 0.9958|\n", "| ;| O| 1.0|\n", "| All| O| 0.9994|\n", "| contracts| B-DOCUMENT_TYPE| 0.9421|\n", "| and| O| 0.9998|\n", "| agreements| B-DOCUMENT_TYPE| 0.9462|\n", "| entered| O| 0.9382|\n", "| into| O| 0.9981|\n", "| by| O| 0.9998|\n", "| Jie| B-DOCUMENT_PERSON| 0.9464|\n", "| Chen| I-DOCUMENT_PERSON| 0.9799|\n", "| ,| O| 0.9996|\n", "| including| O| 1.0|\n", "| any| O| 1.0|\n", "|non-disclosure| O| 0.931|\n", "| agreements| B-DOCUMENT_TYPE| 0.3859|\n", "| ,| O| 0.9999|\n", "| from| O| 0.9998|\n", "| January|B-DOCUMENT_DATE_FROM| 0.9992|\n", "| 1|I-DOCUMENT_DATE_FROM| 0.9998|\n", "| ,|I-DOCUMENT_DATE_FROM| 0.999|\n", "| 2017|I-DOCUMENT_DATE_FROM| 0.9978|\n", "| to| O| 1.0|\n", "| present| O| 0.9993|\n", "| .| O| 0.9998|\n", "| The| O| 0.9996|\n", "| production| O| 0.9709|\n", "| shall| O| 1.0|\n", "| occur| O| 1.0|\n", "| at| O| 1.0|\n", "| the| O| 1.0|\n", "| following| O| 1.0|\n", "| time| O| 1.0|\n", "| and| O| 1.0|\n", "| location| O| 1.0|\n", "| :| O| 1.0|\n", "| Date| O| 1.0|\n", "| :| O| 1.0|\n", "| August| B-APPOINTMENT_DATE| 0.8871|\n", "| 15| I-APPOINTMENT_DATE| 0.856|\n", "| ,| I-APPOINTMENT_DATE| 0.9067|\n", "| 2023| I-APPOINTMENT_DATE| 0.9204|\n", "| Time| O| 1.0|\n", "| :| O| 1.0|\n", "| 10:00| B-APPOINTMENT_HOUR| 0.9982|\n", "| a.m| I-APPOINTMENT_HOUR| 0.9995|\n", "| .| O| 0.9653|\n", "| Location| O| 0.9998|\n", "| :| O| 1.0|\n", "| Law| O| 0.9499|\n", "| Office| O| 0.9776|\n", "| of| O| 0.9892|\n", "| Lee| B-DOCUMENT_PERSON| 0.8784|\n", "| &| I-DOCUMENT_PERSON| 0.9816|\n", "| Associates| I-DOCUMENT_PERSON| 0.9753|\n", "| ,| O| 0.9478|\n", "| 456| B-COURT_ADDRESS| 0.5613|\n", "| Broadway| I-COURT_ADDRESS| 0.7624|\n", "| ,| I-COURT_ADDRESS| 0.8556|\n", "| Suite| I-COURT_ADDRESS| 0.9617|\n", "| 800| I-COURT_ADDRESS| 0.9469|\n", "| ,| I-COURT_ADDRESS| 0.907|\n", "| New| I-COURT_ADDRESS| 0.8847|\n", "| York| I-COURT_ADDRESS| 0.8566|\n", "| ,| I-COURT_ADDRESS| 0.7641|\n", "| NY| I-COURT_ADDRESS| 0.7735|\n", "| 10003| I-COURT_ADDRESS| 0.7114|\n", "| .| O| 0.8257|\n", "+--------------+--------------------+----------+\n", "only showing top 200 rows\n", "\n" ] } ], "source": [ "from pyspark.sql import functions as F\n", "\n", "result.select(F.explode(F.arrays_zip(result.token.result, \n", " result.ner.result, \n", " result.ner.metadata)).alias(\"cols\"))\\\n", " .select(F.expr(\"cols['0']\").alias(\"token\"),\n", " F.expr(\"cols['1']\").alias(\"ner_label\"),\n", " F.expr(\"cols['2']['confidence']\").alias(\"confidence\")).show(200, truncate=100)" ] }, { "cell_type": "code", "execution_count": 7, "id": "0LVgACqoK-9r", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0LVgACqoK-9r", "outputId": "575d1293-3e82-4823-b7ea-7609da98f405" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+---------------------------------+------------------+----------+\n", "|chunk |ner_label |confidence|\n", "+---------------------------------+------------------+----------+\n", "|DOCUMENTS |DOCUMENT_TYPE |0.9844 |\n", "|INFORMATION |DOCUMENT_TYPE |0.9345 |\n", "|NEW YORK |STATE |0.50975 |\n", "|Chang Lee |MATTER |0.7286 |\n", "|Jie Chen |RECEIVER |0.69225 |\n", "|Kim Nguyen |RECEIVER |0.7975 |\n", "|789 Elm Street\n", "New York, NY 10003|ADDRESS |0.99141246|\n", "|documents |DOCUMENT_TYPE |0.9848 |\n", "|electronically stored information|DOCUMENT_TYPE |0.9720667 |\n", "|financial records |DOCUMENT_TYPE |0.98545 |\n", "|bank statements |DOCUMENT_TYPE |0.9024 |\n", "|credit card statements |DOCUMENT_TYPE |0.7403 |\n", "|tax returns |DOCUMENT_TYPE |0.8829 |\n", "|Jie Chen |DOCUMENT_PERSON |0.9896 |\n", "|January 1, 2017 |DOCUMENT_DATE_FROM|0.99915004|\n", "|emails |DOCUMENT_TYPE |0.9979 |\n", "|correspondence |DOCUMENT_TYPE |0.9553 |\n", "|Jie Chen |DOCUMENT_PERSON |0.985 |\n", "|business partners |DOCUMENT_PERSON |0.55799997|\n", "|associates |DOCUMENT_PERSON |0.6085 |\n", "+---------------------------------+------------------+----------+\n", "only showing top 20 rows\n", "\n" ] } ], "source": [ "result.select(F.explode(F.arrays_zip(result.ner_span.result, result.ner_span.metadata)).alias(\"cols\")) \\\n", " .select(F.expr(\"cols['0']\").alias(\"chunk\"),\n", " F.expr(\"cols['1']['entity']\").alias(\"ner_label\"),\n", " F.expr(\"cols['1']['confidence']\").alias(\"confidence\")).show(truncate=False)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "tUOsiWnbOEeG", "metadata": { "id": "tUOsiWnbOEeG" }, "source": [ "### ๐Ÿ–จ๏ธ **Getting Result with LightPipeline**" ] }, { "cell_type": "code", "execution_count": 8, "id": "5ceGwzl0LJrE", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 677 }, "id": "5ceGwzl0LJrE", "outputId": "18f9a575-d907-4bb5-850f-37cfa7fa1ced" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chunksbeginendsentence_identities
0DOCUMENTS20280DOCUMENT_TYPE
1INFORMATION31410DOCUMENT_TYPE
2NEW YORK1511580STATE
3Chang Lee1721800MATTER
4Jie Chen1962030RECEIVER
5Kim Nguyen2102190RECEIVER
6789 Elm Street\\nNew York, NY 100032212530ADDRESS
7documents3513590DOCUMENT_TYPE
8electronically stored information3623940DOCUMENT_TYPE
9financial records4224380DOCUMENT_TYPE
10bank statements4514650DOCUMENT_TYPE
11credit card statements4684890DOCUMENT_TYPE
12tax returns4965060DOCUMENT_TYPE
13Jie Chen5125190DOCUMENT_PERSON
14January 1, 20175265400DOCUMENT_DATE_FROM
15emails5585630DOCUMENT_TYPE
16correspondence5755880DOCUMENT_TYPE
17Jie Chen5986050DOCUMENT_PERSON
18business partners6156310DOCUMENT_PERSON
19associates6346430DOCUMENT_PERSON
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " chunks begin end sentence_id \\\n", "0 DOCUMENTS 20 28 0 \n", "1 INFORMATION 31 41 0 \n", "2 NEW YORK 151 158 0 \n", "3 Chang Lee 172 180 0 \n", "4 Jie Chen 196 203 0 \n", "5 Kim Nguyen 210 219 0 \n", "6 789 Elm Street\\nNew York, NY 10003 221 253 0 \n", "7 documents 351 359 0 \n", "8 electronically stored information 362 394 0 \n", "9 financial records 422 438 0 \n", "10 bank statements 451 465 0 \n", "11 credit card statements 468 489 0 \n", "12 tax returns 496 506 0 \n", "13 Jie Chen 512 519 0 \n", "14 January 1, 2017 526 540 0 \n", "15 emails 558 563 0 \n", "16 correspondence 575 588 0 \n", "17 Jie Chen 598 605 0 \n", "18 business partners 615 631 0 \n", "19 associates 634 643 0 \n", "\n", " entities \n", "0 DOCUMENT_TYPE \n", "1 DOCUMENT_TYPE \n", "2 STATE \n", "3 MATTER \n", "4 RECEIVER \n", "5 RECEIVER \n", "6 ADDRESS \n", "7 DOCUMENT_TYPE \n", "8 DOCUMENT_TYPE \n", "9 DOCUMENT_TYPE \n", "10 DOCUMENT_TYPE \n", "11 DOCUMENT_TYPE \n", "12 DOCUMENT_TYPE \n", "13 DOCUMENT_PERSON \n", "14 DOCUMENT_DATE_FROM \n", "15 DOCUMENT_TYPE \n", "16 DOCUMENT_TYPE \n", "17 DOCUMENT_PERSON \n", "18 DOCUMENT_PERSON \n", "19 DOCUMENT_PERSON " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "light_model = nlp.LightPipeline(prediction_model)\n", "\n", "light_result = light_model.fullAnnotate(text)\n", "\n", "\n", "chunks = []\n", "entities = []\n", "sentence= []\n", "begin = []\n", "end = []\n", "\n", "for n in light_result[0]['ner_span']:\n", " \n", " begin.append(n.begin)\n", " end.append(n.end)\n", " chunks.append(n.result)\n", " entities.append(n.metadata['entity']) \n", " sentence.append(n.metadata['sentence'])\n", " \n", " \n", "\n", "df = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end, \n", " 'sentence_id':sentence, 'entities':entities})\n", "\n", "df.head(20)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9kld4JdpOQLP", "metadata": { "id": "9kld4JdpOQLP" }, "source": [ "###๐Ÿ“Œ **NER Visualizer**\n", "For saving the visualization result as html, provide save_path parameter in the display function." ] }, { "cell_type": "code", "execution_count": 9, "id": "ZDQdpZesLo8X", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "ZDQdpZesLo8X", "outputId": "ff216bbd-3815-461d-98b9-2e3c616ee8e0" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " SUBPOENA TO PRODUCE DOCUMENTS DOCUMENT_TYPE, INFORMATION DOCUMENT_TYPE, OR OBJECTS OR TO PERMIT INSPECTION OF PREMISES IN A CIVIL ACTION

UNITED STATES DISTRICT COURT
DISTRICT OF
NEW YORK STATE

Plaintiff:
Chang Lee MATTER
v.
Defendant:
Jie Chen RECEIVER

To:
Kim Nguyen RECEIVER
789 Elm Street
New York, NY 10003
ADDRESS


You are hereby commanded to produce at the time, date, and place set forth below the following
documents DOCUMENT_TYPE, electronically stored information DOCUMENT_TYPE, or tangible things:

All
financial records DOCUMENT_TYPE, including bank statements DOCUMENT_TYPE, credit card statements DOCUMENT_TYPE, and tax returns DOCUMENT_TYPE for Jie Chen DOCUMENT_PERSON from January 1, 2017 DOCUMENT_DATE_FROM to present;
All
emails DOCUMENT_TYPE and other correspondence DOCUMENT_TYPE between Jie Chen DOCUMENT_PERSON and any business partners DOCUMENT_PERSON, associates DOCUMENT_PERSON or employees DOCUMENT_PERSON related to the above financial records DOCUMENT_TYPE from January 1, 2017 DOCUMENT_DATE_FROM to present;
All
contracts DOCUMENT_TYPE and agreements DOCUMENT_TYPE entered into by Jie Chen DOCUMENT_PERSON, including any non-disclosure agreements DOCUMENT_TYPE, from January 1, 2017 DOCUMENT_DATE_FROM to present.
The production shall occur at the following time and location:

Date:
August 15, 2023 APPOINTMENT_DATE
Time:
10:00 a.m APPOINTMENT_HOUR.
Location: Law Office of
Lee & Associates DOCUMENT_PERSON, 456 Broadway, Suite 800, New York, NY 10003 COURT_ADDRESS.

You are further commanded to preserve and protect the confidentiality of any
documents DOCUMENT_TYPE, electronically stored information DOCUMENT_TYPE, or tangible things produced or inspected, in accordance with the applicable law or agreement DOCUMENT_TYPE.

You are not required to produce or permit inspection of any privileged or protected
documents DOCUMENT_TYPE or information DOCUMENT_TYPE.

This subpoena is issued by the court at the request of the
Plaintiff's DOCUMENT_PERSON attorney DOCUMENT_PERSON, and you are hereby ordered to comply with this subpoena as provided by the Federal Rules of Civil Procedure.

You must comply with this subpoena under the penalty of law.

Dated:
May 4, 2023 SUBPOENA_DATE

[Signature of Clerk of Court]
By:
Sarah Johnson SIGNER
Deputy Clerk
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# from sparknlp_display import NerVisualizer\n", "\n", "visualiser = nlp.viz.NerVisualizer()\n", "\n", "visualiser.display(light_result[0], label_col='ner_span', document_col='document')" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "machine_shape": "hm", "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "tf-gpu", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)]" }, "vscode": { "interpreter": { "hash": "3f47d918ae832c68584484921185f5c85a1760864bf927a683dc6fb56366cc77" } } }, "nbformat": 4, "nbformat_minor": 5 }