{ "cells": [ { "cell_type": "markdown", "id": "lzFxcBXcLkl4", "metadata": { "id": "lzFxcBXcLkl4" }, "source": [ "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)" ] }, { "cell_type": "markdown", "id": "GjVryg8IMrpz", "metadata": { "id": "GjVryg8IMrpz" }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/06.1.Relation_Extraction_and_ZeroShotRE.ipynb)" ] }, { "cell_type": "markdown", "id": "gk3kZHmNj51v", "metadata": { "collapsed": false, "id": "gk3kZHmNj51v" }, "source": [ "#🎬 Installation" ] }, { "cell_type": "code", "execution_count": null, "id": "_914itZsj51v", "metadata": { "id": "_914itZsj51v", "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "! pip install -q johnsnowlabs" ] }, { "cell_type": "markdown", "id": "YPsbAnNoPt0Z", "metadata": { "id": "YPsbAnNoPt0Z" }, "source": [ "##🔗 Automatic Installation\n", "Using my.johnsnowlabs.com SSO" ] }, { "cell_type": "code", "execution_count": null, "id": "fY0lcShkj51w", "metadata": { "id": "fY0lcShkj51w", "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "from johnsnowlabs import nlp, legal\n", "\n", "# nlp.install(force_browser=True)" ] }, { "cell_type": "markdown", "id": "hsJvn_WWM2GL", "metadata": { "id": "hsJvn_WWM2GL" }, "source": [ "##🔗 Manual downloading\n", "If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.\n", "\n", "- Go to my.johnsnowlabs.com\n", "- Download your license\n", "- Upload it using the following command" ] }, { "cell_type": "code", "execution_count": null, "id": "i57QV3-_P2sQ", "metadata": { "id": "i57QV3-_P2sQ" }, "outputs": [], "source": [ "from google.colab import files\n", "print('Please Upload your John Snow Labs License using the button below')\n", "license_keys = files.upload()" ] }, { "cell_type": "markdown", "id": "xGgNdFzZP_hQ", "metadata": { "id": "xGgNdFzZP_hQ" }, "source": [ "- Install it" ] }, { "cell_type": "code", "execution_count": null, "id": "OfmmPqknP4rR", "metadata": { "id": "OfmmPqknP4rR" }, "outputs": [], "source": [ "nlp.install()" ] }, { "cell_type": "markdown", "id": "DCl5ErZkNNLk", "metadata": { "id": "DCl5ErZkNNLk" }, "source": [ "#📌 Starting" ] }, { "cell_type": "code", "execution_count": null, "id": "wRXTnNl3j51w", "metadata": { "id": "wRXTnNl3j51w" }, "outputs": [], "source": [ "spark = nlp.start()" ] }, { "cell_type": "markdown", "id": "f-XE0f0XLmqY", "metadata": { "id": "f-XE0f0XLmqY" }, "source": [ "#🔎 Legal Relation Extraction(RE) and Zero-shot Relation Extraction" ] }, { "cell_type": "markdown", "id": "RzMC8vF3lHNa", "metadata": { "id": "RzMC8vF3lHNa" }, "source": [ "Legal relation extraction is a task in natural language processing (NLP) that involves extracting relationships between entities in legal documents. These relationships can be between people, organizations, or legal concepts.\n", "\n", "Legal relation extraction is useful for a variety of purposes, including legal research, contract analysis, and legal case management. For example, legal relation extraction can be used to identify relationships between parties in a contract, such as the buyer and seller, or to extract clauses in a contract that outline certain obligations or rights." ] }, { "cell_type": "markdown", "id": "R5HblWBolMg4", "metadata": { "id": "R5HblWBolMg4" }, "source": [ "##✔️ Pretrained Relation Extraction Models and Pipelines for Legal\n", "\n", "Here are the list of pretrained Relation Extraction models and pipelines:" ] }, { "cell_type": "markdown", "id": "1KIZMcuYlSzd", "metadata": { "id": "1KIZMcuYlSzd" }, "source": [ "**Relation Extraction Models**\n", "\n", "|index|model|\n", "|-----:|:-----|\n", "| 1| [Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/08/12/legre_contract_doc_parties_en_3_2.html) | \n", "| 2| [Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Medium, Undirectional)](https://nlp.johnsnowlabs.com/2022/11/02/legre_contract_doc_parties_md_en.html) | \n", "| 3| [Legal Relation Extraction (Alias)](https://nlp.johnsnowlabs.com/2022/08/17/legre_org_prod_alias_en_3_2.html) |\n", "| 4| [Legal Relation Extraction (Whereas) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/08/24/legre_whereas_en.html) | \n", "| 5| [Legal Relation Extraction (Whereas) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_whereas_md_en.html) | \n", "| 6| [Legal Relation Extraction (Indemnification) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/09/28/legre_indemnifications_en.html) |\n", "| 7| [Legal Relation Extraction (Indemnification) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_indemnifications_md_en.html) | \n", "| 8| [Legal Relation Extraction (Confidentiality) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/10/18/legre_confidentiality_en.html) |\n", "| 9| [Legal Relation Extraction (Confidentiality) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_confidentiality_md_en.html) |\n", "| 10| [Legal Relation Extraction (Warranty)](https://nlp.johnsnowlabs.com/2022/10/19/legre_warranty_en.html) |\n", "| 11| [Legal Relation Extraction (Grants) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_grants_md_en.html) |\n", "| 12| [(Obligations) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/03/legre_obligations_md_en.html) |\n", "| 13| [Legal Relation Extraction (Notice Clause)](https://nlp.johnsnowlabs.com/2022/12/17/legre_notice_clause_xs_en.html) |\n", "| 14| [Legal Zero-shot Relation Extraction](https://nlp.johnsnowlabs.com/2022/08/22/legre_zero_shot_en_3_2.html) |\n", "| 15| [Pretrained Pipeline(Whereas)](https://nlp.johnsnowlabs.com/2022/08/24/legpipe_whereas_en.html) |\n" ] }, { "cell_type": "markdown", "id": "nPvog1bUUIc2", "metadata": { "id": "nPvog1bUUIc2" }, "source": [ "##✔️ Relation Extraction Model to Infer Relations Between Elements in OBLIGATIONS-like sentences" ] }, { "cell_type": "markdown", "id": "nj_8AwKZZEUW", "metadata": { "id": "nj_8AwKZZEUW" }, "source": [ "An obligation sentence in a legal agreement is a provision that specifies the duties, responsibilities, and obligations of one or more parties to the agreement. These sentences are used to outline the specific actions that a party must take or refrain from taking in order to fulfill their obligations under the agreement.They are an important part of any legal agreement, as they help to ensure that the parties understand and agree to their respective roles and responsibilities." ] }, { "cell_type": "markdown", "id": "zLs8RCciZMPH", "metadata": { "id": "zLs8RCciZMPH" }, "source": [ "📚We understand an `obligation` as a sentence or sentences in which a Party **OBLIGATION_SUBJECT** must do **OBLIGATION_ACITON** something **OBLIGATION_OBJECT** to other Party **OBLIGATION_INDIRECT_OBJECT**." ] }, { "cell_type": "code", "execution_count": null, "id": "n6EDSiPlVOri", "metadata": { "id": "n6EDSiPlVOri" }, "outputs": [], "source": [ "# Create Generic Function to Show Relations in Dataframe\n", "\n", "import pandas as pd\n", "def get_relations_df (results, col='relations'):\n", " rel_pairs=[]\n", " for i in range(len(results)):\n", " for rel in results[i][col]:\n", " rel_pairs.append((\n", " rel.result, \n", " rel.metadata['entity1'], \n", " rel.metadata['entity1_begin'],\n", " rel.metadata['entity1_end'],\n", " rel.metadata['chunk1'], \n", " rel.metadata['entity2'],\n", " rel.metadata['entity2_begin'],\n", " rel.metadata['entity2_end'],\n", " rel.metadata['chunk2'], \n", " rel.metadata['confidence']\n", " ))\n", " rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])\n", " return rel_df" ] }, { "cell_type": "code", "execution_count": null, "id": "27cY0DlVURBl", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "27cY0DlVURBl", "outputId": "ea2503c2-947d-4b95-d383-c7d26b506359" }, "outputs": [ { "data": { "text/plain": [ "['In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as \"OTHER TAXES\").',\n", " 'Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark.']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_text = [\"\"\"In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as \"OTHER TAXES\").\"\"\",\n", " \n", " \"\"\"Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark.\"\"\"]\n", "\n", "sample_text " ] }, { "cell_type": "code", "execution_count": null, "id": "Afc4zElaZzPD", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Afc4zElaZzPD", "outputId": "c2f76a21-8ca2-429d-9962-ad72266de8ab" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "legner_obligations download started this may take some time.\n", "[OK!]\n", "legre_obligations_md download started this may take some time.\n", "[OK!]\n" ] } ], "source": [ "document_assembler = nlp.DocumentAssembler()\\\n", " .setInputCol(\"text\")\\\n", " .setOutputCol(\"document\")\n", "\n", "tokenizer = nlp.Tokenizer()\\\n", " .setInputCols(\"document\")\\\n", " .setOutputCol(\"token\")\n", "\n", "ner_model = legal.BertForTokenClassification.pretrained(\"legner_obligations\", \"en\", \"legal/models\")\\\n", " .setInputCols(\"token\", \"document\")\\\n", " .setOutputCol(\"ner\")\\\n", " .setMaxSentenceLength(512)\\\n", " .setCaseSensitive(True)\n", "\n", "ner_converter = nlp.NerConverter()\\\n", " .setInputCols([\"document\",\"token\",\"ner\"])\\\n", " .setOutputCol(\"ner_chunk\")\n", "\n", "re_model = legal.RelationExtractionDLModel().pretrained(\"legre_obligations_md\", \"en\", \"legal/models\")\\\n", " .setPredictionThreshold(0.4)\\\n", " .setInputCols([\"ner_chunk\", \"document\"])\\\n", " .setOutputCol(\"relations\")\n", "\n", "pipeline = nlp.Pipeline(stages=[\n", " document_assembler, \n", " tokenizer,\n", " ner_model,\n", " ner_converter,\n", " re_model\n", "])\n", "\n", "empty_df = spark.createDataFrame([[\"\"]]).toDF(\"text\")\n", "\n", "model = pipeline.fit(empty_df)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2SPBXErsaLcv", "metadata": { "id": "2SPBXErsaLcv" }, "outputs": [], "source": [ "light_model = nlp.LightPipeline(model)\n", "\n", "result = light_model.fullAnnotate(sample_text)" ] }, { "cell_type": "code", "execution_count": null, "id": "xyZ0uwpAaWrR", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 332 }, "id": "xyZ0uwpAaWrR", "outputId": "6ce5265a-0c77-4326-ca2f-cb698a49aae9" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
relationentity1entity1_beginentity1_endchunk1entity2entity2_beginentity2_endchunk2confidence
0is_obliged_toOBLIGATION_ACTION2738agree to payOBLIGATION_SUBJECT1325the Borrowers0.9983413
1is_obliged_toOBLIGATION_SUBJECT1325the BorrowersOBLIGATION40143any present or future stamp or documentary tax...0.46110857
2is_obliged_objectOBLIGATION_ACTION2738agree to payOBLIGATION40143any present or future stamp or documentary tax...0.9991379
3is_obliged_toOBLIGATION_ACTION938agrees to reasonably cooperateOBLIGATION_SUBJECT07Licensee0.9090177
4is_obliged_withOBLIGATION_SUBJECT07LicenseeOBLIGATION_INDIRECT_OBJECT4552Licensor0.8136201
5is_obliged_toOBLIGATION54100in achieving registration of the Licensed Mark.OBLIGATION_SUBJECT07Licensee0.86316615
6is_obliged_objectOBLIGATION_ACTION938agrees to reasonably cooperateOBLIGATION_INDIRECT_OBJECT4552Licensor0.96135247
7is_obliged_objectOBLIGATION_ACTION938agrees to reasonably cooperateOBLIGATION54100in achieving registration of the Licensed Mark.0.82649904
8is_obliged_toOBLIGATION_INDIRECT_OBJECT4552LicensorOBLIGATION54100in achieving registration of the Licensed Mark.0.9142798
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " relation entity1 entity1_begin entity1_end \\\n", "0 is_obliged_to OBLIGATION_ACTION 27 38 \n", "1 is_obliged_to OBLIGATION_SUBJECT 13 25 \n", "2 is_obliged_object OBLIGATION_ACTION 27 38 \n", "3 is_obliged_to OBLIGATION_ACTION 9 38 \n", "4 is_obliged_with OBLIGATION_SUBJECT 0 7 \n", "5 is_obliged_to OBLIGATION 54 100 \n", "6 is_obliged_object OBLIGATION_ACTION 9 38 \n", "7 is_obliged_object OBLIGATION_ACTION 9 38 \n", "8 is_obliged_to OBLIGATION_INDIRECT_OBJECT 45 52 \n", "\n", " chunk1 \\\n", "0 agree to pay \n", "1 the Borrowers \n", "2 agree to pay \n", "3 agrees to reasonably cooperate \n", "4 Licensee \n", "5 in achieving registration of the Licensed Mark. \n", "6 agrees to reasonably cooperate \n", "7 agrees to reasonably cooperate \n", "8 Licensor \n", "\n", " entity2 entity2_begin entity2_end \\\n", "0 OBLIGATION_SUBJECT 13 25 \n", "1 OBLIGATION 40 143 \n", "2 OBLIGATION 40 143 \n", "3 OBLIGATION_SUBJECT 0 7 \n", "4 OBLIGATION_INDIRECT_OBJECT 45 52 \n", "5 OBLIGATION_SUBJECT 0 7 \n", "6 OBLIGATION_INDIRECT_OBJECT 45 52 \n", "7 OBLIGATION 54 100 \n", "8 OBLIGATION 54 100 \n", "\n", " chunk2 confidence \n", "0 the Borrowers 0.9983413 \n", "1 any present or future stamp or documentary tax... 0.46110857 \n", "2 any present or future stamp or documentary tax... 0.9991379 \n", "3 Licensee 0.9090177 \n", "4 Licensor 0.8136201 \n", "5 Licensee 0.86316615 \n", "6 Licensor 0.96135247 \n", "7 in achieving registration of the Licensed Mark. 0.82649904 \n", "8 in achieving registration of the Licensed Mark. 0.9142798 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rel_df = get_relations_df(result)\n", "\n", "rel_df[rel_df[\"relation\"] != \"other\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "a1jBvSE6aWtn", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 877 }, "id": "a1jBvSE6aWtn", "outputId": "396480b3-456f-490d-f626-19a0cd751f4f" }, "outputs": [ { "data": { "text/html": [ "Inaddition,the BorrowersOBLIGATION_SUBJECTagree to payOBLIGATION_ACTIONany present or future stamp or documentary taxes or any other excise or property taxes or similar leviesOBLIGATIONwhicharisefromanypaymentmadehereunderorfromtheexecution,delivery,orregistrationof,orotherwisewithrespectto,thisAgreementoranyNote(hereinafterreferredtoas\"OTHERTAXES\").is_obliged_tois_obliged_objectis_obliged_to" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "LicenseeOBLIGATION_SUBJECTagrees to reasonably cooperateOBLIGATION_ACTIONwithLicensorOBLIGATION_INDIRECT_OBJECTin achieving registration of the Licensed Mark.OBLIGATIONis_obliged_objectis_obliged_tois_obliged_objectis_obliged_tois_obliged_tois_obliged_with" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "re_vis = nlp.viz.RelationExtractionVisualizer()\n", "\n", "for i in range(len(sample_text)):\n", " re_vis.display(result = result[i],\n", " relation_col = \"relations\",\n", " document_col = \"document\",\n", " exclude_relations = [\"other\"],\n", " show_relations=True\n", " )" ] }, { "cell_type": "markdown", "id": "a2e7799b-6495-416e-93dc-c529a6fef840", "metadata": { "id": "a2e7799b-6495-416e-93dc-c529a6fef840" }, "source": [ "##✔️ Zero Shot Relation Extraction to Extract Relations Between Legal Entities" ] }, { "cell_type": "markdown", "id": "AttnVaEeYuYu", "metadata": { "id": "AttnVaEeYuYu" }, "source": [ "Now, let's suppose we want to extract `GRANTS` and `GRANTS_TO` relations between the **OBLIGATION_SUBJECT**, **OBLIGATION_ACTION** and **OBLIGATION_INDIRECT_OBJECT** entities. We don't have a model to do that, but!\n", "\n", "That's when Zero-shot RE comes into the game. You can use Zero-shot RE model **without training data** and **without any pretrained model** to create your RE model." ] }, { "cell_type": "markdown", "id": "UbDfTSnyI6Cb", "metadata": { "collapsed": false, "id": "UbDfTSnyI6Cb" }, "source": [ "##✔️ A variation of NLI for Zero-shot Relation Extraction\n", "Similarly to Zero-shot NER, Zero-shot RE also works with `H` (hypotheses) and `P` (premises), and the extraction as a positive result is conditioned to the `H` being `entailed` given a `P`.\n", "\n", "📜In this case, what we do is:\n", "- We took a prompt in the form of {ENT_1} [some_text] {ENT_2}\n", "- ENT_1 is filled with entities from a previous NER\n", "- ENT_2 too.\n", "- We ask the ZeroShotRE model if, given the whole text, the premise {ENT_1} [some_text] {ENT_2} is entailed.\n", "\n", "For example, `ENT_1` is `PARTY`. `ENT_2` is `DOC`. `[some_text]` is `was signed`.\n", "\n", "Given a premise `Meta, Inc. signed a Purchase Agreement with Whatsapp, Inc.`, the result of the previous prompt will be `entailed` for both `Meta, Inc.` and `Purchase Agreement` and `Whatsapp, Inc.` and `Purchase Agreement`." ] }, { "cell_type": "markdown", "id": "SUHPnNDAI6Cb", "metadata": { "collapsed": false, "id": "SUHPnNDAI6Cb" }, "source": [ "##🔎 Some examples" ] }, { "cell_type": "markdown", "id": "XdK2bv0xI6Cb", "metadata": { "collapsed": false, "id": "XdK2bv0xI6Cb" }, "source": [ "\n", "Just few examples of the relations types you are looking for, to output a proper result.\n", "\n", "⚡**!!!Make sure you keep the proper syntax of the relations you want to extract!!!**" ] }, { "cell_type": "markdown", "id": "u_CQG_RDuYUV", "metadata": { "id": "u_CQG_RDuYUV" }, "source": [ "Firstly, we will download sample dataset and do all progress on it." ] }, { "cell_type": "code", "execution_count": null, "id": "UEUGQkOh6jyT", "metadata": { "id": "UEUGQkOh6jyT" }, "outputs": [], "source": [ "! wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/legal-nlp/data/intellectual_property_agreement.txt" ] }, { "cell_type": "code", "execution_count": null, "id": "En7u5YETWe6s", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "En7u5YETWe6s", "outputId": "a9928ea8-424d-4e0d-bb18-e7de13c7734c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Exhibit 10.2\n", "\n", "Execution Version\n", "\n", "INTELLECTUAL PROPERTY AGREEMENT\n", "\n", "This INTELLECTUAL PROPERTY AGREEMENT (this \"Agreement\"), dated as of December 31, 2018 (the \"Effective Date\") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation (\"Seller\") and AFI Licensing LLC, a Delaware limited liability company (\"Licensing\" and together with Seller, \"Arizona\") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation (\"Buyer\") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the \"Company\" and together with Buyer the \"Buyer Entities\") (each of Arizona on the one hand and the Buyer Entities on the other hand, a \"Party\" and collectively, the \"Parties\").\n", "\n", "WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the \"Stock Purchase Agreement\"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller's right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation (\"AWP\") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation (\"HHFC,\" and together with the Company, the \"Company Subsidiaries\" and together with AWP, the \"Company Entities\" and each a \"Company Entity\") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;\n", "\n", "WHEREAS, Arizona owns certain Co\n" ] } ], "source": [ "with open('intellectual_property_agreement.txt', 'r') as f:\n", " agreement = f.read()\n", "print(agreement[:1500])" ] }, { "cell_type": "markdown", "id": "o8CMSMujdKAs", "metadata": { "id": "o8CMSMujdKAs" }, "source": [ "###📚 Get sample clause from agreement" ] }, { "cell_type": "markdown", "id": "rdnjRvLBaDzX", "metadata": { "id": "rdnjRvLBaDzX" }, "source": [ "Firstly, we will get a sanple text from agreement. We will use `GRANT OF COPYRIGHT LICENSE` clauses. So, we will split the agreement to get that clauses." ] }, { "cell_type": "code", "execution_count": null, "id": "bgk18S0z-frM", "metadata": { "id": "bgk18S0z-frM" }, "outputs": [], "source": [ "document_assembler = nlp.DocumentAssembler() \\\n", " .setInputCol(\"text\") \\\n", " .setOutputCol(\"document\")\n", "\n", "text_splitter = legal.TextSplitter() \\\n", " .setInputCols([\"document\"]) \\\n", " .setOutputCol(\"sections\")\\\n", " .setCustomBounds([\"\\n\\n\",\"\\d\\.?\\d? \"])\\\n", " .setUseCustomBoundsOnly(True)\\\n", " .setExplodeSentences(True)\n", "\n", "nlp_pipeline = nlp.Pipeline(stages=[\n", " document_assembler,\n", " text_splitter])\n", "\n", "empty_df = spark.createDataFrame([[\"\"]]).toDF(\"text\")\n", "\n", "model = nlp_pipeline.fit(empty_df)\n", "\n", "light_model = nlp.LightPipeline(model)" ] }, { "cell_type": "code", "execution_count": null, "id": "uPizleiGvcyI", "metadata": { "id": "uPizleiGvcyI" }, "outputs": [], "source": [ "result = light_model.annotate(agreement)\n", "\n", "sections = result['sections']\n" ] }, { "cell_type": "code", "execution_count": null, "id": "RMbADN22vvk6", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RMbADN22vvk6", "outputId": "c1df01da-9dfe-4f59-df51-6117741e4846" }, "outputs": [ { "data": { "text/plain": [ "['Exhibit 10.2',\n", " 'Execution Version',\n", " 'INTELLECTUAL PROPERTY AGREEMENT',\n", " 'This INTELLECTUAL PROPERTY AGREEMENT (this \"Agreement\"), dated as of December 31, 20',\n", " '(the \"Effective Date\") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation (\"Seller\") and AFI Licensing LLC, a Delaware limited liability company (\"Licensing\" and together with Seller, \"Arizona\") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation (\"Buyer\") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the \"Company\" and together with Buyer the \"Buyer Entities\") (each of Arizona on the one hand and the Buyer Entities on the other hand, a \"Party\" and collectively, the \"Parties\").',\n", " 'WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 20',\n", " '(the \"Stock Purchase Agreement\"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller\\'s right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation (\"AWP\") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation (\"HHFC,\" and together with the Company, the \"Company Subsidiaries\" and together with AWP, the \"Company Entities\" and each a \"Company Entity\") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;',\n", " \"WHEREAS, Arizona owns certain Copyrights, Know-How, Patents and Trademarks which may be used in the Company Field, and in connection with the transactions contemplated by the Stock Purchase Agreement the Company desires to acquire all of Arizona's right, title and interest in and to such Intellectual Property used exclusively in the Company Field, and obtain a license from Arizona to use other such Intellectual Property on the terms and subject to the conditions set forth herein;\",\n", " 'WHEREAS, Seller is signatory to the Trademark License Agreement pursuant to which Seller obtains a license to the Arizona Licensed Trademarks;',\n", " 'WHEREAS, the Company desires to obtain a sublicense to use the Arizona Licensed Trademarks in the Company Field;',\n", " 'WHEREAS, Arizona has obtained consent from all counterparties to the Trademark License Agreement to grant to the Company the sublicenses to the Arizona Licensed Trademarks included in this Agreement; and',\n", " 'WHEREAS, the Company Entities own certain Copyrights and Know-How which may be used in the Arizona Field, and in connection with the transactions contemplated by the Stock Purchase Agreement, Arizona desires to obtain a license from the Company Entities to use such Intellectual Property on the terms and subject to the conditions set forth herein.',\n", " 'NOW, THEREFORE, in consideration of the foregoing and the mutual agreements, provisions and covenants contained in this Agreement, and for other good and valuable consideration, the receipt and sufficiency of which are hereby acknowledged, the Parties hereby agree as follows:',\n", " 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019',\n", " 'DEFINITIONS AND INTERPRETATION',\n", " 'Certain Definitions. As used herein, capitalized terms have the meaning ascribed to them herein, including the following terms have the meanings set forth below. Capitalized terms that are not defined in this Agreement shall have the meaning set forth in the Stock Purchase Agreement. (a) \"Arizona Assigned Copyrights\" means all Copyrights, whether registered or unregistered, owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of November 14, 20',\n", " '(the \"SPA Signing Date\") and/or as of the Effective Date. (b) \"Arizona Assigned Internet Domain Names\" means the Internet domain names set forth on Schedule 1.1(b) and all other Internet domain names owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than any Internet domain names that include any Arizona Licensed Trademarks). (c) \"Arizona Assigned IP\" means the Arizona Assigned Copyrights, Arizona Assigned Internet Domain Names, Arizona Assigned Know- How, Arizona Assigned Patents and Arizona Assigned Trademarks. (d) \"Arizona Assigned Know-How\" means all Know-How owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (e) \"Arizona Assigned Patents\" means the Patents set forth on Schedule 1.1(e) and all other Patents owned by Licensing or Seller and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (f) \"Arizona Assigned Trademarks\" means the Trademarks set forth on Schedule 1.1(f) and all other Trademarks owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than, for clarity any Arizona Licensed Trademarks). (g) \"Arizona Domain Names\" means the Internet domain names set forth on Schedule 1.1(g). (h) \"Arizona Field\" means all activities conducted by Arizona or its Affiliates, other than the Company Field. (i) \"Arizona Licensed Copyrights\" means all Copyrights owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Copyrights). 2',\n", " 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019',\n", " '(j) \"Arizona Licensed IP\" means the Arizona Licensed Copyrights, the Arizona Licensed Know-How, the Arizona Licensed Patents, the Arizona Licensed Trademarks, the Diamond Licensed Trademarks and the Phase-Out Marks. (k) \"Arizona Licensed Know-How\" means all Know-How owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Know- How). (l) \"Arizona Licensed Patents\" means the Patents set forth on Schedule 1.1(l) and all other Patents owned by Licensing or Seller or their respective Affiliates as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Patents). (m) \"Arizona Licensed Trademarks\" means the Trademarks set forth on Schedule 1.1(m). (n) \"Arizona Trademark License Term\" means the period commencing on the Effective Date and ending twenty-four (24) months thereafter. (o) \"Company Field\" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of solid hardwood and engineered wood flooring products by or for any Company Entity. (p) \"Company Licensed Copyrights\" means all Copyrights and registrations and applications for any of the foregoing owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (q) \"Company Licensed IP\" means the Company Licensed Copyrights, the Company Licensed Know-How and the Company Licensed Patents. (r) \"Company Licensed Know-How\" means all Know-How owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (s) \"Company Licensed Patents\" means the Patents set forth on Schedule 1.1(s). (t) \"Copyrights\" means copyrights (whether registered or unregistered) including applications for copyright (excluding, for clarity, Trademarks). (u) \"Diamond Licensed Trademarks\" means the Trademarks set forth on Schedule 1.1(u). (v) \"Diamond Product\" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of the solid hardwood flooring product by any Company Entity as conducted under the Diamond Licensed Trademarks by any Company Entity prior to the Effective Date 3',\n", " 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sections[:20]" ] }, { "cell_type": "code", "execution_count": null, "id": "p2UpDUq61G5V", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "p2UpDUq61G5V", "outputId": "85199236-d548-4901-9cd6-f5b3946706df" }, "outputs": [ { "data": { "text/plain": [ "30" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sections.index('GRANT OF COPYRIGHT LICENSE')" ] }, { "cell_type": "markdown", "id": "0pcwgYnBa4gS", "metadata": { "id": "0pcwgYnBa4gS" }, "source": [ "We will get the first clause after the title as the sample text." ] }, { "cell_type": "code", "execution_count": null, "id": "xtPdq9pR3Msa", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "id": "xtPdq9pR3Msa", "outputId": "f8d757c2-fe52-42b8-fe7a-46869eb43a48" }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "'Arizona Copyright Grant. Subject to the terms and conditions of this Agreement, Arizona hereby grants to the Company a perpetual, non- exclusive, royalty-free license in, to and under the Arizona Licensed Copyrights for use in the Company Field throughout the world.'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = sections[31]\n", "\n", "text" ] }, { "cell_type": "markdown", "id": "bkU-JmyAdSTi", "metadata": { "id": "bkU-JmyAdSTi" }, "source": [ "###📚 Extract Relations with Zero-shot RE Model" ] }, { "cell_type": "markdown", "id": "mqApaLC6bY5_", "metadata": { "id": "mqApaLC6bY5_" }, "source": [ "As we say above, we want to extract `GRANTS` and `GRANTS_TO` relations between the **OBLIGATION_SUBJECT**, **OBLIGATION_ACTION** and **OBLIGATION_INDIRECT_OBJECT** entities. To do this we use `legner_obligations` NER model. After that we use `legre_zero_shot` model to extract relations. \n", "\n", "But **!!!make sure you keep the proper syntax of the relations you want to extract!!!**" ] }, { "cell_type": "code", "execution_count": null, "id": "TMqH89vW-ftb", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TMqH89vW-ftb", "outputId": "845c1a77-9905-4de8-fdc6-ef4992388176" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "legner_obligations download started this may take some time.\n", "[OK!]\n", "legre_zero_shot download started this may take some time.\n", "[OK!]\n" ] } ], "source": [ "documentAssembler = nlp.DocumentAssembler()\\\n", " .setInputCol(\"text\")\\\n", " .setOutputCol(\"document\")\n", "\n", "tokenizer = nlp.Tokenizer()\\\n", " .setInputCols(\"document\")\\\n", " .setOutputCol(\"token\")\n", "\n", "tokenClassifier = legal.BertForTokenClassification.pretrained('legner_obligations','en', 'legal/models')\\\n", " .setInputCols(\"token\", \"document\")\\\n", " .setOutputCol(\"ner\")\\\n", " .setMaxSentenceLength(512)\\\n", " .setCaseSensitive(True)\n", "\n", "ner_converter = nlp.NerConverter()\\\n", " .setInputCols([\"document\", \"token\", \"ner\"])\\\n", " .setOutputCol(\"ner_chunk\")\n", "\n", "re_model = legal.ZeroShotRelationExtractionModel.pretrained(\"legre_zero_shot\", \"en\", \"legal/models\")\\\n", " .setInputCols([\"ner_chunk\", \"document\"]) \\\n", " .setOutputCol(\"relations\")\n", "\n", "# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0\n", "re_model.setRelationalCategories({\n", " \"GRANTS_TO\": [\"{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}\"],\n", " \"GRANTS\": [\"{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}\"]\n", "})\n", "\n", "pipeline = nlp.Pipeline(stages = [\n", " document_assembler, \n", " tokenizer,\n", " tokenClassifier, \n", " ner_converter,\n", " re_model\n", " ])\n", "\n", "empty_df = spark.createDataFrame([[\"\"]]).toDF(\"text\")\n", "\n", "model = pipeline.fit(empty_df)\n", "\n", "light_model = nlp.LightPipeline(model)" ] }, { "cell_type": "code", "execution_count": null, "id": "zlRzJg9n4idT", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 112 }, "id": "zlRzJg9n4idT", "outputId": "3dbeac7d-361e-4b65-e992-11be62fa7e52" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
relationentity1entity1_beginentity1_endchunk1entity2entity2_beginentity2_endchunk2confidence
0GRANTS_TOOBLIGATION_SUBJECT8086ArizonaOBLIGATION_INDIRECT_OBJECT109115Company0.9535338
1GRANTSOBLIGATION_SUBJECT8086ArizonaOBLIGATION_ACTION88100hereby grants0.9873099
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " relation entity1 entity1_begin entity1_end chunk1 \\\n", "0 GRANTS_TO OBLIGATION_SUBJECT 80 86 Arizona \n", "1 GRANTS OBLIGATION_SUBJECT 80 86 Arizona \n", "\n", " entity2 entity2_begin entity2_end chunk2 \\\n", "0 OBLIGATION_INDIRECT_OBJECT 109 115 Company \n", "1 OBLIGATION_ACTION 88 100 hereby grants \n", "\n", " confidence \n", "0 0.9535338 \n", "1 0.9873099 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = light_model.fullAnnotate(text)\n", "\n", "rel_df = get_relations_df(result)\n", "\n", "rel_df[rel_df[\"relation\"] != \"no_rel\"]" ] }, { "cell_type": "markdown", "id": "b7019ccb-dfe1-47b7-83e4-553baa4d3c3b", "metadata": { "id": "b7019ccb-dfe1-47b7-83e4-553baa4d3c3b" }, "source": [ "###📚 Visualization of Extracted Relations" ] }, { "cell_type": "code", "execution_count": null, "id": "c8f5569f-56c0-4c36-adf5-819537542c43", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 397 }, "id": "c8f5569f-56c0-4c36-adf5-819537542c43", "outputId": "5e1796f1-1811-46a3-aa38-1c944d2cdd6f" }, "outputs": [ { "data": { "text/html": [ "ArizonaCopyrightGrant.SubjecttothetermsandconditionsofthisAgreement,ArizonaOBLIGATION_SUBJECThereby grantsOBLIGATION_ACTIONtotheCompanyOBLIGATION_INDIRECT_OBJECTaperpetual,non-exclusive,royalty-freelicensein,toandundertheArizonaLicensedCopyrightsforuseintheCompanyFieldthroughouttheworld.GRANTS_TOGRANTS" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# from sparknlp_display import RelationExtractionVisualizer\n", "\n", "re_vis = nlp.viz.RelationExtractionVisualizer()\n", "\n", "re_vis.display(result = result[0],\n", " relation_col = \"relations\",\n", " document_col = \"document\",\n", " exclude_relations = [\"no_rel\"],\n", " show_relations=True,\n", " )" ] }, { "cell_type": "markdown", "id": "gvH9-FQhcRnH", "metadata": { "id": "gvH9-FQhcRnH" }, "source": [ "You can use Zero-shot RE model with other NER models to get different relations between the different entities." ] } ], "metadata": { "colab": { "provenance": [], "toc_visible": true }, "gpuClass": "standard", "kernelspec": { "display_name": "tf-gpu", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)]" }, "toc-showtags": false, "vscode": { "interpreter": { "hash": "3f47d918ae832c68584484921185f5c85a1760864bf927a683dc6fb56366cc77" } } }, "nbformat": 4, "nbformat_minor": 5 }