![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/06.1.Relation_Extraction_and_ZeroShotRE.ipynb)

#üé¨ Installation

In [None]:
! pip install -q johnsnowlabs

##üîó Automatic Installation
Using my.johnsnowlabs.com SSO

In [None]:
from johnsnowlabs import nlp, legal

# nlp.install(force_browser=True)

##üîó Manual downloading
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.

- Go to my.johnsnowlabs.com
- Download your license
- Upload it using the following command

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

- Install it

In [None]:
nlp.install()

#üìå Starting

In [None]:
spark = nlp.start()

#üîé Legal Relation Extraction(RE) and Zero-shot Relation Extraction

Legal relation extraction is a task in natural language processing (NLP) that involves extracting relationships between entities in legal documents. These relationships can be between people, organizations, or legal concepts.

Legal relation extraction is useful for a variety of purposes, including legal research, contract analysis, and legal case management. For example, legal relation extraction can be used to identify relationships between parties in a contract, such as the buyer and seller, or to extract clauses in a contract that outline certain obligations or rights.

##‚úîÔ∏è Pretrained Relation Extraction Models and Pipelines for Legal

Here are the list of pretrained Relation Extraction models and pipelines:

**Relation Extraction Models**

|index|model|
|-----:|:-----|
| 1| [Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/08/12/legre_contract_doc_parties_en_3_2.html)  | 
| 2| [Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Medium, Undirectional)](https://nlp.johnsnowlabs.com/2022/11/02/legre_contract_doc_parties_md_en.html)  | 
| 3| [Legal Relation Extraction (Alias)](https://nlp.johnsnowlabs.com/2022/08/17/legre_org_prod_alias_en_3_2.html)  |
| 4| [Legal Relation Extraction (Whereas) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/08/24/legre_whereas_en.html)  | 
| 5| [Legal Relation Extraction (Whereas) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_whereas_md_en.html)  | 
| 6| [Legal Relation Extraction (Indemnification) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/09/28/legre_indemnifications_en.html)  |
| 7| [Legal Relation Extraction (Indemnification) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_indemnifications_md_en.html)  | 
| 8| [Legal Relation Extraction (Confidentiality) (Small, Bidirectional)](https://nlp.johnsnowlabs.com/2022/10/18/legre_confidentiality_en.html)  |
| 9| [Legal Relation Extraction (Confidentiality) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_confidentiality_md_en.html)  |
| 10| [Legal Relation Extraction (Warranty)](https://nlp.johnsnowlabs.com/2022/10/19/legre_warranty_en.html)  |
| 11| [Legal Relation Extraction (Grants) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/09/legre_grants_md_en.html)  |
| 12| [(Obligations) (Medium, Unidirectional)](https://nlp.johnsnowlabs.com/2022/11/03/legre_obligations_md_en.html)  |
| 13| [Legal Relation Extraction (Notice Clause)](https://nlp.johnsnowlabs.com/2022/12/17/legre_notice_clause_xs_en.html)  |
| 14| [Legal Zero-shot Relation Extraction](https://nlp.johnsnowlabs.com/2022/08/22/legre_zero_shot_en_3_2.html)  |
| 15| [Pretrained Pipeline(Whereas)](https://nlp.johnsnowlabs.com/2022/08/24/legpipe_whereas_en.html)  |


##‚úîÔ∏è Relation Extraction Model to Infer Relations Between Elements in OBLIGATIONS-like sentences

An obligation sentence in a legal agreement is a provision that specifies the duties, responsibilities, and obligations of one or more parties to the agreement. These sentences are used to outline the specific actions that a party must take or refrain from taking in order to fulfill their obligations under the agreement.They are an important part of any legal agreement, as they help to ensure that the parties understand and agree to their respective roles and responsibilities.

üìöWe understand an `obligation` as a sentence or sentences in which a Party **OBLIGATION_SUBJECT** must do **OBLIGATION_ACITON** something **OBLIGATION_OBJECT** to other Party **OBLIGATION_INDIRECT_OBJECT**.

In [None]:
# Create Generic Function to Show Relations in Dataframe

import pandas as pd
def get_relations_df (results, col='relations'):
    rel_pairs=[]
    for i in range(len(results)):
        for rel in results[i][col]:
            rel_pairs.append((
              rel.result, 
              rel.metadata['entity1'], 
              rel.metadata['entity1_begin'],
              rel.metadata['entity1_end'],
              rel.metadata['chunk1'], 
              rel.metadata['entity2'],
              rel.metadata['entity2_begin'],
              rel.metadata['entity2_end'],
              rel.metadata['chunk2'], 
              rel.metadata['confidence']
          ))
    rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
    return rel_df

In [None]:
sample_text = ["""In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").""",
              
               """Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark."""]

sample_text               

['In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").',
 'Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark.']

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

ner_model = legal.BertForTokenClassification.pretrained("legner_obligations", "en", "legal/models")\
    .setInputCols("token", "document")\
    .setOutputCol("ner")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["document","token","ner"])\
    .setOutputCol("ner_chunk")

re_model = legal.RelationExtractionDLModel().pretrained("legre_obligations_md", "en", "legal/models")\
    .setPredictionThreshold(0.4)\
    .setInputCols(["ner_chunk", "document"])\
    .setOutputCol("relations")

pipeline = nlp.Pipeline(stages=[
        document_assembler, 
        tokenizer,
        ner_model,
        ner_converter,
        re_model
])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_df)


legner_obligations download started this may take some time.
[OK!]
legre_obligations_md download started this may take some time.
[OK!]


In [None]:
light_model = nlp.LightPipeline(model)

result = light_model.fullAnnotate(sample_text)

In [None]:
rel_df = get_relations_df(result)

rel_df[rel_df["relation"] != "other"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,is_obliged_to,OBLIGATION_ACTION,27,38,agree to pay,OBLIGATION_SUBJECT,13,25,the Borrowers,0.9983413
1,is_obliged_to,OBLIGATION_SUBJECT,13,25,the Borrowers,OBLIGATION,40,143,any present or future stamp or documentary tax...,0.46110857
2,is_obliged_object,OBLIGATION_ACTION,27,38,agree to pay,OBLIGATION,40,143,any present or future stamp or documentary tax...,0.9991379
3,is_obliged_to,OBLIGATION_ACTION,9,38,agrees to reasonably cooperate,OBLIGATION_SUBJECT,0,7,Licensee,0.9090177
4,is_obliged_with,OBLIGATION_SUBJECT,0,7,Licensee,OBLIGATION_INDIRECT_OBJECT,45,52,Licensor,0.8136201
5,is_obliged_to,OBLIGATION,54,100,in achieving registration of the Licensed Mark.,OBLIGATION_SUBJECT,0,7,Licensee,0.86316615
6,is_obliged_object,OBLIGATION_ACTION,9,38,agrees to reasonably cooperate,OBLIGATION_INDIRECT_OBJECT,45,52,Licensor,0.96135247
7,is_obliged_object,OBLIGATION_ACTION,9,38,agrees to reasonably cooperate,OBLIGATION,54,100,in achieving registration of the Licensed Mark.,0.82649904
8,is_obliged_to,OBLIGATION_INDIRECT_OBJECT,45,52,Licensor,OBLIGATION,54,100,in achieving registration of the Licensed Mark.,0.9142798


In [None]:
re_vis = nlp.viz.RelationExtractionVisualizer()

for i in range(len(sample_text)):
  re_vis.display(result = result[i],
            relation_col = "relations",
            document_col = "document",
            exclude_relations = ["other"],
            show_relations=True
            )

##‚úîÔ∏è Zero Shot Relation Extraction to Extract Relations Between Legal Entities

Now, let's suppose we want to extract `GRANTS` and `GRANTS_TO` relations between the **OBLIGATION_SUBJECT**, **OBLIGATION_ACTION** and **OBLIGATION_INDIRECT_OBJECT** entities. We don't have a model to do that, but!

That's when Zero-shot RE comes into the game. You can use Zero-shot RE model **without training data** and **without any pretrained model** to create your RE model.

##‚úîÔ∏è A variation of NLI for Zero-shot Relation Extraction
Similarly to Zero-shot NER, Zero-shot RE also works with `H` (hypotheses) and `P` (premises), and the extraction as a positive result is conditioned to the `H` being `entailed` given a `P`.

üìúIn this case, what we do is:
- We took a prompt in the form of {ENT_1} [some_text] {ENT_2}
- ENT_1 is filled with entities from a previous NER
- ENT_2 too.
- We ask the ZeroShotRE model if, given the whole text, the premise {ENT_1} [some_text] {ENT_2} is entailed.

For example, `ENT_1` is `PARTY`. `ENT_2` is `DOC`. `[some_text]` is `was signed`.

Given a premise `Meta, Inc. signed a Purchase Agreement with Whatsapp, Inc.`, the result of the previous prompt will be `entailed` for both `Meta, Inc.` and `Purchase Agreement` and `Whatsapp, Inc.` and `Purchase Agreement`.

##üîé Some examples


Just few examples of the relations types you are looking for, to output a proper result.

‚ö°**!!!Make sure you keep the proper syntax of the relations you want to extract!!!**

Firstly, we will download sample dataset and do all progress on it.

In [None]:
! wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/legal-nlp/data/intellectual_property_agreement.txt

In [None]:
with open('intellectual_property_agreement.txt', 'r') as f:
  agreement = f.read()
print(agreement[:1500])

Exhibit 10.2

Execution Version

INTELLECTUAL PROPERTY AGREEMENT

This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").

WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all

###üìö Get sample clause from agreement

Firstly, we will get a sanple text from agreement. We will use `GRANT OF COPYRIGHT LICENSE` clauses. So, we will split the agreement to get that clauses.

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

text_splitter = legal.TextSplitter() \
    .setInputCols(["document"]) \
    .setOutputCol("sections")\
    .setCustomBounds(["\n\n","\d\.?\d? "])\
    .setUseCustomBoundsOnly(True)\
    .setExplodeSentences(True)

nlp_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    text_splitter])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = nlp_pipeline.fit(empty_df)

light_model = nlp.LightPipeline(model)

In [None]:
result = light_model.annotate(agreement)

sections = result['sections']


In [None]:
sections[:20]

['Exhibit 10.2',
 'Execution Version',
 'INTELLECTUAL PROPERTY AGREEMENT',
 'This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 20',
 '(the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").',
 'WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 20',
 '(the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchas

In [None]:
sections.index('GRANT OF COPYRIGHT LICENSE')

30

We will get the first clause after the title as the sample text.

In [None]:
text = sections[31]

text

'Arizona Copyright Grant. Subject to the terms and conditions of this Agreement, Arizona hereby grants to the Company a perpetual, non- exclusive, royalty-free license in, to and under the Arizona Licensed Copyrights for use in the Company Field throughout the world.'

###üìö Extract Relations with Zero-shot RE Model

As we say above, we want to extract `GRANTS` and `GRANTS_TO` relations between the **OBLIGATION_SUBJECT**, **OBLIGATION_ACTION** and **OBLIGATION_INDIRECT_OBJECT** entities. To do this we use `legner_obligations` NER model. After that we use `legre_zero_shot` model to extract relations. 

But **!!!make sure you keep the proper syntax of the relations you want to extract!!!**

In [None]:
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained('legner_obligations','en', 'legal/models')\
    .setInputCols("token", "document")\
    .setOutputCol("ner")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["document", "token", "ner"])\
    .setOutputCol("ner_chunk")

re_model = legal.ZeroShotRelationExtractionModel.pretrained("legre_zero_shot", "en", "legal/models")\
    .setInputCols(["ner_chunk", "document"]) \
    .setOutputCol("relations")

# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0
re_model.setRelationalCategories({
    "GRANTS_TO": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}"],
    "GRANTS": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}"]
})

pipeline = nlp.Pipeline(stages = [
                document_assembler,  
                tokenizer,
                tokenClassifier, 
                ner_converter,
                re_model
               ])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_df)

light_model = nlp.LightPipeline(model)

legner_obligations download started this may take some time.
[OK!]
legre_zero_shot download started this may take some time.
[OK!]


In [None]:
result = light_model.fullAnnotate(text)

rel_df = get_relations_df(result)

rel_df[rel_df["relation"] != "no_rel"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,GRANTS_TO,OBLIGATION_SUBJECT,80,86,Arizona,OBLIGATION_INDIRECT_OBJECT,109,115,Company,0.9535338
1,GRANTS,OBLIGATION_SUBJECT,80,86,Arizona,OBLIGATION_ACTION,88,100,hereby grants,0.9873099


###üìö Visualization of Extracted Relations

In [None]:
# from sparknlp_display import RelationExtractionVisualizer

re_vis = nlp.viz.RelationExtractionVisualizer()

re_vis.display(result = result[0],
           relation_col = "relations",
           document_col = "document",
           exclude_relations = ["no_rel"],
           show_relations=True,
           )

You can use Zero-shot RE model with other NER models to get different relations between the different entities.