![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sequence2sequence/T5_question_answering.ipynb)

# `Open book` and `Closed book` question answering with Google's T5  
With the latest NLU release and Google's T5 you can answer **general knowledge based questions given no context** and in addition answer **questions on text databases**.      
These questions can be asked in natural human language and answerd in just 1 line with NLU!.




## What is a `open book question`? 
You can imagine an `open book` question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen. 

In `T5's` terms, this means the model is given a `question` and an **additional piece of textual information** or so called `context`.

This enables the `T5` model to answer questions on textual datasets like `medical records`,`newsarticles` , `wiki-databases` , `stories` and `movie scripts` , `product descriptions`, 'legal documents' and many more.

You can answer `open book question` in 1 line of code, leveraging the latest NLU release and Google's T5.     
All it takes is : 

```python
nlu.load('answer_question').predict("""
Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,
 and Jebe died on the road back to Samarkand""")
>>> Output: Samarkand
```

Example for answering medical questions based on medical context
``` python
question ='''
What does increased oxygen concentrations in the patient’s lungs displace? 
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. 
Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
 Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''


#Predict on text data with T5
nlu.load('answer_question').predict(question)
>>> Output: carbon monoxide	
```

Take a look at this example on a recent news article snippet : 
```python
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'

# from https://www.bbc.com/news/business-55728338 
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
# join question with context, works with Pandas DF aswell!
questions = [
             question1+ news_article_snippet,
             question2+ news_article_snippet,
             question3+ news_article_snippet,
             question4+ news_article_snippet,
             question5+ news_article_snippet,
             question6+ news_article_snippet,]
nlu.load('answer_question').predict(questions)
```
This will output a Pandas Dataframe similar to this : 

|Answer|Question|
|-----|---------|
Alibaba Group founder| 	Who is Jack ma? |        
|Jack Ma	|Who is founder of Alibaba Group? |  
Wednesday	| When did Jack Ma re-appear? | 
surged 5%	| How did Alibaba stocks react? | 
100 rural teachers	| Whom did Jack Ma meet? | 
Chinese regulators	|Who did Jack Ma hide from?|



## What is a `closed book question`? 
A `closed book question` is the exact opposite of a `open book question`. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else.      
In `T5's` terms this means that T5 can only use it's stored weights to answer a `question` and is given **no aditional context**.        
`T5` was pre-trained on the [C4 dataset](https://commoncrawl.org/) which contains **petabytes  of web crawling data**  collected over the last 8 years, including Wikipedia in every language.


This gives `T5` the broad knowledge of the internet stored in it's weights to answer various `closed book questions` 

You can answer `closed book question` in 1 line of code, leveraging the latest NLU release and Google's T5.     
You need to pass one string to NLU, which starts which a `question` and is followed by  a `context:` tag and then the actual context contents. 
All it takes is : 


```python
nlu.load('en.t5').predict('Who is president of Nigeria?')
>>> Muhammadu Buhari 
```


```python
nlu.load('en.t5').predict('What is the most spoken language in India?')
>>> Hindi
```


```python
nlu.load('en.t5').predict('What is the capital of Germany?')
>>> Berlin
```



In [None]:
import os
! apt-get update -qq > /dev/null   
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install nlu pyspark==2.4.7 > /dev/null   
import nlu


# Closed book question answering example

In [None]:
t5_closed_book = nlu.load('en.t5')

google_t5_small_ssm_nq download started this may take some time.
Approximate size to download 139 MB
[OK!]


In [None]:
t5_closed_book.predict('What is the capital of Germany?')

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,What is the capital of Germany?,Berlin


In [None]:
t5_closed_book.predict('Who is president of Nigeria?')

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Who is president of Nigeria?,Muhammadu Buhari


In [None]:
t5_closed_book.predict('What is the most spoken language in India?')


Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,What is the most spoken language in India?,Hindi


# Open Book question examples

**Your context must be prefixed with `context:`**


In [None]:
t5_open_book = nlu.load('answer_question')


t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


## The context : 
`Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand`

In [None]:
t5_open_book.predict("""Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand"""                     )

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Where did Jebe die? context: Ghenkis Khan reca...,Samarkand


## Todo Tesla Bitcoin News article!?

## Open book question example on news article

In [None]:
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'


# from https://www.bbc.com/news/business-55728338 
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""

questions = [
             question1+ news_article_snippet,
             question2+ news_article_snippet,
             question3+ news_article_snippet,
             question4+ news_article_snippet,
             question5+ news_article_snippet,
             question6+ news_article_snippet,]



t5_open_book.predict(questions)


Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,Who is Jack ma? context: Alibaba Group founder...,Alibaba Group founder
1,Who is founder of Alibaba Group? context: Alib...,Jack Ma
2,When did Jack Ma re-appear? context: Alibaba G...,Wednesday
3,How did Alibaba stocks react? context: Alibaba...,surged 5%
4,Whom did Jack Ma meet? context: Alibaba Group ...,100 rural teachers
5,Who did Jack Ma hide from? context: Alibaba Gr...,Chinese regulators


In [None]:


# define Data, add additional context tag between sentence
question ='''
What does increased oxygen concentrations in the patient’s lungs displace? 
context: Hyperbaric (high-pressure) medicine uses special oxygen 
chambers to increase the partial pressure of O 2 around the patient and,
 when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, 
 and decompression sickness (the ’bends’) are sometimes treated using these devices.
  Increased O 2 concentration in the lungs helps to displace carbon monoxide from the
   heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''


#Predict on text data with T5
t5_open_book.predict(question)

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,What does increased oxygen concentrations in t...,carbon monoxide


# Summarize

In [None]:
t5_sum = nlu.load('en.t5.base')

t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


In [None]:
# Set the task on T5
t5_sum['t5'].setTask('summarize ') 



# define Data, add additional tags between sentences
data = [
              '''
        The belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .
                            ''',
              '''  Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations. It has two major branches, differential calculus and integral calculus; the former concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of convergence of infinite sequences and infinite series to a well-defined limit.[1] Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.[2][3] Today, calculus has widespread uses in science, engineering, and economics.[4] In mathematics education, calculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus (plural calculi) is a Latin word, meaning originally "small pebble" (this meaning is kept in medicine – see Calculus (medicine)). Because such pebbles were used for calculation, the meaning of the word has evolved and today usually means a method of computation. It is therefore used for naming specific methods of calculation and related theories, such as propositional calculus, Ricci calculus, calculus of variations, lambda calculus, and process calculus.'''
             ]


#Predict on text data with T5

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,The belgian duo took to the dance floor on mon...,manchester united face newcastle in the premie...
1,"Calculus, originally called infinitesimal calc...","calculus, originally called infinitesimal calc..."


In [None]:
text = """(Reuters) - Mastercard Inc said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year, joining a string of big-ticket firms that have pledged similar support.

The credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin and would soon accept it as a form of payment.

Asset manager BlackRock Inc and payments companies Square and PayPal have also recently backed cryptocurrencies.

Mastercard already offers customers cards that allow people to transact using their cryptocurrencies, although without going through its network.

"Doing this work will create a lot more possibilities for shoppers and merchants, allowing them to transact in an entirely new form of payment. This change may open merchants up to new customers who are already flocking to digital assets," Mastercard said. (mstr.cd/3tLaPZM)

Mastercard specified that not all cryptocurrencies will be supported on its network, adding that many of the hundreds of digital assets in circulation still need to tighten their compliance measures.

Many cryptocurrencies have struggled to win the trust of mainstream investors and the general public due to their speculative nature and potential for money laundering.
"""
short = t5_sum.predict(text)
short

Unnamed: 0_level_0,document,T5
origin_index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,(Reuters) - Mastercard Inc said on Wednesday i...,mastercard said on Wednesday it was planning t...


In [None]:
short.T5.iloc[0]

'mastercard said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year . the credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin . asset manager blackrock and payments companies Square and PayPal have also recently backed cryptocurrencies .'

In [None]:
len('mastercard said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year . the credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin . asset manager blackrock and payments companies Square and PayPal have also recently backed cryptocurrencies .')

352

In [None]:
len(text)

1284