![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sequence2sequence/T5_question_answering.ipynb)

# `Open book` and `Closed book` question answering with Google's T5  
With the latest NLU release and Google's T5 you can answer **general knowledge based questions given no context** and in addition answer **questions on text databases**.      
These questions can be asked in natural human language and answerd in just 1 line with NLU!.




## What is a `open book question`? 
You can imagine an `open book` question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen. 

In `T5's` terms, this means the model is given a `question` and an **additional piece of textual information** or so called `context`.

This enables the `T5` model to answer questions on textual datasets like `medical records`,`newsarticles` , `wiki-databases` , `stories` and `movie scripts` , `product descriptions`, 'legal documents' and many more.

You can answer `open book question` in 1 line of code, leveraging the latest NLU release and Google's T5.     
All it takes is : 



```python
nlu.load('answer_question').predict("""
Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,
 and Jebe died on the road back to Samarkand""")
>>> Output: Samarkand
```

Example for answering medical questions based on medical context
``` python
question ='''
What does increased oxygen concentrations in the patient’s lungs displace? 
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. 
Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
 Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''


#Predict on text data with T5
nlu.load('answer_question').predict(question)
>>> Output: carbon monoxide	
```

Take a look at this example on a recent news article snippet : 
```python
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'

# from https://www.bbc.com/news/business-55728338 
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
# join question with context, works with Pandas DF aswell!
questions = [
             question1+ news_article_snippet,
             question2+ news_article_snippet,
             question3+ news_article_snippet,
             question4+ news_article_snippet,
             question5+ news_article_snippet,
             question6+ news_article_snippet,]
nlu.load('answer_question').predict(questions)
```
This will output a Pandas Dataframe similar to this : 

|Answer|Question|
|-----|---------|
Alibaba Group founder| 	Who is Jack ma? |        
|Jack Ma	|Who is founder of Alibaba Group? |  
Wednesday	| When did Jack Ma re-appear? | 
surged 5%	| How did Alibaba stocks react? | 
100 rural teachers	| Whom did Jack Ma meet? | 
Chinese regulators	|Who did Jack Ma hide from?|



## What is a `closed book question`? 
A `closed book question` is the exact opposite of a `open book question`. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else.      
In `T5's` terms this means that T5 can only use it's stored weights to answer a `question` and is given **no aditional context**.        
`T5` was pre-trained on the [C4 dataset](https://commoncrawl.org/) which contains **petabytes  of web crawling data**  collected over the last 8 years, including Wikipedia in every language.


This gives `T5` the broad knowledge of the internet stored in it's weights to answer various `closed book questions` 

You can answer `closed book question` in 1 line of code, leveraging the latest NLU release and Google's T5.     
You need to pass one string to NLU, which starts which a `question` and is followed by  a `context:` tag and then the actual context contents. 
All it takes is : 


```python
nlu.load('en.t5').predict('Who is president of Nigeria?')
>>> Muhammadu Buhari 
```


```python
nlu.load('en.t5').predict('What is the most spoken language in India?')
>>> Hindi
```


```python
nlu.load('en.t5').predict('What is the capital of Germany?')
>>> Berlin
```



In [None]:
!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash
  

import nlu

--2021-05-01 23:01:02--  https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1671 (1.6K) [text/plain]
Saving to: ‘STDOUT’

-                     0%[                    ]       0  --.-KB/s               Installing  NLU 3.0.0 with  PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...

2021-05-01 23:01:02 (1.66 MB/s) - written to stdout [1671/1671]

[K     |████████████████████████████████| 204.8MB 73kB/s 
[K     |████████████████████████████████| 153kB 45.6MB/s 
[K     |████████████████████████████████| 204kB 20.4MB/s 
[K     |████████████████████████████████| 204kB 50.9MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone


# Closed book question answering example

In [None]:
t5_closed_book = nlu.load('en.t5')

google_t5_small_ssm_nq download started this may take some time.
Approximate size to download 139 MB
[OK!]


In [None]:
t5_closed_book.predict('What is the capital of Germany?')

Unnamed: 0,document,t5
0,What is the capital of Germany?,[Berlin]


In [None]:
t5_closed_book.predict('Who is president of Nigeria?')

Unnamed: 0,document,t5
0,Who is president of Nigeria?,[Muhammadu Buhari]


In [None]:
t5_closed_book.predict('What is the most spoken language in India?')


Unnamed: 0,document,t5
0,What is the most spoken language in India?,[Hindi]


# Open Book question examples

**Your context must be prefixed with `context:`**


In [None]:
t5_open_book = nlu.load('answer_question')


t5_base download started this may take some time.
Approximate size to download 446 MB
[OK!]


In [None]:
t5_open_book.predict("""Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand"""                     )

Unnamed: 0,document,t5
0,Where did Jebe die? context: Ghenkis Khan reca...,[Samarkand]


## Open Book question example on a Story

In [None]:
question1 = 'What does Jimmy like to eat for breakfast usually?'
question2 = 'Why was Jimmy suprised?'

story = """ context:
Once upon a time, there was a squirrel named Joey.
Joey loved to go outside and play with his cousin Jimmy.
Joey and Jimmy played silly games together, and were always laughing.
One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond.
Joey woke up early in the morning to eat some food before they left.
He couldn’t find anything to eat except for pie!
Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast.
After he ate, he and Jimmy went to the pond.
On their way there they saw their friend Jack Rabbit.
They dove into the water and swam for several hours.
The sun was out, but the breeze was cold.
Joey and Jimmy got out of the water and started walking home.
Their fur was wet, and the breeze chilled them.
When they got home, they dried off, and Jimmy put on his favorite purple shirt.
Joey put on a blue shirt with red and green dots.
The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed.
 """
questions = [
             question1+ story,
             question2+ story,]
t5_open_book.predict(questions)


Unnamed: 0,document,t5
0,What does Jimmy like to eat for breakfast usua...,"[cereal, fruit (a pear), or oatmeal]"
1,Why was Jimmy suprised? context: Once upon a t...,[He couldn’t find anything to eat except for pie]


## Open book question example on news article

In [None]:
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'


# from https://www.bbc.com/news/business-55728338 
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""

questions = [
             question1+ news_article_snippet,
             question2+ news_article_snippet,
             question3+ news_article_snippet,
             question4+ news_article_snippet,
             question5+ news_article_snippet,
             question6+ news_article_snippet,]



t5_open_book.predict(questions)


Unnamed: 0,document,t5
0,Who is Jack ma? context: Alibaba Group founder...,[Alibaba Group founder]
1,Who is founder of Alibaba Group? context: Alib...,[Jack Ma]
2,When did Jack Ma re-appear? context: Alibaba G...,[Wednesday]
3,How did Alibaba stocks react? context: Alibaba...,[surged 5%]
4,Whom did Jack Ma meet? context: Alibaba Group ...,[100 rural teachers]
5,Who did Jack Ma hide from? context: Alibaba Gr...,[Chinese regulators]


In [None]:


# define Data, add additional context tag between sentence
question ='''
What does increased oxygen concentrations in the patient’s lungs displace? 
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''


#Predict on text data with T5
t5_open_book.predict(question)

Unnamed: 0,document,t5
0,What does increased oxygen concentrations in t...,[carbon monoxide]
