{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"T5_question_answering.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"CebbOFtOMv6X"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sequence2sequence/T5_question_answering.ipynb)\n","\n","# `Open book` and `Closed book` question answering with Google's T5 \n","With the latest NLU release and Google's T5 you can answer **general knowledge based questions given no context** and in addition answer **questions on text databases**. \n","These questions can be asked in natural human language and answerd in just 1 line with NLU!.\n","\n","\n","\n","\n","## What is a `open book question`? \n","You can imagine an `open book` question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen. \n","\n","In `T5's` terms, this means the model is given a `question` and an **additional piece of textual information** or so called `context`.\n","\n","This enables the `T5` model to answer questions on textual datasets like `medical records`,`newsarticles` , `wiki-databases` , `stories` and `movie scripts` , `product descriptions`, 'legal documents' and many more.\n","\n","You can answer `open book question` in 1 line of code, leveraging the latest NLU release and Google's T5. \n","All it takes is : \n","\n","\n","\n","```python\n","nlu.load('answer_question').predict(\"\"\"\n","Where did Jebe die?\n","context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,\n"," and Jebe died on the road back to Samarkand\"\"\")\n",">>> Output: Samarkand\n","```\n","\n","Example for answering medical questions based on medical context\n","``` python\n","question ='''\n","What does increased oxygen concentrations in the patient’s lungs displace? \n","context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. \n","Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.\n"," Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n","'''\n","\n","\n","#Predict on text data with T5\n","nlu.load('answer_question').predict(question)\n",">>> Output: carbon monoxide\t\n","```\n","\n","Take a look at this example on a recent news article snippet : \n","```python\n","question1 = 'Who is Jack ma?'\n","question2 = 'Who is founder of Alibaba Group?'\n","question3 = 'When did Jack Ma re-appear?'\n","question4 = 'How did Alibaba stocks react?'\n","question5 = 'Whom did Jack Ma meet?'\n","question6 = 'Who did Jack Ma hide from?'\n","\n","# from https://www.bbc.com/news/business-55728338 \n","news_article_snippet = \"\"\" context:\n","Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.\n","His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.\n","The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.\n","Alibaba shares surged 5% on Hong Kong's stock exchange on the news.\n","\"\"\"\n","# join question with context, works with Pandas DF aswell!\n","questions = [\n"," question1+ news_article_snippet,\n"," question2+ news_article_snippet,\n"," question3+ news_article_snippet,\n"," question4+ news_article_snippet,\n"," question5+ news_article_snippet,\n"," question6+ news_article_snippet,]\n","nlu.load('answer_question').predict(questions)\n","```\n","This will output a Pandas Dataframe similar to this : \n","\n","|Answer|Question|\n","|-----|---------|\n","Alibaba Group founder| \tWho is Jack ma? | \n","|Jack Ma\t|Who is founder of Alibaba Group? | \n","Wednesday\t| When did Jack Ma re-appear? | \n","surged 5%\t| How did Alibaba stocks react? | \n","100 rural teachers\t| Whom did Jack Ma meet? | \n","Chinese regulators\t|Who did Jack Ma hide from?|\n","\n","\n","\n","## What is a `closed book question`? \n","A `closed book question` is the exact opposite of a `open book question`. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else. \n","In `T5's` terms this means that T5 can only use it's stored weights to answer a `question` and is given **no aditional context**. \n","`T5` was pre-trained on the [C4 dataset](https://commoncrawl.org/) which contains **petabytes of web crawling data** collected over the last 8 years, including Wikipedia in every language.\n","\n","\n","This gives `T5` the broad knowledge of the internet stored in it's weights to answer various `closed book questions` \n","\n","You can answer `closed book question` in 1 line of code, leveraging the latest NLU release and Google's T5. \n","You need to pass one string to NLU, which starts which a `question` and is followed by a `context:` tag and then the actual context contents. \n","All it takes is : \n","\n","\n","```python\n","nlu.load('en.t5').predict('Who is president of Nigeria?')\n",">>> Muhammadu Buhari \n","```\n","\n","\n","```python\n","nlu.load('en.t5').predict('What is the most spoken language in India?')\n",">>> Hindi\n","```\n","\n","\n","```python\n","nlu.load('en.t5').predict('What is the capital of Germany?')\n",">>> Berlin\n","```\n","\n"]},{"cell_type":"code","metadata":{"id":"s6p3BcAQYeBl","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1619910199312,"user_tz":-120,"elapsed":137566,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"5dd51aeb-8121-41a0-85d5-4423a68fe1a0"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n"," \n","\n","import nlu"],"execution_count":null,"outputs":[{"output_type":"stream","text":["--2021-05-01 23:01:02-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1671 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","\r- 0%[ ] 0 --.-KB/s Installing NLU 3.0.0 with PySpark 3.0.2 and Spark NLP 3.0.1 for Google Colab ...\n","\r- 100%[===================>] 1.63K --.-KB/s in 0.001s \n","\n","2021-05-01 23:01:02 (1.66 MB/s) - written to stdout [1671/1671]\n","\n","\u001b[K |████████████████████████████████| 204.8MB 73kB/s \n","\u001b[K |████████████████████████████████| 153kB 45.6MB/s \n","\u001b[K |████████████████████████████████| 204kB 20.4MB/s \n","\u001b[K |████████████████████████████████| 204kB 50.9MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"CqI-ovPLjzH7"},"source":["# Closed book question answering example"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"FYZQHT4FYjlQ","executionInfo":{"status":"ok","timestamp":1619910251528,"user_tz":-120,"elapsed":189765,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"eac7625a-7a1a-4dbf-b683-e1324d4f0804"},"source":["t5_closed_book = nlu.load('en.t5')"],"execution_count":null,"outputs":[{"output_type":"stream","text":["google_t5_small_ssm_nq download started this may take some time.\n","Approximate size to download 139 MB\n","[OK!]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"uHK91QxwYn6y","executionInfo":{"status":"ok","timestamp":1619910268144,"user_tz":-120,"elapsed":206373,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"d9864b9e-e9e6-4187-e9d3-d44a6f301263"},"source":["t5_closed_book.predict('What is the capital of Germany?')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0What is the capital of Germany?[Berlin]
\n","
"],"text/plain":[" document t5\n","0 What is the capital of Germany? [Berlin]"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"4IugHdKcZMTW","executionInfo":{"status":"ok","timestamp":1619910269805,"user_tz":-120,"elapsed":208028,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"882d4cb6-f9e1-4ac5-9f01-90432486504d"},"source":["t5_closed_book.predict('Who is president of Nigeria?')"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0Who is president of Nigeria?[Muhammadu Buhari]
\n","
"],"text/plain":[" document t5\n","0 Who is president of Nigeria? [Muhammadu Buhari]"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"dZfMDsyXqvZ0","executionInfo":{"status":"ok","timestamp":1619910270728,"user_tz":-120,"elapsed":208945,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"f1b6ddcb-ee63-46bf-e81b-44e6f63a73b2"},"source":["t5_closed_book.predict('What is the most spoken language in India?')\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0What is the most spoken language in India?[Hindi]
\n","
"],"text/plain":[" document t5\n","0 What is the most spoken language in India? [Hindi]"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"3Bu-Beo7ZNps"},"source":["# Open Book question examples\n","\n","**Your context must be prefixed with `context:`**\n"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"886fxf0iZO5A","executionInfo":{"status":"ok","timestamp":1619910324424,"user_tz":-120,"elapsed":262635,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"10c17c0d-9622-4f73-ed5c-2a38c5c2f6ed"},"source":["t5_open_book = nlu.load('answer_question')\n"],"execution_count":null,"outputs":[{"output_type":"stream","text":["t5_base download started this may take some time.\n","Approximate size to download 446 MB\n","[OK!]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"5koR8GOOZqUN","executionInfo":{"status":"ok","timestamp":1619910341165,"user_tz":-120,"elapsed":279370,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"3189115a-0221-4e31-dfc3-992c51558b85"},"source":["t5_open_book.predict(\"\"\"Where did Jebe die?\n","context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand\"\"\" )"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0Where did Jebe die? context: Ghenkis Khan reca...[Samarkand]
\n","
"],"text/plain":[" document t5\n","0 Where did Jebe die? context: Ghenkis Khan reca... [Samarkand]"]},"metadata":{"tags":[]},"execution_count":7}]},{"cell_type":"markdown","metadata":{"id":"PM2NCOTnjSY8"},"source":["## Open Book question example on a Story"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":111},"id":"b8LZV-DjaejR","executionInfo":{"status":"ok","timestamp":1619910358092,"user_tz":-120,"elapsed":296292,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"fb234179-e448-4317-a129-1aa96add7f6b"},"source":["question1 = 'What does Jimmy like to eat for breakfast usually?'\n","question2 = 'Why was Jimmy suprised?'\n","\n","story = \"\"\" context:\n","Once upon a time, there was a squirrel named Joey.\n","Joey loved to go outside and play with his cousin Jimmy.\n","Joey and Jimmy played silly games together, and were always laughing.\n","One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond.\n","Joey woke up early in the morning to eat some food before they left.\n","He couldn’t find anything to eat except for pie!\n","Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast.\n","After he ate, he and Jimmy went to the pond.\n","On their way there they saw their friend Jack Rabbit.\n","They dove into the water and swam for several hours.\n","The sun was out, but the breeze was cold.\n","Joey and Jimmy got out of the water and started walking home.\n","Their fur was wet, and the breeze chilled them.\n","When they got home, they dried off, and Jimmy put on his favorite purple shirt.\n","Joey put on a blue shirt with red and green dots.\n","The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed.\n"," \"\"\"\n","questions = [\n"," question1+ story,\n"," question2+ story,]\n","t5_open_book.predict(questions)\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0What does Jimmy like to eat for breakfast usua...[cereal, fruit (a pear), or oatmeal]
1Why was Jimmy suprised? context: Once upon a t...[He couldn’t find anything to eat except for pie]
\n","
"],"text/plain":[" document t5\n","0 What does Jimmy like to eat for breakfast usua... [cereal, fruit (a pear), or oatmeal]\n","1 Why was Jimmy suprised? context: Once upon a t... [He couldn’t find anything to eat except for pie]"]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"-OlZcBoGjPTS"},"source":["## Open book question example on news article"]},{"cell_type":"code","metadata":{"id":"JSSQz8jxa4Bg","colab":{"base_uri":"https://localhost:8080/","height":235},"executionInfo":{"status":"ok","timestamp":1619910367143,"user_tz":-120,"elapsed":305337,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"2dde8fcb-f877-4924-a964-cbda7aaa821d"},"source":["question1 = 'Who is Jack ma?'\n","question2 = 'Who is founder of Alibaba Group?'\n","question3 = 'When did Jack Ma re-appear?'\n","question4 = 'How did Alibaba stocks react?'\n","question5 = 'Whom did Jack Ma meet?'\n","question6 = 'Who did Jack Ma hide from?'\n","\n","\n","# from https://www.bbc.com/news/business-55728338 \n","news_article_snippet = \"\"\" context:\n","Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.\n","His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.\n","The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.\n","Alibaba shares surged 5% on Hong Kong's stock exchange on the news.\n","\"\"\"\n","\n","questions = [\n"," question1+ news_article_snippet,\n"," question2+ news_article_snippet,\n"," question3+ news_article_snippet,\n"," question4+ news_article_snippet,\n"," question5+ news_article_snippet,\n"," question6+ news_article_snippet,]\n","\n","\n","\n","t5_open_book.predict(questions)\n"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0Who is Jack ma? context: Alibaba Group founder...[Alibaba Group founder]
1Who is founder of Alibaba Group? context: Alib...[Jack Ma]
2When did Jack Ma re-appear? context: Alibaba G...[Wednesday]
3How did Alibaba stocks react? context: Alibaba...[surged 5%]
4Whom did Jack Ma meet? context: Alibaba Group ...[100 rural teachers]
5Who did Jack Ma hide from? context: Alibaba Gr...[Chinese regulators]
\n","
"],"text/plain":[" document t5\n","0 Who is Jack ma? context: Alibaba Group founder... [Alibaba Group founder]\n","1 Who is founder of Alibaba Group? context: Alib... [Jack Ma]\n","2 When did Jack Ma re-appear? context: Alibaba G... [Wednesday]\n","3 How did Alibaba stocks react? context: Alibaba... [surged 5%]\n","4 Whom did Jack Ma meet? context: Alibaba Group ... [100 rural teachers]\n","5 Who did Jack Ma hide from? context: Alibaba Gr... [Chinese regulators]"]},"metadata":{"tags":[]},"execution_count":9}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":80},"id":"vlpHM1m8ixDL","executionInfo":{"status":"ok","timestamp":1619910370125,"user_tz":-120,"elapsed":308315,"user":{"displayName":"Christian Kasim Loan","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GjqAD-ircKP-s5Eh6JSdkDggDczfqQbJGU_IRb4Hw=s64","userId":"14469489166467359317"}},"outputId":"43d918c9-d053-45a0-c80a-0d89d110eb2e"},"source":["\n","\n","# define Data, add additional context tag between sentence\n","question ='''\n","What does increased oxygen concentrations in the patient’s lungs displace? \n","context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n","'''\n","\n","\n","#Predict on text data with T5\n","t5_open_book.predict(question)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
documentt5
0What does increased oxygen concentrations in t...[carbon monoxide]
\n","
"],"text/plain":[" document t5\n","0 What does increased oxygen concentrations in t... [carbon monoxide]"]},"metadata":{"tags":[]},"execution_count":10}]},{"cell_type":"code","metadata":{"id":"-DRunqWhs6QN"},"source":[""],"execution_count":null,"outputs":[]}]}