{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "4.T5_tasks_summarize_question_answering_and_more.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "YcJLn3NGaKIH" }, "source": [ "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", "\n", "\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sequence2sequence/T5_tasks_summarize_question_answering_and_more.ipynb)\n", "\n", "# Overview of every task available with T5\n", "[The T5 model](https://arxiv.org/pdf/1910.10683.pdf) is trained on various datasets for 17 different tasks which fall into 8 categories.\n", "\n", "\n", "\n", "1. Text summarization\n", "2. Question answering\n", "3. Translation\n", "4. Sentiment analysis\n", "5. Natural Language inference\n", "6. Coreference resolution\n", "7. Sentence Completion\n", "8. Word sense disambiguation\n", "\n", "# Every T5 Task with explanation:\n", "|Task Name | Explanation | \n", "|----------|--------------|\n", "|[1.CoLA](https://nyu-mll.github.io/CoLA/) | Classify if a sentence is gramaticaly correct|\n", "|[2.RTE](https://dl.acm.org/doi/10.1007/11736790_9) | Classify whether if a statement can be deducted from a sentence|\n", "|[3.MNLI](https://arxiv.org/abs/1704.05426) | Classify for a hypothesis and premise whether they contradict or contradict each other or neither of both (3 class).|\n", "|[4.MRPC](https://www.aclweb.org/anthology/I05-5002.pdf) | Classify whether a pair of sentences is a re-phrasing of each other (semantically equivalent)|\n", "|[5.QNLI](https://arxiv.org/pdf/1804.07461.pdf) | Classify whether the answer to a question can be deducted from an answer candidate.|\n", "|[6.QQP](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) | Classify whether a pair of questions is a re-phrasing of each other (semantically equivalent)|\n", "|[7.SST2](https://www.aclweb.org/anthology/D13-1170.pdf) | Classify the sentiment of a sentence as positive or negative|\n", "|[8.STSB](https://www.aclweb.org/anthology/S17-2001/) | Classify the sentiment of a sentence on a scale from 1 to 5 (21 Sentiment classes)|\n", "|[9.CB](https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/601) | Classify for a premise and a hypothesis whether they contradict each other or not (binary).|\n", "|[10.COPA](https://www.aaai.org/ocs/index.php/SSS/SSS11/paper/view/2418/0) | Classify for a question, premise, and 2 choices which choice the correct choice is (binary).|\n", "|[11.MultiRc](https://www.aclweb.org/anthology/N18-1023.pdf) | Classify for a question, a paragraph of text, and an answer candidate, if the answer is correct (binary),|\n", "|[12.WiC](https://arxiv.org/abs/1808.09121) | Classify for a pair of sentences and a disambigous word if the word has the same meaning in both sentences.|\n", "|[13.WSC/DPR](https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/0) | Predict for an ambiguous pronoun in a sentence what it is referring to. |\n", "|[14.Summarization](https://arxiv.org/abs/1506.03340) | Summarize text into a shorter representation.|\n", "|[15.SQuAD](https://arxiv.org/abs/1606.05250) | Answer a question for a given context.|\n", "|[16.WMT1.](https://arxiv.org/abs/1706.03762) | Translate English to German|\n", "|[17.WMT2.](https://arxiv.org/abs/1706.03762) | Translate English to French|\n", "|[18.WMT3.](https://arxiv.org/abs/1706.03762) | Translate English to Romanian|\n", "\n", "\n", "# Information about pre-processing for T5 tasks\n", "\n", "## Tasks that require no pre-processing\n", "The following tasks work fine without any additional pre-processing, only setting the `task parameter` on the T5 model is required:\n", "\n", "- CoLA\n", "- Summarization\n", "- SST2\n", "- WMT1.\n", "- WMT2.\n", "- WMT3.\n", "\n", "\n", "## Tasks that require pre-processing with 1 tag\n", "The following tasks require `exactly 1 additional tag` added by manual pre-processing.\n", "Set the `task parameter` and then join the sentences on the `tag` for these tasks.\n", "\n", "- RTE\n", "- MNLI\n", "- MRPC\n", "- QNLI\n", "- QQP\n", "- SST2\n", "- STSB\n", "- CB\n", "\n", "\n", "## Tasks that require pre-processing with multiple tags\n", "The following tasks require `more than 1 additional tag` added manual by pre-processing.\n", "Set the `task parameter` and then prefix sentences with their corresponding tags and join them for these tasks:\n", "\n", "- COPA\n", "- MultiRc\n", "- WiC\n", "\n", "\n", "## WSC/DPR is a special case that requires `*` surrounding\n", "The task WSC/DPR requires highlighting a pronoun with `*` and configuring a `task parameter`.\n", "







\n", "\n", "\n", "\n", "\n", "\n", "The following sections describe each task in detail, with an example and also a pre-processed example.\n", "\n", "***NOTE:*** Linebreaks are added to the `pre-processed examples` in the following section. The T5 model also works with linebreaks, but it can hinder the performance and it is not recommended to intentionally add them.\n", "\n", "\n", "\n", "# Task 1 [CoLA - Binary Grammatical Sentence acceptability classification](https://nyu-mll.github.io/CoLA/)\n", "Judges if a sentence is grammatically acceptable. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "\n", "## Example\n", "\n", "|sentence | prediction|\n", "|------------|------------|\n", "| Anna and Mike is going skiing and they is liked is | unacceptable | \n", "| Anna and Mike like to dance | acceptable | \n", "\n", "\n", "## How to configure T5 task for CoLA\n", "`.setTask(cola sentence:)` prefix.\n", "\n", "### Example pre-processed input for T5 CoLA sentence acceptability judgement:\n", "```\n", "cola \n", "sentence: Anna and Mike is going skiing and they is liked is\n", "```\n", "\n", "# Task 2 [RTE - Natural language inference Deduction Classification](https://dl.acm.org/doi/10.1007/11736790_9)\n", "The RTE task is defined as recognizing, given two text fragments, whether the meaning of one text can be inferred (entailed) from the other or not. \n", "Classification of sentence pairs as entailed and not_entailed \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf) and [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "\n", "## Example\n", "\n", "|sentence 1 | sentence 2 | prediction|\n", "|------------|------------|----------|\n", "Kessler ’s team conducted 60,643 interviews with adults in 14 countries. | Kessler ’s team interviewed more than 60,000 adults in 14 countries | entailed\n", "Peter loves New York, it is his favorite city| Peter loves new York. | entailed\n", "Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. |Johnny is a millionare | entailment|\n", "Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. |Johnny is a poor man | not_entailment | \n", "| It was raining in England for the last 4 weeks | England was very dry yesterday | not_entailment|\n", "\n", "\n", "## How to configure T5 task for RTE\n", "`.setTask('rte sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "\n", "### Example pre-processed input for T5 RTE - 2 Class Natural language inference\n", "```\n", "rte \n", "sentence1: Recent report say Peter makes he alot of money, he earned 10 million USD each year for the last 5 years. \n", "sentence2: Peter is a millionare.\n", "```\n", "\n", "### References\n", "- https://arxiv.org/abs/2010.03061\n", "\n", "\n", "# Task 3 [MNLI - 3 Class Natural Language Inference 3-class contradiction classification](https://arxiv.org/abs/1704.05426)\n", "Classification of sentence pairs with the labels `entailment`, `contradiction`, and `neutral`. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "This classifier predicts for two sentences :\n", "- Whether the first sentence logically and semantically follows from the second sentence as entailment\n", "- Whether the first sentence is a contradiction to the second sentence as a contradiction\n", "- Whether the first sentence does not entail or contradict the first sentence as neutral\n", "\n", "| Hypothesis | Premise | prediction|\n", "|------------|------------|----------|\n", "| Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. | Johnny is a poor man. | contradiction|\n", "|It rained in England the last 4 weeks.| It was snowing in New York last week| neutral | \n", "\n", "## How to configure T5 task for MNLI\n", "`.setTask('mnli hypothesis:)` and prefix second sentence with `premise:`\n", "\n", "### Example pre-processed input for T5 MNLI - 3 Class Natural Language Inference\n", "\n", "```\n", "mnli \n", "hypothesis: At 8:34, the Boston Center controller received a third, transmission from American 11. \n", "premise: The Boston Center controller got a third transmission from American 11.\n", "```\n", "\n", "\n", "# Task 4 [MRPC - Binary Paraphrasing/ sentence similarity classification ](https://www.aclweb.org/anthology/I05-5002.pdf)\n", "Detect whether one sentence is a re-phrasing or similar to another sentence \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "| Sentence1 | Sentence2 | prediction|\n", "|------------|------------|----------|\n", "|We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .| Rather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" . | equivalent | \n", "| I like to eat peanutbutter for breakfast| I like to play football | not_equivalent | \n", "\n", "\n", "## How to configure T5 task for MRPC\n", "`.setTask('mrpc sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "### Example pre-processed input for T5 MRPC - Binary Paraphrasing/ sentence similarity\n", "\n", "```\n", "mrpc \n", "sentence1: We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said . \n", "sentence2: Rather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11\",\n", "```\n", "\n", "*ISSUE:* Can only get neutral and contradiction as prediction results for tested samples but no entailment predictions.\n", "\n", "\n", "# Task 5 [QNLI - Natural Language Inference question answered classification](https://arxiv.org/pdf/1804.07461.pdf)\n", "Classify whether a question is answered by a sentence (`entailed`). \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Question | Answer | prediction|\n", "|------------|------------|----------|\n", "|Where did Jebe die?| Ghenkis Khan recalled Subtai back to Mongolia soon afterward, and Jebe died on the road back to Samarkand | entailment|\n", "|What does Steve like to eat? | Steve watches TV all day | not_netailment\n", "\n", "## How to configure T5 task for QNLI - Natural Language Inference question answered classification\n", "`.setTask('QNLI sentence1:)` and prefix question with `question:` sentence with `sentence:`:\n", "\n", "### Example pre-processed input for T5 QNLI - Natural Language Inference question answered classification\n", "\n", "```\n", "qnli\n", "question: Where did Jebe die? \n", "sentence: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand,\n", "```\n", "\n", "\n", "# Task 6 [QQP - Binary Question Similarity/Paraphrasing](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs)\n", "Based on a quora dataset, determine whether a pair of questions are semantically equivalent. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Question1 | Question2 | prediction|\n", "|------------|------------|----------|\n", "|What attributes would have made you highly desirable in ancient Rome? | How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER? | not_duplicate | \n", "|What was it like in Ancient rome? | What was Ancient rome like?| duplicate | \n", "\n", "\n", "## How to configure T5 task for QQP\n", ".setTask('qqp question1:) and\n", "prefix second sentence with question2:\n", "\n", "\n", "### Example pre-processed input for T5 QQP - Binary Question Similarity/Paraphrasing\n", "\n", "```\n", "qqp \n", "question1: What attributes would have made you highly desirable in ancient Rome? \n", "question2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?',\n", "```\n", "\n", "# Task 7 [SST2 - Binary Sentiment Analysis](https://www.aclweb.org/anthology/D13-1170.pdf)\n", "Binary sentiment classification. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Sentence1 | Prediction | \n", "|-----------|-----------|\n", "|it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight | positive| \n", "|I really hated that movie | negative | \n", "\n", "\n", "## How to configure T5 task for SST2\n", "`.setTask('sst2 sentence: ')`\n", "\n", "### Example pre-processed input for T5 SST2 - Binary Sentiment Analysis\n", "\n", "```\n", "sst2\n", "sentence: I hated that movie\n", "```\n", "\n", "\n", "\n", "# Task8 [STSB - Regressive semantic sentence similarity](https://www.aclweb.org/anthology/S17-2001/)\n", "Measures how similar two sentences are on a scale from 0 to 5 with 21 classes representing a regressive label. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "| Question1 | Question2 | prediction|\n", "|------------|------------|----------|\n", "|What attributes would have made you highly desirable in ancient Rome? | How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER? | 0 | \n", "|What was it like in Ancient rome? | What was Ancient rome like?| 5.0 | \n", "|What was live like as a King in Ancient Rome?? | What is it like to live in Rome? | 3.2 | \n", "\n", "## How to configure T5 task for STSB\n", "`.setTask('stsb sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "\n", "### Example pre-processed input for T5 STSB - Regressive semantic sentence similarity\n", "\n", "```\n", "stsb\n", "sentence1: What attributes would have made you highly desirable in ancient Rome? \n", "sentence2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?',\n", "```\n", "\n", "\n", "# Task 9[ CB - Natural language inference contradiction classification](https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/601)\n", "Classify whether a Premise contradicts a Hypothesis. \n", "Predicts entailment, neutral and contradiction \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "| Hypothesis | Premise | Prediction | \n", "|--------|-------------|----------|\n", "|Valence was helping | Valence the void-brain, Valence the virtuous valet. Why couldn’t the figger choose his own portion of titanic anatomy to shaft? Did he think he was helping'| Contradiction|\n", "\n", "\n", "## How to configure T5 task for CB\n", "`.setTask('cb hypothesis:)` and prefix premise with `premise:`\n", "\n", "### Example pre-processed input for T5 CB - Natural language inference contradiction classification\n", "\n", "```\n", "cb \n", "hypothesis: Valence was helping \n", "premise: Valence the void-brain, Valence the virtuous valet. Why couldn’t the figger choose his own portion of titanic anatomy to shaft? Did he think he was helping,\n", "```\n", "\n", "\n", "# Task 10 [COPA - Sentence Completion/ Binary choice selection](https://www.aaai.org/ocs/index.php/SSS/SSS11/paper/view/2418/0)\n", "The Choice of Plausible Alternatives (COPA) task by Roemmele et al. (2011) evaluates\n", "causal reasoning between events, which requires commonsense knowledge about what usually takes\n", "place in the world. Each example provides a premise and either asks for the correct cause or effect\n", "from two choices, thus testing either ``backward`` or `forward causal reasoning`. COPA data, which\n", "consists of 1,000 examples total, can be downloaded at https://people.ict.usc.e\n", "\n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "This classifier selects from a choice of `2 options` which one the correct is based on a `premise`.\n", "\n", "\n", "## forward causal reasoning\n", "Premise: The man lost his balance on the ladder. \n", "question: What happened as a result? \n", "Alternative 1: He fell off the ladder. \n", "Alternative 2: He climbed up the ladder.\n", "## backwards causal reasoning\n", "Premise: The man fell unconscious. What was the cause\n", "of this? \n", "Alternative 1: The assailant struck the man in the head. \n", "Alternative 2: The assailant took the man’s wallet.\n", "\n", "\n", "| Question | Premise | Choice 1 | Choice 2 | Prediction | \n", "|--------|-------------|----------|---------|-------------|\n", "|effect | Politcal Violence broke out in the nation. | many citizens relocated to the capitol. | Many citizens took refuge in other territories | Choice 1 | \n", "|correct| The men fell unconscious | The assailant struckl the man in the head | he assailant s took the man's wallet. | choice1 | \n", "\n", "\n", "## How to configure T5 task for COPA\n", "`.setTask('copa choice1:)`, prefix choice2 with `choice2:` , prefix premise with `premise:` and prefix the question with `question`\n", "\n", "### Example pre-processed input for T5 COPA - Sentence Completion/ Binary choice selection\n", "\n", "```\n", "copa \n", "choice1: He fell off the ladder \n", "choice2: He climbed up the lader \n", "premise: The man lost his balance on the ladder \n", "question: effect\n", "```\n", "\n", "\n", "\n", "\n", "# Task 11 [MultiRc - Question Answering](https://www.aclweb.org/anthology/N18-1023.pdf)\n", "Evaluates an `answer` for a `question` as `true` or `false` based on an input `paragraph`\n", "The T5 model predicts for a `question` and a `paragraph` of `sentences` wether an `answer` is true or not,\n", "based on the semantic contents of the paragraph. \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "\n", "**Exeeds human performance by a large margin**\n", "\n", "\n", "\n", "| Question | Answer | Prediction | paragraph|\n", "|--------------------------------------------------------------|---------------------------------------------------------------------|------------|----------|\n", "| Why was Joey surprised the morning he woke up for breakfast? | There was only pie to eat, rather than traditional breakfast foods | True |Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. He couldn’t find anything to eat except for pie! Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed., |\n", "| Why was Joey surprised the morning he woke up for breakfast? | There was a T-Rex in his garden | False |Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. He couldn’t find anything to eat except for pie! Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed., |\n", "\n", "## How to configure T5 task for MultiRC\n", "`.setTask('multirc questions:)` followed by `answer:` prefix for the answer to evaluate, followed by `paragraph:` and then a series of sentences, where each sentence is prefixed with `Sent n:`prefix second sentence with sentence2:\n", "\n", "\n", "### Example pre-processed input for T5 MultiRc task:\n", "```\n", "multirc questions: Why was Joey surprised the morning he woke up for breakfast? \n", "answer: There was a T-REX in his garden. \n", "paragraph: \n", "Sent 1: Once upon a time, there was a squirrel named Joey. \n", "Sent 2: Joey loved to go outside and play with his cousin Jimmy. \n", "Sent 3: Joey and Jimmy played silly games together, and were always laughing. \n", "Sent 4: One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. \n", "Sent 5: Joey woke up early in the morning to eat some food before they left. \n", "Sent 6: He couldn’t find anything to eat except for pie! \n", "Sent 7: Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. \n", "Sent 8: After he ate, he and Jimmy went to the pond. \n", "Sent 9: On their way there they saw their friend Jack Rabbit. \n", "Sent 10: They dove into the water and swam for several hours. \n", "Sent 11: The sun was out, but the breeze was cold. \n", "Sent 12: Joey and Jimmy got out of the water and started walking home. \n", "Sent 13: Their fur was wet, and the breeze chilled them. \n", "Sent 14: When they got home, they dried off, and Jimmy put on his favorite purple shirt. \n", "Sent 15: Joey put on a blue shirt with red and green dots. \n", "Sent 16: The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed. \n", "```\n", "\n", "\n", "# Task 12 [WiC - Word sense disambiguation](https://arxiv.org/abs/1808.09121)\n", "Decide for `two sentence`s with a shared `disambigous word` wether they have the target word has the same `semantic meaning` in both sentences. \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "|Predicted | disambigous word| Sentence 1 | Sentence 2 | \n", "|----------|-----------------|------------|------------|\n", "| False | kill | He totally killed that rock show! | The airplane crash killed his family | \n", "| True | window | The expanded window will give us time to catch the thieves.|You have a two-hour window for turning in your homework. | \n", "| False | window | He jumped out of the window.|You have a two-hour window for turning in your homework. | \n", "\n", "\n", "## How to configure T5 task for MultiRC\n", "`.setTask('wic pos:)` followed by `sentence1:` prefix for the first sentence, followed by `sentence2:` prefix for the second sentence.\n", "\n", "\n", "### Example pre-processed input for T5 WiC task:\n", "\n", "```\n", "wic pos:\n", "sentence1: The expanded window will give us time to catch the thieves.\n", "sentence2: You have a two-hour window of turning in your homework.\n", "word : window\n", "```\n", "\n", "\n", "\n", "# Task 13 [WSC and DPR - Coreference resolution/ Pronoun ambiguity resolver ](https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/0)\n", "Predict for an `ambiguous pronoun` to which `noun` it is referring to. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf) and [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "|Prediction| Text | \n", "|----------|-------|\n", "| stable | The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy. | \n", "\n", "\n", "\n", "## How to configure T5 task for WSC/DPR\n", "`.setTask('wsc:)` and surround pronoun with asteriks symbols..\n", "\n", "\n", "### Example pre-processed input for T5 WSC/DPR task:\n", "The `ambiguous pronous` should be surrounded with `*` symbols.\n", "\n", "***Note*** Read [Appendix A.](https://arxiv.org/pdf/1910.10683.pdf#page=64&zoom=100,84,360) for more info\n", "```\n", "wsc: \n", "The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy.\n", "```\n", "\n", "\n", "# Task 14 [Text summarization](https://arxiv.org/abs/1506.03340)\n", "`Summarizes` a paragraph into a shorter version with the same semantic meaning.\n", "\n", "| Predicted summary| Text | \n", "|------------------|-------|\n", "| manchester united face newcastle in the premier league on wednesday . louis van gaal's side currently sit two points clear of liverpool in fourth . the belgian duo took to the dance floor on monday night with some friends . | the belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth . | \n", "\n", "\n", "## How to configure T5 task for summarization\n", "`.setTask('summarize:)`\n", "\n", "\n", "### Example pre-processed input for T5 summarization task:\n", "This task requires no pre-processing, setting the task to `summarize` is sufficient.\n", "```\n", "the belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .\n", "```\n", "\n", "# Task 15 [SQuAD - Context based question answering](https://arxiv.org/abs/1606.05250)\n", "Predict an `answer` to a `question` based on input `context`.\n", "\n", "|Predicted Answer | Question | Context | \n", "|-----------------|----------|------|\n", "|carbon monoxide| What does increased oxygen concentrations in the patient’s lungs displace? | Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n", "|pie| What did Joey eat for breakfast?| Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed,'| \n", "\n", "## How to configure T5 task parameter for Squad Context based question answering\n", "`.setTask('question:)` and prefix the context which can be made up of multiple sentences with `context:`\n", "\n", "## Example pre-processed input for T5 Squad Context based question answering:\n", "```\n", "question: What does increased oxygen concentrations in the patient’s lungs displace? \n", "context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n", "```\n", "\n", "\n", "\n", "# Task 16 [WMT1 Translate English to German](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for WMT Translate English to German\n", "`.setTask('translate English to German:)`\n", "\n", "# Task 17 [WMT2 Translate English to French](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for WMT Translate English to French\n", "`.setTask('translate English to French:)`\n", "\n", "\n", "# 18 [WMT3 - Translate English to Romanian](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for English to Romanian\n", "`.setTask('translate English to Romanian:)`\n" ] }, { "cell_type": "markdown", "metadata": { "id": "l73NeLo6r71r" }, "source": [ "# Spark-NLP Example for every Task:\n", "\n", "\n", "# 0.1 Install Spark NLP and NLU" ] }, { "cell_type": "code", "metadata": { "id": "sDSc7ZPkajRW" }, "source": [ "import os\n", "! apt-get update -qq > /dev/null \n", "# Install java\n", "! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null\n", "os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-8-openjdk-amd64\"\n", "os.environ[\"PATH\"] = os.environ[\"JAVA_HOME\"] + \"/bin:\" + os.environ[\"PATH\"]\n", "! pip install nlu pyspark==2.4.7\n", "import nlu\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "zRfTmN5Rp59D" }, "source": [ "## 0.2 Define Document assembler and T5 model for running the tasks" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3fJqtnZtaDpd", "outputId": "e0e4461c-6a5b-4f60-dc01-ba3347e5e801" }, "source": [ "t5 = nlu.load('en.t5.base')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "t5_base download started this may take some time.\n", "Approximate size to download 446 MB\n", "[OK!]\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "88eNPgivmCTl", "outputId": "e7900be9-68fc-4879-9b06-09f2e65d82de" }, "source": [ "t5.print_info()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :\n", ">>> pipe['t5'] has settable params:\n", "pipe['t5'].setMaxOutputLength(200) | Info: Set the maximum length of output text | Currently set to : 200\n", "pipe['t5'].setTask('base') | Info: Transformer's task, e.g. summarize> | Currently set to : base\n", ">>> pipe['sentence_detector'] has settable params:\n", "pipe['sentence_detector'].setUseAbbreviations(True) | Info: whether to apply abbreviations at sentence detection | Currently set to : True\n", "pipe['sentence_detector'].setDetectLists(True) | Info: whether detect lists during sentence detection | Currently set to : True\n", "pipe['sentence_detector'].setUseCustomBoundsOnly(False) | Info: Only utilize custom bounds in sentence detection | Currently set to : False\n", "pipe['sentence_detector'].setCustomBounds([]) | Info: characters used to explicitly mark sentence bounds | Currently set to : []\n", "pipe['sentence_detector'].setExplodeSentences(False) | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False\n", "pipe['sentence_detector'].setMinLength(0) | Info: Set the minimum allowed length for each sentence. | Currently set to : 0\n", "pipe['sentence_detector'].setMaxLength(99999) | Info: Set the maximum allowed length for each sentence | Currently set to : 99999\n", ">>> pipe['default_tokenizer'] has settable params:\n", "pipe['default_tokenizer'].setTargetPattern('\\S+') | Info: pattern to grab from text as token candidates. Defaults \\S+ | Currently set to : \\S+\n", "pipe['default_tokenizer'].setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]) | Info: character list used to separate from token boundaries | Currently set to : ['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '\"', \"'\"]\n", "pipe['default_tokenizer'].setCaseSensitiveExceptions(True) | Info: Whether to care for case sensitiveness in exceptions | Currently set to : True\n", "pipe['default_tokenizer'].setMinLength(0) | Info: Set the minimum allowed legth for each token | Currently set to : 0\n", "pipe['default_tokenizer'].setMaxLength(99999) | Info: Set the maximum allowed legth for each token | Currently set to : 99999\n", ">>> pipe['document_assembler'] has settable params:\n", "pipe['document_assembler'].setCleanupMode('shrink') | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "KBKk3WWFabhn" }, "source": [ "\n", "# Task 1 [CoLA - Binary Grammatical Sentence acceptability classification](https://nyu-mll.github.io/CoLA/)\n", "Judges if a sentence is grammatically acceptable. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "\n", "## Example\n", "\n", "|sentence | prediction|\n", "|------------|------------|\n", "| Anna and Mike is going skiing and they is liked is | unacceptable | \n", "| Anna and Mike like to dance | acceptable | \n", "\n", "## How to configure T5 task for CoLA\n", "`.setTask(cola sentence:)` prefix.\n", "\n", "### Example pre-processed input for T5 CoLA sentence acceptability judgement:\n", "```\n", "cola \n", "sentence: Anna and Mike is going skiing and they is liked is\n", "```" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "roTo1IGqaayd", "outputId": "d49e13eb-2115-4c6c-e1ac-14fb5940dc67" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('cola sentence: ') \n", "\n", "# define Data\n", "data = ['Anna and Mike is going skiing and they is liked is','Anna and Mike like to dance']\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0unacceptableAnna and Mike is going skiing and they is like...
1acceptableAnna and Mike like to dance
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 unacceptable Anna and Mike is going skiing and they is like...\n", "1 acceptable Anna and Mike like to dance" ] }, "metadata": { "tags": [] }, "execution_count": 27 } ] }, { "cell_type": "markdown", "metadata": { "id": "FeSXp9bVio-T" }, "source": [ "# Task 2 [RTE - Natural language inference Deduction Classification](https://dl.acm.org/doi/10.1007/11736790_9)\n", "The RTE task is defined as recognizing, given two text fragments, whether the meaning of one text can be inferred (entailed) from the other or not. \n", "Classification of sentence pairs as entailed and not_entailed \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf) and [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "\n", "## Example\n", "\n", "|sentence 1 | sentence 2 | prediction|\n", "|------------|------------|----------|\n", "Kessler ’s team conducted 60,643 interviews with adults in 14 countries. | Kessler ’s team interviewed more than 60,000 adults in 14 countries | entailed\n", "Peter loves New York, it is his favorite city| Peter loves new York. | entailed\n", "Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. |Johnny is a millionare | entailment|\n", "Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. |Johnny is a poor man | not_entailment | \n", "| It was raining in England for the last 4 weeks | England was very dry yesterday | not_entailment|\n", "\n", "## How to configure T5 task for RTE\n", "`.setTask('rte sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "\n", "### Example pre-processed input for T5 RTE - 2 Class Natural language inference\n", "```\n", "rte \n", "sentence1: Recent report say Peter makes he alot of money, he earned 10 million USD each year for the last 5 years. \n", "sentence2: Peter is a millionare.\n", "```\n", "\n", "### References\n", "- https://arxiv.org/abs/2010.03061\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "azVTmLRHf_tO", "outputId": "788df121-492b-419e-b2f3-80ae38667a73" }, "source": [ "# Set the task on T5\n", "\n", "t5['t5'].setTask('rte sentence: ') \n", "\n", "data = [\n", " 'Recent report say Peter makes he alot of money, he earned 10 million USD each year for the last 5 years. sentence2: Peter is a millionare',\n", " 'Recent report say Peter makes he alot of money, he earned 10 million USD each year for the last 5 years. sentence2: Peter is a poor man']\n", " \n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0entailmentRecent report say Peter makes he alot of money...
1not_entailmentRecent report say Peter makes he alot of money...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 entailment Recent report say Peter makes he alot of money...\n", "1 not_entailment Recent report say Peter makes he alot of money..." ] }, "metadata": { "tags": [] }, "execution_count": 28 } ] }, { "cell_type": "markdown", "metadata": { "id": "4SnQfLLIjZjG" }, "source": [ "\n", "# Task 3 [MNLI - 3 Class Natural Language Inference 3-class contradiction classification](https://arxiv.org/abs/1704.05426)\n", "Classification of sentence pairs with the labels `entailment`, `contradiction`, and `neutral`. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "This classifier predicts for two sentences :\n", "- Whether the first sentence logically and semantically follows from the second sentence as entailment\n", "- Whether the first sentence is a contradiction to the second sentence as a contradiction\n", "- Whether the first sentence does not entail or contradict the first sentence as neutral\n", "\n", "| Hypothesis | Premise | prediction|\n", "|------------|------------|----------|\n", "| Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years. | Johnny is a poor man. | contradiction|\n", "|It rained in England the last 4 weeks.| It was snowing in New York last week| neutral | \n", "\n", "## How to configure T5 task for MNLI\n", "`.setTask('mnli hypothesis:)` and prefix second sentence with `premise:`\n", "\n", "### Example pre-processed input for T5 MNLI - 3 Class Natural Language Inference\n", "\n", "```\n", "mnli \n", "hypothesis: At 8:34, the Boston Center controller received a third, transmission from American 11. \n", "premise: The Boston Center controller got a third transmission from American 11.\n", "```" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "PP_-VKIojMUh", "outputId": "bf53e2b6-5b2a-48df-de52-ecc60ca9f0df" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('mnli ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " \n", " ''' hypothesis: At 8:34, the Boston Center controller received a third, transmission from American 11.\n", " premise: The Boston Center controller got a third transmission from American 11.\n", " '''\n", " ,\n", " ''' \n", " hypothesis: Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years.\n", " premise: Johnny is a poor man.\n", " '''\n", "\n", " ]\n", "# Set the task on T5\n", "\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0neutralhypothesis: At 8:34, the Boston Center control...
1contradictionhypothesis: Recent report say Johnny makes he ...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 neutral hypothesis: At 8:34, the Boston Center control...\n", "1 contradiction hypothesis: Recent report say Johnny makes he ..." ] }, "metadata": { "tags": [] }, "execution_count": 29 } ] }, { "cell_type": "markdown", "metadata": { "id": "FDaoLg8Dkj8W" }, "source": [ "\n", "# Task 4 [MRPC - Binary Paraphrasing/ sentence similarity classification ](https://www.aclweb.org/anthology/I05-5002.pdf)\n", "Detect whether one sentence is a re-phrasing or similar to another sentence \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "| Sentence1 | Sentence2 | prediction|\n", "|------------|------------|----------|\n", "|We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .| Rather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" . | equivalent | \n", "| I like to eat peanutbutter for breakfast| I like to play football | not_equivalent | \n", "\n", "\n", "## How to configure T5 task for MRPC\n", "`.setTask('mrpc sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "### Example pre-processed input for T5 MRPC - Binary Paraphrasing/ sentence similarity\n", "\n", "```\n", "mrpc \n", "sentence1: We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said . \n", "sentence2: Rather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11\",\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "KmRnjdDBkFJf", "outputId": "9d570a83-1304-460f-e8bd-d1941027d78f" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('mrpc ') \n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " ''' sentence1: We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .\n", " sentence2: Rather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" \n", " '''\n", " ,\n", " ''' \n", " sentence1: I like to eat peanutbutter for breakfast\n", " sentence2: \tI like to play football.\n", " '''\n", " ]\n", "\n", "\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0equivalentsentence1: We acted because we saw the existin...
1not_equivalentsentence1: I like to eat peanutbutter for brea...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 equivalent sentence1: We acted because we saw the existin...\n", "1 not_equivalent sentence1: I like to eat peanutbutter for brea..." ] }, "metadata": { "tags": [] }, "execution_count": 30 } ] }, { "cell_type": "markdown", "metadata": { "id": "pHcRyNahk8x-" }, "source": [ "\n", "# Task 5 [QNLI - Natural Language Inference question answered classification](https://arxiv.org/pdf/1804.07461.pdf)\n", "Classify whether a question is answered by a sentence (`entailed`). \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Question | Answer | prediction|\n", "|------------|------------|----------|\n", "|Where did Jebe die?| Ghenkis Khan recalled Subtai back to Mongolia soon afterward, and Jebe died on the road back to Samarkand | entailment|\n", "|What does Steve like to eat? | Steve watches TV all day | not_netailment\n", "\n", "## How to configure T5 task for QNLI - Natural Language Inference question answered classification\n", "`.setTask('QNLI sentence1:)` and prefix question with `question:` sentence with `sentence:`:\n", "\n", "### Example pre-processed input for T5 QNLI - Natural Language Inference question answered classification\n", "\n", "```\n", "qnli\n", "question: Where did Jebe die? \n", "sentence: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand,\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "JGG9z8Vmk4zJ", "outputId": "8afb4051-27f4-4044-fe83-6e22abe16d30" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('QNLI ') \n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " ''' question: Where did Jebe die? \n", " sentence: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand,\n", " '''\n", " ,\n", " ''' \n", " question: What does Steve like to eat?\t\n", " sentence: \tSteve watches TV all day\n", " '''\n", "\n", " ]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0entailmentquestion: Where did Jebe die? sentence: Ghenki...
1not_entailmentquestion: What does Steve like to eat? sentenc...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 entailment question: Where did Jebe die? sentence: Ghenki...\n", "1 not_entailment question: What does Steve like to eat? sentenc..." ] }, "metadata": { "tags": [] }, "execution_count": 31 } ] }, { "cell_type": "markdown", "metadata": { "id": "lzBXz5calRkP" }, "source": [ "\n", "# Task 6 [QQP - Binary Question Similarity/Paraphrasing](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs)\n", "Based on a quora dataset, determine whether a pair of questions are semantically equivalent. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Question1 | Question2 | prediction|\n", "|------------|------------|----------|\n", "|What attributes would have made you highly desirable in ancient Rome? | How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER? | not_duplicate | \n", "|What was it like in Ancient rome? | What was Ancient rome like?| duplicate | \n", "\n", "\n", "## How to configure T5 task for QQP\n", ".setTask('qqp question1:) and\n", "prefix second sentence with question2:\n", "\n", "\n", "### Example pre-processed input for T5 QQP - Binary Question Similarity/Paraphrasing\n", "\n", "```\n", "qqp \n", "question1: What attributes would have made you highly desirable in ancient Rome? \n", "question2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?',\n", "```" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "2WWW-0X6lRJT", "outputId": "8fbc9654-07a0-4f1c-e34e-23c22ce0bf40" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('qqp ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " ''' question1: What attributes would have made you highly desirable in ancient Rome? \n", " question2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?'\n", " '''\n", " ,\n", " ''' \n", " question1: What was it like in Ancient rome?\n", " question2: \tWhat was Ancient rome like?\n", " '''\n", "\n", " ]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0not_duplicatequestion1: What attributes would have made you...
1duplicatequestion1: What was it like in Ancient rome? q...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 not_duplicate question1: What attributes would have made you...\n", "1 duplicate question1: What was it like in Ancient rome? q..." ] }, "metadata": { "tags": [] }, "execution_count": 32 } ] }, { "cell_type": "markdown", "metadata": { "id": "GsvuFTkjlm-N" }, "source": [ "\n", "# Task 7 [SST2 - Binary Sentiment Analysis](https://www.aclweb.org/anthology/D13-1170.pdf)\n", "Binary sentiment classification. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "| Sentence1 | Prediction | \n", "|-----------|-----------|\n", "|it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight | positive| \n", "|I really hated that movie | negative | \n", "\n", "\n", "## How to configure T5 task for SST2\n", "`.setTask('sst2 sentence: ')`\n", "\n", "### Example pre-processed input for T5 SST2 - Binary Sentiment Analysis\n", "\n", "```\n", "sst2\n", "sentence: I hated that movie\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "FPP4pMVQlOtz", "outputId": "9795d595-7db1-441f-b7c4-718563c84ee2" }, "source": [ "# Set the task on T5\n", "\n", "t5['t5'].setTask('sst2 sentence: ') \n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " ''' I really hated that movie''',\n", " ''' it confirms fincher ’s status as a film maker who artfully bends technical know-how to the service of psychological insight'''\n", " ]\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0negativeI really hated that movie
1positiveit confirms fincher ’s status as a film maker ...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 negative I really hated that movie\n", "1 positive it confirms fincher ’s status as a film maker ..." ] }, "metadata": { "tags": [] }, "execution_count": 33 } ] }, { "cell_type": "markdown", "metadata": { "id": "dpZQ_H8fl4OV" }, "source": [ "\n", "# Task8 [STSB - Regressive semantic sentence similarity](https://www.aclweb.org/anthology/S17-2001/)\n", "Measures how similar two sentences are on a scale from 0 to 5 with 21 classes representing a regressive label. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf).\n", "\n", "\n", "| Question1 | Question2 | prediction|\n", "|------------|------------|----------|\n", "|What attributes would have made you highly desirable in ancient Rome? | How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER? | 0 | \n", "|What was it like in Ancient rome? | What was Ancient rome like?| 5.0 | \n", "|What was live like as a King in Ancient Rome?? | What is it like to live in Rome? | 3.2 | \n", "\n", "## How to configure T5 task for STSB\n", "`.setTask('stsb sentence1:)` and prefix second sentence with `sentence2:`\n", "\n", "\n", "### Example pre-processed input for T5 STSB - Regressive semantic sentence similarity\n", "\n", "```\n", "stsb\n", "sentence1: What attributes would have made you highly desirable in ancient Rome? \n", "sentence2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?',\n", "```" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 167 }, "id": "yPODqDWYl35B", "outputId": "4505d7a9-02bf-4e8b-b41f-7f71fece8bbe" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('stsb ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " \n", " ''' sentence1: What attributes would have made you highly desirable in ancient Rome? \n", " sentence2: How I GET OPPERTINUTY TO JOIN IT COMPANY AS A FRESHER?'\n", " '''\n", " ,\n", " ''' \n", " sentence1: What was it like in Ancient rome?\n", " sentence2: \tWhat was Ancient rome like?\n", " ''',\n", " ''' \n", " sentence1: What was live like as a King in Ancient Rome??\n", " sentence2: \tWhat was Ancient rome like?\n", " '''\n", "\n", " ]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0not_duplicatesentence1: What attributes would have made you...
1duplicatesentence1: What was it like in Ancient rome? s...
2not_duplicatesentence1: What was live like as a King in Anc...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 not_duplicate sentence1: What attributes would have made you...\n", "1 duplicate sentence1: What was it like in Ancient rome? s...\n", "2 not_duplicate sentence1: What was live like as a King in Anc..." ] }, "metadata": { "tags": [] }, "execution_count": 34 } ] }, { "cell_type": "markdown", "metadata": { "id": "2pGz7_qsmQUF" }, "source": [ "\n", "# Task 9[ CB - Natural language inference contradiction classification](https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/601)\n", "Classify whether a Premise contradicts a Hypothesis. \n", "Predicts entailment, neutral and contradiction \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "| Hypothesis | Premise | Prediction | \n", "|--------|-------------|----------|\n", "|Valence was helping | Valence the void-brain, Valence the virtuous valet. Why couldn’t the figger choose his own portion of titanic anatomy to shaft? Did he think he was helping'| Contradiction|\n", "\n", "\n", "## How to configure T5 task for CB\n", "`.setTask('cb hypothesis:)` and prefix premise with `premise:`\n", "\n", "### Example pre-processed input for T5 CB - Natural language inference contradiction classification\n", "\n", "```\n", "cb \n", "hypothesis: Valence was helping \n", "premise: Valence the void-brain, Valence the virtuous valet. Why couldn’t the figger choose his own portion of titanic anatomy to shaft? Did he think he was helping,\n", "```\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "XjGzx2v8l2lk", "outputId": "b81e12cc-e959-4ac4-a868-2878d14d314e" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('cb ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " '''\n", " hypothesis: Recent report say Johnny makes he alot of money, he earned 10 million USD each year for the last 5 years.\n", " premise: Johnny is a poor man.\n", " ''']\n", "\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0contradictionhypothesis: Recent report say Johnny makes he ...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 contradiction hypothesis: Recent report say Johnny makes he ..." ] }, "metadata": { "tags": [] }, "execution_count": 35 } ] }, { "cell_type": "markdown", "metadata": { "id": "Q8Zg4o7NnDy7" }, "source": [ "\n", "# Task 10 [COPA - Sentence Completion/ Binary choice selection](https://www.aaai.org/ocs/index.php/SSS/SSS11/paper/view/2418/0)\n", "The Choice of Plausible Alternatives (COPA) task by Roemmele et al. (2011) evaluates\n", "causal reasoning between events, which requires commonsense knowledge about what usually takes\n", "place in the world. Each example provides a premise and either asks for the correct cause or effect\n", "from two choices, thus testing either ``backward`` or `forward causal reasoning`. COPA data, which\n", "consists of 1,000 examples total, can be downloaded at https://people.ict.usc.e\n", "\n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "This classifier selects from a choice of `2 options` which one the correct is based on a `premise`.\n", "\n", "\n", "## forward causal reasoning\n", "Premise: The man lost his balance on the ladder. \n", "question: What happened as a result? \n", "Alternative 1: He fell off the ladder. \n", "Alternative 2: He climbed up the ladder.\n", "## backwards causal reasoning\n", "Premise: The man fell unconscious. What was the cause\n", "of this? \n", "Alternative 1: The assailant struck the man in the head. \n", "Alternative 2: The assailant took the man’s wallet.\n", "\n", "\n", "| Question | Premise | Choice 1 | Choice 2 | Prediction | \n", "|--------|-------------|----------|---------|-------------|\n", "|effect | Politcal Violence broke out in the nation. | many citizens relocated to the capitol. | Many citizens took refuge in other territories | Choice 1 | \n", "|correct| The men fell unconscious | The assailant struckl the man in the head | he assailant s took the man's wallet. | choice1 | \n", "\n", "\n", "## How to configure T5 task for COPA\n", "`.setTask('copa choice1:)`, prefix choice2 with `choice2:` , prefix premise with `premise:` and prefix the question with `question`\n", "\n", "### Example pre-processed input for T5 COPA - Sentence Completion/ Binary choice selection\n", "\n", "```\n", "copa \n", "choice1: He fell off the ladder \n", "choice2: He climbed up the lader \n", "premise: The man lost his balance on the ladder \n", "question: effect\n", "```\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "bTt90f0pmtxi", "outputId": "0ae0300b-815a-419a-ca3b-c44c79080831" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('copa ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " '''\n", " choice1: He fell off the ladder \n", " choice2: He climbed up the lader \n", " premise: The man lost his balance on the ladder \n", " question: effect\n", "\n", " ''']\n", "\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0choice1choice1: He fell off the ladder choice2: He cl...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 choice1 choice1: He fell off the ladder choice2: He cl..." ] }, "metadata": { "tags": [] }, "execution_count": 36 } ] }, { "cell_type": "markdown", "metadata": { "id": "MtTZAk02nR16" }, "source": [ "\n", "# Task 11 [MultiRc - Question Answering](https://www.aclweb.org/anthology/N18-1023.pdf)\n", "Evaluates an `answer` for a `question` as `true` or `false` based on an input `paragraph`\n", "The T5 model predicts for a `question` and a `paragraph` of `sentences` wether an `answer` is true or not,\n", "based on the semantic contents of the paragraph. \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "\n", "**Exeeds human performance by a large margin**\n", "\n", "\n", "\n", "| Question | Answer | Prediction | paragraph|\n", "|--------------------------------------------------------------|---------------------------------------------------------------------|------------|----------|\n", "| Why was Joey surprised the morning he woke up for breakfast? | There was only pie to eat, rather than traditional breakfast foods | True |Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. He couldn’t find anything to eat except for pie! Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed., |\n", "| Why was Joey surprised the morning he woke up for breakfast? | There was a T-Rex in his garden | False |Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. He couldn’t find anything to eat except for pie! Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed., |\n", "\n", "## How to configure T5 task for MultiRC\n", "`.setTask('multirc questions:)` followed by `answer:` prefix for the answer to evaluate, followed by `paragraph:` and then a series of sentences, where each sentence is prefixed with `Sent n:`prefix second sentence with sentence2:\n", "\n", "\n", "### Example pre-processed input for T5 MultiRc task:\n", "```\n", "multirc questions: Why was Joey surprised the morning he woke up for breakfast? \n", "answer: There was a T-REX in his garden. \n", "paragraph: \n", "Sent 1: Once upon a time, there was a squirrel named Joey. \n", "Sent 2: Joey loved to go outside and play with his cousin Jimmy. \n", "Sent 3: Joey and Jimmy played silly games together, and were always laughing. \n", "Sent 4: One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. \n", "Sent 5: Joey woke up early in the morning to eat some food before they left. \n", "Sent 6: He couldn’t find anything to eat except for pie! \n", "Sent 7: Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. \n", "Sent 8: After he ate, he and Jimmy went to the pond. \n", "Sent 9: On their way there they saw their friend Jack Rabbit. \n", "Sent 10: They dove into the water and swam for several hours. \n", "Sent 11: The sun was out, but the breeze was cold. \n", "Sent 12: Joey and Jimmy got out of the water and started walking home. \n", "Sent 13: Their fur was wet, and the breeze chilled them. \n", "Sent 14: When they got home, they dried off, and Jimmy put on his favorite purple shirt. \n", "Sent 15: Joey put on a blue shirt with red and green dots. \n", "Sent 16: The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed. \n", "```" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "U87YZ46LnNpx", "outputId": "e214fce2-79a6-49b8-f82b-ecff658c8dd8" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('multirc ') \n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " '''\n", "questions: Why was Joey surprised the morning he woke up for breakfast? \n", "answer: There was a T-REX in his garden. \n", "paragraph: \n", "Sent 1: Once upon a time, there was a squirrel named Joey. \n", "Sent 2: Joey loved to go outside and play with his cousin Jimmy. \n", "Sent 3: Joey and Jimmy played silly games together, and were always laughing. \n", "Sent 4: One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. \n", "Sent 5: Joey woke up early in the morning to eat some food before they left. \n", "Sent 6: He couldn’t find anything to eat except for pie! \n", "Sent 7: Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. \n", "Sent 8: After he ate, he and Jimmy went to the pond. \n", "Sent 9: On their way there they saw their friend Jack Rabbit. \n", "Sent 10: They dove into the water and swam for several hours. \n", "Sent 11: The sun was out, but the breeze was cold. \n", "Sent 12: Joey and Jimmy got out of the water and started walking home. \n", "Sent 13: Their fur was wet, and the breeze chilled them. \n", "Sent 14: When they got home, they dried off, and Jimmy put on his favorite purple shirt. \n", "Sent 15: Joey put on a blue shirt with red and green dots. \n", "Sent 16: The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed. \n", "\n", " ''',\n", " \n", " '''\n", "questions: Why was Joey surprised the morning he woke up for breakfast? \n", "answer: There was only pie for breakfast. \n", "paragraph: \n", "Sent 1: Once upon a time, there was a squirrel named Joey. \n", "Sent 2: Joey loved to go outside and play with his cousin Jimmy. \n", "Sent 3: Joey and Jimmy played silly games together, and were always laughing. \n", "Sent 4: One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. \n", "Sent 5: Joey woke up early in the morning to eat some food before they left. \n", "Sent 6: He couldn’t find anything to eat except for pie! \n", "Sent 7: Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. \n", "Sent 8: After he ate, he and Jimmy went to the pond. \n", "Sent 9: On their way there they saw their friend Jack Rabbit. \n", "Sent 10: They dove into the water and swam for several hours. \n", "Sent 11: The sun was out, but the breeze was cold. \n", "Sent 12: Joey and Jimmy got out of the water and started walking home. \n", "Sent 13: Their fur was wet, and the breeze chilled them. \n", "Sent 14: When they got home, they dried off, and Jimmy put on his favorite purple shirt. \n", "Sent 15: Joey put on a blue shirt with red and green dots. \n", "Sent 16: The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed. \n", "\n", " '''\n", " ]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0Falsequestions: Why was Joey surprised the morning ...
1Truequestions: Why was Joey surprised the morning ...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 False questions: Why was Joey surprised the morning ...\n", "1 True questions: Why was Joey surprised the morning ..." ] }, "metadata": { "tags": [] }, "execution_count": 37 } ] }, { "cell_type": "markdown", "metadata": { "id": "KevnNo0pnpaA" }, "source": [ "\n", "# Task 12 [WiC - Word sense disambiguation](https://arxiv.org/abs/1808.09121)\n", "Decide for `two sentence`s with a shared `disambigous word` wether they have the target word has the same `semantic meaning` in both sentences. \n", "This is a sub-task of [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "\n", "|Predicted | disambigous word| Sentence 1 | Sentence 2 | \n", "|----------|-----------------|------------|------------|\n", "| False | kill | He totally killed that rock show! | The airplane crash killed his family | \n", "| True | window | The expanded window will give us time to catch the thieves.|You have a two-hour window for turning in your homework. | \n", "| False | window | He jumped out of the window.|You have a two-hour window for turning in your homework. | \n", "\n", "\n", "## How to configure T5 task for MultiRC\n", "`.setTask('wic pos:)` followed by `sentence1:` prefix for the first sentence, followed by `sentence2:` prefix for the second sentence.\n", "\n", "\n", "### Example pre-processed input for T5 WiC task:\n", "\n", "```\n", "wic pos:\n", "sentence1: The expanded window will give us time to catch the thieves.\n", "sentence2: You have a two-hour window of turning in your homework.\n", "word : window\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "1KoYaT8cnboA", "outputId": "240a1e3e-6445-4bd4-bc54-1d29efee3b34" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('wic ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " '''\n", "pos:\n", "sentence1: The expanded window will give us time to catch the thieves.\n", "sentence2: You have a two-hour window of turning in your homework.\n", "word : window\n", "\n", " ''',]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0Truepos: sentence1: The expanded window will give ...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 True pos: sentence1: The expanded window will give ..." ] }, "metadata": { "tags": [] }, "execution_count": 38 } ] }, { "cell_type": "markdown", "metadata": { "id": "YvcjGf82n39w" }, "source": [ "\n", "# Task 13 [WSC and DPR - Coreference resolution/ Pronoun ambiguity resolver ](https://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/0)\n", "Predict for an `ambiguous pronoun` to which `noun` it is referring to. \n", "This is a sub-task of [GLUE](https://arxiv.org/pdf/1804.07461.pdf) and [SuperGLUE](https://w4ngatang.github.io/static/papers/superglue.pdf).\n", "\n", "|Prediction| Text | \n", "|----------|-------|\n", "| stable | The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy. | \n", "\n", "\n", "\n", "## How to configure T5 task for WSC/DPR\n", "`.setTask('wsc:)` and surround pronoun with asteriks symbols..\n", "\n", "\n", "### Example pre-processed input for T5 WSC/DPR task:\n", "The `ambiguous pronous` should be surrounded with `*` symbols.\n", "\n", "***Note*** Read [Appendix A.](https://arxiv.org/pdf/1910.10683.pdf#page=64&zoom=100,84,360) for more info\n", "```\n", "wsc: \n", "The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy.\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "g4ZLodm1nyGQ", "outputId": "129ce0b2-c0d4-40cc-e485-35b55b1732cf" }, "source": [ "# Does not work yet 100% correct\n", "# Set the task on T5\n", "t5['t5'].setTask('wsc ') \n", "\n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = ['''The stable was very roomy, with four good stalls; a large swinging window opened into the yard , which made *it* pleasant and airy.''']\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0wsc The stable was very roomy, with four good ...The stable was very roomy, with four good stal...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 wsc The stable was very roomy, with four good ... The stable was very roomy, with four good stal..." ] }, "metadata": { "tags": [] }, "execution_count": 39 } ] }, { "cell_type": "markdown", "metadata": { "id": "_MyugUXQoJd8" }, "source": [ "\n", "# Task 14 [Text summarization](https://arxiv.org/abs/1506.03340)\n", "`Summarizes` a paragraph into a shorter version with the same semantic meaning.\n", "\n", "| Predicted summary| Text | \n", "|------------------|-------|\n", "| manchester united face newcastle in the premier league on wednesday . louis van gaal's side currently sit two points clear of liverpool in fourth . the belgian duo took to the dance floor on monday night with some friends . | the belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth . | \n", "\n", "\n", "## How to configure T5 task for summarization\n", "`.setTask('summarize:)`\n", "\n", "\n", "### Example pre-processed input for T5 summarization task:\n", "This task requires no pre-processing, setting the task to `summarize` is sufficient.\n", "```\n", "the belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 137 }, "id": "hBM0VycaoAnB", "outputId": "f1309cb5-ed7b-4084-f722-bd38268798fb" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('summarize ') \n", "\n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = [\n", " '''\n", " The belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .\n", " ''',\n", " ''' Calculus, originally called infinitesimal calculus or \"the calculus of infinitesimals\", is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations. It has two major branches, differential calculus and integral calculus; the former concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of convergence of infinite sequences and infinite series to a well-defined limit.[1] Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.[2][3] Today, calculus has widespread uses in science, engineering, and economics.[4] In mathematics education, calculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus (plural calculi) is a Latin word, meaning originally \"small pebble\" (this meaning is kept in medicine – see Calculus (medicine)). Because such pebbles were used for calculation, the meaning of the word has evolved and today usually means a method of computation. It is therefore used for naming specific methods of calculation and related theories, such as propositional calculus, Ricci calculus, calculus of variations, lambda calculus, and process calculus.'''\n", " ]\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0manchester united face newcastle in the premie...The belgian duo took to the dance floor on mon...
1calculus, originally called infinitesimal calc...Calculus, originally called infinitesimal calc...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 manchester united face newcastle in the premie... The belgian duo took to the dance floor on mon...\n", "1 calculus, originally called infinitesimal calc... Calculus, originally called infinitesimal calc..." ] }, "metadata": { "tags": [] }, "execution_count": 40 } ] }, { "cell_type": "markdown", "metadata": { "id": "ZyxqNOO1obBv" }, "source": [ "\n", "# Task 15 [SQuAD - Context based question answering](https://arxiv.org/abs/1606.05250)\n", "Predict an `answer` to a `question` based on input `context`.\n", "\n", "|Predicted Answer | Question | Context | \n", "|-----------------|----------|------|\n", "|carbon monoxide| What does increased oxygen concentrations in the patient’s lungs displace? | Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n", "|pie| What did Joey eat for breakfast?| Once upon a time, there was a squirrel named Joey. Joey loved to go outside and play with his cousin Jimmy. Joey and Jimmy played silly games together, and were always laughing. One day, Joey and Jimmy went swimming together 50 at their Aunt Julie’s pond. Joey woke up early in the morning to eat some food before they left. Usually, Joey would eat cereal, fruit (a pear), or oatmeal for breakfast. After he ate, he and Jimmy went to the pond. On their way there they saw their friend Jack Rabbit. They dove into the water and swam for several hours. The sun was out, but the breeze was cold. Joey and Jimmy got out of the water and started walking home. Their fur was wet, and the breeze chilled them. When they got home, they dried off, and Jimmy put on his favorite purple shirt. Joey put on a blue shirt with red and green dots. The two squirrels ate some food that Joey’s mom, Jasmine, made and went off to bed,'| \n", "\n", "## How to configure T5 task parameter for Squad Context based question answering\n", "`.setTask('question:)` and prefix the context which can be made up of multiple sentences with `context:`\n", "\n", "## Example pre-processed input for T5 Squad Context based question answering:\n", "```\n", "question: What does increased oxygen concentrations in the patient’s lungs displace? \n", "context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n", "```\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "OiotszcWoY4r", "outputId": "0dc97618-85f4-43b0-b9c6-531a6a5fb854" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('question ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = ['''\n", "What does increased oxygen concentrations in the patient’s lungs displace? \n", "context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.\n", "''']\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0carbon monoxideWhat does increased oxygen concentrations in t...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 carbon monoxide What does increased oxygen concentrations in t..." ] }, "metadata": { "tags": [] }, "execution_count": 41 } ] }, { "cell_type": "code", "metadata": { "id": "ZRWahi-cop_w" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "EV4YUQdkoqXw" }, "source": [ "# Task 16 [WMT1 Translate English to German](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for WMT Translate English to German\n", "`.setTask('translate English to German:)`\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "2aEz3xiGombi", "outputId": "3820f376-944d-436b-dcff-05524230343a" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('translate English to German: ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "sentences = ['''I like sausage and Tea for breakfast with potatoes''']\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0Die heutige Variante des Oxygen-Stahls bietet ...What does increased oxygen concentrations in t...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 Die heutige Variante des Oxygen-Stahls bietet ... What does increased oxygen concentrations in t..." ] }, "metadata": { "tags": [] }, "execution_count": 42 } ] }, { "cell_type": "markdown", "metadata": { "id": "kZQe18dQo3CH" }, "source": [ "# Task 17 [WMT2 Translate English to French](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for WMT Translate English to French\n", "`.setTask('translate English to French:)`" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "SaSJkuK8o1sL", "outputId": "bf3dcf15-97c3-4e05-9104-5ca20e5bc763" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('translate English to French: ') \n", "\n", "\n", "# define Data, add additional tags between sentences\n", "data = ['''I like sausage and Tea for breakfast with potatoes''']\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0J'aime les saucisses et le thé au petit déjeun...I like sausage and Tea for breakfast with pota...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 J'aime les saucisses et le thé au petit déjeun... I like sausage and Tea for breakfast with pota..." ] }, "metadata": { "tags": [] }, "execution_count": 43 } ] }, { "cell_type": "markdown", "metadata": { "id": "_nozgTDFo7cK" }, "source": [ "# 18 [WMT3 - Translate English to Romanian](https://arxiv.org/abs/1706.03762)\n", "For translation tasks use the `marian` model\n", "## How to configure T5 task parameter for English to Romanian\n", "`.setTask('translate English to Romanian:)`" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 107 }, "id": "_879n2Ljo5tc", "outputId": "35226e44-3f8c-4b44-d78e-f49027946bfe" }, "source": [ "# Set the task on T5\n", "t5['t5'].setTask('translate English to Romanian: ') \n", "\n", "# define Data, add additional tags between sentences\n", "data = [ '''I like sausage and Tea for breakfast with potatoes''']\n", "\n", "\n", "#Predict on text data with T5\n", "t5.predict(data)" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
T5document
origin_index
0Mi-ar plăcea cârnaţi şi ceai la micul dejun cu...I like sausage and Tea for breakfast with pota...
\n", "
" ], "text/plain": [ " T5 document\n", "origin_index \n", "0 Mi-ar plăcea cârnaţi şi ceai la micul dejun cu... I like sausage and Tea for breakfast with pota..." ] }, "metadata": { "tags": [] }, "execution_count": 44 } ] } ] }