{
"cells": [
{
"cell_type": "markdown",
"id": "1684dd94",
"metadata": {
"papermill": {
"duration": 0.065982,
"end_time": "2022-04-07T12:57:39.624742",
"exception": false,
"start_time": "2022-04-07T12:57:39.558760",
"status": "completed"
},
"tags": []
},
"source": [
"We're going to replicate the benchmark in [A Named Entity Based Approach to Model Recipes](https://arxiv.org/abs/2004.12184), by Diwan, Batra, and Bagler using StanfordNLP, and check it using [seqeval](https://github.com/chakki-works/seqeval).\n",
"\n",
"Evaluating NER is surprisingly tricky, as [David Batista explains](https://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/), and I want to check that the results in the paper are the same as what seqeval gives, so I can compare it to other models.\n",
"\n",
"The authors share their data in an [associated git repository](https://github.com/cosylabiiit/recipe-knowledge-mining) and train a model using [Stanford NER](https://nlp.stanford.edu/software/CRF-NER.html), which is open source, so we have a chance of replicating the results."
]
},
{
"cell_type": "markdown",
"id": "e559cc76",
"metadata": {
"papermill": {
"duration": 0.059759,
"end_time": "2022-04-07T12:57:39.748864",
"exception": false,
"start_time": "2022-04-07T12:57:39.689105",
"status": "completed"
},
"tags": []
},
"source": [
"# Installing Stanford NLP\n",
"\n",
"We're going to install Stanford NLP which is a Java library.\n",
"To make things easier we will use [stanza](https://stanfordnlp.github.io/stanza/) which includes tools for [installing and invoking Stanford NLP](https://stanfordnlp.github.io/stanza/corenlp_client.html)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2873cc65",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:57:39.879337Z",
"iopub.status.busy": "2022-04-07T12:57:39.878337Z",
"iopub.status.idle": "2022-04-07T12:57:53.376009Z",
"shell.execute_reply": "2022-04-07T12:57:53.374964Z",
"shell.execute_reply.started": "2022-04-07T12:49:07.145062Z"
},
"papermill": {
"duration": 13.566942,
"end_time": "2022-04-07T12:57:53.376225",
"exception": false,
"start_time": "2022-04-07T12:57:39.809283",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting stanza\r\n",
" Downloading stanza-1.3.0-py3-none-any.whl (432 kB)\r\n",
" |████████████████████████████████| 432 kB 292 kB/s \r\n",
"\u001b[?25hRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from stanza) (2.26.0)\r\n",
"Requirement already satisfied: protobuf in /opt/conda/lib/python3.7/site-packages (from stanza) (3.19.4)\r\n",
"Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from stanza) (4.62.3)\r\n",
"Requirement already satisfied: torch>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from stanza) (1.9.1+cpu)\r\n",
"Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from stanza) (1.20.3)\r\n",
"Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from stanza) (1.16.0)\r\n",
"Requirement already satisfied: emoji in /opt/conda/lib/python3.7/site-packages (from stanza) (1.7.0)\r\n",
"Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch>=1.3.0->stanza) (4.1.1)\r\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2.0.9)\r\n",
"Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2021.10.8)\r\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (1.26.7)\r\n",
"Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (3.1)\r\n",
"Installing collected packages: stanza\r\n",
"Successfully installed stanza-1.3.0\r\n",
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n"
]
}
],
"source": [
" !pip install stanza"
]
},
{
"cell_type": "markdown",
"id": "f71bca3f",
"metadata": {
"papermill": {
"duration": 0.072074,
"end_time": "2022-04-07T12:57:53.522613",
"exception": false,
"start_time": "2022-04-07T12:57:53.450539",
"status": "completed"
},
"tags": []
},
"source": [
"We can specify where to install Core NLP, but we will us the default, which is either \"\\\\$CORE_NLP_HOME\", or \"\\\\$HOME/stanza_corenlp\". (Ideally we'd use stanza to get this, but I couldn't easy work out how.)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "85b13351",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:57:53.670403Z",
"iopub.status.busy": "2022-04-07T12:57:53.669467Z",
"iopub.status.idle": "2022-04-07T12:58:29.230320Z",
"shell.execute_reply": "2022-04-07T12:58:29.229643Z",
"shell.execute_reply.started": "2022-04-07T12:49:18.684182Z"
},
"papermill": {
"duration": 35.633674,
"end_time": "2022-04-07T12:58:29.230514",
"exception": false,
"start_time": "2022-04-07T12:57:53.596840",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b90172ce78ce4c519efa02f06f3c6835",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip: 0%| …"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import stanza\n",
"stanza.install_corenlp()"
]
},
{
"cell_type": "markdown",
"id": "4483cf45",
"metadata": {
"papermill": {
"duration": 0.06657,
"end_time": "2022-04-07T12:58:29.364624",
"exception": false,
"start_time": "2022-04-07T12:58:29.298054",
"status": "completed"
},
"tags": []
},
"source": [
"We'll need to invoke the Stanford Core NLP JAR that we just installed, so let's find it."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3b31ac2f",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:58:29.505152Z",
"iopub.status.busy": "2022-04-07T12:58:29.504095Z",
"iopub.status.idle": "2022-04-07T12:58:29.516285Z",
"shell.execute_reply": "2022-04-07T12:58:29.515710Z",
"shell.execute_reply.started": "2022-04-07T12:49:53.274276Z"
},
"papermill": {
"duration": 0.084307,
"end_time": "2022-04-07T12:58:29.516468",
"exception": false,
"start_time": "2022-04-07T12:58:29.432161",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'/root/stanza_corenlp/stanford-corenlp-4.4.0.jar'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"import re\n",
"from pathlib import Path\n",
"\n",
"\n",
"# Reimplement the logic to find the path where stanza_corenlp is installed.\n",
"core_nlp_path = os.getenv('CORENLP_HOME', str(Path.home() / 'stanza_corenlp'))\n",
"\n",
"# A heuristic to find the right jar file\n",
"classpath = [str(p) for p in Path(core_nlp_path).iterdir() if re.match(r\"stanford-corenlp-[0-9.]+\\.jar\", p.name)][0]\n",
"classpath"
]
},
{
"cell_type": "markdown",
"id": "98419a70",
"metadata": {
"papermill": {
"duration": 0.074162,
"end_time": "2022-04-07T12:58:29.661879",
"exception": false,
"start_time": "2022-04-07T12:58:29.587717",
"status": "completed"
},
"tags": []
},
"source": [
"Let's test the [basic usage](https://stanfordnlp.github.io/stanza/client_usage.html).\n",
"\n",
"There are currently models for 8 languages, and for some fairly complex tasks like coreference resolution."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5a8e1173",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:58:29.804047Z",
"iopub.status.busy": "2022-04-07T12:58:29.803014Z",
"iopub.status.idle": "2022-04-07T12:59:11.230446Z",
"shell.execute_reply": "2022-04-07T12:59:11.229515Z",
"shell.execute_reply.started": "2022-04-07T12:49:53.286134Z"
},
"papermill": {
"duration": 41.500822,
"end_time": "2022-04-07T12:59:11.230672",
"exception": false,
"start_time": "2022-04-07T12:58:29.729850",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---\n",
"[main] INFO CoreNLP - Server default properties:\n",
"\t\t\t(Note: unspecified annotator properties are English defaults)\n",
"\t\t\tannotators = tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n",
"\t\t\tinputFormat = text\n",
"\t\t\toutputFormat = serialized\n",
"\t\t\tprettyPrint = false\n",
"\t\t\tthreads = 5\n",
"[main] INFO CoreNLP - Threads: 5\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n",
"[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.1 sec].\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n",
"[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].\n",
"[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].\n",
"[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].\n",
"[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.\n",
"[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt\n",
"[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.\n",
"[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.\n",
"[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files\n",
"[main] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n",
"[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.8 sec].\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n",
"[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... Time elapsed: 2.1 sec\n",
"[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 20000 vectors, elapsed Time: 2.204 sec\n",
"[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [4.3 sec].\n",
"[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n",
"[main] INFO edu.stanford.nlp.coref.statistical.SimpleLinearClassifier - Loading coref model edu/stanford/nlp/models/coref/statistical/ranking_model.ser.gz ... done [0.9 sec].\n",
"[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency\n",
"[main] INFO CoreNLP - Starting server...\n",
"[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:9000\n",
"[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:36852] API call w/annotators tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n",
"[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"David Batista wrote a blog post on NER evaluation. Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. We will test his library against Stanford Core NLP. \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Thread-0] INFO CoreNLP - CoreNLP Server is shutting down.\n"
]
}
],
"source": [
"from stanza.server import CoreNLPClient\n",
"\n",
"text = \"David Batista wrote a blog post on NER evaluation. \" \\\n",
" \"Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. \" \\\n",
" \"We will test his library against Stanford Core NLP. \"\n",
"\n",
"with CoreNLPClient(\n",
" annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],\n",
" timeout=30000,\n",
" memory='6G') as client:\n",
" \n",
" ann = client.annotate(text)"
]
},
{
"cell_type": "markdown",
"id": "27657787",
"metadata": {
"papermill": {
"duration": 0.073188,
"end_time": "2022-04-07T12:59:11.379679",
"exception": false,
"start_time": "2022-04-07T12:59:11.306491",
"status": "completed"
},
"tags": []
},
"source": [
"We get 3 sentences out."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "81ca28fd",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:11.534658Z",
"iopub.status.busy": "2022-04-07T12:59:11.533863Z",
"iopub.status.idle": "2022-04-07T12:59:11.538234Z",
"shell.execute_reply": "2022-04-07T12:59:11.537672Z",
"shell.execute_reply.started": "2022-04-07T12:50:29.701187Z"
},
"papermill": {
"duration": 0.083411,
"end_time": "2022-04-07T12:59:11.538434",
"exception": false,
"start_time": "2022-04-07T12:59:11.455023",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"David Batista wrote a blog post on NER evaluation .\n",
"Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks , such as NER .\n",
"We will test his library against Stanford Core NLP .\n"
]
}
],
"source": [
"for sentence in ann.sentence:\n",
" print(\" \".join([token.word for token in sentence.token]))"
]
},
{
"cell_type": "markdown",
"id": "6a71d850",
"metadata": {
"papermill": {
"duration": 0.075015,
"end_time": "2022-04-07T12:59:11.688350",
"exception": false,
"start_time": "2022-04-07T12:59:11.613335",
"status": "completed"
},
"tags": []
},
"source": [
"It can even do clever things like coreference resolution; resolving that \"his library\" refers to \"Hiroki Nakayama's library\"."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "15c591e5",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:11.844039Z",
"iopub.status.busy": "2022-04-07T12:59:11.843369Z",
"iopub.status.idle": "2022-04-07T12:59:11.847528Z",
"shell.execute_reply": "2022-04-07T12:59:11.848178Z",
"shell.execute_reply.started": "2022-04-07T12:50:29.708506Z"
},
"papermill": {
"duration": 0.081987,
"end_time": "2022-04-07T12:59:11.848387",
"exception": false,
"start_time": "2022-04-07T12:59:11.766400",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['nakayama', 'his']\n"
]
}
],
"source": [
"for chain in ann.corefChain:\n",
" print([ann.mentionsForCoref[mention.mentionID].headString for mention in chain.mention])"
]
},
{
"cell_type": "markdown",
"id": "7b1942b0",
"metadata": {
"papermill": {
"duration": 0.074006,
"end_time": "2022-04-07T12:59:11.997611",
"exception": false,
"start_time": "2022-04-07T12:59:11.923605",
"status": "completed"
},
"tags": []
},
"source": [
"We can extract things such as lemmas, parts of speech and standard NER tags.\n",
"\n",
"But we want to train our own NER model to detect ingredients. First we will need to collect the data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "bb5b69e7",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:12.165040Z",
"iopub.status.busy": "2022-04-07T12:59:12.156660Z",
"iopub.status.idle": "2022-04-07T12:59:12.184016Z",
"shell.execute_reply": "2022-04-07T12:59:12.184541Z",
"shell.execute_reply.started": "2022-04-07T12:50:29.721655Z"
},
"papermill": {
"duration": 0.11083,
"end_time": "2022-04-07T12:59:12.184757",
"exception": false,
"start_time": "2022-04-07T12:59:12.073927",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
" 8 | \n",
" 9 | \n",
" 10 | \n",
" 11 | \n",
" 12 | \n",
" 13 | \n",
"
\n",
" \n",
" \n",
" \n",
" word | \n",
" Hiroki | \n",
" Nakayama | \n",
" wrote | \n",
" seqeval | \n",
" to | \n",
" evaluate | \n",
" sequential | \n",
" labelling | \n",
" tasks | \n",
" , | \n",
" such | \n",
" as | \n",
" NER | \n",
" . | \n",
"
\n",
" \n",
" lemma | \n",
" Hiroki | \n",
" Nakayama | \n",
" write | \n",
" seqeval | \n",
" to | \n",
" evaluate | \n",
" sequential | \n",
" labelling | \n",
" task | \n",
" , | \n",
" such | \n",
" as | \n",
" ner | \n",
" . | \n",
"
\n",
" \n",
" pos | \n",
" NNP | \n",
" NNP | \n",
" VBD | \n",
" NN | \n",
" TO | \n",
" VB | \n",
" JJ | \n",
" NN | \n",
" NNS | \n",
" , | \n",
" JJ | \n",
" IN | \n",
" NN | \n",
" . | \n",
"
\n",
" \n",
" ner | \n",
" PERSON | \n",
" PERSON | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3 4 5 6 7 \\\n",
"word Hiroki Nakayama wrote seqeval to evaluate sequential labelling \n",
"lemma Hiroki Nakayama write seqeval to evaluate sequential labelling \n",
"pos NNP NNP VBD NN TO VB JJ NN \n",
"ner PERSON PERSON O O O O O O \n",
"\n",
" 8 9 10 11 12 13 \n",
"word tasks , such as NER . \n",
"lemma task , such as ner . \n",
"pos NNS , JJ IN NN . \n",
"ner O O O O O O "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"tokens = ann.sentence[1].token\n",
"\n",
"pd.DataFrame({'word': [s.word for s in tokens],\n",
" 'lemma': [s.lemma for s in tokens],\n",
" 'pos': [s.pos for s in tokens],\n",
" 'ner': [s.ner for s in tokens]}).T"
]
},
{
"cell_type": "markdown",
"id": "01c281d3",
"metadata": {
"papermill": {
"duration": 0.074855,
"end_time": "2022-04-07T12:59:12.333648",
"exception": false,
"start_time": "2022-04-07T12:59:12.258793",
"status": "completed"
},
"tags": []
},
"source": [
"# Get Data\n",
"\n",
"Helpfully the authors provide the annotated ingredients data in the format for Stanford NER that we can download [from github](https://github.com/cosylabiiit/recipe-knowledge-mining).\n",
"\n",
"There are two sources of ingredients, `ar` is AllRecipes and `gk` is FOOD.com (formerly GeniusKitchen.com)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "04edd65b",
"metadata": {
"_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
"_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
"execution": {
"iopub.execute_input": "2022-04-07T12:59:12.498569Z",
"iopub.status.busy": "2022-04-07T12:59:12.497854Z",
"iopub.status.idle": "2022-04-07T12:59:14.808353Z",
"shell.execute_reply": "2022-04-07T12:59:14.807269Z",
"shell.execute_reply.started": "2022-04-07T12:50:29.755893Z"
},
"papermill": {
"duration": 2.400074,
"end_time": "2022-04-07T12:59:14.808574",
"exception": false,
"start_time": "2022-04-07T12:59:12.408500",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from urllib.request import urlretrieve\n",
"\n",
"data_sources = ['ar', 'gk']\n",
"data_splits = ['train', 'test']\n",
"\n",
"base_url = 'https://raw.githubusercontent.com/cosylabiiit/recipe-knowledge-mining/master/'\n",
"\n",
"def data_filename(source, split):\n",
" return f'{source}_{split}.tsv'\n",
"\n",
"for source in data_sources:\n",
" for split in data_splits:\n",
" name = data_filename(source, split)\n",
" urlretrieve(base_url + name, name)"
]
},
{
"cell_type": "markdown",
"id": "2c260e5f",
"metadata": {
"papermill": {
"duration": 0.073279,
"end_time": "2022-04-07T12:59:14.957042",
"exception": false,
"start_time": "2022-04-07T12:59:14.883763",
"status": "completed"
},
"tags": []
},
"source": [
"Each line of the file is either a single tab (separating different texts), or a token followed by a tab and then the entity type.\n",
"\n",
"So for example the first ingredient is `4 cloves garlic`, which is a quantity (4) followed by a unit (cloves) and a name (garlic)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "20fd23c4",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:15.124352Z",
"iopub.status.busy": "2022-04-07T12:59:15.122955Z",
"iopub.status.idle": "2022-04-07T12:59:15.897018Z",
"shell.execute_reply": "2022-04-07T12:59:15.897645Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.106713Z"
},
"papermill": {
"duration": 0.866332,
"end_time": "2022-04-07T12:59:15.897874",
"exception": false,
"start_time": "2022-04-07T12:59:15.031542",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"^I\r\n",
"4^IQUANTITY\r\n",
"cloves^IUNIT\r\n",
"garlic^INAME\r\n",
"^I\r\n",
"2^IQUANTITY\r\n",
"tablespoons^IUNIT\r\n",
"vegetable^INAME\r\n",
"oil^INAME\r\n",
",^IO\r\n"
]
}
],
"source": [
"!head {data_filename('ar', 'train')} | cat -t"
]
},
{
"cell_type": "markdown",
"id": "eb5b9b7d",
"metadata": {
"papermill": {
"duration": 0.077612,
"end_time": "2022-04-07T12:59:16.051180",
"exception": false,
"start_time": "2022-04-07T12:59:15.973568",
"status": "completed"
},
"tags": []
},
"source": [
"We can read this in to Python, converting it to a list of annotated sentences, which is just a sequence of token, label pairs."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e3682be8",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:16.206078Z",
"iopub.status.busy": "2022-04-07T12:59:16.204972Z",
"iopub.status.idle": "2022-04-07T12:59:16.214225Z",
"shell.execute_reply": "2022-04-07T12:59:16.214726Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.911852Z"
},
"papermill": {
"duration": 0.089243,
"end_time": "2022-04-07T12:59:16.214969",
"exception": false,
"start_time": "2022-04-07T12:59:16.125726",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from typing import List, Tuple, Generator\n",
"\n",
"Annotation = Tuple[str, str]\n",
"AnnotatedSentence = List[Annotation]\n",
"\n",
"def segment_texts(data: str) -> Generator[AnnotatedSentence, None, None]:\n",
" output = []\n",
" for line in data.split('\\n'):\n",
" if line.strip():\n",
" text, token = line.split('\\t')\n",
" output.append((text.strip(), token.strip()))\n",
" elif output:\n",
" yield output\n",
" output = []\n",
" \n",
"def segment_file(filename: str) -> List[AnnotatedSentence]:\n",
" with open(filename, 'rt') as f:\n",
" return list(segment_texts(f.read()))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "411d8e65",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:16.373155Z",
"iopub.status.busy": "2022-04-07T12:59:16.371969Z",
"iopub.status.idle": "2022-04-07T12:59:16.388214Z",
"shell.execute_reply": "2022-04-07T12:59:16.388796Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.921731Z"
},
"papermill": {
"duration": 0.0992,
"end_time": "2022-04-07T12:59:16.389053",
"exception": false,
"start_time": "2022-04-07T12:59:16.289853",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"ar_train = segment_file(data_filename('ar', 'train'))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "9f048122",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:16.545563Z",
"iopub.status.busy": "2022-04-07T12:59:16.544455Z",
"iopub.status.idle": "2022-04-07T12:59:16.551302Z",
"shell.execute_reply": "2022-04-07T12:59:16.551951Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.947480Z"
},
"papermill": {
"duration": 0.087288,
"end_time": "2022-04-07T12:59:16.552158",
"exception": false,
"start_time": "2022-04-07T12:59:16.464870",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[[('4', 'QUANTITY'), ('cloves', 'UNIT'), ('garlic', 'NAME')],\n",
" [('2', 'QUANTITY'),\n",
" ('tablespoons', 'UNIT'),\n",
" ('vegetable', 'NAME'),\n",
" ('oil', 'NAME'),\n",
" (',', 'O'),\n",
" ('divided', 'STATE')]]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ar_train[:2]"
]
},
{
"cell_type": "markdown",
"id": "d2dcacf1",
"metadata": {
"papermill": {
"duration": 0.07699,
"end_time": "2022-04-07T12:59:16.705265",
"exception": false,
"start_time": "2022-04-07T12:59:16.628275",
"status": "completed"
},
"tags": []
},
"source": [
"We can then calculate the number of sentences in the training set for a source."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "170a4c27",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:16.860723Z",
"iopub.status.busy": "2022-04-07T12:59:16.859821Z",
"iopub.status.idle": "2022-04-07T12:59:16.863539Z",
"shell.execute_reply": "2022-04-07T12:59:16.864041Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.954556Z"
},
"papermill": {
"duration": 0.084373,
"end_time": "2022-04-07T12:59:16.864216",
"exception": false,
"start_time": "2022-04-07T12:59:16.779843",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"1470"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(ar_train)"
]
},
{
"cell_type": "markdown",
"id": "76312b92",
"metadata": {
"papermill": {
"duration": 0.076091,
"end_time": "2022-04-07T12:59:17.014758",
"exception": false,
"start_time": "2022-04-07T12:59:16.938667",
"status": "completed"
},
"tags": []
},
"source": [
"We can use this to check the types of entities annotated, as in the paper (DF is Dried/Fresh)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1142681f",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:17.175245Z",
"iopub.status.busy": "2022-04-07T12:59:17.174209Z",
"iopub.status.idle": "2022-04-07T12:59:17.178876Z",
"shell.execute_reply": "2022-04-07T12:59:17.178340Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.968592Z"
},
"papermill": {
"duration": 0.089217,
"end_time": "2022-04-07T12:59:17.179088",
"exception": false,
"start_time": "2022-04-07T12:59:17.089871",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({'QUANTITY': 1583,\n",
" 'UNIT': 1338,\n",
" 'NAME': 2501,\n",
" 'O': 1662,\n",
" 'STATE': 879,\n",
" 'DF': 154,\n",
" 'SIZE': 64,\n",
" 'TEMP': 31})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"\n",
"tag_counts = Counter([annotation[1] for sentence in ar_train for annotation in sentence])\n",
"tag_counts"
]
},
{
"cell_type": "markdown",
"id": "677e4069",
"metadata": {
"papermill": {
"duration": 0.073955,
"end_time": "2022-04-07T12:59:17.327416",
"exception": false,
"start_time": "2022-04-07T12:59:17.253461",
"status": "completed"
},
"tags": []
},
"source": [
"# Train NER Model\n",
"\n",
"Now we want to train a Stanford NER model on the new annotations.\n",
"\n",
"First we have to configure it; but there's no information on the paper on how it's configured.\n",
"I've copied this template configuration out of the [FAQ](https://nlp.stanford.edu/software/crf-faq.html)\n",
"For more information on the parameters you can check the [NERFeatureFactory documentation](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html) or the [source](https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/ie/NERFeatureFactory.java)."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "59a6e2c0",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:17.491773Z",
"iopub.status.busy": "2022-04-07T12:59:17.490773Z",
"iopub.status.idle": "2022-04-07T12:59:17.497225Z",
"shell.execute_reply": "2022-04-07T12:59:17.497813Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.983052Z"
},
"papermill": {
"duration": 0.092886,
"end_time": "2022-04-07T12:59:17.498022",
"exception": false,
"start_time": "2022-04-07T12:59:17.405136",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"def ner_prop_str(train_files: List[str], test_files: List[str], output: str) -> str:\n",
" \"\"\"Returns configuration string to train NER model\"\"\"\n",
" train_file_str = ','.join(train_files)\n",
" test_file_str = ','.join(test_files)\n",
" return f\"\"\"\n",
"trainFileList = {train_file_str}\n",
"testFiles = {test_file_str}\n",
"serializeTo = {output}\n",
"map = word=0,answer=1\n",
"\n",
"useClassFeature=true\n",
"useWord=true\n",
"useNGrams=true\n",
"noMidNGrams=true\n",
"maxNGramLeng=6\n",
"usePrev=true\n",
"useNext=true\n",
"useSequences=true\n",
"usePrevSequences=true\n",
"maxLeft=1\n",
"useTypeSeqs=true\n",
"useTypeSeqs2=true\n",
"useTypeySequences=true\n",
"wordShape=chris2useLC\n",
"useDisjunctive=true\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"id": "207441dc",
"metadata": {
"papermill": {
"duration": 0.075391,
"end_time": "2022-04-07T12:59:17.650533",
"exception": false,
"start_time": "2022-04-07T12:59:17.575142",
"status": "completed"
},
"tags": []
},
"source": [
"This is expected to be a file, so let's write a helper that writes it to a file. (An alternative would be to pass these as arguments to the trainer)."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d9e7eb79",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:17.812064Z",
"iopub.status.busy": "2022-04-07T12:59:17.810962Z",
"iopub.status.idle": "2022-04-07T12:59:17.816510Z",
"shell.execute_reply": "2022-04-07T12:59:17.817089Z",
"shell.execute_reply.started": "2022-04-07T12:50:32.994122Z"
},
"papermill": {
"duration": 0.087414,
"end_time": "2022-04-07T12:59:17.817303",
"exception": false,
"start_time": "2022-04-07T12:59:17.729889",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"def write_ner_prop_file(ner_prop_file: str, train_files: List[str], test_files: List[str], output_file: str) -> None:\n",
" with open(ner_prop_file, 'wt') as f:\n",
" props = ner_prop_str(train_files, test_files, output_file)\n",
" f.write(props)"
]
},
{
"cell_type": "markdown",
"id": "daec3df3",
"metadata": {
"papermill": {
"duration": 0.075681,
"end_time": "2022-04-07T12:59:17.970722",
"exception": false,
"start_time": "2022-04-07T12:59:17.895041",
"status": "completed"
},
"tags": []
},
"source": [
"Stanza doesn't give an interface to train a CRF NER model using Stanford NLP, but we can invoke `edu.stanford.nlp.ie.crf.CRFClassifier` directly.\n",
"\n",
"Let's write a properties file and invoke Java to run the classifier.\n",
"It prints a lot of training information, and importantly a summary report at the end which we want to see."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "6d0cb59c",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:18.128663Z",
"iopub.status.busy": "2022-04-07T12:59:18.127603Z",
"iopub.status.idle": "2022-04-07T12:59:18.136205Z",
"shell.execute_reply": "2022-04-07T12:59:18.136742Z",
"shell.execute_reply.started": "2022-04-07T12:50:33.006937Z"
},
"papermill": {
"duration": 0.089125,
"end_time": "2022-04-07T12:59:18.136964",
"exception": false,
"start_time": "2022-04-07T12:59:18.047839",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"import subprocess\n",
"from typing import List\n",
"\n",
"def train_model(model_name, train_files: List[str], test_files: List[str], print_report=True, classpath=classpath) -> str:\n",
" \"\"\"Trains CRF NER Model using StanfordNLP\"\"\"\n",
" model_file = f'{model_name}.model.ser.gz'\n",
" ner_prop_filename = f'{model_name}.model.props'\n",
" write_ner_prop_file(ner_prop_filename, train_files, test_files, model_file)\n",
" \n",
" result = subprocess.run(\n",
" ['java',\n",
" '-Xmx2g',\n",
" '-cp', classpath,\n",
" 'edu.stanford.nlp.ie.crf.CRFClassifier',\n",
" '-prop', ner_prop_filename],\n",
" capture_output=True)\n",
" \n",
" # If there's an error with invocation better log the stacktrace\n",
" if result.returncode != 0:\n",
" print(result.stderr.decode('utf-8'))\n",
" result.check_returncode()\n",
" \n",
" if print_report:\n",
" print(*result.stderr.decode('utf-8').split('\\n')[-11:], sep='\\n')\n",
" \n",
" return model_file"
]
},
{
"cell_type": "markdown",
"id": "9936f352",
"metadata": {
"papermill": {
"duration": 0.074215,
"end_time": "2022-04-07T12:59:18.286972",
"exception": false,
"start_time": "2022-04-07T12:59:18.212757",
"status": "completed"
},
"tags": []
},
"source": [
"We can train models on each dataset separately, and all together.\n",
"For evaluation we'll use the corresponding test set.\n",
"\n",
"This only takes a few minutes."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "8232dd04",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T12:59:18.443591Z",
"iopub.status.busy": "2022-04-07T12:59:18.442776Z",
"iopub.status.idle": "2022-04-07T13:01:35.835973Z",
"shell.execute_reply": "2022-04-07T13:01:35.836649Z",
"shell.execute_reply.started": "2022-04-07T12:50:33.017883Z"
},
"papermill": {
"duration": 137.47581,
"end_time": "2022-04-07T13:01:35.836960",
"exception": false,
"start_time": "2022-04-07T12:59:18.361150",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ar\n",
"CRFClassifier tagged 2788 words in 483 documents at 7185.57 words per second.\n",
" Entity\tP\tR\tF1\tTP\tFP\tFN\n",
" DF\t1.0000\t0.9608\t0.9800\t49\t0\t2\n",
" NAME\t0.9297\t0.9279\t0.9288\t463\t35\t36\n",
" QUANTITY\t1.0000\t0.9962\t0.9981\t522\t0\t2\n",
" SIZE\t1.0000\t1.0000\t1.0000\t20\t0\t0\n",
" STATE\t0.9601\t0.9633\t0.9617\t289\t12\t11\n",
" TEMP\t0.8750\t0.7000\t0.7778\t7\t1\t3\n",
" UNIT\t0.9819\t0.9841\t0.9830\t434\t8\t7\n",
" Totals\t0.9696\t0.9669\t0.9682\t1784\t56\t61\n",
"\n",
"\n",
"gk\n",
"CRFClassifier tagged 9886 words in 1705 documents at 11727.16 words per second.\n",
" Entity\tP\tR\tF1\tTP\tFP\tFN\n",
" DF\t0.9718\t0.9517\t0.9617\t138\t4\t7\n",
" NAME\t0.9132\t0.9021\t0.9076\t1621\t154\t176\n",
" QUANTITY\t0.9882\t0.9870\t0.9876\t1598\t19\t21\n",
" SIZE\t0.9750\t0.9398\t0.9571\t78\t2\t5\n",
" STATE\t0.9255\t0.9503\t0.9377\t708\t57\t37\n",
" TEMP\t0.8125\t0.8125\t0.8125\t26\t6\t6\n",
" UNIT\t0.9810\t0.9721\t0.9766\t1291\t25\t37\n",
" Totals\t0.9534\t0.9497\t0.9516\t5460\t267\t289\n",
"\n",
"\n",
"ar_gk\n",
"CRFClassifier tagged 12674 words in 2188 documents at 11648.90 words per second.\n",
" Entity\tP\tR\tF1\tTP\tFP\tFN\n",
" DF\t0.9738\t0.9490\t0.9612\t186\t5\t10\n",
" NAME\t0.9136\t0.9077\t0.9106\t2084\t197\t212\n",
" QUANTITY\t0.9911\t0.9897\t0.9904\t2121\t19\t22\n",
" SIZE\t0.9798\t0.9417\t0.9604\t97\t2\t6\n",
" STATE\t0.9386\t0.9512\t0.9449\t994\t65\t51\n",
" TEMP\t0.8140\t0.8333\t0.8235\t35\t8\t7\n",
" UNIT\t0.9801\t0.9763\t0.9782\t1727\t35\t42\n",
" Totals\t0.9563\t0.9539\t0.9551\t7244\t331\t350\n",
"\n",
"\n",
"CPU times: user 276 ms, sys: 134 ms, total: 410 ms\n",
"Wall time: 2min 17s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"models = {}\n",
"for source in ['ar', 'gk', 'ar_gk']:\n",
" print(source)\n",
" train_files = [data_filename(s, 'train') for s in source.split('_')]\n",
" test_files = [data_filename(s, 'test') for s in source.split('_')]\n",
" models[source] = train_model(source, train_files, test_files)\n",
" print()"
]
},
{
"cell_type": "markdown",
"id": "59f6f0c9",
"metadata": {
"papermill": {
"duration": 0.077949,
"end_time": "2022-04-07T13:01:35.992806",
"exception": false,
"start_time": "2022-04-07T13:01:35.914857",
"status": "completed"
},
"tags": []
},
"source": [
"The summary report shows for each model and entity type:\n",
"\n",
"* True Positives (TP): The number of times that entity was predicted correctly\n",
"* False Positives (FP): The number of times that entity in the text but not predicted correctly\n",
"* False Negative (FN): The number of times that entity was not in the text and predicted\n",
"* Precision (P): Probability a predicted entity is correct, TP/(TP+FP)\n",
"* Recall (R): Probability a correct entity is predicted, TP/(TP+FN)\n",
"* F1 Score (F1): Harmonic mean of precision and recall, 2/(1/P + 1/R).\n",
"\n",
"We can compare the F1 Totals to the diagonal of Table IV in the paper\n",
"\n",
"* AllRecipes.com (ar): We get 0.9682, they report 0.9682\n",
"* FOOD.com (gk): We get 0.9516, they report 0.9519\n",
"* Both (ar_gk): We get 0.9551, they report 0.9611\n",
"\n",
"These are super close.\n",
"The furthest is `ar_gk` and in the repository they have a separate `ar_gk_train.tsv`; it would be interesting to check whether using it directly gives a closer result and why there is a difference."
]
},
{
"cell_type": "markdown",
"id": "54563997",
"metadata": {
"papermill": {
"duration": 0.0776,
"end_time": "2022-04-07T13:01:36.149703",
"exception": false,
"start_time": "2022-04-07T13:01:36.072103",
"status": "completed"
},
"tags": []
},
"source": [
"# Running the model in Python"
]
},
{
"cell_type": "markdown",
"id": "fb2def70",
"metadata": {
"papermill": {
"duration": 0.076927,
"end_time": "2022-04-07T13:01:36.304500",
"exception": false,
"start_time": "2022-04-07T13:01:36.227573",
"status": "completed"
},
"tags": []
},
"source": [
"We can now use these trained models in Python by invoking Stanford NLP with Stanza.\n",
"\n",
"First we'll load in the test data."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "0564ac40",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:36.466360Z",
"iopub.status.busy": "2022-04-07T13:01:36.465286Z",
"iopub.status.idle": "2022-04-07T13:01:36.487026Z",
"shell.execute_reply": "2022-04-07T13:01:36.487684Z",
"shell.execute_reply.started": "2022-04-07T12:52:33.237540Z"
},
"papermill": {
"duration": 0.105083,
"end_time": "2022-04-07T13:01:36.487888",
"exception": false,
"start_time": "2022-04-07T13:01:36.382805",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ar 483\n",
"gk 1705\n"
]
}
],
"source": [
"test_data = {}\n",
"\n",
"for source in data_sources:\n",
" test_data[source] = segment_file(data_filename(source, 'test'))\n",
" print(source, len(test_data[source]))"
]
},
{
"cell_type": "markdown",
"id": "3e03b03d",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-06T23:58:43.359429Z",
"iopub.status.busy": "2022-04-06T23:58:43.359055Z",
"iopub.status.idle": "2022-04-06T23:58:43.365707Z",
"shell.execute_reply": "2022-04-06T23:58:43.36474Z",
"shell.execute_reply.started": "2022-04-06T23:58:43.35939Z"
},
"papermill": {
"duration": 0.078031,
"end_time": "2022-04-07T13:01:36.643468",
"exception": false,
"start_time": "2022-04-07T13:01:36.565437",
"status": "completed"
},
"tags": []
},
"source": [
"We can call StanfordNLP with our custom model by passing the property `ner.model`.\n",
"\n",
"Our test data is already tokenized in a different way to StanfordNLP, so we'll add an option to the [Tokenizer](https://stanfordnlp.github.io/CoreNLP/tokenize.html) to use whitespace tokenization which is easy to invert.\n",
"\n",
"It takes a while to start up the server so we want to annotate a large number of texts at once."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "30730c9e",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:36.807245Z",
"iopub.status.busy": "2022-04-07T13:01:36.806406Z",
"iopub.status.idle": "2022-04-07T13:01:36.813188Z",
"shell.execute_reply": "2022-04-07T13:01:36.813766Z",
"shell.execute_reply.started": "2022-04-07T12:52:33.259904Z"
},
"papermill": {
"duration": 0.09251,
"end_time": "2022-04-07T13:01:36.813991",
"exception": false,
"start_time": "2022-04-07T13:01:36.721481",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from tqdm.notebook import tqdm\n",
"from stanza.server import CoreNLPClient\n",
"\n",
"def annotate_ner(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True):\n",
" properties = {\"ner.model\": ner_model_file, \"tokenize.whitespace\": tokenize_whitespace, \"ner.applyNumericClassifiers\": False}\n",
" \n",
" annotated = []\n",
" with CoreNLPClient(\n",
" annotators=['tokenize','ssplit','ner'],\n",
" properties=properties,\n",
" timeout=30000,\n",
" be_quiet=True,\n",
" memory='6G') as client:\n",
" \n",
" for text in tqdm(texts):\n",
" annotated.append(client.annotate(text))\n",
" return annotated"
]
},
{
"cell_type": "markdown",
"id": "392be996",
"metadata": {
"papermill": {
"duration": 0.077305,
"end_time": "2022-04-07T13:01:36.971129",
"exception": false,
"start_time": "2022-04-07T13:01:36.893824",
"status": "completed"
},
"tags": []
},
"source": [
"We can then get the annotations"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "b2b52eb9",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:37.136508Z",
"iopub.status.busy": "2022-04-07T13:01:37.135776Z",
"iopub.status.idle": "2022-04-07T13:01:51.580370Z",
"shell.execute_reply": "2022-04-07T13:01:51.579773Z",
"shell.execute_reply.started": "2022-04-07T12:52:33.268568Z"
},
"papermill": {
"duration": 14.527687,
"end_time": "2022-04-07T13:01:51.580543",
"exception": false,
"start_time": "2022-04-07T13:01:37.052856",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9c0da7f9ecde470a8ad513bd82b4e638",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/4 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"annotations = annotate_ner(models['ar'],\n",
" ['1 cup of frozen peas',\n",
" 'A dash of salt . Or to taste',\n",
" '12 slices pancetta -LRB- Italian unsmoked cured bacon -RRB-',\n",
" 'pumpkin sliced into 3 cm moons'])"
]
},
{
"cell_type": "markdown",
"id": "3eb072c0",
"metadata": {
"papermill": {
"duration": 0.077157,
"end_time": "2022-04-07T13:01:51.736018",
"exception": false,
"start_time": "2022-04-07T13:01:51.658861",
"status": "completed"
},
"tags": []
},
"source": [
"Note here that the word \"Italian\" has ner \"NATIONALITY\", which comes from another model (it wasn't in the training set!).\n",
"\n",
"We want to use the `coarseNER`."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "8f65e5d0",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:51.899603Z",
"iopub.status.busy": "2022-04-07T13:01:51.898653Z",
"iopub.status.idle": "2022-04-07T13:01:51.902411Z",
"shell.execute_reply": "2022-04-07T13:01:51.902921Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.398492Z"
},
"papermill": {
"duration": 0.088481,
"end_time": "2022-04-07T13:01:51.903104",
"exception": false,
"start_time": "2022-04-07T13:01:51.814623",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"word: \"Italian\"\n",
"pos: \"JJ\"\n",
"value: \"Italian\"\n",
"originalText: \"Italian\"\n",
"ner: \"NATIONALITY\"\n",
"lemma: \"italian\"\n",
"beginChar: 25\n",
"endChar: 32\n",
"tokenBeginIndex: 4\n",
"tokenEndIndex: 5\n",
"hasXmlContext: false\n",
"isNewline: false\n",
"coarseNER: \"O\"\n",
"fineGrainedNER: \"NATIONALITY\"\n",
"entityMentionIndex: 3\n",
"nerLabelProbs: \"O=0.870902471545891\""
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"annotations[2].sentence[0].token[4]"
]
},
{
"cell_type": "markdown",
"id": "d5113166",
"metadata": {
"papermill": {
"duration": 0.077775,
"end_time": "2022-04-07T13:01:52.059609",
"exception": false,
"start_time": "2022-04-07T13:01:51.981834",
"status": "completed"
},
"tags": []
},
"source": [
"When I didn't set `\"ner.applyNumericClassifiers\": False` this would come up as a `NUMBER`."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "fa0cfe7d",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:52.223603Z",
"iopub.status.busy": "2022-04-07T13:01:52.222675Z",
"iopub.status.idle": "2022-04-07T13:01:52.226917Z",
"shell.execute_reply": "2022-04-07T13:01:52.226383Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.408128Z"
},
"papermill": {
"duration": 0.089989,
"end_time": "2022-04-07T13:01:52.227089",
"exception": false,
"start_time": "2022-04-07T13:01:52.137100",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"word: \"3\"\n",
"pos: \"CD\"\n",
"value: \"3\"\n",
"originalText: \"3\"\n",
"ner: \"O\"\n",
"lemma: \"3\"\n",
"beginChar: 20\n",
"endChar: 21\n",
"tokenBeginIndex: 3\n",
"tokenEndIndex: 4\n",
"hasXmlContext: false\n",
"isNewline: false\n",
"coarseNER: \"O\"\n",
"fineGrainedNER: \"O\"\n",
"nerLabelProbs: \"O=0.8599887537555505\""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"annotations[3].sentence[0].token[3]"
]
},
{
"cell_type": "markdown",
"id": "1ffc242b",
"metadata": {
"papermill": {
"duration": 0.078642,
"end_time": "2022-04-07T13:01:52.389005",
"exception": false,
"start_time": "2022-04-07T13:01:52.310363",
"status": "completed"
},
"tags": []
},
"source": [
"We can then flatten the sentences and extract the NER tokens"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "16628086",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:52.593218Z",
"iopub.status.busy": "2022-04-07T13:01:52.587639Z",
"iopub.status.idle": "2022-04-07T13:01:52.596815Z",
"shell.execute_reply": "2022-04-07T13:01:52.597409Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.420198Z"
},
"papermill": {
"duration": 0.109423,
"end_time": "2022-04-07T13:01:52.597623",
"exception": false,
"start_time": "2022-04-07T13:01:52.488200",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from dataclasses import dataclass, asdict\n",
"\n",
"@dataclass\n",
"class NERData:\n",
" ner: List[str]\n",
" tokens: List[str]\n",
" \n",
" # Let's use Pandas to make it pretty in a notebook\n",
" def _repr_html_(self):\n",
" return pd.DataFrame(asdict(self)).T._repr_html_()\n",
"\n",
"def extract_ner_data(annotation) -> NERData:\n",
" tokens = [token for sentence in annotation.sentence for token in sentence.token]\n",
" return NERData(tokens=[t.word for t in tokens], ner=[t.coarseNER for t in tokens])"
]
},
{
"cell_type": "markdown",
"id": "41ef7ddd",
"metadata": {
"papermill": {
"duration": 0.079568,
"end_time": "2022-04-07T13:01:52.761448",
"exception": false,
"start_time": "2022-04-07T13:01:52.681880",
"status": "completed"
},
"tags": []
},
"source": [
"A relatively simple ingredient works well"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "9d4dd5cb",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:52.923565Z",
"iopub.status.busy": "2022-04-07T13:01:52.922824Z",
"iopub.status.idle": "2022-04-07T13:01:52.934377Z",
"shell.execute_reply": "2022-04-07T13:01:52.934830Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.431673Z"
},
"papermill": {
"duration": 0.094408,
"end_time": "2022-04-07T13:01:52.935033",
"exception": false,
"start_time": "2022-04-07T13:01:52.840625",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
"
\n",
" \n",
" \n",
" \n",
" ner | \n",
" QUANTITY | \n",
" UNIT | \n",
" O | \n",
" TEMP | \n",
" NAME | \n",
"
\n",
" \n",
" tokens | \n",
" 1 | \n",
" cup | \n",
" of | \n",
" frozen | \n",
" peas | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"NERData(ner=['QUANTITY', 'UNIT', 'O', 'TEMP', 'NAME'], tokens=['1', 'cup', 'of', 'frozen', 'peas'])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"extract_ner_data(annotations[0])"
]
},
{
"cell_type": "markdown",
"id": "f68d4ab6",
"metadata": {
"papermill": {
"duration": 0.080341,
"end_time": "2022-04-07T13:01:53.095739",
"exception": false,
"start_time": "2022-04-07T13:01:53.015398",
"status": "completed"
},
"tags": []
},
"source": [
"A more complex sentence does quite badly, perhaps because this kind of thing wasn't seen."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "84662f1c",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:53.259562Z",
"iopub.status.busy": "2022-04-07T13:01:53.258809Z",
"iopub.status.idle": "2022-04-07T13:01:53.269051Z",
"shell.execute_reply": "2022-04-07T13:01:53.269619Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.452987Z"
},
"papermill": {
"duration": 0.094219,
"end_time": "2022-04-07T13:01:53.269840",
"exception": false,
"start_time": "2022-04-07T13:01:53.175621",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
"
\n",
" \n",
" \n",
" \n",
" ner | \n",
" QUANTITY | \n",
" UNIT | \n",
" NAME | \n",
" NAME | \n",
" NAME | \n",
" NAME | \n",
" O | \n",
" O | \n",
"
\n",
" \n",
" tokens | \n",
" A | \n",
" dash | \n",
" of | \n",
" salt | \n",
" . | \n",
" Or | \n",
" to | \n",
" taste | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'NAME', 'NAME', 'NAME', 'O', 'O'], tokens=['A', 'dash', 'of', 'salt', '.', 'Or', 'to', 'taste'])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"extract_ner_data(annotations[1])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "e85ccce9",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:53.436990Z",
"iopub.status.busy": "2022-04-07T13:01:53.436294Z",
"iopub.status.idle": "2022-04-07T13:01:53.445951Z",
"shell.execute_reply": "2022-04-07T13:01:53.446566Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.468618Z"
},
"papermill": {
"duration": 0.094843,
"end_time": "2022-04-07T13:01:53.446783",
"exception": false,
"start_time": "2022-04-07T13:01:53.351940",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" 5 | \n",
" 6 | \n",
" 7 | \n",
" 8 | \n",
"
\n",
" \n",
" \n",
" \n",
" ner | \n",
" QUANTITY | \n",
" UNIT | \n",
" NAME | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
" O | \n",
"
\n",
" \n",
" tokens | \n",
" 12 | \n",
" slices | \n",
" pancetta | \n",
" -LRB- | \n",
" Italian | \n",
" unsmoked | \n",
" cured | \n",
" bacon | \n",
" -RRB- | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'O', 'O', 'O', 'O', 'O', 'O'], tokens=['12', 'slices', 'pancetta', '-LRB-', 'Italian', 'unsmoked', 'cured', 'bacon', '-RRB-'])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"extract_ner_data(annotations[2])"
]
},
{
"cell_type": "markdown",
"id": "d067bdb5",
"metadata": {
"papermill": {
"duration": 0.080591,
"end_time": "2022-04-07T13:01:53.610955",
"exception": false,
"start_time": "2022-04-07T13:01:53.530364",
"status": "completed"
},
"tags": []
},
"source": [
"We can chain these functions together to get from text to NER"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "7e1f8e27",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:53.777728Z",
"iopub.status.busy": "2022-04-07T13:01:53.777006Z",
"iopub.status.idle": "2022-04-07T13:01:53.782184Z",
"shell.execute_reply": "2022-04-07T13:01:53.782741Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.490584Z"
},
"papermill": {
"duration": 0.091584,
"end_time": "2022-04-07T13:01:53.782964",
"exception": false,
"start_time": "2022-04-07T13:01:53.691380",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"from typing import Dict\n",
"\n",
"def ner_extract(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True) -> List[Dict[str, List[str]]]:\n",
" annotations = annotate_ner(ner_model_file, texts, tokenize_whitespace)\n",
" return [extract_ner_data(ann) for ann in annotations]"
]
},
{
"cell_type": "markdown",
"id": "92d193f3",
"metadata": {
"papermill": {
"duration": 0.081116,
"end_time": "2022-04-07T13:01:53.944944",
"exception": false,
"start_time": "2022-04-07T13:01:53.863828",
"status": "completed"
},
"tags": []
},
"source": [
"And then for each model, and test data we can calculate the predictions."
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "2c32ef0a",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:01:54.111985Z",
"iopub.status.busy": "2022-04-07T13:01:54.111261Z",
"iopub.status.idle": "2022-04-07T13:06:33.038766Z",
"shell.execute_reply": "2022-04-07T13:06:33.039358Z",
"shell.execute_reply.started": "2022-04-07T12:52:46.497380Z"
},
"papermill": {
"duration": 279.012135,
"end_time": "2022-04-07T13:06:33.039789",
"exception": false,
"start_time": "2022-04-07T13:01:54.027654",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "26e07430d6f54a9bbd9acea509197ce1",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/483 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5ae58c17d7a2462dadf527338d943cea",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1705 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "39c71bb6e43e4330b2b149c25da98d1d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/483 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "652276ddba8640708df4285c1ddf5ff9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1705 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "87190b2caa46403a82d7cb69319b1262",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/483 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8ab8307799ba406e85852cb620015885",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1705 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"preds = {}\n",
"for model, modelfile in models.items():\n",
" preds[model] = {}\n",
" for test_source, token_data in test_data.items():\n",
" texts = [' '.join([x[0] for x in text]) for text in token_data]\n",
" preds[model][test_source] = ner_extract(modelfile, texts)"
]
},
{
"cell_type": "markdown",
"id": "5d2fdccc",
"metadata": {
"papermill": {
"duration": 0.090737,
"end_time": "2022-04-07T13:06:33.217549",
"exception": false,
"start_time": "2022-04-07T13:06:33.126812",
"status": "completed"
},
"tags": []
},
"source": [
"## Sanity checks\n",
"\n",
"Let's check the same tokens come through the model as were input"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "5fc5053e",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:33.412009Z",
"iopub.status.busy": "2022-04-07T13:06:33.410951Z",
"iopub.status.idle": "2022-04-07T13:06:33.414218Z",
"shell.execute_reply": "2022-04-07T13:06:33.413529Z",
"shell.execute_reply.started": "2022-04-07T12:56:31.467543Z"
},
"papermill": {
"duration": 0.109894,
"end_time": "2022-04-07T13:06:33.414392",
"exception": false,
"start_time": "2022-04-07T13:06:33.304498",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"for test_source, token_data in test_data.items():\n",
" tokens = [[x[0] for x in tokens] for tokens in token_data]\n",
" \n",
" for model in models:\n",
" model_preds = preds[model][test_source]\n",
" \n",
" model_tokens = [p.tokens for p in model_preds]\n",
" \n",
" if tokens != model_tokens:\n",
" raise ValueError(\"Tokenization issue in %s with model %s\" % (test_source, model))"
]
},
{
"cell_type": "markdown",
"id": "384d097b",
"metadata": {
"papermill": {
"duration": 0.086069,
"end_time": "2022-04-07T13:06:33.585161",
"exception": false,
"start_time": "2022-04-07T13:06:33.499092",
"status": "completed"
},
"tags": []
},
"source": [
"# Evaluating\n",
"\n",
"Now that we have predictions we can evaulate with [seqeval](https://github.com/chakki-works/seqeval)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "1ecad075",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:33.765716Z",
"iopub.status.busy": "2022-04-07T13:06:33.764994Z",
"iopub.status.idle": "2022-04-07T13:06:50.298649Z",
"shell.execute_reply": "2022-04-07T13:06:50.297970Z",
"shell.execute_reply.started": "2022-04-07T12:56:31.630845Z"
},
"papermill": {
"duration": 16.624488,
"end_time": "2022-04-07T13:06:50.298801",
"exception": false,
"start_time": "2022-04-07T13:06:33.674313",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting seqeval\r\n",
" Downloading seqeval-1.2.2.tar.gz (43 kB)\r\n",
" |████████████████████████████████| 43 kB 102 kB/s \r\n",
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l-\b \bdone\r\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.14.0 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.20.3)\r\n",
"Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.0.1)\r\n",
"Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.1.0)\r\n",
"Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (3.0.0)\r\n",
"Requirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.7.3)\r\n",
"Building wheels for collected packages: seqeval\r\n",
" Building wheel for seqeval (setup.py) ... \u001b[?25l-\b \b\\\b \b|\b \bdone\r\n",
"\u001b[?25h Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16181 sha256=117220ab957b2dfbf6fad8b7cf7fb429b409f1fb1b62fef7ea14d20e38b36203\r\n",
" Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7\r\n",
"Successfully built seqeval\r\n",
"Installing collected packages: seqeval\r\n",
"Successfully installed seqeval-1.2.2\r\n",
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n"
]
}
],
"source": [
"!pip install seqeval"
]
},
{
"cell_type": "markdown",
"id": "442daabb",
"metadata": {
"papermill": {
"duration": 0.088059,
"end_time": "2022-04-07T13:06:50.475096",
"exception": false,
"start_time": "2022-04-07T13:06:50.387037",
"status": "completed"
},
"tags": []
},
"source": [
"Seqeval expects the data to be in one of the following formats:\n",
"\n",
"* IOB1\n",
"* IOB2\n",
"* IOE1\n",
"* IOE2\n",
"* IOBES(only in strict mode)\n",
"* BILOU(only in strict mode)\n",
"\n",
"These all become important when trying to distinguish distinct entities that are adjacent; these are quite rare in practice.\n",
"See Wikipedia for a detailed explanation of [IOB (inside-outside-beginning)](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)).\n",
"\n",
"In this case it's assumed there's only one entity of each type (which can be wrong when multiple names are listing in a single ingredient).\n",
"We can easily convert it to IOB1 using this assumption by prefixing every tag other than 'O' with an 'I-'."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "816f82e2",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:50.659609Z",
"iopub.status.busy": "2022-04-07T13:06:50.658619Z",
"iopub.status.idle": "2022-04-07T13:06:50.661057Z",
"shell.execute_reply": "2022-04-07T13:06:50.660405Z",
"shell.execute_reply.started": "2022-04-07T12:56:44.754303Z"
},
"papermill": {
"duration": 0.098039,
"end_time": "2022-04-07T13:06:50.661208",
"exception": false,
"start_time": "2022-04-07T13:06:50.563169",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"def convert_to_iob1(tokens):\n",
" return ['I-' + label if label != 'O' else 'O' for label in tokens]\n",
"\n",
"assert convert_to_iob1(['QUANTITY', 'SIZE', 'NAME', 'NAME', 'O', 'STATE']) == ['I-QUANTITY', 'I-SIZE', 'I-NAME', 'I-NAME', 'O', 'I-STATE']"
]
},
{
"cell_type": "markdown",
"id": "2bd57cb3",
"metadata": {
"papermill": {
"duration": 0.088945,
"end_time": "2022-04-07T13:06:50.837958",
"exception": false,
"start_time": "2022-04-07T13:06:50.749013",
"status": "completed"
},
"tags": []
},
"source": [
"Let's check the classification report for a single example and compare it to the report from StanfordNER.\n",
"\n",
"The classification report doesn't have the TP, TN and FN, but instead has the support - the number of true entities in the data.\n",
"The set of data is equivalent:\n",
"\n",
"* support = TP + FN\n",
"* TP = R * support\n",
"* FP = TP (1/P - 1)\n",
"* FN = support - TP\n",
"\n",
"The results are the same."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "9bf3fab1",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:51.024498Z",
"iopub.status.busy": "2022-04-07T13:06:51.023501Z",
"iopub.status.idle": "2022-04-07T13:06:52.408718Z",
"shell.execute_reply": "2022-04-07T13:06:52.407592Z",
"shell.execute_reply.started": "2022-04-07T12:56:44.760756Z"
},
"papermill": {
"duration": 1.482137,
"end_time": "2022-04-07T13:06:52.408950",
"exception": false,
"start_time": "2022-04-07T13:06:50.926813",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" DF 1.0000 0.9608 0.9800 51\n",
" NAME 0.9297 0.9279 0.9288 499\n",
" QUANTITY 1.0000 0.9962 0.9981 524\n",
" SIZE 1.0000 1.0000 1.0000 20\n",
" STATE 0.9601 0.9633 0.9617 300\n",
" TEMP 0.8750 0.7000 0.7778 10\n",
" UNIT 0.9819 0.9841 0.9830 441\n",
"\n",
" micro avg 0.9696 0.9669 0.9682 1845\n",
" macro avg 0.9638 0.9332 0.9471 1845\n",
"weighted avg 0.9695 0.9669 0.9682 1845\n",
"\n"
]
}
],
"source": [
"from seqeval.metrics import classification_report\n",
"\n",
"test_source = 'ar'\n",
"model = 'ar'\n",
"\n",
"actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in test_data[test_source]]\n",
"pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n",
"\n",
"print(classification_report(actual_ner, pred_ner, digits=4))"
]
},
{
"cell_type": "markdown",
"id": "58e08983",
"metadata": {
"papermill": {
"duration": 0.08957,
"end_time": "2022-04-07T13:06:52.587948",
"exception": false,
"start_time": "2022-04-07T13:06:52.498378",
"status": "completed"
},
"tags": []
},
"source": [
"We can get the micro f1-score directly."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "c7f7bfd0",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:52.771575Z",
"iopub.status.busy": "2022-04-07T13:06:52.770572Z",
"iopub.status.idle": "2022-04-07T13:06:52.797700Z",
"shell.execute_reply": "2022-04-07T13:06:52.798273Z",
"shell.execute_reply.started": "2022-04-07T12:56:45.725087Z"
},
"papermill": {
"duration": 0.120808,
"end_time": "2022-04-07T13:06:52.798476",
"exception": false,
"start_time": "2022-04-07T13:06:52.677668",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'0.9682'"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from seqeval.metrics import f1_score\n",
"'%0.4f' % f1_score(actual_ner, pred_ner)"
]
},
{
"cell_type": "markdown",
"id": "0ca961cf",
"metadata": {
"papermill": {
"duration": 0.090745,
"end_time": "2022-04-07T13:06:52.978636",
"exception": false,
"start_time": "2022-04-07T13:06:52.887891",
"status": "completed"
},
"tags": []
},
"source": [
"We can then try to reproduce Table IV by computing the f1-score for each model and data."
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "97fe7bab",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:53.162034Z",
"iopub.status.busy": "2022-04-07T13:06:53.161354Z",
"iopub.status.idle": "2022-04-07T13:06:53.462721Z",
"shell.execute_reply": "2022-04-07T13:06:53.462124Z",
"shell.execute_reply.started": "2022-04-07T12:56:45.748363Z"
},
"papermill": {
"duration": 0.395147,
"end_time": "2022-04-07T13:06:53.462918",
"exception": false,
"start_time": "2022-04-07T13:06:53.067771",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"scores = {model: {} for model in models}\n",
"for test_source, data in test_data.items():\n",
" actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in data]\n",
" for model in models:\n",
" pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n",
" scores[model][test_source] = f1_score(actual_ner, pred_ner)"
]
},
{
"cell_type": "markdown",
"id": "02ace148",
"metadata": {
"papermill": {
"duration": 0.093906,
"end_time": "2022-04-07T13:06:53.649107",
"exception": false,
"start_time": "2022-04-07T13:06:53.555201",
"status": "completed"
},
"tags": []
},
"source": [
"We also need to calculate the scores on the combined test set, by contatenating them"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "d19d3eb8",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:53.834560Z",
"iopub.status.busy": "2022-04-07T13:06:53.833800Z",
"iopub.status.idle": "2022-04-07T13:06:54.131037Z",
"shell.execute_reply": "2022-04-07T13:06:54.131536Z",
"shell.execute_reply.started": "2022-04-07T12:56:45.944156Z"
},
"papermill": {
"duration": 0.392623,
"end_time": "2022-04-07T13:06:54.131761",
"exception": false,
"start_time": "2022-04-07T13:06:53.739138",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"actual_ner = [convert_to_iob1([x[1] for x in ann]) for data in test_data.values() for ann in data]\n",
"for model in models:\n",
" pred_ner = [convert_to_iob1(p.ner) for test_source in test_data for p in preds[model][test_source]]\n",
" scores[model]['combined'] = f1_score(actual_ner, pred_ner)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "be344047",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:54.322051Z",
"iopub.status.busy": "2022-04-07T13:06:54.321346Z",
"iopub.status.idle": "2022-04-07T13:06:54.399490Z",
"shell.execute_reply": "2022-04-07T13:06:54.398955Z",
"shell.execute_reply.started": "2022-04-07T12:56:46.135926Z"
},
"papermill": {
"duration": 0.177398,
"end_time": "2022-04-07T13:06:54.399653",
"exception": false,
"start_time": "2022-04-07T13:06:54.222255",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" | \n",
" ar | \n",
" gk | \n",
" ar_gk | \n",
"
\n",
" \n",
" \n",
" \n",
" ar | \n",
" 0.9682 | \n",
" 0.9331 | \n",
" 0.9704 | \n",
"
\n",
" \n",
" gk | \n",
" 0.8666 | \n",
" 0.9511 | \n",
" 0.9499 | \n",
"
\n",
" \n",
" combined | \n",
" 0.8911 | \n",
" 0.9469 | \n",
" 0.9549 | \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(scores).style.format('{:0.4f}')"
]
},
{
"cell_type": "markdown",
"id": "0e61f00f",
"metadata": {
"papermill": {
"duration": 0.090834,
"end_time": "2022-04-07T13:06:54.582503",
"exception": false,
"start_time": "2022-04-07T13:06:54.491669",
"status": "completed"
},
"tags": []
},
"source": [
"The results are *slightly* different to those in the paper, but all agree within 0.01 for each row.\n",
"\n",
"So we've successfully reproduced the results in the paper, and shown the evaulation from Stanford NER toolkit is very close to that of seqeval (if you work around hallucinated entities)."
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "94e67a94",
"metadata": {
"execution": {
"iopub.execute_input": "2022-04-07T13:06:54.781566Z",
"iopub.status.busy": "2022-04-07T13:06:54.780354Z",
"iopub.status.idle": "2022-04-07T13:06:54.784566Z",
"shell.execute_reply": "2022-04-07T13:06:54.785228Z",
"shell.execute_reply.started": "2022-04-07T12:56:46.204022Z"
},
"papermill": {
"duration": 0.11174,
"end_time": "2022-04-07T13:06:54.785439",
"exception": false,
"start_time": "2022-04-07T13:06:54.673699",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" AllRecipes | \n",
" FOOD.com | \n",
" BOTH | \n",
"
\n",
" \n",
" \n",
" \n",
" AllRecipes | \n",
" 0.9682 | \n",
" 0.9317 | \n",
" 0.9709 | \n",
"
\n",
" \n",
" FOOD.com | \n",
" 0.8672 | \n",
" 0.9519 | \n",
" 0.9498 | \n",
"
\n",
" \n",
" BOTH | \n",
" 0.8972 | \n",
" 0.9472 | \n",
" 0.9611 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" AllRecipes FOOD.com BOTH\n",
"AllRecipes 0.9682 0.9317 0.9709\n",
"FOOD.com 0.8672 0.9519 0.9498\n",
"BOTH 0.8972 0.9472 0.9611"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reported_scores = pd.DataFrame([[0.9682, 0.9317, 0.9709],\n",
" [0.8672, 0.9519, 0.9498],\n",
" [0.8972, 0.9472, 0.9611]],\n",
" columns = ['AllRecipes', 'FOOD.com', 'BOTH'],\n",
" index = ['AllRecipes', 'FOOD.com', 'BOTH'])\n",
"reported_scores"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
},
"papermill": {
"default_parameters": {},
"duration": 567.381694,
"end_time": "2022-04-07T13:06:56.199628",
"environment_variables": {},
"exception": null,
"input_path": "__notebook__.ipynb",
"output_path": "__notebook__.ipynb",
"parameters": {},
"start_time": "2022-04-07T12:57:28.817934",
"version": "2.3.3"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {
"04e4948e123444569b4b09d696e122ea": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"05cf3b36eb1e4ddb9c4f9b675fdd6b1d": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"081d02fab37a4e29bd2531213f508808": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_f688a6aae02749a6874511e7c4d0da95",
"max": 1705.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_a6e2ba7f52a042fb8f3595b50c406bc6",
"value": 1705.0
}
},
"0b95b19116f44d5dae17156430a708ec": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"1455809b8e1945e0bebdc8ea8adcc87d": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"15dca12ce1ee4b8ca4ec62aa8864f8c7": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"171199414c5a4954b0ac767628554d8b": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"183452cec8804fd5903953436a43bf20": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_e4473ff688a9440690198f9e4cf6858e",
"placeholder": "",
"style": "IPY_MODEL_2474d59c1f9d47d4b59ffef46d4f8443",
"value": " 483/483 [00:29<00:00, 41.45it/s]"
}
},
"19c7b297a53444608e29aa3fcce236b9": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"1a8c3ed5abda483baee14f7c855449cd": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_337d520658594202b091a3a25a60ab2c",
"placeholder": "",
"style": "IPY_MODEL_46133dd43dca46eba08636f28973f190",
"value": " 483/483 [00:28<00:00, 37.52it/s]"
}
},
"1fe61ae43c2f4a9b93b25d6c60204c8a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"20145d12fdbb45a8a3c7ed1126260403": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"214063d8f2bf4d2f854bdfaa75dd3469": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_661e0655ba4b41139b1be60f73f4ea87",
"placeholder": "",
"style": "IPY_MODEL_385d18bccb9245548deed2a3b554525c",
"value": " 1705/1705 [01:00<00:00, 36.52it/s]"
}
},
"21a60b9d20784d38afdbdbdf9fa74f78": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"2474d59c1f9d47d4b59ffef46d4f8443": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"263283c3d0764d38b9c3d1ddb6c8b427": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_20145d12fdbb45a8a3c7ed1126260403",
"placeholder": "",
"style": "IPY_MODEL_a7eb0e99aee4463785eea58d0cc847af",
"value": " 1705/1705 [01:00<00:00, 34.91it/s]"
}
},
"26e07430d6f54a9bbd9acea509197ce1": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_4191ab859c984f81a50aeb218fc1ac5a",
"IPY_MODEL_c03a18909027459681be5a9113e88a0a",
"IPY_MODEL_1a8c3ed5abda483baee14f7c855449cd"
],
"layout": "IPY_MODEL_f04f3af815f24025b341de0249819145"
}
},
"28efee3ebc4b44e4a2411844af855e92": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_53c764a85af5434eabf8513c4a5e8e6f",
"placeholder": "",
"style": "IPY_MODEL_9ad75aea6d7341dba7d6e56cf625c10b",
"value": "100%"
}
},
"2a539d7b18e4425abbd7ae22e76a26fa": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"2b5a6815608b41d69923bd5e845385a3": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"316de487dcb041e9aef79533ede9ab20": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"337d520658594202b091a3a25a60ab2c": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"345e4d287bd24ae9975f3be4e2027c4c": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"385d18bccb9245548deed2a3b554525c": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"38d19c56c37947469a993acb7f0213a0": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"39c71bb6e43e4330b2b149c25da98d1d": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_28efee3ebc4b44e4a2411844af855e92",
"IPY_MODEL_8bc62d6d2ec34b848127dfe4d57b8c0e",
"IPY_MODEL_a9267cfc2fb54d408ab6377e7cf9a2cb"
],
"layout": "IPY_MODEL_ec60bd5d40374f4296bf13b4d1a6e957"
}
},
"3cc71d45aca94e189e597f9532b0ff83": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"40754c3ca8c049eeaf44280e861bb455": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_2a539d7b18e4425abbd7ae22e76a26fa",
"placeholder": "",
"style": "IPY_MODEL_345e4d287bd24ae9975f3be4e2027c4c",
"value": "Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip: 100%"
}
},
"4191ab859c984f81a50aeb218fc1ac5a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_4827280778c24f8ca7bfc579f2bc0806",
"placeholder": "",
"style": "IPY_MODEL_5fe9b045a4014a4a80ab8190de207966",
"value": "100%"
}
},
"42e7709bb52842e999ccf9a7a385973a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_aacee86f941e404b8c9f0345c0e85a94",
"placeholder": "",
"style": "IPY_MODEL_04e4948e123444569b4b09d696e122ea",
"value": "100%"
}
},
"44ca9e6935704d00925e55eaf8f5e5ec": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_38d19c56c37947469a993acb7f0213a0",
"placeholder": "",
"style": "IPY_MODEL_0b95b19116f44d5dae17156430a708ec",
"value": "100%"
}
},
"46133dd43dca46eba08636f28973f190": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"47ef62581c17433e93855d6789befc2b": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_21a60b9d20784d38afdbdbdf9fa74f78",
"placeholder": "",
"style": "IPY_MODEL_ae9e033f42884508ac8a7f49f6e62c2a",
"value": "100%"
}
},
"4827280778c24f8ca7bfc579f2bc0806": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"4b524361ba0947b9af302faa8518050f": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_73fc8072162e45de89abf389fa5d4b90",
"max": 1705.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_5012502d87f24cd192200979e56af080",
"value": 1705.0
}
},
"5012502d87f24cd192200979e56af080": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"53c764a85af5434eabf8513c4a5e8e6f": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"555a9bda2a024e9a8a9abc444e4fe4cf": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_15dca12ce1ee4b8ca4ec62aa8864f8c7",
"max": 505207915.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_730961bd290541e9a01938ca4876beb9",
"value": 505207915.0
}
},
"5ae58c17d7a2462dadf527338d943cea": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_47ef62581c17433e93855d6789befc2b",
"IPY_MODEL_757532f367d248dcb369c157e5565861",
"IPY_MODEL_263283c3d0764d38b9c3d1ddb6c8b427"
],
"layout": "IPY_MODEL_76e64492e3d7480aa2d96bfc56f3cfa8"
}
},
"5cb06a46903341a9b50676a31addad14": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"5fe9b045a4014a4a80ab8190de207966": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"652276ddba8640708df4285c1ddf5ff9": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_8e12508157c44fbcbb77aac67bc12549",
"IPY_MODEL_4b524361ba0947b9af302faa8518050f",
"IPY_MODEL_71093fc9838b4e9ea8a1fbeda6150111"
],
"layout": "IPY_MODEL_2b5a6815608b41d69923bd5e845385a3"
}
},
"661e0655ba4b41139b1be60f73f4ea87": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"67197e8331f74dd1a7ae96d9d4ee7490": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_75992a1df6f6420e8bc06349c0a99076",
"max": 483.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_93654bad7fcf4739bb54a6b45c60c5ed",
"value": 483.0
}
},
"6a9d691c650c4d779882190429cbe86b": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_05cf3b36eb1e4ddb9c4f9b675fdd6b1d",
"placeholder": "",
"style": "IPY_MODEL_5cb06a46903341a9b50676a31addad14",
"value": " 505M/505M [00:30<00:00, 17.9MB/s]"
}
},
"6fc700ddba53400b889b51216731058b": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"71093fc9838b4e9ea8a1fbeda6150111": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_8594609c694247669c25c78c8ded0342",
"placeholder": "",
"style": "IPY_MODEL_b9e45e43c1ae4ff483e64abefa8ed8de",
"value": " 1705/1705 [01:02<00:00, 37.06it/s]"
}
},
"730961bd290541e9a01938ca4876beb9": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"73fc8072162e45de89abf389fa5d4b90": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"757532f367d248dcb369c157e5565861": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_acb9785ff9dc422bb40cb371d4180419",
"max": 1705.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_c7ab303608734bf18388966fda9a2d77",
"value": 1705.0
}
},
"75992a1df6f6420e8bc06349c0a99076": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"75b56360c2064ec2b7b5d7d4bdc22d42": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_c84f6aba9d924d5ba166018593531c9a",
"placeholder": "",
"style": "IPY_MODEL_cdf8f8e1213e4056aff15837fd6cd1b8",
"value": " 4/4 [00:13<00:00, 3.62s/it]"
}
},
"76e64492e3d7480aa2d96bfc56f3cfa8": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"7f60ca33e67449c1bab4a7c52e290d35": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"8594609c694247669c25c78c8ded0342": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"87190b2caa46403a82d7cb69319b1262": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_44ca9e6935704d00925e55eaf8f5e5ec",
"IPY_MODEL_67197e8331f74dd1a7ae96d9d4ee7490",
"IPY_MODEL_183452cec8804fd5903953436a43bf20"
],
"layout": "IPY_MODEL_c12c8164a6f2450b804b2f350ed2d580"
}
},
"8ab8307799ba406e85852cb620015885": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_42e7709bb52842e999ccf9a7a385973a",
"IPY_MODEL_081d02fab37a4e29bd2531213f508808",
"IPY_MODEL_214063d8f2bf4d2f854bdfaa75dd3469"
],
"layout": "IPY_MODEL_171199414c5a4954b0ac767628554d8b"
}
},
"8bc62d6d2ec34b848127dfe4d57b8c0e": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_7f60ca33e67449c1bab4a7c52e290d35",
"max": 483.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_99a70625c1a5402e9214a7dc5ed53cd1",
"value": 483.0
}
},
"8e12508157c44fbcbb77aac67bc12549": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_e6207369245244299c4e72cab9ee9e6f",
"placeholder": "",
"style": "IPY_MODEL_316de487dcb041e9aef79533ede9ab20",
"value": "100%"
}
},
"92d594f090a64ae0994bb4e7e59a362b": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1455809b8e1945e0bebdc8ea8adcc87d",
"max": 4.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_3cc71d45aca94e189e597f9532b0ff83",
"value": 4.0
}
},
"93654bad7fcf4739bb54a6b45c60c5ed": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"94f4b01c55604c8ead1ebc742ec981dc": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"99a70625c1a5402e9214a7dc5ed53cd1": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"9ad75aea6d7341dba7d6e56cf625c10b": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"9c0da7f9ecde470a8ad513bd82b4e638": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_eaa545004794465bb69a876a78e8386c",
"IPY_MODEL_92d594f090a64ae0994bb4e7e59a362b",
"IPY_MODEL_75b56360c2064ec2b7b5d7d4bdc22d42"
],
"layout": "IPY_MODEL_beca40aef11d4d4fbcc39c9f54709889"
}
},
"a6e2ba7f52a042fb8f3595b50c406bc6": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"a7eb0e99aee4463785eea58d0cc847af": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"a9267cfc2fb54d408ab6377e7cf9a2cb": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_6fc700ddba53400b889b51216731058b",
"placeholder": "",
"style": "IPY_MODEL_d9376b3ffa6741409f6218d1041020cf",
"value": " 483/483 [00:34<00:00, 38.36it/s]"
}
},
"aacee86f941e404b8c9f0345c0e85a94": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"acb9785ff9dc422bb40cb371d4180419": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"ae9e033f42884508ac8a7f49f6e62c2a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"b90172ce78ce4c519efa02f06f3c6835": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_40754c3ca8c049eeaf44280e861bb455",
"IPY_MODEL_555a9bda2a024e9a8a9abc444e4fe4cf",
"IPY_MODEL_6a9d691c650c4d779882190429cbe86b"
],
"layout": "IPY_MODEL_fbe7eab600d2483e98a914ba761c293b"
}
},
"b9e45e43c1ae4ff483e64abefa8ed8de": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"beca40aef11d4d4fbcc39c9f54709889": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"c03a18909027459681be5a9113e88a0a": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_19c7b297a53444608e29aa3fcce236b9",
"max": 483.0,
"min": 0.0,
"orientation": "horizontal",
"style": "IPY_MODEL_1fe61ae43c2f4a9b93b25d6c60204c8a",
"value": 483.0
}
},
"c12c8164a6f2450b804b2f350ed2d580": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"c7ab303608734bf18388966fda9a2d77": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"c84f6aba9d924d5ba166018593531c9a": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"cda950e3c9bc4afdb263d7e294ad4276": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"cdf8f8e1213e4056aff15837fd6cd1b8": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"d9376b3ffa6741409f6218d1041020cf": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"e4473ff688a9440690198f9e4cf6858e": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"e6207369245244299c4e72cab9ee9e6f": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"eaa545004794465bb69a876a78e8386c": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_94f4b01c55604c8ead1ebc742ec981dc",
"placeholder": "",
"style": "IPY_MODEL_cda950e3c9bc4afdb263d7e294ad4276",
"value": "100%"
}
},
"ec60bd5d40374f4296bf13b4d1a6e957": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"f04f3af815f24025b341de0249819145": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"f688a6aae02749a6874511e7c4d0da95": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"fbe7eab600d2483e98a914ba761c293b": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
}
},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}