{ "cells": [ { "cell_type": "markdown", "id": "1684dd94", "metadata": { "papermill": { "duration": 0.065982, "end_time": "2022-04-07T12:57:39.624742", "exception": false, "start_time": "2022-04-07T12:57:39.558760", "status": "completed" }, "tags": [] }, "source": [ "We're going to replicate the benchmark in [A Named Entity Based Approach to Model Recipes](https://arxiv.org/abs/2004.12184), by Diwan, Batra, and Bagler using StanfordNLP, and check it using [seqeval](https://github.com/chakki-works/seqeval).\n", "\n", "Evaluating NER is surprisingly tricky, as [David Batista explains](https://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/), and I want to check that the results in the paper are the same as what seqeval gives, so I can compare it to other models.\n", "\n", "The authors share their data in an [associated git repository](https://github.com/cosylabiiit/recipe-knowledge-mining) and train a model using [Stanford NER](https://nlp.stanford.edu/software/CRF-NER.html), which is open source, so we have a chance of replicating the results." ] }, { "cell_type": "markdown", "id": "e559cc76", "metadata": { "papermill": { "duration": 0.059759, "end_time": "2022-04-07T12:57:39.748864", "exception": false, "start_time": "2022-04-07T12:57:39.689105", "status": "completed" }, "tags": [] }, "source": [ "# Installing Stanford NLP\n", "\n", "We're going to install Stanford NLP which is a Java library.\n", "To make things easier we will use [stanza](https://stanfordnlp.github.io/stanza/) which includes tools for [installing and invoking Stanford NLP](https://stanfordnlp.github.io/stanza/corenlp_client.html)." ] }, { "cell_type": "code", "execution_count": 1, "id": "2873cc65", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:57:39.879337Z", "iopub.status.busy": "2022-04-07T12:57:39.878337Z", "iopub.status.idle": "2022-04-07T12:57:53.376009Z", "shell.execute_reply": "2022-04-07T12:57:53.374964Z", "shell.execute_reply.started": "2022-04-07T12:49:07.145062Z" }, "papermill": { "duration": 13.566942, "end_time": "2022-04-07T12:57:53.376225", "exception": false, "start_time": "2022-04-07T12:57:39.809283", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting stanza\r\n", " Downloading stanza-1.3.0-py3-none-any.whl (432 kB)\r\n", " |████████████████████████████████| 432 kB 292 kB/s \r\n", "\u001b[?25hRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from stanza) (2.26.0)\r\n", "Requirement already satisfied: protobuf in /opt/conda/lib/python3.7/site-packages (from stanza) (3.19.4)\r\n", "Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from stanza) (4.62.3)\r\n", "Requirement already satisfied: torch>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from stanza) (1.9.1+cpu)\r\n", "Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from stanza) (1.20.3)\r\n", "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from stanza) (1.16.0)\r\n", "Requirement already satisfied: emoji in /opt/conda/lib/python3.7/site-packages (from stanza) (1.7.0)\r\n", "Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch>=1.3.0->stanza) (4.1.1)\r\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2.0.9)\r\n", "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2021.10.8)\r\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (1.26.7)\r\n", "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (3.1)\r\n", "Installing collected packages: stanza\r\n", "Successfully installed stanza-1.3.0\r\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n" ] } ], "source": [ " !pip install stanza" ] }, { "cell_type": "markdown", "id": "f71bca3f", "metadata": { "papermill": { "duration": 0.072074, "end_time": "2022-04-07T12:57:53.522613", "exception": false, "start_time": "2022-04-07T12:57:53.450539", "status": "completed" }, "tags": [] }, "source": [ "We can specify where to install Core NLP, but we will us the default, which is either \"\\\\$CORE_NLP_HOME\", or \"\\\\$HOME/stanza_corenlp\". (Ideally we'd use stanza to get this, but I couldn't easy work out how.)" ] }, { "cell_type": "code", "execution_count": 2, "id": "85b13351", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:57:53.670403Z", "iopub.status.busy": "2022-04-07T12:57:53.669467Z", "iopub.status.idle": "2022-04-07T12:58:29.230320Z", "shell.execute_reply": "2022-04-07T12:58:29.229643Z", "shell.execute_reply.started": "2022-04-07T12:49:18.684182Z" }, "papermill": { "duration": 35.633674, "end_time": "2022-04-07T12:58:29.230514", "exception": false, "start_time": "2022-04-07T12:57:53.596840", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b90172ce78ce4c519efa02f06f3c6835", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip: 0%| …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import stanza\n", "stanza.install_corenlp()" ] }, { "cell_type": "markdown", "id": "4483cf45", "metadata": { "papermill": { "duration": 0.06657, "end_time": "2022-04-07T12:58:29.364624", "exception": false, "start_time": "2022-04-07T12:58:29.298054", "status": "completed" }, "tags": [] }, "source": [ "We'll need to invoke the Stanford Core NLP JAR that we just installed, so let's find it." ] }, { "cell_type": "code", "execution_count": 3, "id": "3b31ac2f", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:58:29.505152Z", "iopub.status.busy": "2022-04-07T12:58:29.504095Z", "iopub.status.idle": "2022-04-07T12:58:29.516285Z", "shell.execute_reply": "2022-04-07T12:58:29.515710Z", "shell.execute_reply.started": "2022-04-07T12:49:53.274276Z" }, "papermill": { "duration": 0.084307, "end_time": "2022-04-07T12:58:29.516468", "exception": false, "start_time": "2022-04-07T12:58:29.432161", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'/root/stanza_corenlp/stanford-corenlp-4.4.0.jar'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "import re\n", "from pathlib import Path\n", "\n", "\n", "# Reimplement the logic to find the path where stanza_corenlp is installed.\n", "core_nlp_path = os.getenv('CORENLP_HOME', str(Path.home() / 'stanza_corenlp'))\n", "\n", "# A heuristic to find the right jar file\n", "classpath = [str(p) for p in Path(core_nlp_path).iterdir() if re.match(r\"stanford-corenlp-[0-9.]+\\.jar\", p.name)][0]\n", "classpath" ] }, { "cell_type": "markdown", "id": "98419a70", "metadata": { "papermill": { "duration": 0.074162, "end_time": "2022-04-07T12:58:29.661879", "exception": false, "start_time": "2022-04-07T12:58:29.587717", "status": "completed" }, "tags": [] }, "source": [ "Let's test the [basic usage](https://stanfordnlp.github.io/stanza/client_usage.html).\n", "\n", "There are currently models for 8 languages, and for some fairly complex tasks like coreference resolution." ] }, { "cell_type": "code", "execution_count": 4, "id": "5a8e1173", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:58:29.804047Z", "iopub.status.busy": "2022-04-07T12:58:29.803014Z", "iopub.status.idle": "2022-04-07T12:59:11.230446Z", "shell.execute_reply": "2022-04-07T12:59:11.229515Z", "shell.execute_reply.started": "2022-04-07T12:49:53.286134Z" }, "papermill": { "duration": 41.500822, "end_time": "2022-04-07T12:59:11.230672", "exception": false, "start_time": "2022-04-07T12:58:29.729850", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---\n", "[main] INFO CoreNLP - Server default properties:\n", "\t\t\t(Note: unspecified annotator properties are English defaults)\n", "\t\t\tannotators = tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n", "\t\t\tinputFormat = text\n", "\t\t\toutputFormat = serialized\n", "\t\t\tprettyPrint = false\n", "\t\t\tthreads = 5\n", "[main] INFO CoreNLP - Threads: 5\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n", "[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.1 sec].\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n", "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].\n", "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].\n", "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].\n", "[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.\n", "[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt\n", "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.\n", "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.\n", "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files\n", "[main] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n", "[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.8 sec].\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n", "[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... Time elapsed: 2.1 sec\n", "[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 20000 vectors, elapsed Time: 2.204 sec\n", "[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [4.3 sec].\n", "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n", "[main] INFO edu.stanford.nlp.coref.statistical.SimpleLinearClassifier - Loading coref model edu/stanford/nlp/models/coref/statistical/ranking_model.ser.gz ... done [0.9 sec].\n", "[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency\n", "[main] INFO CoreNLP - Starting server...\n", "[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:9000\n", "[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:36852] API call w/annotators tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n", "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "David Batista wrote a blog post on NER evaluation. Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. We will test his library against Stanford Core NLP. \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Thread-0] INFO CoreNLP - CoreNLP Server is shutting down.\n" ] } ], "source": [ "from stanza.server import CoreNLPClient\n", "\n", "text = \"David Batista wrote a blog post on NER evaluation. \" \\\n", " \"Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. \" \\\n", " \"We will test his library against Stanford Core NLP. \"\n", "\n", "with CoreNLPClient(\n", " annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],\n", " timeout=30000,\n", " memory='6G') as client:\n", " \n", " ann = client.annotate(text)" ] }, { "cell_type": "markdown", "id": "27657787", "metadata": { "papermill": { "duration": 0.073188, "end_time": "2022-04-07T12:59:11.379679", "exception": false, "start_time": "2022-04-07T12:59:11.306491", "status": "completed" }, "tags": [] }, "source": [ "We get 3 sentences out." ] }, { "cell_type": "code", "execution_count": 5, "id": "81ca28fd", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:11.534658Z", "iopub.status.busy": "2022-04-07T12:59:11.533863Z", "iopub.status.idle": "2022-04-07T12:59:11.538234Z", "shell.execute_reply": "2022-04-07T12:59:11.537672Z", "shell.execute_reply.started": "2022-04-07T12:50:29.701187Z" }, "papermill": { "duration": 0.083411, "end_time": "2022-04-07T12:59:11.538434", "exception": false, "start_time": "2022-04-07T12:59:11.455023", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "David Batista wrote a blog post on NER evaluation .\n", "Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks , such as NER .\n", "We will test his library against Stanford Core NLP .\n" ] } ], "source": [ "for sentence in ann.sentence:\n", " print(\" \".join([token.word for token in sentence.token]))" ] }, { "cell_type": "markdown", "id": "6a71d850", "metadata": { "papermill": { "duration": 0.075015, "end_time": "2022-04-07T12:59:11.688350", "exception": false, "start_time": "2022-04-07T12:59:11.613335", "status": "completed" }, "tags": [] }, "source": [ "It can even do clever things like coreference resolution; resolving that \"his library\" refers to \"Hiroki Nakayama's library\"." ] }, { "cell_type": "code", "execution_count": 6, "id": "15c591e5", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:11.844039Z", "iopub.status.busy": "2022-04-07T12:59:11.843369Z", "iopub.status.idle": "2022-04-07T12:59:11.847528Z", "shell.execute_reply": "2022-04-07T12:59:11.848178Z", "shell.execute_reply.started": "2022-04-07T12:50:29.708506Z" }, "papermill": { "duration": 0.081987, "end_time": "2022-04-07T12:59:11.848387", "exception": false, "start_time": "2022-04-07T12:59:11.766400", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['nakayama', 'his']\n" ] } ], "source": [ "for chain in ann.corefChain:\n", " print([ann.mentionsForCoref[mention.mentionID].headString for mention in chain.mention])" ] }, { "cell_type": "markdown", "id": "7b1942b0", "metadata": { "papermill": { "duration": 0.074006, "end_time": "2022-04-07T12:59:11.997611", "exception": false, "start_time": "2022-04-07T12:59:11.923605", "status": "completed" }, "tags": [] }, "source": [ "We can extract things such as lemmas, parts of speech and standard NER tags.\n", "\n", "But we want to train our own NER model to detect ingredients. First we will need to collect the data." ] }, { "cell_type": "code", "execution_count": 7, "id": "bb5b69e7", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:12.165040Z", "iopub.status.busy": "2022-04-07T12:59:12.156660Z", "iopub.status.idle": "2022-04-07T12:59:12.184016Z", "shell.execute_reply": "2022-04-07T12:59:12.184541Z", "shell.execute_reply.started": "2022-04-07T12:50:29.721655Z" }, "papermill": { "duration": 0.11083, "end_time": "2022-04-07T12:59:12.184757", "exception": false, "start_time": "2022-04-07T12:59:12.073927", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678910111213
wordHirokiNakayamawroteseqevaltoevaluatesequentiallabellingtasks,suchasNER.
lemmaHirokiNakayamawriteseqevaltoevaluatesequentiallabellingtask,suchasner.
posNNPNNPVBDNNTOVBJJNNNNS,JJINNN.
nerPERSONPERSONOOOOOOOOOOOO
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 \\\n", "word Hiroki Nakayama wrote seqeval to evaluate sequential labelling \n", "lemma Hiroki Nakayama write seqeval to evaluate sequential labelling \n", "pos NNP NNP VBD NN TO VB JJ NN \n", "ner PERSON PERSON O O O O O O \n", "\n", " 8 9 10 11 12 13 \n", "word tasks , such as NER . \n", "lemma task , such as ner . \n", "pos NNS , JJ IN NN . \n", "ner O O O O O O " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "tokens = ann.sentence[1].token\n", "\n", "pd.DataFrame({'word': [s.word for s in tokens],\n", " 'lemma': [s.lemma for s in tokens],\n", " 'pos': [s.pos for s in tokens],\n", " 'ner': [s.ner for s in tokens]}).T" ] }, { "cell_type": "markdown", "id": "01c281d3", "metadata": { "papermill": { "duration": 0.074855, "end_time": "2022-04-07T12:59:12.333648", "exception": false, "start_time": "2022-04-07T12:59:12.258793", "status": "completed" }, "tags": [] }, "source": [ "# Get Data\n", "\n", "Helpfully the authors provide the annotated ingredients data in the format for Stanford NER that we can download [from github](https://github.com/cosylabiiit/recipe-knowledge-mining).\n", "\n", "There are two sources of ingredients, `ar` is AllRecipes and `gk` is FOOD.com (formerly GeniusKitchen.com)." ] }, { "cell_type": "code", "execution_count": 8, "id": "04edd65b", "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", "execution": { "iopub.execute_input": "2022-04-07T12:59:12.498569Z", "iopub.status.busy": "2022-04-07T12:59:12.497854Z", "iopub.status.idle": "2022-04-07T12:59:14.808353Z", "shell.execute_reply": "2022-04-07T12:59:14.807269Z", "shell.execute_reply.started": "2022-04-07T12:50:29.755893Z" }, "papermill": { "duration": 2.400074, "end_time": "2022-04-07T12:59:14.808574", "exception": false, "start_time": "2022-04-07T12:59:12.408500", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "from urllib.request import urlretrieve\n", "\n", "data_sources = ['ar', 'gk']\n", "data_splits = ['train', 'test']\n", "\n", "base_url = 'https://raw.githubusercontent.com/cosylabiiit/recipe-knowledge-mining/master/'\n", "\n", "def data_filename(source, split):\n", " return f'{source}_{split}.tsv'\n", "\n", "for source in data_sources:\n", " for split in data_splits:\n", " name = data_filename(source, split)\n", " urlretrieve(base_url + name, name)" ] }, { "cell_type": "markdown", "id": "2c260e5f", "metadata": { "papermill": { "duration": 0.073279, "end_time": "2022-04-07T12:59:14.957042", "exception": false, "start_time": "2022-04-07T12:59:14.883763", "status": "completed" }, "tags": [] }, "source": [ "Each line of the file is either a single tab (separating different texts), or a token followed by a tab and then the entity type.\n", "\n", "So for example the first ingredient is `4 cloves garlic`, which is a quantity (4) followed by a unit (cloves) and a name (garlic)." ] }, { "cell_type": "code", "execution_count": 9, "id": "20fd23c4", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:15.124352Z", "iopub.status.busy": "2022-04-07T12:59:15.122955Z", "iopub.status.idle": "2022-04-07T12:59:15.897018Z", "shell.execute_reply": "2022-04-07T12:59:15.897645Z", "shell.execute_reply.started": "2022-04-07T12:50:32.106713Z" }, "papermill": { "duration": 0.866332, "end_time": "2022-04-07T12:59:15.897874", "exception": false, "start_time": "2022-04-07T12:59:15.031542", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "^I\r\n", "4^IQUANTITY\r\n", "cloves^IUNIT\r\n", "garlic^INAME\r\n", "^I\r\n", "2^IQUANTITY\r\n", "tablespoons^IUNIT\r\n", "vegetable^INAME\r\n", "oil^INAME\r\n", ",^IO\r\n" ] } ], "source": [ "!head {data_filename('ar', 'train')} | cat -t" ] }, { "cell_type": "markdown", "id": "eb5b9b7d", "metadata": { "papermill": { "duration": 0.077612, "end_time": "2022-04-07T12:59:16.051180", "exception": false, "start_time": "2022-04-07T12:59:15.973568", "status": "completed" }, "tags": [] }, "source": [ "We can read this in to Python, converting it to a list of annotated sentences, which is just a sequence of token, label pairs." ] }, { "cell_type": "code", "execution_count": 10, "id": "e3682be8", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:16.206078Z", "iopub.status.busy": "2022-04-07T12:59:16.204972Z", "iopub.status.idle": "2022-04-07T12:59:16.214225Z", "shell.execute_reply": "2022-04-07T12:59:16.214726Z", "shell.execute_reply.started": "2022-04-07T12:50:32.911852Z" }, "papermill": { "duration": 0.089243, "end_time": "2022-04-07T12:59:16.214969", "exception": false, "start_time": "2022-04-07T12:59:16.125726", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "from typing import List, Tuple, Generator\n", "\n", "Annotation = Tuple[str, str]\n", "AnnotatedSentence = List[Annotation]\n", "\n", "def segment_texts(data: str) -> Generator[AnnotatedSentence, None, None]:\n", " output = []\n", " for line in data.split('\\n'):\n", " if line.strip():\n", " text, token = line.split('\\t')\n", " output.append((text.strip(), token.strip()))\n", " elif output:\n", " yield output\n", " output = []\n", " \n", "def segment_file(filename: str) -> List[AnnotatedSentence]:\n", " with open(filename, 'rt') as f:\n", " return list(segment_texts(f.read()))" ] }, { "cell_type": "code", "execution_count": 11, "id": "411d8e65", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:16.373155Z", "iopub.status.busy": "2022-04-07T12:59:16.371969Z", "iopub.status.idle": "2022-04-07T12:59:16.388214Z", "shell.execute_reply": "2022-04-07T12:59:16.388796Z", "shell.execute_reply.started": "2022-04-07T12:50:32.921731Z" }, "papermill": { "duration": 0.0992, "end_time": "2022-04-07T12:59:16.389053", "exception": false, "start_time": "2022-04-07T12:59:16.289853", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "ar_train = segment_file(data_filename('ar', 'train'))" ] }, { "cell_type": "code", "execution_count": 12, "id": "9f048122", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:16.545563Z", "iopub.status.busy": "2022-04-07T12:59:16.544455Z", "iopub.status.idle": "2022-04-07T12:59:16.551302Z", "shell.execute_reply": "2022-04-07T12:59:16.551951Z", "shell.execute_reply.started": "2022-04-07T12:50:32.947480Z" }, "papermill": { "duration": 0.087288, "end_time": "2022-04-07T12:59:16.552158", "exception": false, "start_time": "2022-04-07T12:59:16.464870", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[[('4', 'QUANTITY'), ('cloves', 'UNIT'), ('garlic', 'NAME')],\n", " [('2', 'QUANTITY'),\n", " ('tablespoons', 'UNIT'),\n", " ('vegetable', 'NAME'),\n", " ('oil', 'NAME'),\n", " (',', 'O'),\n", " ('divided', 'STATE')]]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ar_train[:2]" ] }, { "cell_type": "markdown", "id": "d2dcacf1", "metadata": { "papermill": { "duration": 0.07699, "end_time": "2022-04-07T12:59:16.705265", "exception": false, "start_time": "2022-04-07T12:59:16.628275", "status": "completed" }, "tags": [] }, "source": [ "We can then calculate the number of sentences in the training set for a source." ] }, { "cell_type": "code", "execution_count": 13, "id": "170a4c27", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:16.860723Z", "iopub.status.busy": "2022-04-07T12:59:16.859821Z", "iopub.status.idle": "2022-04-07T12:59:16.863539Z", "shell.execute_reply": "2022-04-07T12:59:16.864041Z", "shell.execute_reply.started": "2022-04-07T12:50:32.954556Z" }, "papermill": { "duration": 0.084373, "end_time": "2022-04-07T12:59:16.864216", "exception": false, "start_time": "2022-04-07T12:59:16.779843", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "1470" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ar_train)" ] }, { "cell_type": "markdown", "id": "76312b92", "metadata": { "papermill": { "duration": 0.076091, "end_time": "2022-04-07T12:59:17.014758", "exception": false, "start_time": "2022-04-07T12:59:16.938667", "status": "completed" }, "tags": [] }, "source": [ "We can use this to check the types of entities annotated, as in the paper (DF is Dried/Fresh)." ] }, { "cell_type": "code", "execution_count": 14, "id": "1142681f", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:17.175245Z", "iopub.status.busy": "2022-04-07T12:59:17.174209Z", "iopub.status.idle": "2022-04-07T12:59:17.178876Z", "shell.execute_reply": "2022-04-07T12:59:17.178340Z", "shell.execute_reply.started": "2022-04-07T12:50:32.968592Z" }, "papermill": { "duration": 0.089217, "end_time": "2022-04-07T12:59:17.179088", "exception": false, "start_time": "2022-04-07T12:59:17.089871", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "Counter({'QUANTITY': 1583,\n", " 'UNIT': 1338,\n", " 'NAME': 2501,\n", " 'O': 1662,\n", " 'STATE': 879,\n", " 'DF': 154,\n", " 'SIZE': 64,\n", " 'TEMP': 31})" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from collections import Counter\n", "\n", "tag_counts = Counter([annotation[1] for sentence in ar_train for annotation in sentence])\n", "tag_counts" ] }, { "cell_type": "markdown", "id": "677e4069", "metadata": { "papermill": { "duration": 0.073955, "end_time": "2022-04-07T12:59:17.327416", "exception": false, "start_time": "2022-04-07T12:59:17.253461", "status": "completed" }, "tags": [] }, "source": [ "# Train NER Model\n", "\n", "Now we want to train a Stanford NER model on the new annotations.\n", "\n", "First we have to configure it; but there's no information on the paper on how it's configured.\n", "I've copied this template configuration out of the [FAQ](https://nlp.stanford.edu/software/crf-faq.html)\n", "For more information on the parameters you can check the [NERFeatureFactory documentation](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html) or the [source](https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/ie/NERFeatureFactory.java)." ] }, { "cell_type": "code", "execution_count": 15, "id": "59a6e2c0", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:17.491773Z", "iopub.status.busy": "2022-04-07T12:59:17.490773Z", "iopub.status.idle": "2022-04-07T12:59:17.497225Z", "shell.execute_reply": "2022-04-07T12:59:17.497813Z", "shell.execute_reply.started": "2022-04-07T12:50:32.983052Z" }, "papermill": { "duration": 0.092886, "end_time": "2022-04-07T12:59:17.498022", "exception": false, "start_time": "2022-04-07T12:59:17.405136", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "def ner_prop_str(train_files: List[str], test_files: List[str], output: str) -> str:\n", " \"\"\"Returns configuration string to train NER model\"\"\"\n", " train_file_str = ','.join(train_files)\n", " test_file_str = ','.join(test_files)\n", " return f\"\"\"\n", "trainFileList = {train_file_str}\n", "testFiles = {test_file_str}\n", "serializeTo = {output}\n", "map = word=0,answer=1\n", "\n", "useClassFeature=true\n", "useWord=true\n", "useNGrams=true\n", "noMidNGrams=true\n", "maxNGramLeng=6\n", "usePrev=true\n", "useNext=true\n", "useSequences=true\n", "usePrevSequences=true\n", "maxLeft=1\n", "useTypeSeqs=true\n", "useTypeSeqs2=true\n", "useTypeySequences=true\n", "wordShape=chris2useLC\n", "useDisjunctive=true\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "207441dc", "metadata": { "papermill": { "duration": 0.075391, "end_time": "2022-04-07T12:59:17.650533", "exception": false, "start_time": "2022-04-07T12:59:17.575142", "status": "completed" }, "tags": [] }, "source": [ "This is expected to be a file, so let's write a helper that writes it to a file. (An alternative would be to pass these as arguments to the trainer)." ] }, { "cell_type": "code", "execution_count": 16, "id": "d9e7eb79", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:17.812064Z", "iopub.status.busy": "2022-04-07T12:59:17.810962Z", "iopub.status.idle": "2022-04-07T12:59:17.816510Z", "shell.execute_reply": "2022-04-07T12:59:17.817089Z", "shell.execute_reply.started": "2022-04-07T12:50:32.994122Z" }, "papermill": { "duration": 0.087414, "end_time": "2022-04-07T12:59:17.817303", "exception": false, "start_time": "2022-04-07T12:59:17.729889", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "def write_ner_prop_file(ner_prop_file: str, train_files: List[str], test_files: List[str], output_file: str) -> None:\n", " with open(ner_prop_file, 'wt') as f:\n", " props = ner_prop_str(train_files, test_files, output_file)\n", " f.write(props)" ] }, { "cell_type": "markdown", "id": "daec3df3", "metadata": { "papermill": { "duration": 0.075681, "end_time": "2022-04-07T12:59:17.970722", "exception": false, "start_time": "2022-04-07T12:59:17.895041", "status": "completed" }, "tags": [] }, "source": [ "Stanza doesn't give an interface to train a CRF NER model using Stanford NLP, but we can invoke `edu.stanford.nlp.ie.crf.CRFClassifier` directly.\n", "\n", "Let's write a properties file and invoke Java to run the classifier.\n", "It prints a lot of training information, and importantly a summary report at the end which we want to see." ] }, { "cell_type": "code", "execution_count": 17, "id": "6d0cb59c", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:18.128663Z", "iopub.status.busy": "2022-04-07T12:59:18.127603Z", "iopub.status.idle": "2022-04-07T12:59:18.136205Z", "shell.execute_reply": "2022-04-07T12:59:18.136742Z", "shell.execute_reply.started": "2022-04-07T12:50:33.006937Z" }, "papermill": { "duration": 0.089125, "end_time": "2022-04-07T12:59:18.136964", "exception": false, "start_time": "2022-04-07T12:59:18.047839", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "import subprocess\n", "from typing import List\n", "\n", "def train_model(model_name, train_files: List[str], test_files: List[str], print_report=True, classpath=classpath) -> str:\n", " \"\"\"Trains CRF NER Model using StanfordNLP\"\"\"\n", " model_file = f'{model_name}.model.ser.gz'\n", " ner_prop_filename = f'{model_name}.model.props'\n", " write_ner_prop_file(ner_prop_filename, train_files, test_files, model_file)\n", " \n", " result = subprocess.run(\n", " ['java',\n", " '-Xmx2g',\n", " '-cp', classpath,\n", " 'edu.stanford.nlp.ie.crf.CRFClassifier',\n", " '-prop', ner_prop_filename],\n", " capture_output=True)\n", " \n", " # If there's an error with invocation better log the stacktrace\n", " if result.returncode != 0:\n", " print(result.stderr.decode('utf-8'))\n", " result.check_returncode()\n", " \n", " if print_report:\n", " print(*result.stderr.decode('utf-8').split('\\n')[-11:], sep='\\n')\n", " \n", " return model_file" ] }, { "cell_type": "markdown", "id": "9936f352", "metadata": { "papermill": { "duration": 0.074215, "end_time": "2022-04-07T12:59:18.286972", "exception": false, "start_time": "2022-04-07T12:59:18.212757", "status": "completed" }, "tags": [] }, "source": [ "We can train models on each dataset separately, and all together.\n", "For evaluation we'll use the corresponding test set.\n", "\n", "This only takes a few minutes." ] }, { "cell_type": "code", "execution_count": 18, "id": "8232dd04", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T12:59:18.443591Z", "iopub.status.busy": "2022-04-07T12:59:18.442776Z", "iopub.status.idle": "2022-04-07T13:01:35.835973Z", "shell.execute_reply": "2022-04-07T13:01:35.836649Z", "shell.execute_reply.started": "2022-04-07T12:50:33.017883Z" }, "papermill": { "duration": 137.47581, "end_time": "2022-04-07T13:01:35.836960", "exception": false, "start_time": "2022-04-07T12:59:18.361150", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ar\n", "CRFClassifier tagged 2788 words in 483 documents at 7185.57 words per second.\n", " Entity\tP\tR\tF1\tTP\tFP\tFN\n", " DF\t1.0000\t0.9608\t0.9800\t49\t0\t2\n", " NAME\t0.9297\t0.9279\t0.9288\t463\t35\t36\n", " QUANTITY\t1.0000\t0.9962\t0.9981\t522\t0\t2\n", " SIZE\t1.0000\t1.0000\t1.0000\t20\t0\t0\n", " STATE\t0.9601\t0.9633\t0.9617\t289\t12\t11\n", " TEMP\t0.8750\t0.7000\t0.7778\t7\t1\t3\n", " UNIT\t0.9819\t0.9841\t0.9830\t434\t8\t7\n", " Totals\t0.9696\t0.9669\t0.9682\t1784\t56\t61\n", "\n", "\n", "gk\n", "CRFClassifier tagged 9886 words in 1705 documents at 11727.16 words per second.\n", " Entity\tP\tR\tF1\tTP\tFP\tFN\n", " DF\t0.9718\t0.9517\t0.9617\t138\t4\t7\n", " NAME\t0.9132\t0.9021\t0.9076\t1621\t154\t176\n", " QUANTITY\t0.9882\t0.9870\t0.9876\t1598\t19\t21\n", " SIZE\t0.9750\t0.9398\t0.9571\t78\t2\t5\n", " STATE\t0.9255\t0.9503\t0.9377\t708\t57\t37\n", " TEMP\t0.8125\t0.8125\t0.8125\t26\t6\t6\n", " UNIT\t0.9810\t0.9721\t0.9766\t1291\t25\t37\n", " Totals\t0.9534\t0.9497\t0.9516\t5460\t267\t289\n", "\n", "\n", "ar_gk\n", "CRFClassifier tagged 12674 words in 2188 documents at 11648.90 words per second.\n", " Entity\tP\tR\tF1\tTP\tFP\tFN\n", " DF\t0.9738\t0.9490\t0.9612\t186\t5\t10\n", " NAME\t0.9136\t0.9077\t0.9106\t2084\t197\t212\n", " QUANTITY\t0.9911\t0.9897\t0.9904\t2121\t19\t22\n", " SIZE\t0.9798\t0.9417\t0.9604\t97\t2\t6\n", " STATE\t0.9386\t0.9512\t0.9449\t994\t65\t51\n", " TEMP\t0.8140\t0.8333\t0.8235\t35\t8\t7\n", " UNIT\t0.9801\t0.9763\t0.9782\t1727\t35\t42\n", " Totals\t0.9563\t0.9539\t0.9551\t7244\t331\t350\n", "\n", "\n", "CPU times: user 276 ms, sys: 134 ms, total: 410 ms\n", "Wall time: 2min 17s\n" ] } ], "source": [ "%%time\n", "\n", "models = {}\n", "for source in ['ar', 'gk', 'ar_gk']:\n", " print(source)\n", " train_files = [data_filename(s, 'train') for s in source.split('_')]\n", " test_files = [data_filename(s, 'test') for s in source.split('_')]\n", " models[source] = train_model(source, train_files, test_files)\n", " print()" ] }, { "cell_type": "markdown", "id": "59f6f0c9", "metadata": { "papermill": { "duration": 0.077949, "end_time": "2022-04-07T13:01:35.992806", "exception": false, "start_time": "2022-04-07T13:01:35.914857", "status": "completed" }, "tags": [] }, "source": [ "The summary report shows for each model and entity type:\n", "\n", "* True Positives (TP): The number of times that entity was predicted correctly\n", "* False Positives (FP): The number of times that entity in the text but not predicted correctly\n", "* False Negative (FN): The number of times that entity was not in the text and predicted\n", "* Precision (P): Probability a predicted entity is correct, TP/(TP+FP)\n", "* Recall (R): Probability a correct entity is predicted, TP/(TP+FN)\n", "* F1 Score (F1): Harmonic mean of precision and recall, 2/(1/P + 1/R).\n", "\n", "We can compare the F1 Totals to the diagonal of Table IV in the paper\n", "\n", "* AllRecipes.com (ar): We get 0.9682, they report 0.9682\n", "* FOOD.com (gk): We get 0.9516, they report 0.9519\n", "* Both (ar_gk): We get 0.9551, they report 0.9611\n", "\n", "These are super close.\n", "The furthest is `ar_gk` and in the repository they have a separate `ar_gk_train.tsv`; it would be interesting to check whether using it directly gives a closer result and why there is a difference." ] }, { "cell_type": "markdown", "id": "54563997", "metadata": { "papermill": { "duration": 0.0776, "end_time": "2022-04-07T13:01:36.149703", "exception": false, "start_time": "2022-04-07T13:01:36.072103", "status": "completed" }, "tags": [] }, "source": [ "# Running the model in Python" ] }, { "cell_type": "markdown", "id": "fb2def70", "metadata": { "papermill": { "duration": 0.076927, "end_time": "2022-04-07T13:01:36.304500", "exception": false, "start_time": "2022-04-07T13:01:36.227573", "status": "completed" }, "tags": [] }, "source": [ "We can now use these trained models in Python by invoking Stanford NLP with Stanza.\n", "\n", "First we'll load in the test data." ] }, { "cell_type": "code", "execution_count": 19, "id": "0564ac40", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:36.466360Z", "iopub.status.busy": "2022-04-07T13:01:36.465286Z", "iopub.status.idle": "2022-04-07T13:01:36.487026Z", "shell.execute_reply": "2022-04-07T13:01:36.487684Z", "shell.execute_reply.started": "2022-04-07T12:52:33.237540Z" }, "papermill": { "duration": 0.105083, "end_time": "2022-04-07T13:01:36.487888", "exception": false, "start_time": "2022-04-07T13:01:36.382805", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ar 483\n", "gk 1705\n" ] } ], "source": [ "test_data = {}\n", "\n", "for source in data_sources:\n", " test_data[source] = segment_file(data_filename(source, 'test'))\n", " print(source, len(test_data[source]))" ] }, { "cell_type": "markdown", "id": "3e03b03d", "metadata": { "execution": { "iopub.execute_input": "2022-04-06T23:58:43.359429Z", "iopub.status.busy": "2022-04-06T23:58:43.359055Z", "iopub.status.idle": "2022-04-06T23:58:43.365707Z", "shell.execute_reply": "2022-04-06T23:58:43.36474Z", "shell.execute_reply.started": "2022-04-06T23:58:43.35939Z" }, "papermill": { "duration": 0.078031, "end_time": "2022-04-07T13:01:36.643468", "exception": false, "start_time": "2022-04-07T13:01:36.565437", "status": "completed" }, "tags": [] }, "source": [ "We can call StanfordNLP with our custom model by passing the property `ner.model`.\n", "\n", "Our test data is already tokenized in a different way to StanfordNLP, so we'll add an option to the [Tokenizer](https://stanfordnlp.github.io/CoreNLP/tokenize.html) to use whitespace tokenization which is easy to invert.\n", "\n", "It takes a while to start up the server so we want to annotate a large number of texts at once." ] }, { "cell_type": "code", "execution_count": 20, "id": "30730c9e", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:36.807245Z", "iopub.status.busy": "2022-04-07T13:01:36.806406Z", "iopub.status.idle": "2022-04-07T13:01:36.813188Z", "shell.execute_reply": "2022-04-07T13:01:36.813766Z", "shell.execute_reply.started": "2022-04-07T12:52:33.259904Z" }, "papermill": { "duration": 0.09251, "end_time": "2022-04-07T13:01:36.813991", "exception": false, "start_time": "2022-04-07T13:01:36.721481", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "from tqdm.notebook import tqdm\n", "from stanza.server import CoreNLPClient\n", "\n", "def annotate_ner(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True):\n", " properties = {\"ner.model\": ner_model_file, \"tokenize.whitespace\": tokenize_whitespace, \"ner.applyNumericClassifiers\": False}\n", " \n", " annotated = []\n", " with CoreNLPClient(\n", " annotators=['tokenize','ssplit','ner'],\n", " properties=properties,\n", " timeout=30000,\n", " be_quiet=True,\n", " memory='6G') as client:\n", " \n", " for text in tqdm(texts):\n", " annotated.append(client.annotate(text))\n", " return annotated" ] }, { "cell_type": "markdown", "id": "392be996", "metadata": { "papermill": { "duration": 0.077305, "end_time": "2022-04-07T13:01:36.971129", "exception": false, "start_time": "2022-04-07T13:01:36.893824", "status": "completed" }, "tags": [] }, "source": [ "We can then get the annotations" ] }, { "cell_type": "code", "execution_count": 21, "id": "b2b52eb9", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:37.136508Z", "iopub.status.busy": "2022-04-07T13:01:37.135776Z", "iopub.status.idle": "2022-04-07T13:01:51.580370Z", "shell.execute_reply": "2022-04-07T13:01:51.579773Z", "shell.execute_reply.started": "2022-04-07T12:52:33.268568Z" }, "papermill": { "duration": 14.527687, "end_time": "2022-04-07T13:01:51.580543", "exception": false, "start_time": "2022-04-07T13:01:37.052856", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9c0da7f9ecde470a8ad513bd82b4e638", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00 NERData:\n", " tokens = [token for sentence in annotation.sentence for token in sentence.token]\n", " return NERData(tokens=[t.word for t in tokens], ner=[t.coarseNER for t in tokens])" ] }, { "cell_type": "markdown", "id": "41ef7ddd", "metadata": { "papermill": { "duration": 0.079568, "end_time": "2022-04-07T13:01:52.761448", "exception": false, "start_time": "2022-04-07T13:01:52.681880", "status": "completed" }, "tags": [] }, "source": [ "A relatively simple ingredient works well" ] }, { "cell_type": "code", "execution_count": 25, "id": "9d4dd5cb", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:52.923565Z", "iopub.status.busy": "2022-04-07T13:01:52.922824Z", "iopub.status.idle": "2022-04-07T13:01:52.934377Z", "shell.execute_reply": "2022-04-07T13:01:52.934830Z", "shell.execute_reply.started": "2022-04-07T12:52:46.431673Z" }, "papermill": { "duration": 0.094408, "end_time": "2022-04-07T13:01:52.935033", "exception": false, "start_time": "2022-04-07T13:01:52.840625", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
nerQUANTITYUNITOTEMPNAME
tokens1cupoffrozenpeas
\n", "
" ], "text/plain": [ "NERData(ner=['QUANTITY', 'UNIT', 'O', 'TEMP', 'NAME'], tokens=['1', 'cup', 'of', 'frozen', 'peas'])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_ner_data(annotations[0])" ] }, { "cell_type": "markdown", "id": "f68d4ab6", "metadata": { "papermill": { "duration": 0.080341, "end_time": "2022-04-07T13:01:53.095739", "exception": false, "start_time": "2022-04-07T13:01:53.015398", "status": "completed" }, "tags": [] }, "source": [ "A more complex sentence does quite badly, perhaps because this kind of thing wasn't seen." ] }, { "cell_type": "code", "execution_count": 26, "id": "84662f1c", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:53.259562Z", "iopub.status.busy": "2022-04-07T13:01:53.258809Z", "iopub.status.idle": "2022-04-07T13:01:53.269051Z", "shell.execute_reply": "2022-04-07T13:01:53.269619Z", "shell.execute_reply.started": "2022-04-07T12:52:46.452987Z" }, "papermill": { "duration": 0.094219, "end_time": "2022-04-07T13:01:53.269840", "exception": false, "start_time": "2022-04-07T13:01:53.175621", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234567
nerQUANTITYUNITNAMENAMENAMENAMEOO
tokensAdashofsalt.Ortotaste
\n", "
" ], "text/plain": [ "NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'NAME', 'NAME', 'NAME', 'O', 'O'], tokens=['A', 'dash', 'of', 'salt', '.', 'Or', 'to', 'taste'])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_ner_data(annotations[1])" ] }, { "cell_type": "code", "execution_count": 27, "id": "e85ccce9", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:53.436990Z", "iopub.status.busy": "2022-04-07T13:01:53.436294Z", "iopub.status.idle": "2022-04-07T13:01:53.445951Z", "shell.execute_reply": "2022-04-07T13:01:53.446566Z", "shell.execute_reply.started": "2022-04-07T12:52:46.468618Z" }, "papermill": { "duration": 0.094843, "end_time": "2022-04-07T13:01:53.446783", "exception": false, "start_time": "2022-04-07T13:01:53.351940", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678
nerQUANTITYUNITNAMEOOOOOO
tokens12slicespancetta-LRB-Italianunsmokedcuredbacon-RRB-
\n", "
" ], "text/plain": [ "NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'O', 'O', 'O', 'O', 'O', 'O'], tokens=['12', 'slices', 'pancetta', '-LRB-', 'Italian', 'unsmoked', 'cured', 'bacon', '-RRB-'])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "extract_ner_data(annotations[2])" ] }, { "cell_type": "markdown", "id": "d067bdb5", "metadata": { "papermill": { "duration": 0.080591, "end_time": "2022-04-07T13:01:53.610955", "exception": false, "start_time": "2022-04-07T13:01:53.530364", "status": "completed" }, "tags": [] }, "source": [ "We can chain these functions together to get from text to NER" ] }, { "cell_type": "code", "execution_count": 28, "id": "7e1f8e27", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:53.777728Z", "iopub.status.busy": "2022-04-07T13:01:53.777006Z", "iopub.status.idle": "2022-04-07T13:01:53.782184Z", "shell.execute_reply": "2022-04-07T13:01:53.782741Z", "shell.execute_reply.started": "2022-04-07T12:52:46.490584Z" }, "papermill": { "duration": 0.091584, "end_time": "2022-04-07T13:01:53.782964", "exception": false, "start_time": "2022-04-07T13:01:53.691380", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "from typing import Dict\n", "\n", "def ner_extract(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True) -> List[Dict[str, List[str]]]:\n", " annotations = annotate_ner(ner_model_file, texts, tokenize_whitespace)\n", " return [extract_ner_data(ann) for ann in annotations]" ] }, { "cell_type": "markdown", "id": "92d193f3", "metadata": { "papermill": { "duration": 0.081116, "end_time": "2022-04-07T13:01:53.944944", "exception": false, "start_time": "2022-04-07T13:01:53.863828", "status": "completed" }, "tags": [] }, "source": [ "And then for each model, and test data we can calculate the predictions." ] }, { "cell_type": "code", "execution_count": 29, "id": "2c32ef0a", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:01:54.111985Z", "iopub.status.busy": "2022-04-07T13:01:54.111261Z", "iopub.status.idle": "2022-04-07T13:06:33.038766Z", "shell.execute_reply": "2022-04-07T13:06:33.039358Z", "shell.execute_reply.started": "2022-04-07T12:52:46.497380Z" }, "papermill": { "duration": 279.012135, "end_time": "2022-04-07T13:06:33.039789", "exception": false, "start_time": "2022-04-07T13:01:54.027654", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "26e07430d6f54a9bbd9acea509197ce1", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/483 [00:00=1.14.0 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.20.3)\r\n", "Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.0.1)\r\n", "Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.1.0)\r\n", "Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (3.0.0)\r\n", "Requirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.7.3)\r\n", "Building wheels for collected packages: seqeval\r\n", " Building wheel for seqeval (setup.py) ... \u001b[?25l-\b \b\\\b \b|\b \bdone\r\n", "\u001b[?25h Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16181 sha256=117220ab957b2dfbf6fad8b7cf7fb429b409f1fb1b62fef7ea14d20e38b36203\r\n", " Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7\r\n", "Successfully built seqeval\r\n", "Installing collected packages: seqeval\r\n", "Successfully installed seqeval-1.2.2\r\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n" ] } ], "source": [ "!pip install seqeval" ] }, { "cell_type": "markdown", "id": "442daabb", "metadata": { "papermill": { "duration": 0.088059, "end_time": "2022-04-07T13:06:50.475096", "exception": false, "start_time": "2022-04-07T13:06:50.387037", "status": "completed" }, "tags": [] }, "source": [ "Seqeval expects the data to be in one of the following formats:\n", "\n", "* IOB1\n", "* IOB2\n", "* IOE1\n", "* IOE2\n", "* IOBES(only in strict mode)\n", "* BILOU(only in strict mode)\n", "\n", "These all become important when trying to distinguish distinct entities that are adjacent; these are quite rare in practice.\n", "See Wikipedia for a detailed explanation of [IOB (inside-outside-beginning)](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)).\n", "\n", "In this case it's assumed there's only one entity of each type (which can be wrong when multiple names are listing in a single ingredient).\n", "We can easily convert it to IOB1 using this assumption by prefixing every tag other than 'O' with an 'I-'." ] }, { "cell_type": "code", "execution_count": 32, "id": "816f82e2", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:50.659609Z", "iopub.status.busy": "2022-04-07T13:06:50.658619Z", "iopub.status.idle": "2022-04-07T13:06:50.661057Z", "shell.execute_reply": "2022-04-07T13:06:50.660405Z", "shell.execute_reply.started": "2022-04-07T12:56:44.754303Z" }, "papermill": { "duration": 0.098039, "end_time": "2022-04-07T13:06:50.661208", "exception": false, "start_time": "2022-04-07T13:06:50.563169", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "def convert_to_iob1(tokens):\n", " return ['I-' + label if label != 'O' else 'O' for label in tokens]\n", "\n", "assert convert_to_iob1(['QUANTITY', 'SIZE', 'NAME', 'NAME', 'O', 'STATE']) == ['I-QUANTITY', 'I-SIZE', 'I-NAME', 'I-NAME', 'O', 'I-STATE']" ] }, { "cell_type": "markdown", "id": "2bd57cb3", "metadata": { "papermill": { "duration": 0.088945, "end_time": "2022-04-07T13:06:50.837958", "exception": false, "start_time": "2022-04-07T13:06:50.749013", "status": "completed" }, "tags": [] }, "source": [ "Let's check the classification report for a single example and compare it to the report from StanfordNER.\n", "\n", "The classification report doesn't have the TP, TN and FN, but instead has the support - the number of true entities in the data.\n", "The set of data is equivalent:\n", "\n", "* support = TP + FN\n", "* TP = R * support\n", "* FP = TP (1/P - 1)\n", "* FN = support - TP\n", "\n", "The results are the same." ] }, { "cell_type": "code", "execution_count": 33, "id": "9bf3fab1", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:51.024498Z", "iopub.status.busy": "2022-04-07T13:06:51.023501Z", "iopub.status.idle": "2022-04-07T13:06:52.408718Z", "shell.execute_reply": "2022-04-07T13:06:52.407592Z", "shell.execute_reply.started": "2022-04-07T12:56:44.760756Z" }, "papermill": { "duration": 1.482137, "end_time": "2022-04-07T13:06:52.408950", "exception": false, "start_time": "2022-04-07T13:06:50.926813", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " DF 1.0000 0.9608 0.9800 51\n", " NAME 0.9297 0.9279 0.9288 499\n", " QUANTITY 1.0000 0.9962 0.9981 524\n", " SIZE 1.0000 1.0000 1.0000 20\n", " STATE 0.9601 0.9633 0.9617 300\n", " TEMP 0.8750 0.7000 0.7778 10\n", " UNIT 0.9819 0.9841 0.9830 441\n", "\n", " micro avg 0.9696 0.9669 0.9682 1845\n", " macro avg 0.9638 0.9332 0.9471 1845\n", "weighted avg 0.9695 0.9669 0.9682 1845\n", "\n" ] } ], "source": [ "from seqeval.metrics import classification_report\n", "\n", "test_source = 'ar'\n", "model = 'ar'\n", "\n", "actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in test_data[test_source]]\n", "pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n", "\n", "print(classification_report(actual_ner, pred_ner, digits=4))" ] }, { "cell_type": "markdown", "id": "58e08983", "metadata": { "papermill": { "duration": 0.08957, "end_time": "2022-04-07T13:06:52.587948", "exception": false, "start_time": "2022-04-07T13:06:52.498378", "status": "completed" }, "tags": [] }, "source": [ "We can get the micro f1-score directly." ] }, { "cell_type": "code", "execution_count": 34, "id": "c7f7bfd0", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:52.771575Z", "iopub.status.busy": "2022-04-07T13:06:52.770572Z", "iopub.status.idle": "2022-04-07T13:06:52.797700Z", "shell.execute_reply": "2022-04-07T13:06:52.798273Z", "shell.execute_reply.started": "2022-04-07T12:56:45.725087Z" }, "papermill": { "duration": 0.120808, "end_time": "2022-04-07T13:06:52.798476", "exception": false, "start_time": "2022-04-07T13:06:52.677668", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'0.9682'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from seqeval.metrics import f1_score\n", "'%0.4f' % f1_score(actual_ner, pred_ner)" ] }, { "cell_type": "markdown", "id": "0ca961cf", "metadata": { "papermill": { "duration": 0.090745, "end_time": "2022-04-07T13:06:52.978636", "exception": false, "start_time": "2022-04-07T13:06:52.887891", "status": "completed" }, "tags": [] }, "source": [ "We can then try to reproduce Table IV by computing the f1-score for each model and data." ] }, { "cell_type": "code", "execution_count": 35, "id": "97fe7bab", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:53.162034Z", "iopub.status.busy": "2022-04-07T13:06:53.161354Z", "iopub.status.idle": "2022-04-07T13:06:53.462721Z", "shell.execute_reply": "2022-04-07T13:06:53.462124Z", "shell.execute_reply.started": "2022-04-07T12:56:45.748363Z" }, "papermill": { "duration": 0.395147, "end_time": "2022-04-07T13:06:53.462918", "exception": false, "start_time": "2022-04-07T13:06:53.067771", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "scores = {model: {} for model in models}\n", "for test_source, data in test_data.items():\n", " actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in data]\n", " for model in models:\n", " pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n", " scores[model][test_source] = f1_score(actual_ner, pred_ner)" ] }, { "cell_type": "markdown", "id": "02ace148", "metadata": { "papermill": { "duration": 0.093906, "end_time": "2022-04-07T13:06:53.649107", "exception": false, "start_time": "2022-04-07T13:06:53.555201", "status": "completed" }, "tags": [] }, "source": [ "We also need to calculate the scores on the combined test set, by contatenating them" ] }, { "cell_type": "code", "execution_count": 36, "id": "d19d3eb8", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:53.834560Z", "iopub.status.busy": "2022-04-07T13:06:53.833800Z", "iopub.status.idle": "2022-04-07T13:06:54.131037Z", "shell.execute_reply": "2022-04-07T13:06:54.131536Z", "shell.execute_reply.started": "2022-04-07T12:56:45.944156Z" }, "papermill": { "duration": 0.392623, "end_time": "2022-04-07T13:06:54.131761", "exception": false, "start_time": "2022-04-07T13:06:53.739138", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "actual_ner = [convert_to_iob1([x[1] for x in ann]) for data in test_data.values() for ann in data]\n", "for model in models:\n", " pred_ner = [convert_to_iob1(p.ner) for test_source in test_data for p in preds[model][test_source]]\n", " scores[model]['combined'] = f1_score(actual_ner, pred_ner)" ] }, { "cell_type": "code", "execution_count": 37, "id": "be344047", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:54.322051Z", "iopub.status.busy": "2022-04-07T13:06:54.321346Z", "iopub.status.idle": "2022-04-07T13:06:54.399490Z", "shell.execute_reply": "2022-04-07T13:06:54.398955Z", "shell.execute_reply.started": "2022-04-07T12:56:46.135926Z" }, "papermill": { "duration": 0.177398, "end_time": "2022-04-07T13:06:54.399653", "exception": false, "start_time": "2022-04-07T13:06:54.222255", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 argkar_gk
ar0.96820.93310.9704
gk0.86660.95110.9499
combined0.89110.94690.9549
\n" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(scores).style.format('{:0.4f}')" ] }, { "cell_type": "markdown", "id": "0e61f00f", "metadata": { "papermill": { "duration": 0.090834, "end_time": "2022-04-07T13:06:54.582503", "exception": false, "start_time": "2022-04-07T13:06:54.491669", "status": "completed" }, "tags": [] }, "source": [ "The results are *slightly* different to those in the paper, but all agree within 0.01 for each row.\n", "\n", "So we've successfully reproduced the results in the paper, and shown the evaulation from Stanford NER toolkit is very close to that of seqeval (if you work around hallucinated entities)." ] }, { "cell_type": "code", "execution_count": 38, "id": "94e67a94", "metadata": { "execution": { "iopub.execute_input": "2022-04-07T13:06:54.781566Z", "iopub.status.busy": "2022-04-07T13:06:54.780354Z", "iopub.status.idle": "2022-04-07T13:06:54.784566Z", "shell.execute_reply": "2022-04-07T13:06:54.785228Z", "shell.execute_reply.started": "2022-04-07T12:56:46.204022Z" }, "papermill": { "duration": 0.11174, "end_time": "2022-04-07T13:06:54.785439", "exception": false, "start_time": "2022-04-07T13:06:54.673699", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AllRecipesFOOD.comBOTH
AllRecipes0.96820.93170.9709
FOOD.com0.86720.95190.9498
BOTH0.89720.94720.9611
\n", "
" ], "text/plain": [ " AllRecipes FOOD.com BOTH\n", "AllRecipes 0.9682 0.9317 0.9709\n", "FOOD.com 0.8672 0.9519 0.9498\n", "BOTH 0.8972 0.9472 0.9611" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reported_scores = pd.DataFrame([[0.9682, 0.9317, 0.9709],\n", " [0.8672, 0.9519, 0.9498],\n", " [0.8972, 0.9472, 0.9611]],\n", " columns = ['AllRecipes', 'FOOD.com', 'BOTH'],\n", " index = ['AllRecipes', 'FOOD.com', 'BOTH'])\n", "reported_scores" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "papermill": { "default_parameters": {}, "duration": 567.381694, "end_time": "2022-04-07T13:06:56.199628", "environment_variables": {}, "exception": null, "input_path": "__notebook__.ipynb", "output_path": "__notebook__.ipynb", "parameters": {}, "start_time": "2022-04-07T12:57:28.817934", "version": "2.3.3" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": { "04e4948e123444569b4b09d696e122ea": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "05cf3b36eb1e4ddb9c4f9b675fdd6b1d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "081d02fab37a4e29bd2531213f508808": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_f688a6aae02749a6874511e7c4d0da95", "max": 1705.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_a6e2ba7f52a042fb8f3595b50c406bc6", "value": 1705.0 } }, "0b95b19116f44d5dae17156430a708ec": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "1455809b8e1945e0bebdc8ea8adcc87d": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "15dca12ce1ee4b8ca4ec62aa8864f8c7": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "171199414c5a4954b0ac767628554d8b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "183452cec8804fd5903953436a43bf20": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_e4473ff688a9440690198f9e4cf6858e", "placeholder": "​", "style": "IPY_MODEL_2474d59c1f9d47d4b59ffef46d4f8443", "value": " 483/483 [00:29<00:00, 41.45it/s]" } }, "19c7b297a53444608e29aa3fcce236b9": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "1a8c3ed5abda483baee14f7c855449cd": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_337d520658594202b091a3a25a60ab2c", "placeholder": "​", "style": "IPY_MODEL_46133dd43dca46eba08636f28973f190", "value": " 483/483 [00:28<00:00, 37.52it/s]" } }, "1fe61ae43c2f4a9b93b25d6c60204c8a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "20145d12fdbb45a8a3c7ed1126260403": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "214063d8f2bf4d2f854bdfaa75dd3469": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_661e0655ba4b41139b1be60f73f4ea87", "placeholder": "​", "style": "IPY_MODEL_385d18bccb9245548deed2a3b554525c", "value": " 1705/1705 [01:00<00:00, 36.52it/s]" } }, "21a60b9d20784d38afdbdbdf9fa74f78": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "2474d59c1f9d47d4b59ffef46d4f8443": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "263283c3d0764d38b9c3d1ddb6c8b427": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_20145d12fdbb45a8a3c7ed1126260403", "placeholder": "​", "style": "IPY_MODEL_a7eb0e99aee4463785eea58d0cc847af", "value": " 1705/1705 [01:00<00:00, 34.91it/s]" } }, "26e07430d6f54a9bbd9acea509197ce1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_4191ab859c984f81a50aeb218fc1ac5a", "IPY_MODEL_c03a18909027459681be5a9113e88a0a", "IPY_MODEL_1a8c3ed5abda483baee14f7c855449cd" ], "layout": "IPY_MODEL_f04f3af815f24025b341de0249819145" } }, "28efee3ebc4b44e4a2411844af855e92": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_53c764a85af5434eabf8513c4a5e8e6f", "placeholder": "​", "style": "IPY_MODEL_9ad75aea6d7341dba7d6e56cf625c10b", "value": "100%" } }, "2a539d7b18e4425abbd7ae22e76a26fa": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "2b5a6815608b41d69923bd5e845385a3": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "316de487dcb041e9aef79533ede9ab20": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "337d520658594202b091a3a25a60ab2c": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "345e4d287bd24ae9975f3be4e2027c4c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "385d18bccb9245548deed2a3b554525c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "38d19c56c37947469a993acb7f0213a0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "39c71bb6e43e4330b2b149c25da98d1d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_28efee3ebc4b44e4a2411844af855e92", "IPY_MODEL_8bc62d6d2ec34b848127dfe4d57b8c0e", "IPY_MODEL_a9267cfc2fb54d408ab6377e7cf9a2cb" ], "layout": "IPY_MODEL_ec60bd5d40374f4296bf13b4d1a6e957" } }, "3cc71d45aca94e189e597f9532b0ff83": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "40754c3ca8c049eeaf44280e861bb455": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_2a539d7b18e4425abbd7ae22e76a26fa", "placeholder": "​", "style": "IPY_MODEL_345e4d287bd24ae9975f3be4e2027c4c", "value": "Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip: 100%" } }, "4191ab859c984f81a50aeb218fc1ac5a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_4827280778c24f8ca7bfc579f2bc0806", "placeholder": "​", "style": "IPY_MODEL_5fe9b045a4014a4a80ab8190de207966", "value": "100%" } }, "42e7709bb52842e999ccf9a7a385973a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_aacee86f941e404b8c9f0345c0e85a94", "placeholder": "​", "style": "IPY_MODEL_04e4948e123444569b4b09d696e122ea", "value": "100%" } }, "44ca9e6935704d00925e55eaf8f5e5ec": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_38d19c56c37947469a993acb7f0213a0", "placeholder": "​", "style": "IPY_MODEL_0b95b19116f44d5dae17156430a708ec", "value": "100%" } }, "46133dd43dca46eba08636f28973f190": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "47ef62581c17433e93855d6789befc2b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_21a60b9d20784d38afdbdbdf9fa74f78", "placeholder": "​", "style": "IPY_MODEL_ae9e033f42884508ac8a7f49f6e62c2a", "value": "100%" } }, "4827280778c24f8ca7bfc579f2bc0806": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "4b524361ba0947b9af302faa8518050f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_73fc8072162e45de89abf389fa5d4b90", "max": 1705.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_5012502d87f24cd192200979e56af080", "value": 1705.0 } }, "5012502d87f24cd192200979e56af080": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "53c764a85af5434eabf8513c4a5e8e6f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "555a9bda2a024e9a8a9abc444e4fe4cf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_15dca12ce1ee4b8ca4ec62aa8864f8c7", "max": 505207915.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_730961bd290541e9a01938ca4876beb9", "value": 505207915.0 } }, "5ae58c17d7a2462dadf527338d943cea": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_47ef62581c17433e93855d6789befc2b", "IPY_MODEL_757532f367d248dcb369c157e5565861", "IPY_MODEL_263283c3d0764d38b9c3d1ddb6c8b427" ], "layout": "IPY_MODEL_76e64492e3d7480aa2d96bfc56f3cfa8" } }, "5cb06a46903341a9b50676a31addad14": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "5fe9b045a4014a4a80ab8190de207966": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "652276ddba8640708df4285c1ddf5ff9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_8e12508157c44fbcbb77aac67bc12549", "IPY_MODEL_4b524361ba0947b9af302faa8518050f", "IPY_MODEL_71093fc9838b4e9ea8a1fbeda6150111" ], "layout": "IPY_MODEL_2b5a6815608b41d69923bd5e845385a3" } }, "661e0655ba4b41139b1be60f73f4ea87": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "67197e8331f74dd1a7ae96d9d4ee7490": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_75992a1df6f6420e8bc06349c0a99076", "max": 483.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_93654bad7fcf4739bb54a6b45c60c5ed", "value": 483.0 } }, "6a9d691c650c4d779882190429cbe86b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_05cf3b36eb1e4ddb9c4f9b675fdd6b1d", "placeholder": "​", "style": "IPY_MODEL_5cb06a46903341a9b50676a31addad14", "value": " 505M/505M [00:30<00:00, 17.9MB/s]" } }, "6fc700ddba53400b889b51216731058b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "71093fc9838b4e9ea8a1fbeda6150111": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_8594609c694247669c25c78c8ded0342", "placeholder": "​", "style": "IPY_MODEL_b9e45e43c1ae4ff483e64abefa8ed8de", "value": " 1705/1705 [01:02<00:00, 37.06it/s]" } }, "730961bd290541e9a01938ca4876beb9": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "73fc8072162e45de89abf389fa5d4b90": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "757532f367d248dcb369c157e5565861": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_acb9785ff9dc422bb40cb371d4180419", "max": 1705.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_c7ab303608734bf18388966fda9a2d77", "value": 1705.0 } }, "75992a1df6f6420e8bc06349c0a99076": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "75b56360c2064ec2b7b5d7d4bdc22d42": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_c84f6aba9d924d5ba166018593531c9a", "placeholder": "​", "style": "IPY_MODEL_cdf8f8e1213e4056aff15837fd6cd1b8", "value": " 4/4 [00:13<00:00, 3.62s/it]" } }, "76e64492e3d7480aa2d96bfc56f3cfa8": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "7f60ca33e67449c1bab4a7c52e290d35": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "8594609c694247669c25c78c8ded0342": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "87190b2caa46403a82d7cb69319b1262": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_44ca9e6935704d00925e55eaf8f5e5ec", "IPY_MODEL_67197e8331f74dd1a7ae96d9d4ee7490", "IPY_MODEL_183452cec8804fd5903953436a43bf20" ], "layout": "IPY_MODEL_c12c8164a6f2450b804b2f350ed2d580" } }, "8ab8307799ba406e85852cb620015885": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_42e7709bb52842e999ccf9a7a385973a", "IPY_MODEL_081d02fab37a4e29bd2531213f508808", "IPY_MODEL_214063d8f2bf4d2f854bdfaa75dd3469" ], "layout": "IPY_MODEL_171199414c5a4954b0ac767628554d8b" } }, "8bc62d6d2ec34b848127dfe4d57b8c0e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_7f60ca33e67449c1bab4a7c52e290d35", "max": 483.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_99a70625c1a5402e9214a7dc5ed53cd1", "value": 483.0 } }, "8e12508157c44fbcbb77aac67bc12549": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_e6207369245244299c4e72cab9ee9e6f", "placeholder": "​", "style": "IPY_MODEL_316de487dcb041e9aef79533ede9ab20", "value": "100%" } }, "92d594f090a64ae0994bb4e7e59a362b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_1455809b8e1945e0bebdc8ea8adcc87d", "max": 4.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_3cc71d45aca94e189e597f9532b0ff83", "value": 4.0 } }, "93654bad7fcf4739bb54a6b45c60c5ed": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "94f4b01c55604c8ead1ebc742ec981dc": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "99a70625c1a5402e9214a7dc5ed53cd1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "9ad75aea6d7341dba7d6e56cf625c10b": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "9c0da7f9ecde470a8ad513bd82b4e638": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_eaa545004794465bb69a876a78e8386c", "IPY_MODEL_92d594f090a64ae0994bb4e7e59a362b", "IPY_MODEL_75b56360c2064ec2b7b5d7d4bdc22d42" ], "layout": "IPY_MODEL_beca40aef11d4d4fbcc39c9f54709889" } }, "a6e2ba7f52a042fb8f3595b50c406bc6": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "a7eb0e99aee4463785eea58d0cc847af": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "a9267cfc2fb54d408ab6377e7cf9a2cb": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_6fc700ddba53400b889b51216731058b", "placeholder": "​", "style": "IPY_MODEL_d9376b3ffa6741409f6218d1041020cf", "value": " 483/483 [00:34<00:00, 38.36it/s]" } }, "aacee86f941e404b8c9f0345c0e85a94": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "acb9785ff9dc422bb40cb371d4180419": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "ae9e033f42884508ac8a7f49f6e62c2a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "b90172ce78ce4c519efa02f06f3c6835": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_40754c3ca8c049eeaf44280e861bb455", "IPY_MODEL_555a9bda2a024e9a8a9abc444e4fe4cf", "IPY_MODEL_6a9d691c650c4d779882190429cbe86b" ], "layout": "IPY_MODEL_fbe7eab600d2483e98a914ba761c293b" } }, "b9e45e43c1ae4ff483e64abefa8ed8de": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "beca40aef11d4d4fbcc39c9f54709889": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "c03a18909027459681be5a9113e88a0a": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_19c7b297a53444608e29aa3fcce236b9", "max": 483.0, "min": 0.0, "orientation": "horizontal", "style": "IPY_MODEL_1fe61ae43c2f4a9b93b25d6c60204c8a", "value": 483.0 } }, "c12c8164a6f2450b804b2f350ed2d580": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "c7ab303608734bf18388966fda9a2d77": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "c84f6aba9d924d5ba166018593531c9a": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "cda950e3c9bc4afdb263d7e294ad4276": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "cdf8f8e1213e4056aff15837fd6cd1b8": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "d9376b3ffa6741409f6218d1041020cf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "e4473ff688a9440690198f9e4cf6858e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e6207369245244299c4e72cab9ee9e6f": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "eaa545004794465bb69a876a78e8386c": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_94f4b01c55604c8ead1ebc742ec981dc", "placeholder": "​", "style": "IPY_MODEL_cda950e3c9bc4afdb263d7e294ad4276", "value": "100%" } }, "ec60bd5d40374f4296bf13b4d1a6e957": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "f04f3af815f24025b341de0249819145": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "f688a6aae02749a6874511e7c4d0da95": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "fbe7eab600d2483e98a914ba761c293b": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } } }, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }