{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1684dd94",
   "metadata": {
    "papermill": {
     "duration": 0.065982,
     "end_time": "2022-04-07T12:57:39.624742",
     "exception": false,
     "start_time": "2022-04-07T12:57:39.558760",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We're going to replicate the benchmark in [A Named Entity Based Approach to Model Recipes](https://arxiv.org/abs/2004.12184), by Diwan, Batra, and Bagler using StanfordNLP, and check it using [seqeval](https://github.com/chakki-works/seqeval).\n",
    "\n",
    "Evaluating NER is surprisingly tricky, as [David Batista explains](https://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/), and I want to check that the results in the paper are the same as what seqeval gives, so I can compare it to other models.\n",
    "\n",
    "The authors share their data in an [associated git repository](https://github.com/cosylabiiit/recipe-knowledge-mining) and train a model using [Stanford NER](https://nlp.stanford.edu/software/CRF-NER.html), which is open source, so we have a chance of replicating the results."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e559cc76",
   "metadata": {
    "papermill": {
     "duration": 0.059759,
     "end_time": "2022-04-07T12:57:39.748864",
     "exception": false,
     "start_time": "2022-04-07T12:57:39.689105",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Installing Stanford NLP\n",
    "\n",
    "We're going to install Stanford NLP which is a Java library.\n",
    "To make things easier we will use [stanza](https://stanfordnlp.github.io/stanza/) which includes tools for [installing and invoking Stanford NLP](https://stanfordnlp.github.io/stanza/corenlp_client.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2873cc65",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:57:39.879337Z",
     "iopub.status.busy": "2022-04-07T12:57:39.878337Z",
     "iopub.status.idle": "2022-04-07T12:57:53.376009Z",
     "shell.execute_reply": "2022-04-07T12:57:53.374964Z",
     "shell.execute_reply.started": "2022-04-07T12:49:07.145062Z"
    },
    "papermill": {
     "duration": 13.566942,
     "end_time": "2022-04-07T12:57:53.376225",
     "exception": false,
     "start_time": "2022-04-07T12:57:39.809283",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting stanza\r\n",
      "  Downloading stanza-1.3.0-py3-none-any.whl (432 kB)\r\n",
      "     |████████████████████████████████| 432 kB 292 kB/s            \r\n",
      "\u001b[?25hRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from stanza) (2.26.0)\r\n",
      "Requirement already satisfied: protobuf in /opt/conda/lib/python3.7/site-packages (from stanza) (3.19.4)\r\n",
      "Requirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from stanza) (4.62.3)\r\n",
      "Requirement already satisfied: torch>=1.3.0 in /opt/conda/lib/python3.7/site-packages (from stanza) (1.9.1+cpu)\r\n",
      "Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from stanza) (1.20.3)\r\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from stanza) (1.16.0)\r\n",
      "Requirement already satisfied: emoji in /opt/conda/lib/python3.7/site-packages (from stanza) (1.7.0)\r\n",
      "Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch>=1.3.0->stanza) (4.1.1)\r\n",
      "Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2.0.9)\r\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (2021.10.8)\r\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (1.26.7)\r\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->stanza) (3.1)\r\n",
      "Installing collected packages: stanza\r\n",
      "Successfully installed stanza-1.3.0\r\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n"
     ]
    }
   ],
   "source": [
    "    !pip install stanza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f71bca3f",
   "metadata": {
    "papermill": {
     "duration": 0.072074,
     "end_time": "2022-04-07T12:57:53.522613",
     "exception": false,
     "start_time": "2022-04-07T12:57:53.450539",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can specify where to install Core NLP, but we will us the default, which is either \"\\\\$CORE_NLP_HOME\", or \"\\\\$HOME/stanza_corenlp\". (Ideally we'd use stanza to get this, but I couldn't easy work out how.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "85b13351",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:57:53.670403Z",
     "iopub.status.busy": "2022-04-07T12:57:53.669467Z",
     "iopub.status.idle": "2022-04-07T12:58:29.230320Z",
     "shell.execute_reply": "2022-04-07T12:58:29.229643Z",
     "shell.execute_reply.started": "2022-04-07T12:49:18.684182Z"
    },
    "papermill": {
     "duration": 35.633674,
     "end_time": "2022-04-07T12:58:29.230514",
     "exception": false,
     "start_time": "2022-04-07T12:57:53.596840",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b90172ce78ce4c519efa02f06f3c6835",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip:   0%|        …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import stanza\n",
    "stanza.install_corenlp()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4483cf45",
   "metadata": {
    "papermill": {
     "duration": 0.06657,
     "end_time": "2022-04-07T12:58:29.364624",
     "exception": false,
     "start_time": "2022-04-07T12:58:29.298054",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We'll need to invoke the Stanford Core NLP JAR that we just installed, so let's find it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3b31ac2f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:58:29.505152Z",
     "iopub.status.busy": "2022-04-07T12:58:29.504095Z",
     "iopub.status.idle": "2022-04-07T12:58:29.516285Z",
     "shell.execute_reply": "2022-04-07T12:58:29.515710Z",
     "shell.execute_reply.started": "2022-04-07T12:49:53.274276Z"
    },
    "papermill": {
     "duration": 0.084307,
     "end_time": "2022-04-07T12:58:29.516468",
     "exception": false,
     "start_time": "2022-04-07T12:58:29.432161",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'/root/stanza_corenlp/stanford-corenlp-4.4.0.jar'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import re\n",
    "from pathlib import Path\n",
    "\n",
    "\n",
    "# Reimplement the logic to find the path where stanza_corenlp is installed.\n",
    "core_nlp_path = os.getenv('CORENLP_HOME', str(Path.home() / 'stanza_corenlp'))\n",
    "\n",
    "# A heuristic to find the right jar file\n",
    "classpath = [str(p) for p in Path(core_nlp_path).iterdir() if re.match(r\"stanford-corenlp-[0-9.]+\\.jar\", p.name)][0]\n",
    "classpath"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98419a70",
   "metadata": {
    "papermill": {
     "duration": 0.074162,
     "end_time": "2022-04-07T12:58:29.661879",
     "exception": false,
     "start_time": "2022-04-07T12:58:29.587717",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Let's test the [basic usage](https://stanfordnlp.github.io/stanza/client_usage.html).\n",
    "\n",
    "There are currently models for 8 languages, and for some fairly complex tasks like coreference resolution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5a8e1173",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:58:29.804047Z",
     "iopub.status.busy": "2022-04-07T12:58:29.803014Z",
     "iopub.status.idle": "2022-04-07T12:59:11.230446Z",
     "shell.execute_reply": "2022-04-07T12:59:11.229515Z",
     "shell.execute_reply.started": "2022-04-07T12:49:53.286134Z"
    },
    "papermill": {
     "duration": 41.500822,
     "end_time": "2022-04-07T12:59:11.230672",
     "exception": false,
     "start_time": "2022-04-07T12:58:29.729850",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---\n",
      "[main] INFO CoreNLP - Server default properties:\n",
      "\t\t\t(Note: unspecified annotator properties are English defaults)\n",
      "\t\t\tannotators = tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n",
      "\t\t\tinputFormat = text\n",
      "\t\t\toutputFormat = serialized\n",
      "\t\t\tprettyPrint = false\n",
      "\t\t\tthreads = 5\n",
      "[main] INFO CoreNLP - Threads: 5\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n",
      "[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.1 sec].\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n",
      "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].\n",
      "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.6 sec].\n",
      "[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.0 sec].\n",
      "[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.\n",
      "[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt\n",
      "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.\n",
      "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.\n",
      "[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files\n",
      "[main] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n",
      "[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.8 sec].\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n",
      "[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... Time elapsed: 2.1 sec\n",
      "[main] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 20000 vectors, elapsed Time: 2.204 sec\n",
      "[main] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [4.3 sec].\n",
      "[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n",
      "[main] INFO edu.stanford.nlp.coref.statistical.SimpleLinearClassifier - Loading coref model edu/stanford/nlp/models/coref/statistical/ranking_model.ser.gz ... done [0.9 sec].\n",
      "[main] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency\n",
      "[main] INFO CoreNLP - Starting server...\n",
      "[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0.0.0.0:9000\n",
      "[pool-1-thread-3] INFO CoreNLP - [/127.0.0.1:36852] API call w/annotators tokenize,ssplit,pos,lemma,ner,parse,depparse,coref\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse\n",
      "[pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator coref\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "David Batista wrote a blog post on NER evaluation. Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. We will test his library against Stanford Core NLP. \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Thread-0] INFO CoreNLP - CoreNLP Server is shutting down.\n"
     ]
    }
   ],
   "source": [
    "from stanza.server import CoreNLPClient\n",
    "\n",
    "text = \"David Batista wrote a blog post on NER evaluation. \" \\\n",
    "       \"Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks, such as NER. \" \\\n",
    "       \"We will test his library against Stanford Core NLP. \"\n",
    "\n",
    "with CoreNLPClient(\n",
    "     annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],\n",
    "        timeout=30000,\n",
    "        memory='6G') as client:\n",
    "    \n",
    "    ann =  client.annotate(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27657787",
   "metadata": {
    "papermill": {
     "duration": 0.073188,
     "end_time": "2022-04-07T12:59:11.379679",
     "exception": false,
     "start_time": "2022-04-07T12:59:11.306491",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We get 3 sentences out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "81ca28fd",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:11.534658Z",
     "iopub.status.busy": "2022-04-07T12:59:11.533863Z",
     "iopub.status.idle": "2022-04-07T12:59:11.538234Z",
     "shell.execute_reply": "2022-04-07T12:59:11.537672Z",
     "shell.execute_reply.started": "2022-04-07T12:50:29.701187Z"
    },
    "papermill": {
     "duration": 0.083411,
     "end_time": "2022-04-07T12:59:11.538434",
     "exception": false,
     "start_time": "2022-04-07T12:59:11.455023",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "David Batista wrote a blog post on NER evaluation .\n",
      "Hiroki Nakayama wrote seqeval to evaluate sequential labelling tasks , such as NER .\n",
      "We will test his library against Stanford Core NLP .\n"
     ]
    }
   ],
   "source": [
    "for sentence in ann.sentence:\n",
    "    print(\" \".join([token.word for token in sentence.token]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a71d850",
   "metadata": {
    "papermill": {
     "duration": 0.075015,
     "end_time": "2022-04-07T12:59:11.688350",
     "exception": false,
     "start_time": "2022-04-07T12:59:11.613335",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "It can even do clever things like coreference resolution; resolving that \"his library\" refers to \"Hiroki Nakayama's library\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "15c591e5",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:11.844039Z",
     "iopub.status.busy": "2022-04-07T12:59:11.843369Z",
     "iopub.status.idle": "2022-04-07T12:59:11.847528Z",
     "shell.execute_reply": "2022-04-07T12:59:11.848178Z",
     "shell.execute_reply.started": "2022-04-07T12:50:29.708506Z"
    },
    "papermill": {
     "duration": 0.081987,
     "end_time": "2022-04-07T12:59:11.848387",
     "exception": false,
     "start_time": "2022-04-07T12:59:11.766400",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['nakayama', 'his']\n"
     ]
    }
   ],
   "source": [
    "for chain in ann.corefChain:\n",
    "    print([ann.mentionsForCoref[mention.mentionID].headString for mention in chain.mention])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b1942b0",
   "metadata": {
    "papermill": {
     "duration": 0.074006,
     "end_time": "2022-04-07T12:59:11.997611",
     "exception": false,
     "start_time": "2022-04-07T12:59:11.923605",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can extract things such as lemmas, parts of speech and standard NER tags.\n",
    "\n",
    "But we want to train our own NER model to detect ingredients. First we will need to collect the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "bb5b69e7",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:12.165040Z",
     "iopub.status.busy": "2022-04-07T12:59:12.156660Z",
     "iopub.status.idle": "2022-04-07T12:59:12.184016Z",
     "shell.execute_reply": "2022-04-07T12:59:12.184541Z",
     "shell.execute_reply.started": "2022-04-07T12:50:29.721655Z"
    },
    "papermill": {
     "duration": 0.11083,
     "end_time": "2022-04-07T12:59:12.184757",
     "exception": false,
     "start_time": "2022-04-07T12:59:12.073927",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>10</th>\n",
       "      <th>11</th>\n",
       "      <th>12</th>\n",
       "      <th>13</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>word</th>\n",
       "      <td>Hiroki</td>\n",
       "      <td>Nakayama</td>\n",
       "      <td>wrote</td>\n",
       "      <td>seqeval</td>\n",
       "      <td>to</td>\n",
       "      <td>evaluate</td>\n",
       "      <td>sequential</td>\n",
       "      <td>labelling</td>\n",
       "      <td>tasks</td>\n",
       "      <td>,</td>\n",
       "      <td>such</td>\n",
       "      <td>as</td>\n",
       "      <td>NER</td>\n",
       "      <td>.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lemma</th>\n",
       "      <td>Hiroki</td>\n",
       "      <td>Nakayama</td>\n",
       "      <td>write</td>\n",
       "      <td>seqeval</td>\n",
       "      <td>to</td>\n",
       "      <td>evaluate</td>\n",
       "      <td>sequential</td>\n",
       "      <td>labelling</td>\n",
       "      <td>task</td>\n",
       "      <td>,</td>\n",
       "      <td>such</td>\n",
       "      <td>as</td>\n",
       "      <td>ner</td>\n",
       "      <td>.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pos</th>\n",
       "      <td>NNP</td>\n",
       "      <td>NNP</td>\n",
       "      <td>VBD</td>\n",
       "      <td>NN</td>\n",
       "      <td>TO</td>\n",
       "      <td>VB</td>\n",
       "      <td>JJ</td>\n",
       "      <td>NN</td>\n",
       "      <td>NNS</td>\n",
       "      <td>,</td>\n",
       "      <td>JJ</td>\n",
       "      <td>IN</td>\n",
       "      <td>NN</td>\n",
       "      <td>.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ner</th>\n",
       "      <td>PERSON</td>\n",
       "      <td>PERSON</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           0         1      2        3   4         5           6          7   \\\n",
       "word   Hiroki  Nakayama  wrote  seqeval  to  evaluate  sequential  labelling   \n",
       "lemma  Hiroki  Nakayama  write  seqeval  to  evaluate  sequential  labelling   \n",
       "pos       NNP       NNP    VBD       NN  TO        VB          JJ         NN   \n",
       "ner    PERSON    PERSON      O        O   O         O           O          O   \n",
       "\n",
       "          8  9     10  11   12 13  \n",
       "word   tasks  ,  such  as  NER  .  \n",
       "lemma   task  ,  such  as  ner  .  \n",
       "pos      NNS  ,    JJ  IN   NN  .  \n",
       "ner        O  O     O   O    O  O  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "tokens = ann.sentence[1].token\n",
    "\n",
    "pd.DataFrame({'word': [s.word for s in tokens],\n",
    "              'lemma': [s.lemma for s in tokens],\n",
    "              'pos': [s.pos for s in tokens],\n",
    "              'ner': [s.ner for s in tokens]}).T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01c281d3",
   "metadata": {
    "papermill": {
     "duration": 0.074855,
     "end_time": "2022-04-07T12:59:12.333648",
     "exception": false,
     "start_time": "2022-04-07T12:59:12.258793",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Get Data\n",
    "\n",
    "Helpfully the authors provide the annotated ingredients data in the format for Stanford NER that we can download [from github](https://github.com/cosylabiiit/recipe-knowledge-mining).\n",
    "\n",
    "There are two sources of ingredients, `ar` is AllRecipes and `gk` is  FOOD.com (formerly GeniusKitchen.com)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "04edd65b",
   "metadata": {
    "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
    "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:12.498569Z",
     "iopub.status.busy": "2022-04-07T12:59:12.497854Z",
     "iopub.status.idle": "2022-04-07T12:59:14.808353Z",
     "shell.execute_reply": "2022-04-07T12:59:14.807269Z",
     "shell.execute_reply.started": "2022-04-07T12:50:29.755893Z"
    },
    "papermill": {
     "duration": 2.400074,
     "end_time": "2022-04-07T12:59:14.808574",
     "exception": false,
     "start_time": "2022-04-07T12:59:12.408500",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from urllib.request import urlretrieve\n",
    "\n",
    "data_sources = ['ar', 'gk']\n",
    "data_splits = ['train', 'test']\n",
    "\n",
    "base_url = 'https://raw.githubusercontent.com/cosylabiiit/recipe-knowledge-mining/master/'\n",
    "\n",
    "def data_filename(source, split):\n",
    "    return f'{source}_{split}.tsv'\n",
    "\n",
    "for source in data_sources:\n",
    "    for split in data_splits:\n",
    "        name = data_filename(source, split)\n",
    "        urlretrieve(base_url + name, name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c260e5f",
   "metadata": {
    "papermill": {
     "duration": 0.073279,
     "end_time": "2022-04-07T12:59:14.957042",
     "exception": false,
     "start_time": "2022-04-07T12:59:14.883763",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Each line of the file is either a single tab (separating different texts), or a token followed by a tab and then the entity type.\n",
    "\n",
    "So for example the first ingredient is `4 cloves garlic`, which is a quantity (4) followed by a unit (cloves) and a name (garlic)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "20fd23c4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:15.124352Z",
     "iopub.status.busy": "2022-04-07T12:59:15.122955Z",
     "iopub.status.idle": "2022-04-07T12:59:15.897018Z",
     "shell.execute_reply": "2022-04-07T12:59:15.897645Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.106713Z"
    },
    "papermill": {
     "duration": 0.866332,
     "end_time": "2022-04-07T12:59:15.897874",
     "exception": false,
     "start_time": "2022-04-07T12:59:15.031542",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "^I\r\n",
      "4^IQUANTITY\r\n",
      "cloves^IUNIT\r\n",
      "garlic^INAME\r\n",
      "^I\r\n",
      "2^IQUANTITY\r\n",
      "tablespoons^IUNIT\r\n",
      "vegetable^INAME\r\n",
      "oil^INAME\r\n",
      ",^IO\r\n"
     ]
    }
   ],
   "source": [
    "!head {data_filename('ar', 'train')} | cat -t"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb5b9b7d",
   "metadata": {
    "papermill": {
     "duration": 0.077612,
     "end_time": "2022-04-07T12:59:16.051180",
     "exception": false,
     "start_time": "2022-04-07T12:59:15.973568",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can read this in to Python, converting it to a list of annotated sentences, which is just a sequence of token, label pairs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "e3682be8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:16.206078Z",
     "iopub.status.busy": "2022-04-07T12:59:16.204972Z",
     "iopub.status.idle": "2022-04-07T12:59:16.214225Z",
     "shell.execute_reply": "2022-04-07T12:59:16.214726Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.911852Z"
    },
    "papermill": {
     "duration": 0.089243,
     "end_time": "2022-04-07T12:59:16.214969",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.125726",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from typing import List, Tuple, Generator\n",
    "\n",
    "Annotation = Tuple[str, str]\n",
    "AnnotatedSentence = List[Annotation]\n",
    "\n",
    "def segment_texts(data: str) -> Generator[AnnotatedSentence, None, None]:\n",
    "    output = []\n",
    "    for line in data.split('\\n'):\n",
    "        if line.strip():\n",
    "            text, token = line.split('\\t')\n",
    "            output.append((text.strip(), token.strip()))\n",
    "        elif output:\n",
    "            yield output\n",
    "            output = []\n",
    "            \n",
    "def segment_file(filename: str) -> List[AnnotatedSentence]:\n",
    "    with open(filename, 'rt') as f:\n",
    "        return list(segment_texts(f.read()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "411d8e65",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:16.373155Z",
     "iopub.status.busy": "2022-04-07T12:59:16.371969Z",
     "iopub.status.idle": "2022-04-07T12:59:16.388214Z",
     "shell.execute_reply": "2022-04-07T12:59:16.388796Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.921731Z"
    },
    "papermill": {
     "duration": 0.0992,
     "end_time": "2022-04-07T12:59:16.389053",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.289853",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "ar_train = segment_file(data_filename('ar', 'train'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9f048122",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:16.545563Z",
     "iopub.status.busy": "2022-04-07T12:59:16.544455Z",
     "iopub.status.idle": "2022-04-07T12:59:16.551302Z",
     "shell.execute_reply": "2022-04-07T12:59:16.551951Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.947480Z"
    },
    "papermill": {
     "duration": 0.087288,
     "end_time": "2022-04-07T12:59:16.552158",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.464870",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[('4', 'QUANTITY'), ('cloves', 'UNIT'), ('garlic', 'NAME')],\n",
       " [('2', 'QUANTITY'),\n",
       "  ('tablespoons', 'UNIT'),\n",
       "  ('vegetable', 'NAME'),\n",
       "  ('oil', 'NAME'),\n",
       "  (',', 'O'),\n",
       "  ('divided', 'STATE')]]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ar_train[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2dcacf1",
   "metadata": {
    "papermill": {
     "duration": 0.07699,
     "end_time": "2022-04-07T12:59:16.705265",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.628275",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can then calculate the number of sentences in the training set for a source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "170a4c27",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:16.860723Z",
     "iopub.status.busy": "2022-04-07T12:59:16.859821Z",
     "iopub.status.idle": "2022-04-07T12:59:16.863539Z",
     "shell.execute_reply": "2022-04-07T12:59:16.864041Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.954556Z"
    },
    "papermill": {
     "duration": 0.084373,
     "end_time": "2022-04-07T12:59:16.864216",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.779843",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1470"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(ar_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76312b92",
   "metadata": {
    "papermill": {
     "duration": 0.076091,
     "end_time": "2022-04-07T12:59:17.014758",
     "exception": false,
     "start_time": "2022-04-07T12:59:16.938667",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can use this to check the types of entities annotated, as in the paper (DF is Dried/Fresh)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "1142681f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:17.175245Z",
     "iopub.status.busy": "2022-04-07T12:59:17.174209Z",
     "iopub.status.idle": "2022-04-07T12:59:17.178876Z",
     "shell.execute_reply": "2022-04-07T12:59:17.178340Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.968592Z"
    },
    "papermill": {
     "duration": 0.089217,
     "end_time": "2022-04-07T12:59:17.179088",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.089871",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({'QUANTITY': 1583,\n",
       "         'UNIT': 1338,\n",
       "         'NAME': 2501,\n",
       "         'O': 1662,\n",
       "         'STATE': 879,\n",
       "         'DF': 154,\n",
       "         'SIZE': 64,\n",
       "         'TEMP': 31})"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import Counter\n",
    "\n",
    "tag_counts = Counter([annotation[1] for sentence in ar_train for annotation in sentence])\n",
    "tag_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "677e4069",
   "metadata": {
    "papermill": {
     "duration": 0.073955,
     "end_time": "2022-04-07T12:59:17.327416",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.253461",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Train NER Model\n",
    "\n",
    "Now we want to train a Stanford NER model on the new annotations.\n",
    "\n",
    "First we have to configure it; but there's no information on the paper on how it's configured.\n",
    "I've copied this template configuration out of the [FAQ](https://nlp.stanford.edu/software/crf-faq.html)\n",
    "For more information on the parameters you can check the [NERFeatureFactory documentation](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html) or the [source](https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/ie/NERFeatureFactory.java)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "59a6e2c0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:17.491773Z",
     "iopub.status.busy": "2022-04-07T12:59:17.490773Z",
     "iopub.status.idle": "2022-04-07T12:59:17.497225Z",
     "shell.execute_reply": "2022-04-07T12:59:17.497813Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.983052Z"
    },
    "papermill": {
     "duration": 0.092886,
     "end_time": "2022-04-07T12:59:17.498022",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.405136",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def ner_prop_str(train_files: List[str], test_files: List[str], output: str) -> str:\n",
    "    \"\"\"Returns configuration string to train NER model\"\"\"\n",
    "    train_file_str = ','.join(train_files)\n",
    "    test_file_str = ','.join(test_files)\n",
    "    return f\"\"\"\n",
    "trainFileList = {train_file_str}\n",
    "testFiles = {test_file_str}\n",
    "serializeTo = {output}\n",
    "map = word=0,answer=1\n",
    "\n",
    "useClassFeature=true\n",
    "useWord=true\n",
    "useNGrams=true\n",
    "noMidNGrams=true\n",
    "maxNGramLeng=6\n",
    "usePrev=true\n",
    "useNext=true\n",
    "useSequences=true\n",
    "usePrevSequences=true\n",
    "maxLeft=1\n",
    "useTypeSeqs=true\n",
    "useTypeSeqs2=true\n",
    "useTypeySequences=true\n",
    "wordShape=chris2useLC\n",
    "useDisjunctive=true\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "207441dc",
   "metadata": {
    "papermill": {
     "duration": 0.075391,
     "end_time": "2022-04-07T12:59:17.650533",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.575142",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This is expected to be a file, so let's write a helper that writes it to a file. (An alternative would be to pass these as arguments to the trainer)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d9e7eb79",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:17.812064Z",
     "iopub.status.busy": "2022-04-07T12:59:17.810962Z",
     "iopub.status.idle": "2022-04-07T12:59:17.816510Z",
     "shell.execute_reply": "2022-04-07T12:59:17.817089Z",
     "shell.execute_reply.started": "2022-04-07T12:50:32.994122Z"
    },
    "papermill": {
     "duration": 0.087414,
     "end_time": "2022-04-07T12:59:17.817303",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.729889",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def write_ner_prop_file(ner_prop_file: str, train_files: List[str], test_files: List[str], output_file: str) -> None:\n",
    "    with open(ner_prop_file, 'wt') as f:\n",
    "        props = ner_prop_str(train_files, test_files, output_file)\n",
    "        f.write(props)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "daec3df3",
   "metadata": {
    "papermill": {
     "duration": 0.075681,
     "end_time": "2022-04-07T12:59:17.970722",
     "exception": false,
     "start_time": "2022-04-07T12:59:17.895041",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Stanza doesn't give an interface to train a CRF NER model using Stanford NLP, but we can invoke `edu.stanford.nlp.ie.crf.CRFClassifier` directly.\n",
    "\n",
    "Let's write a properties file and invoke Java to run the classifier.\n",
    "It prints a lot of training information, and importantly a summary report at the end which we want to see."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "6d0cb59c",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:18.128663Z",
     "iopub.status.busy": "2022-04-07T12:59:18.127603Z",
     "iopub.status.idle": "2022-04-07T12:59:18.136205Z",
     "shell.execute_reply": "2022-04-07T12:59:18.136742Z",
     "shell.execute_reply.started": "2022-04-07T12:50:33.006937Z"
    },
    "papermill": {
     "duration": 0.089125,
     "end_time": "2022-04-07T12:59:18.136964",
     "exception": false,
     "start_time": "2022-04-07T12:59:18.047839",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import subprocess\n",
    "from typing import List\n",
    "\n",
    "def train_model(model_name, train_files: List[str], test_files: List[str], print_report=True, classpath=classpath) -> str:\n",
    "    \"\"\"Trains CRF NER Model using StanfordNLP\"\"\"\n",
    "    model_file = f'{model_name}.model.ser.gz'\n",
    "    ner_prop_filename = f'{model_name}.model.props'\n",
    "    write_ner_prop_file(ner_prop_filename, train_files, test_files, model_file)\n",
    "        \n",
    "    result = subprocess.run(\n",
    "                ['java',\n",
    "                 '-Xmx2g',\n",
    "                 '-cp', classpath,\n",
    "                 'edu.stanford.nlp.ie.crf.CRFClassifier',\n",
    "                 '-prop', ner_prop_filename],\n",
    "                capture_output=True)\n",
    "    \n",
    "    # If there's an error with invocation better log the stacktrace\n",
    "    if result.returncode != 0:\n",
    "        print(result.stderr.decode('utf-8'))\n",
    "    result.check_returncode()\n",
    "    \n",
    "    if print_report:\n",
    "        print(*result.stderr.decode('utf-8').split('\\n')[-11:], sep='\\n')\n",
    "        \n",
    "    return model_file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9936f352",
   "metadata": {
    "papermill": {
     "duration": 0.074215,
     "end_time": "2022-04-07T12:59:18.286972",
     "exception": false,
     "start_time": "2022-04-07T12:59:18.212757",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can train models on each dataset separately, and all together.\n",
    "For evaluation we'll use the corresponding test set.\n",
    "\n",
    "This only takes a few minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8232dd04",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T12:59:18.443591Z",
     "iopub.status.busy": "2022-04-07T12:59:18.442776Z",
     "iopub.status.idle": "2022-04-07T13:01:35.835973Z",
     "shell.execute_reply": "2022-04-07T13:01:35.836649Z",
     "shell.execute_reply.started": "2022-04-07T12:50:33.017883Z"
    },
    "papermill": {
     "duration": 137.47581,
     "end_time": "2022-04-07T13:01:35.836960",
     "exception": false,
     "start_time": "2022-04-07T12:59:18.361150",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ar\n",
      "CRFClassifier tagged 2788 words in 483 documents at 7185.57 words per second.\n",
      "         Entity\tP\tR\tF1\tTP\tFP\tFN\n",
      "             DF\t1.0000\t0.9608\t0.9800\t49\t0\t2\n",
      "           NAME\t0.9297\t0.9279\t0.9288\t463\t35\t36\n",
      "       QUANTITY\t1.0000\t0.9962\t0.9981\t522\t0\t2\n",
      "           SIZE\t1.0000\t1.0000\t1.0000\t20\t0\t0\n",
      "          STATE\t0.9601\t0.9633\t0.9617\t289\t12\t11\n",
      "           TEMP\t0.8750\t0.7000\t0.7778\t7\t1\t3\n",
      "           UNIT\t0.9819\t0.9841\t0.9830\t434\t8\t7\n",
      "         Totals\t0.9696\t0.9669\t0.9682\t1784\t56\t61\n",
      "\n",
      "\n",
      "gk\n",
      "CRFClassifier tagged 9886 words in 1705 documents at 11727.16 words per second.\n",
      "         Entity\tP\tR\tF1\tTP\tFP\tFN\n",
      "             DF\t0.9718\t0.9517\t0.9617\t138\t4\t7\n",
      "           NAME\t0.9132\t0.9021\t0.9076\t1621\t154\t176\n",
      "       QUANTITY\t0.9882\t0.9870\t0.9876\t1598\t19\t21\n",
      "           SIZE\t0.9750\t0.9398\t0.9571\t78\t2\t5\n",
      "          STATE\t0.9255\t0.9503\t0.9377\t708\t57\t37\n",
      "           TEMP\t0.8125\t0.8125\t0.8125\t26\t6\t6\n",
      "           UNIT\t0.9810\t0.9721\t0.9766\t1291\t25\t37\n",
      "         Totals\t0.9534\t0.9497\t0.9516\t5460\t267\t289\n",
      "\n",
      "\n",
      "ar_gk\n",
      "CRFClassifier tagged 12674 words in 2188 documents at 11648.90 words per second.\n",
      "         Entity\tP\tR\tF1\tTP\tFP\tFN\n",
      "             DF\t0.9738\t0.9490\t0.9612\t186\t5\t10\n",
      "           NAME\t0.9136\t0.9077\t0.9106\t2084\t197\t212\n",
      "       QUANTITY\t0.9911\t0.9897\t0.9904\t2121\t19\t22\n",
      "           SIZE\t0.9798\t0.9417\t0.9604\t97\t2\t6\n",
      "          STATE\t0.9386\t0.9512\t0.9449\t994\t65\t51\n",
      "           TEMP\t0.8140\t0.8333\t0.8235\t35\t8\t7\n",
      "           UNIT\t0.9801\t0.9763\t0.9782\t1727\t35\t42\n",
      "         Totals\t0.9563\t0.9539\t0.9551\t7244\t331\t350\n",
      "\n",
      "\n",
      "CPU times: user 276 ms, sys: 134 ms, total: 410 ms\n",
      "Wall time: 2min 17s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "models = {}\n",
    "for source in ['ar', 'gk', 'ar_gk']:\n",
    "    print(source)\n",
    "    train_files = [data_filename(s, 'train') for s in source.split('_')]\n",
    "    test_files = [data_filename(s, 'test') for s in source.split('_')]\n",
    "    models[source] = train_model(source, train_files, test_files)\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59f6f0c9",
   "metadata": {
    "papermill": {
     "duration": 0.077949,
     "end_time": "2022-04-07T13:01:35.992806",
     "exception": false,
     "start_time": "2022-04-07T13:01:35.914857",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The summary report shows for each model and entity type:\n",
    "\n",
    "* True Positives (TP): The number of times that entity was predicted correctly\n",
    "* False Positives (FP): The number of times that entity in the text but not predicted correctly\n",
    "* False Negative (FN): The number of times that entity was not in the text and predicted\n",
    "* Precision (P): Probability a predicted entity is correct, TP/(TP+FP)\n",
    "* Recall (R): Probability a correct entity is predicted, TP/(TP+FN)\n",
    "* F1 Score (F1): Harmonic mean of precision and recall, 2/(1/P + 1/R).\n",
    "\n",
    "We can compare the F1 Totals to the diagonal of Table IV in the paper\n",
    "\n",
    "* AllRecipes.com (ar): We get 0.9682, they report 0.9682\n",
    "* FOOD.com (gk): We get 0.9516, they report 0.9519\n",
    "* Both (ar_gk): We get 0.9551, they report 0.9611\n",
    "\n",
    "These are super close.\n",
    "The furthest is `ar_gk` and in the repository they have a separate `ar_gk_train.tsv`; it would be interesting to check whether using it directly gives a closer result and why there is a difference."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54563997",
   "metadata": {
    "papermill": {
     "duration": 0.0776,
     "end_time": "2022-04-07T13:01:36.149703",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.072103",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Running the model in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb2def70",
   "metadata": {
    "papermill": {
     "duration": 0.076927,
     "end_time": "2022-04-07T13:01:36.304500",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.227573",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can now use these trained models in Python by invoking Stanford NLP with Stanza.\n",
    "\n",
    "First we'll load in the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "0564ac40",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:36.466360Z",
     "iopub.status.busy": "2022-04-07T13:01:36.465286Z",
     "iopub.status.idle": "2022-04-07T13:01:36.487026Z",
     "shell.execute_reply": "2022-04-07T13:01:36.487684Z",
     "shell.execute_reply.started": "2022-04-07T12:52:33.237540Z"
    },
    "papermill": {
     "duration": 0.105083,
     "end_time": "2022-04-07T13:01:36.487888",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.382805",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ar 483\n",
      "gk 1705\n"
     ]
    }
   ],
   "source": [
    "test_data = {}\n",
    "\n",
    "for source in data_sources:\n",
    "    test_data[source] = segment_file(data_filename(source, 'test'))\n",
    "    print(source, len(test_data[source]))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e03b03d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-06T23:58:43.359429Z",
     "iopub.status.busy": "2022-04-06T23:58:43.359055Z",
     "iopub.status.idle": "2022-04-06T23:58:43.365707Z",
     "shell.execute_reply": "2022-04-06T23:58:43.36474Z",
     "shell.execute_reply.started": "2022-04-06T23:58:43.35939Z"
    },
    "papermill": {
     "duration": 0.078031,
     "end_time": "2022-04-07T13:01:36.643468",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.565437",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can call StanfordNLP with our custom model by passing the property `ner.model`.\n",
    "\n",
    "Our test data is already tokenized in a different way to StanfordNLP, so we'll add an option to the [Tokenizer](https://stanfordnlp.github.io/CoreNLP/tokenize.html) to use whitespace tokenization which is easy to invert.\n",
    "\n",
    "It takes a while to start up the server so we want to annotate a large number of texts at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "30730c9e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:36.807245Z",
     "iopub.status.busy": "2022-04-07T13:01:36.806406Z",
     "iopub.status.idle": "2022-04-07T13:01:36.813188Z",
     "shell.execute_reply": "2022-04-07T13:01:36.813766Z",
     "shell.execute_reply.started": "2022-04-07T12:52:33.259904Z"
    },
    "papermill": {
     "duration": 0.09251,
     "end_time": "2022-04-07T13:01:36.813991",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.721481",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from tqdm.notebook import tqdm\n",
    "from stanza.server import CoreNLPClient\n",
    "\n",
    "def annotate_ner(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True):\n",
    "    properties = {\"ner.model\": ner_model_file, \"tokenize.whitespace\": tokenize_whitespace, \"ner.applyNumericClassifiers\": False}\n",
    "    \n",
    "    annotated = []\n",
    "    with CoreNLPClient(\n",
    "         annotators=['tokenize','ssplit','ner'],\n",
    "         properties=properties,\n",
    "         timeout=30000,\n",
    "         be_quiet=True,\n",
    "        memory='6G') as client:\n",
    "    \n",
    "        for text in tqdm(texts):\n",
    "            annotated.append(client.annotate(text))\n",
    "    return annotated"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "392be996",
   "metadata": {
    "papermill": {
     "duration": 0.077305,
     "end_time": "2022-04-07T13:01:36.971129",
     "exception": false,
     "start_time": "2022-04-07T13:01:36.893824",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can then get the annotations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "b2b52eb9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:37.136508Z",
     "iopub.status.busy": "2022-04-07T13:01:37.135776Z",
     "iopub.status.idle": "2022-04-07T13:01:51.580370Z",
     "shell.execute_reply": "2022-04-07T13:01:51.579773Z",
     "shell.execute_reply.started": "2022-04-07T12:52:33.268568Z"
    },
    "papermill": {
     "duration": 14.527687,
     "end_time": "2022-04-07T13:01:51.580543",
     "exception": false,
     "start_time": "2022-04-07T13:01:37.052856",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9c0da7f9ecde470a8ad513bd82b4e638",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/4 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "annotations = annotate_ner(models['ar'],\n",
    "                           ['1 cup of frozen peas',\n",
    "                            'A dash of salt . Or to taste',\n",
    "                           '12 slices pancetta -LRB- Italian unsmoked cured bacon -RRB-',\n",
    "                           'pumpkin sliced into 3 cm moons'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eb072c0",
   "metadata": {
    "papermill": {
     "duration": 0.077157,
     "end_time": "2022-04-07T13:01:51.736018",
     "exception": false,
     "start_time": "2022-04-07T13:01:51.658861",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Note here that the word \"Italian\" has ner \"NATIONALITY\", which comes from another model (it wasn't in the training set!).\n",
    "\n",
    "We want to use the `coarseNER`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "8f65e5d0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:51.899603Z",
     "iopub.status.busy": "2022-04-07T13:01:51.898653Z",
     "iopub.status.idle": "2022-04-07T13:01:51.902411Z",
     "shell.execute_reply": "2022-04-07T13:01:51.902921Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.398492Z"
    },
    "papermill": {
     "duration": 0.088481,
     "end_time": "2022-04-07T13:01:51.903104",
     "exception": false,
     "start_time": "2022-04-07T13:01:51.814623",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "word: \"Italian\"\n",
       "pos: \"JJ\"\n",
       "value: \"Italian\"\n",
       "originalText: \"Italian\"\n",
       "ner: \"NATIONALITY\"\n",
       "lemma: \"italian\"\n",
       "beginChar: 25\n",
       "endChar: 32\n",
       "tokenBeginIndex: 4\n",
       "tokenEndIndex: 5\n",
       "hasXmlContext: false\n",
       "isNewline: false\n",
       "coarseNER: \"O\"\n",
       "fineGrainedNER: \"NATIONALITY\"\n",
       "entityMentionIndex: 3\n",
       "nerLabelProbs: \"O=0.870902471545891\""
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "annotations[2].sentence[0].token[4]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5113166",
   "metadata": {
    "papermill": {
     "duration": 0.077775,
     "end_time": "2022-04-07T13:01:52.059609",
     "exception": false,
     "start_time": "2022-04-07T13:01:51.981834",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "When I didn't set `\"ner.applyNumericClassifiers\": False` this would come up as a `NUMBER`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "fa0cfe7d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:52.223603Z",
     "iopub.status.busy": "2022-04-07T13:01:52.222675Z",
     "iopub.status.idle": "2022-04-07T13:01:52.226917Z",
     "shell.execute_reply": "2022-04-07T13:01:52.226383Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.408128Z"
    },
    "papermill": {
     "duration": 0.089989,
     "end_time": "2022-04-07T13:01:52.227089",
     "exception": false,
     "start_time": "2022-04-07T13:01:52.137100",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "word: \"3\"\n",
       "pos: \"CD\"\n",
       "value: \"3\"\n",
       "originalText: \"3\"\n",
       "ner: \"O\"\n",
       "lemma: \"3\"\n",
       "beginChar: 20\n",
       "endChar: 21\n",
       "tokenBeginIndex: 3\n",
       "tokenEndIndex: 4\n",
       "hasXmlContext: false\n",
       "isNewline: false\n",
       "coarseNER: \"O\"\n",
       "fineGrainedNER: \"O\"\n",
       "nerLabelProbs: \"O=0.8599887537555505\""
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "annotations[3].sentence[0].token[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ffc242b",
   "metadata": {
    "papermill": {
     "duration": 0.078642,
     "end_time": "2022-04-07T13:01:52.389005",
     "exception": false,
     "start_time": "2022-04-07T13:01:52.310363",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can then flatten the sentences and extract the NER tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "16628086",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:52.593218Z",
     "iopub.status.busy": "2022-04-07T13:01:52.587639Z",
     "iopub.status.idle": "2022-04-07T13:01:52.596815Z",
     "shell.execute_reply": "2022-04-07T13:01:52.597409Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.420198Z"
    },
    "papermill": {
     "duration": 0.109423,
     "end_time": "2022-04-07T13:01:52.597623",
     "exception": false,
     "start_time": "2022-04-07T13:01:52.488200",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from dataclasses import dataclass, asdict\n",
    "\n",
    "@dataclass\n",
    "class NERData:\n",
    "    ner: List[str]\n",
    "    tokens: List[str]\n",
    "        \n",
    "    # Let's use Pandas to make it pretty in a notebook\n",
    "    def _repr_html_(self):\n",
    "        return pd.DataFrame(asdict(self)).T._repr_html_()\n",
    "\n",
    "def extract_ner_data(annotation) -> NERData:\n",
    "    tokens = [token for sentence in annotation.sentence for token in sentence.token]\n",
    "    return NERData(tokens=[t.word for t in tokens], ner=[t.coarseNER for t in tokens])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41ef7ddd",
   "metadata": {
    "papermill": {
     "duration": 0.079568,
     "end_time": "2022-04-07T13:01:52.761448",
     "exception": false,
     "start_time": "2022-04-07T13:01:52.681880",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "A relatively simple ingredient works well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "9d4dd5cb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:52.923565Z",
     "iopub.status.busy": "2022-04-07T13:01:52.922824Z",
     "iopub.status.idle": "2022-04-07T13:01:52.934377Z",
     "shell.execute_reply": "2022-04-07T13:01:52.934830Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.431673Z"
    },
    "papermill": {
     "duration": 0.094408,
     "end_time": "2022-04-07T13:01:52.935033",
     "exception": false,
     "start_time": "2022-04-07T13:01:52.840625",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ner</th>\n",
       "      <td>QUANTITY</td>\n",
       "      <td>UNIT</td>\n",
       "      <td>O</td>\n",
       "      <td>TEMP</td>\n",
       "      <td>NAME</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tokens</th>\n",
       "      <td>1</td>\n",
       "      <td>cup</td>\n",
       "      <td>of</td>\n",
       "      <td>frozen</td>\n",
       "      <td>peas</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "NERData(ner=['QUANTITY', 'UNIT', 'O', 'TEMP', 'NAME'], tokens=['1', 'cup', 'of', 'frozen', 'peas'])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "extract_ner_data(annotations[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f68d4ab6",
   "metadata": {
    "papermill": {
     "duration": 0.080341,
     "end_time": "2022-04-07T13:01:53.095739",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.015398",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "A more complex sentence does quite badly, perhaps because this kind of thing wasn't seen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "84662f1c",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:53.259562Z",
     "iopub.status.busy": "2022-04-07T13:01:53.258809Z",
     "iopub.status.idle": "2022-04-07T13:01:53.269051Z",
     "shell.execute_reply": "2022-04-07T13:01:53.269619Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.452987Z"
    },
    "papermill": {
     "duration": 0.094219,
     "end_time": "2022-04-07T13:01:53.269840",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.175621",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ner</th>\n",
       "      <td>QUANTITY</td>\n",
       "      <td>UNIT</td>\n",
       "      <td>NAME</td>\n",
       "      <td>NAME</td>\n",
       "      <td>NAME</td>\n",
       "      <td>NAME</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tokens</th>\n",
       "      <td>A</td>\n",
       "      <td>dash</td>\n",
       "      <td>of</td>\n",
       "      <td>salt</td>\n",
       "      <td>.</td>\n",
       "      <td>Or</td>\n",
       "      <td>to</td>\n",
       "      <td>taste</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'NAME', 'NAME', 'NAME', 'O', 'O'], tokens=['A', 'dash', 'of', 'salt', '.', 'Or', 'to', 'taste'])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "extract_ner_data(annotations[1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "e85ccce9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:53.436990Z",
     "iopub.status.busy": "2022-04-07T13:01:53.436294Z",
     "iopub.status.idle": "2022-04-07T13:01:53.445951Z",
     "shell.execute_reply": "2022-04-07T13:01:53.446566Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.468618Z"
    },
    "papermill": {
     "duration": 0.094843,
     "end_time": "2022-04-07T13:01:53.446783",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.351940",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ner</th>\n",
       "      <td>QUANTITY</td>\n",
       "      <td>UNIT</td>\n",
       "      <td>NAME</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>tokens</th>\n",
       "      <td>12</td>\n",
       "      <td>slices</td>\n",
       "      <td>pancetta</td>\n",
       "      <td>-LRB-</td>\n",
       "      <td>Italian</td>\n",
       "      <td>unsmoked</td>\n",
       "      <td>cured</td>\n",
       "      <td>bacon</td>\n",
       "      <td>-RRB-</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "NERData(ner=['QUANTITY', 'UNIT', 'NAME', 'O', 'O', 'O', 'O', 'O', 'O'], tokens=['12', 'slices', 'pancetta', '-LRB-', 'Italian', 'unsmoked', 'cured', 'bacon', '-RRB-'])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "extract_ner_data(annotations[2])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d067bdb5",
   "metadata": {
    "papermill": {
     "duration": 0.080591,
     "end_time": "2022-04-07T13:01:53.610955",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.530364",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can chain these functions together to get from text to NER"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "7e1f8e27",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:53.777728Z",
     "iopub.status.busy": "2022-04-07T13:01:53.777006Z",
     "iopub.status.idle": "2022-04-07T13:01:53.782184Z",
     "shell.execute_reply": "2022-04-07T13:01:53.782741Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.490584Z"
    },
    "papermill": {
     "duration": 0.091584,
     "end_time": "2022-04-07T13:01:53.782964",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.691380",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from typing import Dict\n",
    "\n",
    "def ner_extract(ner_model_file: str, texts: List[str], tokenize_whitespace: bool = True) -> List[Dict[str, List[str]]]:\n",
    "    annotations = annotate_ner(ner_model_file, texts, tokenize_whitespace)\n",
    "    return [extract_ner_data(ann) for ann in annotations]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92d193f3",
   "metadata": {
    "papermill": {
     "duration": 0.081116,
     "end_time": "2022-04-07T13:01:53.944944",
     "exception": false,
     "start_time": "2022-04-07T13:01:53.863828",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "And then for each model, and test data we can calculate the predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "2c32ef0a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:01:54.111985Z",
     "iopub.status.busy": "2022-04-07T13:01:54.111261Z",
     "iopub.status.idle": "2022-04-07T13:06:33.038766Z",
     "shell.execute_reply": "2022-04-07T13:06:33.039358Z",
     "shell.execute_reply.started": "2022-04-07T12:52:46.497380Z"
    },
    "papermill": {
     "duration": 279.012135,
     "end_time": "2022-04-07T13:06:33.039789",
     "exception": false,
     "start_time": "2022-04-07T13:01:54.027654",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "26e07430d6f54a9bbd9acea509197ce1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/483 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5ae58c17d7a2462dadf527338d943cea",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1705 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "39c71bb6e43e4330b2b149c25da98d1d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/483 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "652276ddba8640708df4285c1ddf5ff9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1705 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "87190b2caa46403a82d7cb69319b1262",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/483 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8ab8307799ba406e85852cb620015885",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/1705 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "preds = {}\n",
    "for model, modelfile in models.items():\n",
    "    preds[model] = {}\n",
    "    for test_source, token_data in test_data.items():\n",
    "        texts = [' '.join([x[0] for x in text]) for text in token_data]\n",
    "        preds[model][test_source] = ner_extract(modelfile, texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d2fdccc",
   "metadata": {
    "papermill": {
     "duration": 0.090737,
     "end_time": "2022-04-07T13:06:33.217549",
     "exception": false,
     "start_time": "2022-04-07T13:06:33.126812",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Sanity checks\n",
    "\n",
    "Let's check the same tokens come through the model as were input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "5fc5053e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:33.412009Z",
     "iopub.status.busy": "2022-04-07T13:06:33.410951Z",
     "iopub.status.idle": "2022-04-07T13:06:33.414218Z",
     "shell.execute_reply": "2022-04-07T13:06:33.413529Z",
     "shell.execute_reply.started": "2022-04-07T12:56:31.467543Z"
    },
    "papermill": {
     "duration": 0.109894,
     "end_time": "2022-04-07T13:06:33.414392",
     "exception": false,
     "start_time": "2022-04-07T13:06:33.304498",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "for test_source, token_data in test_data.items():\n",
    "    tokens = [[x[0] for x in tokens] for tokens in token_data]\n",
    "    \n",
    "    for model in models:\n",
    "        model_preds = preds[model][test_source]\n",
    "        \n",
    "        model_tokens = [p.tokens for p in model_preds]\n",
    "        \n",
    "        if tokens != model_tokens:\n",
    "            raise ValueError(\"Tokenization issue in %s with model %s\" % (test_source, model))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "384d097b",
   "metadata": {
    "papermill": {
     "duration": 0.086069,
     "end_time": "2022-04-07T13:06:33.585161",
     "exception": false,
     "start_time": "2022-04-07T13:06:33.499092",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Evaluating\n",
    "\n",
    "Now that we have predictions we can evaulate with [seqeval](https://github.com/chakki-works/seqeval)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "1ecad075",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:33.765716Z",
     "iopub.status.busy": "2022-04-07T13:06:33.764994Z",
     "iopub.status.idle": "2022-04-07T13:06:50.298649Z",
     "shell.execute_reply": "2022-04-07T13:06:50.297970Z",
     "shell.execute_reply.started": "2022-04-07T12:56:31.630845Z"
    },
    "papermill": {
     "duration": 16.624488,
     "end_time": "2022-04-07T13:06:50.298801",
     "exception": false,
     "start_time": "2022-04-07T13:06:33.674313",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting seqeval\r\n",
      "  Downloading seqeval-1.2.2.tar.gz (43 kB)\r\n",
      "     |████████████████████████████████| 43 kB 102 kB/s            \r\n",
      "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l-\b \bdone\r\n",
      "\u001b[?25hRequirement already satisfied: numpy>=1.14.0 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.20.3)\r\n",
      "Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/lib/python3.7/site-packages (from seqeval) (1.0.1)\r\n",
      "Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.1.0)\r\n",
      "Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (3.0.0)\r\n",
      "Requirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval) (1.7.3)\r\n",
      "Building wheels for collected packages: seqeval\r\n",
      "  Building wheel for seqeval (setup.py) ... \u001b[?25l-\b \b\\\b \b|\b \bdone\r\n",
      "\u001b[?25h  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16181 sha256=117220ab957b2dfbf6fad8b7cf7fb429b409f1fb1b62fef7ea14d20e38b36203\r\n",
      "  Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7\r\n",
      "Successfully built seqeval\r\n",
      "Installing collected packages: seqeval\r\n",
      "Successfully installed seqeval-1.2.2\r\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\r\n"
     ]
    }
   ],
   "source": [
    "!pip install seqeval"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "442daabb",
   "metadata": {
    "papermill": {
     "duration": 0.088059,
     "end_time": "2022-04-07T13:06:50.475096",
     "exception": false,
     "start_time": "2022-04-07T13:06:50.387037",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Seqeval expects the data to be in one of the following formats:\n",
    "\n",
    "* IOB1\n",
    "* IOB2\n",
    "* IOE1\n",
    "* IOE2\n",
    "* IOBES(only in strict mode)\n",
    "* BILOU(only in strict mode)\n",
    "\n",
    "These all become important when trying to distinguish distinct entities that are adjacent; these are quite rare in practice.\n",
    "See Wikipedia for a detailed explanation of [IOB (inside-outside-beginning)](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)).\n",
    "\n",
    "In this case it's assumed there's only one entity of each type (which can be wrong when multiple names are listing in a single ingredient).\n",
    "We can easily convert it to IOB1 using this assumption by prefixing every tag other than 'O' with an 'I-'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "816f82e2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:50.659609Z",
     "iopub.status.busy": "2022-04-07T13:06:50.658619Z",
     "iopub.status.idle": "2022-04-07T13:06:50.661057Z",
     "shell.execute_reply": "2022-04-07T13:06:50.660405Z",
     "shell.execute_reply.started": "2022-04-07T12:56:44.754303Z"
    },
    "papermill": {
     "duration": 0.098039,
     "end_time": "2022-04-07T13:06:50.661208",
     "exception": false,
     "start_time": "2022-04-07T13:06:50.563169",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def convert_to_iob1(tokens):\n",
    "    return ['I-' + label if label != 'O' else 'O' for label in tokens]\n",
    "\n",
    "assert convert_to_iob1(['QUANTITY', 'SIZE', 'NAME', 'NAME', 'O', 'STATE']) == ['I-QUANTITY', 'I-SIZE', 'I-NAME', 'I-NAME', 'O', 'I-STATE']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bd57cb3",
   "metadata": {
    "papermill": {
     "duration": 0.088945,
     "end_time": "2022-04-07T13:06:50.837958",
     "exception": false,
     "start_time": "2022-04-07T13:06:50.749013",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Let's check the classification report for a single example and compare it to the report from StanfordNER.\n",
    "\n",
    "The classification report doesn't have the TP, TN and FN, but instead has the support - the number of true entities in the data.\n",
    "The set of data is equivalent:\n",
    "\n",
    "* support = TP + FN\n",
    "* TP = R * support\n",
    "* FP = TP (1/P - 1)\n",
    "* FN = support - TP\n",
    "\n",
    "The results are the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "9bf3fab1",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:51.024498Z",
     "iopub.status.busy": "2022-04-07T13:06:51.023501Z",
     "iopub.status.idle": "2022-04-07T13:06:52.408718Z",
     "shell.execute_reply": "2022-04-07T13:06:52.407592Z",
     "shell.execute_reply.started": "2022-04-07T12:56:44.760756Z"
    },
    "papermill": {
     "duration": 1.482137,
     "end_time": "2022-04-07T13:06:52.408950",
     "exception": false,
     "start_time": "2022-04-07T13:06:50.926813",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "          DF     1.0000    0.9608    0.9800        51\n",
      "        NAME     0.9297    0.9279    0.9288       499\n",
      "    QUANTITY     1.0000    0.9962    0.9981       524\n",
      "        SIZE     1.0000    1.0000    1.0000        20\n",
      "       STATE     0.9601    0.9633    0.9617       300\n",
      "        TEMP     0.8750    0.7000    0.7778        10\n",
      "        UNIT     0.9819    0.9841    0.9830       441\n",
      "\n",
      "   micro avg     0.9696    0.9669    0.9682      1845\n",
      "   macro avg     0.9638    0.9332    0.9471      1845\n",
      "weighted avg     0.9695    0.9669    0.9682      1845\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from seqeval.metrics import classification_report\n",
    "\n",
    "test_source = 'ar'\n",
    "model = 'ar'\n",
    "\n",
    "actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in test_data[test_source]]\n",
    "pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n",
    "\n",
    "print(classification_report(actual_ner, pred_ner, digits=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58e08983",
   "metadata": {
    "papermill": {
     "duration": 0.08957,
     "end_time": "2022-04-07T13:06:52.587948",
     "exception": false,
     "start_time": "2022-04-07T13:06:52.498378",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can get the micro f1-score directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "c7f7bfd0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:52.771575Z",
     "iopub.status.busy": "2022-04-07T13:06:52.770572Z",
     "iopub.status.idle": "2022-04-07T13:06:52.797700Z",
     "shell.execute_reply": "2022-04-07T13:06:52.798273Z",
     "shell.execute_reply.started": "2022-04-07T12:56:45.725087Z"
    },
    "papermill": {
     "duration": 0.120808,
     "end_time": "2022-04-07T13:06:52.798476",
     "exception": false,
     "start_time": "2022-04-07T13:06:52.677668",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'0.9682'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from seqeval.metrics import f1_score\n",
    "'%0.4f' % f1_score(actual_ner, pred_ner)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ca961cf",
   "metadata": {
    "papermill": {
     "duration": 0.090745,
     "end_time": "2022-04-07T13:06:52.978636",
     "exception": false,
     "start_time": "2022-04-07T13:06:52.887891",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We can then try to reproduce Table IV by computing the f1-score for each model and data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "97fe7bab",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:53.162034Z",
     "iopub.status.busy": "2022-04-07T13:06:53.161354Z",
     "iopub.status.idle": "2022-04-07T13:06:53.462721Z",
     "shell.execute_reply": "2022-04-07T13:06:53.462124Z",
     "shell.execute_reply.started": "2022-04-07T12:56:45.748363Z"
    },
    "papermill": {
     "duration": 0.395147,
     "end_time": "2022-04-07T13:06:53.462918",
     "exception": false,
     "start_time": "2022-04-07T13:06:53.067771",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "scores = {model: {} for model in models}\n",
    "for test_source, data in test_data.items():\n",
    "    actual_ner = [convert_to_iob1([x[1] for x in ann]) for ann in data]\n",
    "    for model in models:\n",
    "        pred_ner = [convert_to_iob1(p.ner) for p in preds[model][test_source]]\n",
    "        scores[model][test_source] = f1_score(actual_ner, pred_ner)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02ace148",
   "metadata": {
    "papermill": {
     "duration": 0.093906,
     "end_time": "2022-04-07T13:06:53.649107",
     "exception": false,
     "start_time": "2022-04-07T13:06:53.555201",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We also need to calculate the scores on the combined test set, by contatenating them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "d19d3eb8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:53.834560Z",
     "iopub.status.busy": "2022-04-07T13:06:53.833800Z",
     "iopub.status.idle": "2022-04-07T13:06:54.131037Z",
     "shell.execute_reply": "2022-04-07T13:06:54.131536Z",
     "shell.execute_reply.started": "2022-04-07T12:56:45.944156Z"
    },
    "papermill": {
     "duration": 0.392623,
     "end_time": "2022-04-07T13:06:54.131761",
     "exception": false,
     "start_time": "2022-04-07T13:06:53.739138",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "actual_ner = [convert_to_iob1([x[1] for x in ann]) for data in test_data.values() for ann in data]\n",
    "for model in models:\n",
    "    pred_ner = [convert_to_iob1(p.ner) for test_source in test_data for p in preds[model][test_source]]\n",
    "    scores[model]['combined'] = f1_score(actual_ner, pred_ner)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "be344047",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:54.322051Z",
     "iopub.status.busy": "2022-04-07T13:06:54.321346Z",
     "iopub.status.idle": "2022-04-07T13:06:54.399490Z",
     "shell.execute_reply": "2022-04-07T13:06:54.398955Z",
     "shell.execute_reply.started": "2022-04-07T12:56:46.135926Z"
    },
    "papermill": {
     "duration": 0.177398,
     "end_time": "2022-04-07T13:06:54.399653",
     "exception": false,
     "start_time": "2022-04-07T13:06:54.222255",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "</style>\n",
       "<table id=\"T_1db5d_\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th class=\"blank level0\" >&nbsp;</th>\n",
       "      <th class=\"col_heading level0 col0\" >ar</th>\n",
       "      <th class=\"col_heading level0 col1\" >gk</th>\n",
       "      <th class=\"col_heading level0 col2\" >ar_gk</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th id=\"T_1db5d_level0_row0\" class=\"row_heading level0 row0\" >ar</th>\n",
       "      <td id=\"T_1db5d_row0_col0\" class=\"data row0 col0\" >0.9682</td>\n",
       "      <td id=\"T_1db5d_row0_col1\" class=\"data row0 col1\" >0.9331</td>\n",
       "      <td id=\"T_1db5d_row0_col2\" class=\"data row0 col2\" >0.9704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_1db5d_level0_row1\" class=\"row_heading level0 row1\" >gk</th>\n",
       "      <td id=\"T_1db5d_row1_col0\" class=\"data row1 col0\" >0.8666</td>\n",
       "      <td id=\"T_1db5d_row1_col1\" class=\"data row1 col1\" >0.9511</td>\n",
       "      <td id=\"T_1db5d_row1_col2\" class=\"data row1 col2\" >0.9499</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th id=\"T_1db5d_level0_row2\" class=\"row_heading level0 row2\" >combined</th>\n",
       "      <td id=\"T_1db5d_row2_col0\" class=\"data row2 col0\" >0.8911</td>\n",
       "      <td id=\"T_1db5d_row2_col1\" class=\"data row2 col1\" >0.9469</td>\n",
       "      <td id=\"T_1db5d_row2_col2\" class=\"data row2 col2\" >0.9549</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x7fdef2b77ed0>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(scores).style.format('{:0.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e61f00f",
   "metadata": {
    "papermill": {
     "duration": 0.090834,
     "end_time": "2022-04-07T13:06:54.582503",
     "exception": false,
     "start_time": "2022-04-07T13:06:54.491669",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The results are *slightly* different to those in the paper, but all agree within 0.01 for each row.\n",
    "\n",
    "So we've successfully reproduced the results in the paper, and shown the evaulation from Stanford NER toolkit is very close to that of seqeval (if you work around hallucinated entities)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "94e67a94",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-04-07T13:06:54.781566Z",
     "iopub.status.busy": "2022-04-07T13:06:54.780354Z",
     "iopub.status.idle": "2022-04-07T13:06:54.784566Z",
     "shell.execute_reply": "2022-04-07T13:06:54.785228Z",
     "shell.execute_reply.started": "2022-04-07T12:56:46.204022Z"
    },
    "papermill": {
     "duration": 0.11174,
     "end_time": "2022-04-07T13:06:54.785439",
     "exception": false,
     "start_time": "2022-04-07T13:06:54.673699",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AllRecipes</th>\n",
       "      <th>FOOD.com</th>\n",
       "      <th>BOTH</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AllRecipes</th>\n",
       "      <td>0.9682</td>\n",
       "      <td>0.9317</td>\n",
       "      <td>0.9709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FOOD.com</th>\n",
       "      <td>0.8672</td>\n",
       "      <td>0.9519</td>\n",
       "      <td>0.9498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BOTH</th>\n",
       "      <td>0.8972</td>\n",
       "      <td>0.9472</td>\n",
       "      <td>0.9611</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            AllRecipes  FOOD.com    BOTH\n",
       "AllRecipes      0.9682    0.9317  0.9709\n",
       "FOOD.com        0.8672    0.9519  0.9498\n",
       "BOTH            0.8972    0.9472  0.9611"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reported_scores = pd.DataFrame([[0.9682, 0.9317, 0.9709],\n",
    "              [0.8672, 0.9519, 0.9498],\n",
    "              [0.8972, 0.9472, 0.9611]],\n",
    "             columns = ['AllRecipes', 'FOOD.com', 'BOTH'],\n",
    "             index = ['AllRecipes', 'FOOD.com', 'BOTH'])\n",
    "reported_scores"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 567.381694,
   "end_time": "2022-04-07T13:06:56.199628",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2022-04-07T12:57:28.817934",
   "version": "2.3.3"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
     "04e4948e123444569b4b09d696e122ea": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "05cf3b36eb1e4ddb9c4f9b675fdd6b1d": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "081d02fab37a4e29bd2531213f508808": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_f688a6aae02749a6874511e7c4d0da95",
       "max": 1705.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_a6e2ba7f52a042fb8f3595b50c406bc6",
       "value": 1705.0
      }
     },
     "0b95b19116f44d5dae17156430a708ec": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "1455809b8e1945e0bebdc8ea8adcc87d": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "15dca12ce1ee4b8ca4ec62aa8864f8c7": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "171199414c5a4954b0ac767628554d8b": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "183452cec8804fd5903953436a43bf20": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_e4473ff688a9440690198f9e4cf6858e",
       "placeholder": "​",
       "style": "IPY_MODEL_2474d59c1f9d47d4b59ffef46d4f8443",
       "value": " 483/483 [00:29&lt;00:00, 41.45it/s]"
      }
     },
     "19c7b297a53444608e29aa3fcce236b9": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "1a8c3ed5abda483baee14f7c855449cd": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_337d520658594202b091a3a25a60ab2c",
       "placeholder": "​",
       "style": "IPY_MODEL_46133dd43dca46eba08636f28973f190",
       "value": " 483/483 [00:28&lt;00:00, 37.52it/s]"
      }
     },
     "1fe61ae43c2f4a9b93b25d6c60204c8a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "20145d12fdbb45a8a3c7ed1126260403": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "214063d8f2bf4d2f854bdfaa75dd3469": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_661e0655ba4b41139b1be60f73f4ea87",
       "placeholder": "​",
       "style": "IPY_MODEL_385d18bccb9245548deed2a3b554525c",
       "value": " 1705/1705 [01:00&lt;00:00, 36.52it/s]"
      }
     },
     "21a60b9d20784d38afdbdbdf9fa74f78": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "2474d59c1f9d47d4b59ffef46d4f8443": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "263283c3d0764d38b9c3d1ddb6c8b427": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_20145d12fdbb45a8a3c7ed1126260403",
       "placeholder": "​",
       "style": "IPY_MODEL_a7eb0e99aee4463785eea58d0cc847af",
       "value": " 1705/1705 [01:00&lt;00:00, 34.91it/s]"
      }
     },
     "26e07430d6f54a9bbd9acea509197ce1": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_4191ab859c984f81a50aeb218fc1ac5a",
        "IPY_MODEL_c03a18909027459681be5a9113e88a0a",
        "IPY_MODEL_1a8c3ed5abda483baee14f7c855449cd"
       ],
       "layout": "IPY_MODEL_f04f3af815f24025b341de0249819145"
      }
     },
     "28efee3ebc4b44e4a2411844af855e92": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_53c764a85af5434eabf8513c4a5e8e6f",
       "placeholder": "​",
       "style": "IPY_MODEL_9ad75aea6d7341dba7d6e56cf625c10b",
       "value": "100%"
      }
     },
     "2a539d7b18e4425abbd7ae22e76a26fa": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "2b5a6815608b41d69923bd5e845385a3": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "316de487dcb041e9aef79533ede9ab20": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "337d520658594202b091a3a25a60ab2c": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "345e4d287bd24ae9975f3be4e2027c4c": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "385d18bccb9245548deed2a3b554525c": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "38d19c56c37947469a993acb7f0213a0": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "39c71bb6e43e4330b2b149c25da98d1d": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_28efee3ebc4b44e4a2411844af855e92",
        "IPY_MODEL_8bc62d6d2ec34b848127dfe4d57b8c0e",
        "IPY_MODEL_a9267cfc2fb54d408ab6377e7cf9a2cb"
       ],
       "layout": "IPY_MODEL_ec60bd5d40374f4296bf13b4d1a6e957"
      }
     },
     "3cc71d45aca94e189e597f9532b0ff83": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "40754c3ca8c049eeaf44280e861bb455": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_2a539d7b18e4425abbd7ae22e76a26fa",
       "placeholder": "​",
       "style": "IPY_MODEL_345e4d287bd24ae9975f3be4e2027c4c",
       "value": "Downloading https://huggingface.co/stanfordnlp/CoreNLP/resolve/main/stanford-corenlp-latest.zip: 100%"
      }
     },
     "4191ab859c984f81a50aeb218fc1ac5a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_4827280778c24f8ca7bfc579f2bc0806",
       "placeholder": "​",
       "style": "IPY_MODEL_5fe9b045a4014a4a80ab8190de207966",
       "value": "100%"
      }
     },
     "42e7709bb52842e999ccf9a7a385973a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_aacee86f941e404b8c9f0345c0e85a94",
       "placeholder": "​",
       "style": "IPY_MODEL_04e4948e123444569b4b09d696e122ea",
       "value": "100%"
      }
     },
     "44ca9e6935704d00925e55eaf8f5e5ec": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_38d19c56c37947469a993acb7f0213a0",
       "placeholder": "​",
       "style": "IPY_MODEL_0b95b19116f44d5dae17156430a708ec",
       "value": "100%"
      }
     },
     "46133dd43dca46eba08636f28973f190": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "47ef62581c17433e93855d6789befc2b": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_21a60b9d20784d38afdbdbdf9fa74f78",
       "placeholder": "​",
       "style": "IPY_MODEL_ae9e033f42884508ac8a7f49f6e62c2a",
       "value": "100%"
      }
     },
     "4827280778c24f8ca7bfc579f2bc0806": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "4b524361ba0947b9af302faa8518050f": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_73fc8072162e45de89abf389fa5d4b90",
       "max": 1705.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_5012502d87f24cd192200979e56af080",
       "value": 1705.0
      }
     },
     "5012502d87f24cd192200979e56af080": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "53c764a85af5434eabf8513c4a5e8e6f": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "555a9bda2a024e9a8a9abc444e4fe4cf": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_15dca12ce1ee4b8ca4ec62aa8864f8c7",
       "max": 505207915.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_730961bd290541e9a01938ca4876beb9",
       "value": 505207915.0
      }
     },
     "5ae58c17d7a2462dadf527338d943cea": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_47ef62581c17433e93855d6789befc2b",
        "IPY_MODEL_757532f367d248dcb369c157e5565861",
        "IPY_MODEL_263283c3d0764d38b9c3d1ddb6c8b427"
       ],
       "layout": "IPY_MODEL_76e64492e3d7480aa2d96bfc56f3cfa8"
      }
     },
     "5cb06a46903341a9b50676a31addad14": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "5fe9b045a4014a4a80ab8190de207966": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "652276ddba8640708df4285c1ddf5ff9": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_8e12508157c44fbcbb77aac67bc12549",
        "IPY_MODEL_4b524361ba0947b9af302faa8518050f",
        "IPY_MODEL_71093fc9838b4e9ea8a1fbeda6150111"
       ],
       "layout": "IPY_MODEL_2b5a6815608b41d69923bd5e845385a3"
      }
     },
     "661e0655ba4b41139b1be60f73f4ea87": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "67197e8331f74dd1a7ae96d9d4ee7490": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_75992a1df6f6420e8bc06349c0a99076",
       "max": 483.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_93654bad7fcf4739bb54a6b45c60c5ed",
       "value": 483.0
      }
     },
     "6a9d691c650c4d779882190429cbe86b": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_05cf3b36eb1e4ddb9c4f9b675fdd6b1d",
       "placeholder": "​",
       "style": "IPY_MODEL_5cb06a46903341a9b50676a31addad14",
       "value": " 505M/505M [00:30&lt;00:00, 17.9MB/s]"
      }
     },
     "6fc700ddba53400b889b51216731058b": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "71093fc9838b4e9ea8a1fbeda6150111": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_8594609c694247669c25c78c8ded0342",
       "placeholder": "​",
       "style": "IPY_MODEL_b9e45e43c1ae4ff483e64abefa8ed8de",
       "value": " 1705/1705 [01:02&lt;00:00, 37.06it/s]"
      }
     },
     "730961bd290541e9a01938ca4876beb9": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "73fc8072162e45de89abf389fa5d4b90": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "757532f367d248dcb369c157e5565861": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_acb9785ff9dc422bb40cb371d4180419",
       "max": 1705.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_c7ab303608734bf18388966fda9a2d77",
       "value": 1705.0
      }
     },
     "75992a1df6f6420e8bc06349c0a99076": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "75b56360c2064ec2b7b5d7d4bdc22d42": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_c84f6aba9d924d5ba166018593531c9a",
       "placeholder": "​",
       "style": "IPY_MODEL_cdf8f8e1213e4056aff15837fd6cd1b8",
       "value": " 4/4 [00:13&lt;00:00,  3.62s/it]"
      }
     },
     "76e64492e3d7480aa2d96bfc56f3cfa8": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "7f60ca33e67449c1bab4a7c52e290d35": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "8594609c694247669c25c78c8ded0342": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "87190b2caa46403a82d7cb69319b1262": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_44ca9e6935704d00925e55eaf8f5e5ec",
        "IPY_MODEL_67197e8331f74dd1a7ae96d9d4ee7490",
        "IPY_MODEL_183452cec8804fd5903953436a43bf20"
       ],
       "layout": "IPY_MODEL_c12c8164a6f2450b804b2f350ed2d580"
      }
     },
     "8ab8307799ba406e85852cb620015885": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_42e7709bb52842e999ccf9a7a385973a",
        "IPY_MODEL_081d02fab37a4e29bd2531213f508808",
        "IPY_MODEL_214063d8f2bf4d2f854bdfaa75dd3469"
       ],
       "layout": "IPY_MODEL_171199414c5a4954b0ac767628554d8b"
      }
     },
     "8bc62d6d2ec34b848127dfe4d57b8c0e": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_7f60ca33e67449c1bab4a7c52e290d35",
       "max": 483.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_99a70625c1a5402e9214a7dc5ed53cd1",
       "value": 483.0
      }
     },
     "8e12508157c44fbcbb77aac67bc12549": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_e6207369245244299c4e72cab9ee9e6f",
       "placeholder": "​",
       "style": "IPY_MODEL_316de487dcb041e9aef79533ede9ab20",
       "value": "100%"
      }
     },
     "92d594f090a64ae0994bb4e7e59a362b": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_1455809b8e1945e0bebdc8ea8adcc87d",
       "max": 4.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_3cc71d45aca94e189e597f9532b0ff83",
       "value": 4.0
      }
     },
     "93654bad7fcf4739bb54a6b45c60c5ed": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "94f4b01c55604c8ead1ebc742ec981dc": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "99a70625c1a5402e9214a7dc5ed53cd1": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "9ad75aea6d7341dba7d6e56cf625c10b": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "9c0da7f9ecde470a8ad513bd82b4e638": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_eaa545004794465bb69a876a78e8386c",
        "IPY_MODEL_92d594f090a64ae0994bb4e7e59a362b",
        "IPY_MODEL_75b56360c2064ec2b7b5d7d4bdc22d42"
       ],
       "layout": "IPY_MODEL_beca40aef11d4d4fbcc39c9f54709889"
      }
     },
     "a6e2ba7f52a042fb8f3595b50c406bc6": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "a7eb0e99aee4463785eea58d0cc847af": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "a9267cfc2fb54d408ab6377e7cf9a2cb": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_6fc700ddba53400b889b51216731058b",
       "placeholder": "​",
       "style": "IPY_MODEL_d9376b3ffa6741409f6218d1041020cf",
       "value": " 483/483 [00:34&lt;00:00, 38.36it/s]"
      }
     },
     "aacee86f941e404b8c9f0345c0e85a94": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "acb9785ff9dc422bb40cb371d4180419": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "ae9e033f42884508ac8a7f49f6e62c2a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "b90172ce78ce4c519efa02f06f3c6835": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_40754c3ca8c049eeaf44280e861bb455",
        "IPY_MODEL_555a9bda2a024e9a8a9abc444e4fe4cf",
        "IPY_MODEL_6a9d691c650c4d779882190429cbe86b"
       ],
       "layout": "IPY_MODEL_fbe7eab600d2483e98a914ba761c293b"
      }
     },
     "b9e45e43c1ae4ff483e64abefa8ed8de": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "beca40aef11d4d4fbcc39c9f54709889": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "c03a18909027459681be5a9113e88a0a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_19c7b297a53444608e29aa3fcce236b9",
       "max": 483.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_1fe61ae43c2f4a9b93b25d6c60204c8a",
       "value": 483.0
      }
     },
     "c12c8164a6f2450b804b2f350ed2d580": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "c7ab303608734bf18388966fda9a2d77": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "c84f6aba9d924d5ba166018593531c9a": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "cda950e3c9bc4afdb263d7e294ad4276": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "cdf8f8e1213e4056aff15837fd6cd1b8": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "d9376b3ffa6741409f6218d1041020cf": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "DescriptionStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "DescriptionStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "StyleView",
       "description_width": ""
      }
     },
     "e4473ff688a9440690198f9e4cf6858e": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "e6207369245244299c4e72cab9ee9e6f": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "eaa545004794465bb69a876a78e8386c": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "1.5.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "1.5.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "1.5.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_tooltip": null,
       "layout": "IPY_MODEL_94f4b01c55604c8ead1ebc742ec981dc",
       "placeholder": "​",
       "style": "IPY_MODEL_cda950e3c9bc4afdb263d7e294ad4276",
       "value": "100%"
      }
     },
     "ec60bd5d40374f4296bf13b4d1a6e957": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "f04f3af815f24025b341de0249819145": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "f688a6aae02749a6874511e7c4d0da95": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "fbe7eab600d2483e98a914ba761c293b": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "1.2.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "1.2.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "1.2.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "overflow_x": null,
       "overflow_y": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     }
    },
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}