{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Progress report 1\n",
"\n",
"*Asura Enkhbayar, 09.03.2020*\n",
"\n",
"This report covers intermediate results for:\n",
"\n",
"- **Citation parsing** on the unstructured references in order to retrieve identifiers and other metadata in the input dataset\n",
"- **Crossref queries** using the unstructured references from the original dataset\n",
"- Additional **NCBI identifiers** queried with the DOIs retrieved from Crossref\n",
"- **Altmetric counts** for articles with DOIs retrieved from Crossref"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import os\n",
"from pathlib import Path\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"from tracking_grants import project_dir, data_dir\n",
"from tracking_grants import CR_THRESH"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Articles after processing with anystyle\n",
"articles = pd.read_csv(data_dir / \"interim/structured.csv\", index_col=\"article_id\")\n",
"\n",
"# External data from CR/Altmetric\n",
"crossref = pd.read_csv(data_dir / \"interim/_crossref.csv\", index_col=\"article_id\", low_memory=False)\n",
"altmetric = pd.read_csv(data_dir / \"interim/_altmetric.csv\", index_col=\"article_id\", low_memory=False)\n",
"ncbi = pd.read_csv(data_dir / \"interim/_ncbi.csv\", index_col=\"article_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Citation Parsing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our input dataset contains unstructured references in the form of strings that were typed in by the original authors.\n",
"\n",
"Using [anystyle](https://github.com/inukshuk/anystyle) we can attempt to retrieve DOIs, PMIDs, PMCIDs, and other structured metadata from these strings."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Articles with
\n",
"
% (n=18708)
\n",
"
\n",
" \n",
" \n",
"
\n",
"
DOI
\n",
"
546
\n",
"
2.9
\n",
"
\n",
"
\n",
"
PMID
\n",
"
163
\n",
"
0.9
\n",
"
\n",
"
\n",
"
PMCID
\n",
"
96
\n",
"
0.5
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Articles with % (n=18708)\n",
"DOI 546 2.9\n",
"PMID 163 0.9\n",
"PMCID 96 0.5"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = articles[['DOI', 'PMID', 'PMCID']].count().to_frame('Articles with')\n",
"x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n",
"x.round(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**As we can see the number of identifiers extractred from the input dataset is not really useful.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Crossref results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Crossref provides an API endpoint for textual searches using references. We are using that endpoint and retrieving the best candidate for each query. The results contain a score which refers to the quality of the match.\n",
"\n",
"We are currently using 80 as the threshold for that score. We are still hoping to get in touch with one developer at Crossref who has been working on citation matching."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"80"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CR_THRESH"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following plot shows the distribution of matching scores including the score of 80 which I have currently chosen based on some prelim experimentation and manual inspection of random articles."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFNCAYAAAD7De1wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nOzde3xU1b3//9fccr8nMwmEqxduIqJFRVRsPdIIJZUD+tAHnMbab7HH09qKX/laBeqt1l74Fbwc21OsPdXCOXDoEUq1ES9Vq6AFVEAJyMUkkMBM7skkk8lc9u+PkCmRhCSQZC55Px/yMLPX3ns+KzOTz6y1117LZBiGgYiIiEQlc7gDEBERkbOnRC4iIhLFlMhFRESimBK5iIhIFFMiFxERiWJK5CIiIlFMiVykB4FAgN/97nfMnz+fm266iTlz5vCLX/yCtra2cIfWyUMPPcT111/PqlWr+v3c//M//8PatWvPuM/TTz/No48+2mXZ4sWLOXToUL/HJSJgDXcAIpHu4YcfpqGhgd///vekpqbS0tLCfffdx7Jly/jFL34R7vBC1q9fz1tvvUVeXl6/n3vXrl1ceOGFZ338mjVr+jEaETmVErnIGRw9epQtW7bw7rvvkpKSAkBSUhKPPPIIH330EQA//OEPqa+v5+jRo3z5y1/mX//1X3nkkUfYv38/JpOJa6+9lnvvvRer1cpTTz3Fa6+9hs1mIzMzkyeeeAKHw9Ht9smTJ/NP//RP7N+/n5UrV5KUlMTjjz9OfX09gUCAb3zjG9x8880sXLgQwzBYvHgxDz30ENOmTQvV4emnn6a8vJyjR4/icrmYMmUKV199NZs2beLYsWMsXbqUuXPnUl1dzY9+9CNqamqoqqoiPz+f1atX8+GHH/Lmm2/y3nvvkZCQwK233sovfvEL3nrrLSwWC5deeikPPfQQAEeOHOEb3/gGVVVV5OTk8Mtf/hKHw8H111/Pk08+SUtLC6tWrWLkyJEcPHiQtrY2fvSjHzF9+nRqa2t54IEHKC8vJyMjA7vdzoUXXsjdd9/d6TXZuXMnP/3pTwkGgwB85zvfoaCggObmZn784x/z4YcfYrFYuOGGG1iyZAlut7vb16O3v9/m5mYeeOABysrKMJvNXHTRRTz66KOYzerUlAhgiEi3iouLjQULFpxxn/vvv9+4/fbbQ4//3//7f8Zjjz1mBINBw+v1Gt/61reM//iP/zAqKyuNyy67zPB6vYZhGMZvf/tb47XXXut2u2EYxrhx44yXXnrJMAzD8Pl8xpw5c4xPPvnEMAzDaGxsNGbPnm189NFHoX1rampOi++pp54yvvKVrxiNjY2Gx+MxLr/8cuOJJ54wDMMwXnvtNeOrX/2qYRiG8Z//+Z/Gf/zHfxiGYRjBYND49re/bfz2t78N1fG5554zDMMwfv/73xuLFi0yPB6PEQgEjB/84AfGSy+9ZDz11FPG9ddfH4rhrrvuMp555hnDMAzjK1/5irFnzx7j/fffNyZOnGjs27cvVNdFixYZhmEYS5YsMX7+858bhmEYTqfTuPrqq42nnnrqtPoUFRUZf/7znw3DMIySkhLj4YcfNgzDMH7yk58YS5YsMfx+v+H1eo1FixYZ77//frevR19+vy+99JLxrW99yzAMw/D7/cayZcuM0tLS7t8UIoNILXKRMzCbzaGW35l86UtfCv38zjvv8F//9V+YTCbi4uK47bbb+P3vf8+3v/1tJkyYwD//8z8zc+ZMZs6cyVVXXUUwGOxye4eO1nVpaSnl5eU8+OCDobLW1lb27dvH1KlTzxjfjBkzSE1NBcDhcHDttdcCMGrUKOrr6wG4/fbb2blzJ7/73e8oLS3l4MGDXHLJJaeda9u2bdx0000kJCQAsHr1aqC95X/11VeTlZUFwIQJE6itrT3t+OHDhzNx4kQAJk2axEsvvQTA22+/HfrZ4XBw4403dlmX2bNn8+ijj/Lmm28yY8YM7r333lBcDzzwABaLBYvFwh/+8AcA7rnnni5fjzvvvLPXv99rr72WVatW8Y1vfIMZM2Zw++23M3r06DP+zkUGixK5yBlMmTKFI0eO4Ha7Q13rAE6nkxUrVvDUU08B7d3tHb6Y+IPBIH6/H7PZzB/+8Af27t3L9u3b+clPfsKVV17J8uXLu91+6rkDgQBpaWls3rw5dO7q6upQgj6TuLi4To+t1tM/+r/4xS/Ys2cPCxYs4Morr8Tv92N0sRTDF4+trq4O1fnUMpPJ1OXxHV8AvriP1WrttH933da33XYbX/nKV3jvvff429/+xjPPPMOf/vQnrFYrJpMptN/x48dJSEjo9vXo0Jvfb3x8PK+99hoffPAB77//PnfccQfLly/v9suGyGDSBR6RM8jNzaWwsJAHH3wQt9sNgNvt5uGHHyYjI6NTUupwzTXXsHbtWgzDoK2tjQ0bNjBjxgz279/P3LlzOf/88/nOd77DN7/5TQ4cONDt9i8aO3Ys8fHxoURz/Phx5s6dyyeffNIvdX333Xe5/fbbmTdvHtnZ2Wzbto1AIACAxWIJJb+rrrqKP//5z7S1tREMBnn44Yd5+eWXz/n5r7vuOjZu3AhAXV0dr7/+eqfE3OG2226jpKSE+fPn89hjj9HY2EhDQwNXXXUVL730EsFgkLa2Nr7//e+zY8eObl+PLzrT73fdunU88MADXHPNNSxdupRrrrmGgwcPnnOdRfqDWuQiPXjooYd49tlnue2227BYLLS1tXHDDTecNgirw/Lly/nxj39MYWEhPp+Pa6+9ln/9138lLi6O2bNns2DBApKSkkhISGD58uVMmDChy+1fFBcXx7PPPsvjjz/Oc889h9/v5wc/+EGnbv1z8d3vfpef//znPPvss1gsFi677DLKy8sBmDlzJo899hgA3/72t6moqGD+/PkYhsEVV1zBN77xDX71q1+d0/M/8MADLF++nMLCQjIyMhg+fHiXX5Tuu+8+fvKTn7B69WrMZjPf+973GDFiBN/73vd4/PHHuemmmwgEAsyZM4evfvWrXH755V2+Hl90pt/vxIkT+fvf/86cOXNITExk+PDhFBUVnVN9RfqLyeiq70tEZJCtXbuWSZMmcemll9LW1sbChQu5++67ue6668IdmkhEU4tcRCLCBRdcwGOPPUYwGMTn83HjjTcqiYv0glrkIiIiUUyD3URERKKYErmIiEgUUyIXERGJYkrkIiIiUSwqR63X1TUTDPZ+jF52dgo1Ne4BjCj8VMfYoDrGhqFQRxga9YyEOprNJjIzk7stj8pEHgwafUrkHcfEOtUxNqiOsWEo1BGGRj0jvY7qWhcREYliSuQiIiJRTIlcREQkiimRi4iIRDElchERkSimRC4iIhLFlMhFRESimBK5iIhIFFMiFxERiWJRObObxB5/ELw+f7fl8Ta9VUVEuqK/jhIRvD4/O0qc3ZZfPjF3EKMREYke6loXERGJYkrkIiIiUUyJXEREJIrpGrlEpGaPj5KyOlq8fjxeP8UflJOcaOP2gvGMyk0Nd3giIhFDLXKJSNs/PUFJWR3V9a0YBuTnJNPY3MZP137Ip6W14Q5PRCRiqEUuEafB7aWyuoWpF+Yw5fxsoH3UekpKAit+/R6rN+zmW3MmctXkvDBHKiISfmqRS8QpKavHbDYxbmR6p+05GYn8cNGXuHBEOmv+vI+Xt5diGAb+IDR7/d3+8wfDUw8RkcHQq0S+ZcsW5syZw6xZs1i7du1p5SUlJSxYsICCggKWLVuG3995Yo8nn3ySp59++rTjTpw4wRVXXMGxY8fOMnyJNV5fgCOVDYwdlkpC3OkdRkkJVu69dSrTJ+Xyx7eP8F+vH8TT5mNHibPbf2eaaEZEJNr1mMidTierVq1i3bp1bN68mfXr13Po0KFO+yxdupQVK1bw6quvYhgGGzZsAKCpqYkHH3yQ559//rTzBoNBli1bhs/n66eqSCw4eKwBf8Bg4ujMbvexWsx8u3ASX718JK/vOsaLxQcIBo1BjFJEJHL0mMi3bdvG9OnTycjIICkpiYKCAoqLi0PlFRUVtLa2MnXqVADmz58fKn/jjTcYM2YMd9xxx2nnfe6555gxYwaZmd3/wZahJRg0OFBWR25WIllpCWfc12wycev1F7DguvPYud/FXz+qwB9QH7qIDD09DnZzuVzY7fbQY4fDwZ49e7ott9vtOJ3tU23OmzcP4LRu9U8++YQPPviANWvWdNlV35Ps7JQ+H2O3x/4tS9FcR6O2hapGL82tfmZeOoLUlM6JPCkpHji9jt/8+sUkJ8Xzwl9K+OuHFcy99jzirJbTjrVnJQ1sBfpRNL+OvaU6xo6hUM9Ir2OPidwwTu+yNJlMvS7/Io/Hw6OPPsrq1asxm89urF1NjbtPXal2eypVVU1n9VzRItrr2OL189EBFymJNrLT4mhyt3Yq97S24aqFlhbvacd+abyd0uMNvPNxJdt2V/Kl8fZO5S0tXqoCgQGNv79E++vYG6pj7BgK9YyEOprNpjM2YHtM5Lm5uezcuTP02OVy4XA4OpVXV1eHHldVVXUq/6KdO3dSXV3NXXfdFTrfnXfeyTPPPMN5553XUzgSo8qdTbjqPEybYMfcxRdBry9ASZnrtAQPcMk4O2PyUqnIT6OktI5xI9NJTYobjLBFRMKuxybxjBkz2L59O7W1tXg8HrZu3crMmTND5fn5+cTHx7Nr1y4ANm3a1Kn8i6699lrefPNNNm/ezObNm3E4HPzmN79REh/i3ttzHKvFxAX56T3v3I1LL7RjNsPO/VX9GJmISGTrMZHn5uayZMkSioqKmDdvHnPnzmXKlCksXryYvXv3ArBy5UqeeOIJZs+ejcfjoaioaMADl9hhGAYlZXUMz0kmzmbp+YBuJCVYufi8bI663Byvae7HCEVEIlevZnYrLCyksLCw07Y1a9aEfp4wYQIbN27s9vi7776727I333yzNyFIDKuq91DX5OXCEWffGu8waUwmB481sKPExdwZYzCbux+vISISCzSzm4RdSVkdAMOyz31kucVi5kvj7dS72zh4rP6czyciEumUyCXsSsrqSEuOIy25fwaojcpNITczkY8P1uD1RcdodRGRs6VELmFlGAb7y+oYPzLjjLct9oXJZGLaRAdeX4DDFQ39ck4RkUilRC5hVVHdTGOLj3GjMvr1vNlpCaQnx1FRpUFvIhLblMglrEpK26+PjxvZv4kcIN+ejLO2hdY2LZoiIrFLiVzCqqSsDkdGz3Orn40R9hSCBhwo16A3EYldSuQSNoFgkANH65hwhpXOzoUjMxGb1cynn9cOyPlFRCKBErmETdkJNx5vgEljBiaRm80mhuck8+nntQS7WBNARCQWKJFL2JSUtbeUJ4wauKVsR9iTaWxuo9wZ2ws7iMjQpUQuYVNSVscIe3K/3T/eleE5yZiAPYdqBuw5RETCSYlcwsLnD3LwWMOAXR/vkBhvZXReKrsPK5GLSGxSIpewOFLZgM8fZNLorAF/rovGZlF6vJHG5rYBfy4RkcGmRC5hsa+0DpNpYO4f/6KLxmZhAHuPqFUuIrFHiVzC4kB5HWPyUklK6NUCfOdkhCOF9JQ4da+LSExSIpdB5w8EKT3RxAX5A98ah/a516ecl82nn9fgDwQH5TlFRAaLErkMuqMuN23+IOfnpw3ac045PwePN8ChY1pERURiixK5DLojlY0AXJCfPmjPOWlMJmaTiX0n1z4XEYkVA3+BUuQkfxC8Pj8HjtaRnhxHXJyFZm/7gibBAZ54LTHeyvCcZEpPNA7sE4mIDDIlchk0Xp+fHSVO9pfVk5UWz879rlDZJePsA/a8JrOJZq+fEY5k9h6uwd3qC619Hm+zYlW/lIhEMSVyGVQerx+3x8f4fl5//Ey8vgC7P6vCMKC51c9bH1aQkmQD4PKJuVjj9TEQkeiltogMqqp6DwD2jP5ftrQn2entz1nT2Drozy0iMlCUyGVQVdW3YjZB9gCsP96TzNQ4zCaoblAiF5HYoUQug6q63kNWWgIWy+C/9SxmM5mpCdQokYtIDFEil0ETCBpUN7Riz0gMWwzZ6fHUNLZiaH1yEYkRSuQyaCqr3ASCBjlhuD7eITs9AZ8/SFOLL2wxiIj0JyVyGTSfH28CCG+L/OS1eV0nF5FY0atEvmXLFubMmcOsWbNYu3btaeUlJSUsWLCAgoICli1bht/v71T+5JNP8vTTT4ceHz58mIULF3LTTTdx6623UlJSco7VkGjw+fFGEuMtJA/CQindyUiJx2I26Tq5iMSMHhO50+lk1apVrFu3js2bN7N+/XoOHTrUaZ+lS5eyYsUKXn31VQzDYMOGDQA0NTXx4IMP8vzzz3faf/ny5SxevJjNmzdzzz33cP/99/djlSRSlR5vxJ6RGJqMJRzMZhNZafG6BU1EYkaPiXzbtm1Mnz6djIwMkpKSKCgooLi4OFReUVFBa2srU6dOBWD+/Pmh8jfeeIMxY8Zwxx13dDrnLbfcwsyZMwEYP348x48f77cKSWRqbG6juqGVnDB2q3fITkugtrGVoAa8iUgM6DGRu1wu7PZ/TJ/pcDhwOp3dltvt9lD5vHnzuPPOO7FYLJ3OOX/+/NC2p556ihtuuOHcaiER73Bl+6pj4ZgI5ouy0xPwBwwa3W3hDkVE5Jz1eLGyq9t0Tu0a7an8TOf9+c9/zu7du3nhhRd63P9U2dkpfdofwG5P7fMx0SaS63hix1EsZhNjhmdg7eIecpvNSmpK90neZmt/q3a1T2+OPbV81DB4b+8Jmr0BkpLisWcl9aUqAy6SX8f+ojrGjqFQz0ivY4+JPDc3l507d4Yeu1wuHA5Hp/Lq6urQ46qqqk7lXfH7/dx///04nU5eeOEFUlP79kuqqXET7MNyWXZ7KlVVTX16jmgT6XXce7CKfHsyHk/XrWCfz0+Tu/vr1j5f+wDKrvbpzbGnlltMBlaLiWOuJlpavFQFAr2txoCL9NexP6iOsWMo1DMS6mg2m87YgO2xa33GjBls376d2tpaPB4PW7duDV3fBsjPzyc+Pp5du3YBsGnTpk7lXfnZz36G2+3m+eef73MSl+gTNAxKTzQxOi8yXmuzyUR2mmZ4E5HY0KsW+ZIlSygqKsLn83HzzTczZcoUFi9ezPe//30uvvhiVq5cyfLly2lubmbSpEkUFRV1e77a2lrWrl3LiBEjuOWWW0LbN2/e3D81kojjqvPQ2hZgpCMyEjm0XyffX15PIBAMdygiIuekVzf0FhYWUlhY2GnbmjVrQj9PmDCBjRs3dnv83XffHfo5KyuLffv29TVOiWKlJxoBGJmbwvHq5jBH0y47PYFg0OB4TQtpSXHhDkdE5KxpZjcZcGUnmrBazAyLoEFlHTO8lTlj+/qeiMQ+JXIZcGUnmhjpSA7LimfdSU2yEWc1c9TpDncoIiLnJHL+skpMChoGZc4mRuelhTuUTkwmE5lp8RyrUiIXkeimRC4Dqqreg8cbYEyEjFg/VVZqApVVzQSCGvAmItFLiVwGVNmJ9mvQo3MjMJGnxeMLBDlR0xLuUEREzpoSuQyo0hNNWC0m8u3J4Q7lNFknB7yV6zq5iEQxJXIZUGUnmsi3p3Q5LWu4pSfHYbOYNXJdRKJa5P11lZhhGAblzqaIvD4O7dMeDstJplyJXESimBK5DJiqhlaaW/0ReX28w0hHMuVOd5eL/4iIRAMlchkw5R0D3SK0RQ4wwp5Ci9eveddFJGopkcuAKT3RhMVsYoS978vODpYRjvbYyjTgTUSilBK5DJiyE43k25OxWSP3bTY8JxmTCV0nF5GoFbl/YSWqGR1Ll0bw9XGAOJuFYdka8CYi0UuJXAZETWP7QLdIHbF+qlGOFMpd6loXkeikRC4DIjSjW4TNsd6VUbmp1DV5aWxpC3coIiJ9pkQuA6L0RBNmk4kRETij2xeNym0f8KaV0EQkGimRy4AoO9HE8Jxk4myWcIfSo1Enr+PrOrmIRCMlcul3Rmjp0si97exUKYk2stPiNVWriEQlJXLpd/XuNppafBE/Yv1Uo3JTtXiKiEQlJXLpdx0t21FRlsidtS20tvnDHYqISJ8okUu/8Qeh2evncGUDANkZCTR7/aF/wQieznxUbgoGcMzVHO5QRET6xBruACR2eH1+dpQ42Xu4htQkG3sP13Qqv2ScPUyR9azjMkCZs4kLRqSHORoRkd5Ti1z6XW2jl6y0hHCH0Ssms4lmr5+4OAvJCVaOHG/s1IvgD4Y7QhGRM1OLXPpVmy+A2+Pjwihp1Xp9AXZ/VgVAWnIcB8rr2FHiDJVfPjEXa7w+JiISudQil35V2+QFICstPsyR9F1WWgL1TV4CQTXDRSR6KJFLv6pr7Ejk0dG1fqrs9ASCBtQ3aapWEYkeSuTSr2obW0mIs5AYhd3R2Sd7EWoaW8MciYhI7/UqkW/ZsoU5c+Ywa9Ys1q5de1p5SUkJCxYsoKCggGXLluH3d74X98knn+Tpp58OPW5sbOTOO+9k9uzZLFq0iKqqqnOshkSK2qboGej2RSmJNmxWM7VK5CISRXpM5E6nk1WrVrFu3To2b97M+vXrOXToUKd9li5dyooVK3j11VcxDIMNGzYA0NTUxIMPPsjzzz/faf/Vq1czbdo0/vKXv3DLLbfw+OOP92OVJFx8/iD1bm9UXh8HMJlMZKclUHPy8oCISDToMZFv27aN6dOnk5GRQVJSEgUFBRQXF4fKKyoqaG1tZerUqQDMnz8/VP7GG28wZswY7rjjjk7nfOuttygsLARg7ty5vPPOO/h8vn6rlITHiZpmDAOyUqMzkUP7IL26Ji/BSJ69RkTkFD1eyHS5XNjt/5jIw+FwsGfPnm7L7XY7Tmf77Tvz5s0D6NSt/sVjrFYrKSkp1NbWkpub26ugs7P7vhiH3R4904WerXDXcfPPngPOY2ReOqkppydzm81KakrX3e5nKusoB7rcpzfH9vZ5R+Smsq+0jrYg2NMSSEqKx56V1O25B0K4X8fBoDrGjqFQz0ivY4+J3DBOb5mYTKZel/eW2dz7cXc1Ne4+tZjs9lSqqmJ7ZatIqGOlkYTN8GMmSJP79OvMPp+/y+09lXWUA30+b1+fNymufdnVoycaSbCaaGnxUhUIdHvu/hYJr+NAUx1jx1CoZyTU0Ww2nbEB22P2zM3Npbq6OvTY5XLhcDi6La+qqupU3hWHwxE6xu/343a7ycjI6CkUiXAnSMIebD6rL3KRIjXJhs2iAW8iEj16TOQzZsxg+/bt1NbW4vF42Lp1KzNnzgyV5+fnEx8fz65duwDYtGlTp/KuXHfddWzatAmAV155hWnTpmGz2c6lHhJmQcPASTKOYHQvOmIymchKi6emQYlcRKJDr1rkS5YsoaioiHnz5jF37lymTJnC4sWL2bt3LwArV67kiSeeYPbs2Xg8HoqKis54zh/84Ad8/PHHfO1rX2PdunX86Ec/6p/aSNhU1Xlow4I9GP1remelJWjAm4hEjV7N2lFYWBgaZd5hzZo1oZ8nTJjAxo0buz3+7rvv7vQ4IyODX//6132JUyJcxxrksZDIs9PjCZQZNDRrhjcRiXya2U36xVGXGzNBso2WcIdyzjomtNF1chGJBkrk0i/KnE3Y8WAl+ruj05LjsFpMmqpVRKKCErn0i3KnmzyivzUOYDaZyExNoKZBM7yJSORTIpdz1uD20tjcRi7RPWL9VNlp8dQ1tWrAm4hEPCVyOWflrvYBbnmxlMjTE/AHDFx1nnCHIiJyRkrkcs7KT45Yj5WudfjHgLejrtietUpEop8SuZyzoy43OekJJJgGbyrTgZaeHIfFbOKoK/pvpxOR2KZELues3OlmVG5kLyrQV2aziczUeI46lchFJLIpkcs58bYFcNa2MMrR9xXpIl1WWgLHqtwEu1gYSEQkUiiRyzk5WuXGAEbmxl4iz06Lp7UtQFW9BryJSORSIpdzcvTkQLdRjtjqWod/DHgrV/e6iEQwJXI5J+UuN8kJVrLS4sMdSr/LSI3DbDZRdkIj10UkcimRyzkpd7oZ6UiJ6jXIu2MxmxmenRRaEEZEJBIpkctZCwSDHKuKvRHrpxrhSKHc2YShAW8iEqGUyOWsOWs9+PxBRsbgiPUOIxwpNLX4qGvSvOsiEpmUyOWsdczoNjqGW+QdX1LUvS4ikUqJXM5aucuN1WIiLzsp3KEMmHx7CiY0cl1EIpcSuZy1o84m8nNSsFpi920Ub7OQl52kkesiErGs4Q5Aooc/CF6fHwDDMChzupl8XhbN3vZtxOh4sNG5qRw4Wh/uMEREuqRELr3m9fnZUeIEoKXVj9vjIxA0QttGhzO4ATQqN5X39zlpbGkjLSku3OGIiHQSu32iMqBqm1oByEqNvYlgvmj0yelnyzXgTUQikBK5nJW6xvbbsTKHQCIfldc+Kl/XyUUkEimRy1mpbWwlJdFGnM0S7lAGXHKCjZz0BI1cF5GIpEQuZ6W2yRuT86t3Z3Requ4lF5GIpEQufebzB2lq8Q2J6+MdRuWm4qrz0NLqD3coIiKdKJFLn9WdHOiWeXKZz6GgY/a6oy61ykUksvQqkW/ZsoU5c+Ywa9Ys1q5de1p5SUkJCxYsoKCggGXLluH3t7daKisrWbRoETfeeCN33XUXzc3NADQ0NLB48WK+/vWvc/PNN1NSUtKPVZKBVntyoNtQ61oHKNN1chGJMD0mcqfTyapVq1i3bh2bN29m/fr1HDp0qNM+S5cuZcWKFbz66qsYhsGGDRsAeOSRR1i4cCHFxcVMnjyZZ599FoDf/e53jBs3jj/96U/827/9G48++ugAVE0GSm2Tl3ibhaT4oTMNQXpyHOkpcRq5LiIRp8dEvm3bNqZPn05GRgZJSUkUFBRQXFwcKq+oqKC1tZWpU6cCMH/+fIqLi/H5fOzYsYOCgoJO2wGCwWCode7xeEhIGDpdtLGgtrGVrLT4mFyD/ExG56bqXnIRiTg9NqlcLhd2uz302OFwsGfPnm7L7XY7TqeTuro6UlJSsFqtnbYDfOtb3+LWW2/lmmuuobm5meeff75PQWdn933ZTLs9dlfo6jDQdTRqW0hKiqfe3caUC3JITen8BcxkMmGxmE/b3sFms55VWUc50OU+vclOjzIAACAASURBVDn2bJ83Lt6GcXIu+THD09l7uIYWv0FivJXEBCupAzDTm96rsWEo1BGGRj0jvY49JnLDOH0C7VNbYt2Vn+m4xx57jEWLFlFUVMRHH33EkiVLePnll0lOTu5V0DU1boLB3k/sbbenUlUV2y2pwahji9dPhbOBYNAgJcFKk7u1U3mWYRAIBE/b3sHn859VWUc50OU+vTn2bJ/X3eJl92dVADS3tGEAf3r7EHnZSVw+MZfW5v5dp1zv1dgwFOoIQ6OekVBHs9l0xgZsj13rubm5VFdXhx67XC4cDke35VVVVTgcDrKysnC73QQCgU7bAd544w0WLFgAwKWXXkp2djaHDx/uY9UkHIbiQLcOOentLfeqek+YIxER+YceE/mMGTPYvn07tbW1eDwetm7dysyZM0Pl+fn5xMfHs2vXLgA2bdrEzJkzsdlsTJs2jVdeeaXTdoAJEybw+uuvA1BaWorL5WLs2LH9Xjnpf7WNXixmE2nJQ2/xkPg4C6lJNqobum/Bi4gMtl61yJcsWUJRURHz5s1j7ty5TJkyhcWLF7N3714AVq5cyRNPPMHs2bPxeDwUFRUB8NBDD7FhwwbmzJnDzp07ueeeewD46U9/yh//+Efmzp3Lvffey89+9jNSUyP7GoS0q21qJTM1HvMQG+jWwZ6RSHWDp8tLRyIi4dCr+4cKCwspLCzstG3NmjWhnydMmMDGjRtPOy4/P58XX3zxtO1jxozhhRde6GusEmaGYVDX6A3dUz0U5aQncKSykWbN8CYiEUIzu0mv1TZ6afMHyRpCM7p9UU5GIgDVuk4uIhFCiVx67VhV+6xmQ3GgW4fM1HjMZpOuk4tIxFAil1475nJjYmisQd4di9lEdlo8VfVK5CISGZTIpdcqqppJS47Dahnab5uc9ERqG1sJBILhDkVERIlceu9YlZvMIdyt3iEnI4FA0KCiujncoYiIKJFL77g9PuqavENqDfLu2NPbB7yVagEVEYkASuTSK0dPLhYylEesd0hOtJIQZ6HseGO4QxERUSKX3ulYh3soD3TrYDKZyMlIVItcRCKCErn0ylFXE+nJcSQOoTXIz8SenoCrzkNzqy/coYjIEKdELr1S7nQzwtH35WNjVU5G+yWGzyvVvS4i4aVELj1q8wU4XtPCCHvvlpkdCrLTEzABR5TIRSTMlMilRxXVzQQNg3y7WuQd4qwW8rKTOKIBbyISZkrk0qOOVudQXiylK6PzUjlc0UBQK6GJSBgpkUuPjlQ2kJ4cpxHrX3B+fjrNrX4qqzQxjIiEjxK59OhwZSPnDU/DNETXIO/OuJEZAJSU1YU5EhEZypTI5YzcHh+uOg/nDU8LdygRJystAXtGAvvLlchFJHyUyOWMjlQ2AHD+8PQwRxKZJozK5EB5PcGgrpOLSHgokcsZHalsxGSCMcM00K0rE0dn0uL1U+7SLG8iEh5K5HJGhysbyc9JISFOM7p1ZcLoTAD2l9WHORIRGaqUyKVbQcPg85MD3aRrGSnxDMtO0oA3EQkbJXLplrO2hRavn/OVyM9owqhMPjtWjz8QDHcoIjIEKZFLtzomglGL/MwmjM7E2xbQamgiEhZK5NKtI5WNJMZbGJajOdbPZPyo9vvJ96t7XUTCQIlcunW4soExeWmYNRFMl0xmE81ePxaLmeE5yXxaWkuz1x/651dPu4gMAg1Fli55fQGOuZqZc9WocIcSsby+ALs/qwIgPTmOz47W8/6nx7GY278fXz4xF6vWbxeRAaYWuXSp7EQTQcPgvGGaCKY3crMSCQQNqutbwx2KiAwxvUrkW7ZsYc6cOcyaNYu1a9eeVl5SUsKCBQsoKChg2bJl+P1+ACorK1m0aBE33ngjd911F83N7YtLuN1u/u///b/MmzePefPm8emnn/ZjlaQ/aKBb3+RlJWECjte0hDsUERliekzkTqeTVatWsW7dOjZv3sz69es5dOhQp32WLl3KihUrePXVVzEMgw0bNgDwyCOPsHDhQoqLi5k8eTLPPvssAE888QTDhg1j06ZN3HvvvTz88MP9XzM5J4crG8hJTyAtOS7coUSFOJuFrLR4nLVK5CIyuHpM5Nu2bWP69OlkZGSQlJREQUEBxcXFofKKigpaW1uZOnUqAPPnz6e4uBifz8eOHTsoKCjotN0wDLZu3cqdd94JwMyZM/nJT34yEHWTc3CkspHz89Wt3hd52UlU1Xt0P7mIDKoeE7nL5cJut4ceOxwOnE5nt+V2ux2n00ldXR0pKSlYrdZO22tqaoiLi+MPf/gD8+bNo6ioiEAg0J91knPgD0JFTTN1TV5G2FM6jcLWuiBnlpeVTNAAV50n3KGIyBDS45Bawzj9r/ep61J3V97d9kAgQHV1Nenp6WzatIn33nuP7373u7zxxhu9Djo7O6XX+3aw22N/0Y/+qKOrtoW/7TnR/sBsYv/RhlDZ+NGZpKYkdHusyWTCYjF3u4/NZj2rso5yoMt9enPsuTxvb489L8HGXz88Rm2Tl/FjsklKiseeldTtubuj92psGAp1hKFRz0ivY4+JPDc3l507d4Yeu1wuHA5Hp/Lq6urQ46qqKhwOB1lZWbjdbgKBABaLJbQ9MzMTq9XK3LlzAbj66qtpaWmhpqaG7OzsXgVdU+Pu07KRdnsqVVWxPetWf9Wxxevn8NE64mxmEm1mmtz/GIXt8/k7Pf6iLMMgEAh2u8+Zju/p3D5f+wDKrvbpzbHn8rx9OTYnI5Gy401MHttKS4uXqj72Num9GhuGQh1haNQzEupoNpvO2IDtsWt9xowZbN++ndraWjweD1u3bmXmzJmh8vz8fOLj49m1axcAmzZtYubMmdhsNqZNm8Yrr7zSaXtcXBwzZszg5ZdfBuDjjz8mMTGRzMzMc6qo9I9g0KCiupn8nGTMZk0E01d5WUnUNLbi9elykYgMjh4TeW5uLkuWLKGoqIh58+Yxd+5cpkyZwuLFi9m7dy8AK1eu5IknnmD27Nl4PB6KiooAeOihh9iwYQNz5sxh586d3HPPPQA8/vjjvPPOO8ydO5eHH36YVatWYTbrlvZIUOZsorUtwAh73y9fCAzLbu9K1+h1ERksvZp2qrCwkMLCwk7b1qxZE/p5woQJbNy48bTj8vPzefHFF0/b7nA4+PWvf93XWGUQfPp5LSYTDNf86mclJyMRq8Wk+8lFZNCoGSydfHqkBkdGIvFxlnCHEpUsZhOOzEROKJGLyCBRIpeQ2sZWjlU1k+9Qt/q5yMtOpqG5jQa3N9yhiMgQoEQuIXsO1wAwwq5u9XMx7OQtZ58drQ9zJCIyFCiRS8juQ9XkpCeQrmlZz0lWWjxxNjMHypXIRWTgKZEL0L4k576yOi4am9Vpwh/pO5PJRF5WEp8dre9yYiQRkf6kRC4A7C+rw+cPMvm83k3KI2eWl51EXZMXV72maxWRgaVELgDsPlxDfJxFC6X0k2FZ7eMMSkrrwhyJiMQ6JXLBMAx2H6pm8pgsbFa9JfpDWrKNjJQ49pUpkYvIwNJfbeGoy01dk5cp56tbvb+YTCbGjcxgf1kdQV0nF5EBpEQu7NjvwmRCibyfjRuZgdvj45jLHe5QRCSGKZEPcT5/gLc/rmTqBTmkp8SHO5yYMm5U+0JAJepeF5EBpEQ+xO3Y78Lt8XH9l0aEO5SYk5kaT25WkhK5iAwoJfIh7s0PK8jLSmLSaC0jOxAmjs7kwNF6/IFguEMRkRilRD6EfX68kSOVjVx/Wb4mgRkgk0Zn4m0LUHq8KdyhiEiMUiIfwt788BjxNgszJg8Ldygxa/yoDABKymrDHImIxCol8iGqqaWND/a5mDE5j6SEXi1LL2chNSmOUY4UXScXkQGjRD7E+IPQ7PXzxocV+ANBrpqcR7PXH/oX1C3P/W7imEwOVTTg9QXCHYqIxCAl8iHG6/Pzwb4TvLHzKLlZiRyrcrOjxBn65w9qUFZ/mzg6E3/A4FBFQ7hDEZEYpEQ+BFVUNdPc6mfCKI1UH0gms4lmr598Rwpms4k9h2tCPR9+fV8SkX6ii6NDTNAw2HOomuQEKyMdKeEOJ6Z5fQF2f1YFQHZaPB99VsWw7CQALp+YizVeHz8ROXdqkQ8xH31WRU2jl6kX5mA265azwTIsO5mahlbadJ1cRPqZEvkQ4g8E+fN7pWSmxjN2eFq4wxlS8rKTMIATtS3hDkVEYowS+RDy9seVVDe0ctm4HMyaAGZQ2TMSsJhNSuQi0u+UyIcIj9fPlvc+58IR6QzPSQ53OEOOxWzGkZnIiRolchHpX0rkQ8TWHUdpbPHx9WvHajrWMMnLTqLe3YbH6w93KCISQ5TIh4CG5jaK/17OtPF2xuTp2ni4DMtu7wmprG4OcyQiEkuUyGOcYRisf/MgPl+Q+dedH+5whrTstHgS4ixUVCmRi0j/6VUi37JlC3PmzGHWrFmsXbv2tPKSkhIWLFhAQUEBy5Ytw+9v7zqsrKxk0aJF3Hjjjdx11100N3f+A3bixAmuuOIKjh071g9Vka688n4Z73/q5OtXjyEvKync4QxpJpOJfHsyldXNBDQXroj0kx4TudPpZNWqVaxbt47Nmzezfv16Dh061GmfpUuXsmLFCl599VUMw2DDhg0APPLIIyxcuJDi4mImT57Ms88+GzomGAyybNkyfD5fP1dJOuzc7+KPbx9h+qRcCq8eE+5wBMi3p9DmD1J6vDHcoYhIjOgxkW/bto3p06eTkZFBUlISBQUFFBcXh8orKipobW1l6tSpAMyfP5/i4mJ8Ph87duygoKCg0/YOzz33HDNmzCAzU9OEDoQjlY2s+fM+LshP5445EzTALUIMz07CZIJPP9eypiLSP3qcI9LlcmG320OPHQ4He/bs6bbcbrfjdDqpq6sjJSUFq9XaaTvAJ598wgcffMCaNWu67KrvSXZ236cWtdtT+3xMtOmoo6u2hWde2ktWWgIP33kV6SnxoX2M2hZSUxK6PYfNZu22/Exl0N51bLGYz+r4ns5ts7W/j7rapzfHnsvz9vexw7KTKSmr6/Y9OZTeq7FsKNQRhkY9I72OPSZywzj9Wt6prbvuyrvb7vF4ePTRR1m9ejVm89mNtaupcRPswzVGuz2Vqqqms3quaGG3p3L8RAPv7K5k87uf4w8E+e4/X0xdfQt19f+4dzloQJO7tdvz+Hz+bsvPVAaQZRgEAsGzOr6nc/t87eMuutqnN8eey/P297F52Ul8eKCKA4eryErrnOiHyntVdYwNQ6GekVBHs9l0xgZsj4k8NzeXnTt3hh67XC4cDken8urq6tDjqqoqHA4HWVlZuN1uAoEAFosltH3nzp1UV1dz1113hc5355138swzz3DeeeedVSWHOsMw+OCT4zy3+RNO1LZwwYh0JozK4KiriaOuzm/AS8bZuzmLDJYROcl8eKCKPUdq+PLU/HCHIyJRrscm8YwZM9i+fTu1tbV4PB62bt3KzJkzQ+X5+fnEx8eza9cuADZt2sTMmTOx2WxMmzaNV155pdP2a6+9ljfffJPNmzezefNmHA4Hv/nNb5TEz1Jdk5dfbtjNj3/3dwC+v2AK3795ymktPYkc6SlxZKXFs+dQTbhDEZEY0KsW+ZIlSygqKsLn83HzzTczZcoUFi9ezPe//30uvvhiVq5cyfLly2lubmbSpEkUFRUB8NBDD/HDH/6QX/3qVwwbNoxf/vKXA16hoWT3oWp++3IJbf4A3/nni/nSBdlYLWaaNXNYRDOZTFw0NosP9jnx+YPYrJrOQUTOXq8WRC4sLKSwsLDTtjVr1oR+njBhAhs3bjztuPz8fF588cUznvvNN9/sTQhyCk9bkI1vHeSvH1YwPCeZO+ZM5LyRmbS0ePH6g+gW5cg3aUwWf9t9nM+O1nPR2KxwhyMiUaxXiVwihz8Q5P/77w85UtnI+FEZTBtv56irifoWX2hgla6DR75xIzOwWc3sPlytRC4i50R9elFmy3ulHKls5OqL87hyUi4Wi17CaBRnszBhVCZ7D+s6uYicG2WBKPL58UZe3l7GFRMdnJ+fHu5w5BxNOT8bZ50Hp9YoF5FzoEQeJXz+AL99uYT0lDgWfPmCcIcj/eCS87MB2HnAFeZIRCSaKZFHiZf+9jmV1c3cMXsCSQka2hALcjISGTcinXf3nuhyAiURkd5QIo8Ch4418OoH5Vw3dTiTz8sOdzjSj66eMgxnbQuHK7WIioicHSXyCOUPQrPXT2NLG2v+vI/MtHjmXj2GZq9ft5fFkGnjHcTZzLy753i4QxGRKKVEHqG8Pj87Spz88a3DVNV7uOSCHPYermFHiRN/MBju8KSfJMZbmTbewY79Try+QLjDEZEopEQewQzDYF9pHWnJcYywJ4c7HBkg11w8DI83wEefVYU7FBGJQkrkEcxV76GmsZWJozO0nngMGzcqg5z0BN7dq+51Eek7JfIIVlJaR5zNzHnDdc94LDObTFx98TBKSutw1emechHpGyXyCFXd4OGo0x2aylNii8lsotnrD/2bOi4HA9jytyM0e/34NQxCRHpJNyRHqLc/qgQTTBiVEe5QZAB4fQF2f+GaeF5WEm/uPEpqgoUrJuVhjdfHU0R6pqZeBGpp9bP9kxOMHZZGUoIt3OHIIDk/P43G5jZcdZ5whyIiUUSJPAL9bU8lXl+AiaMzwx2KDKJRuanE2czsL6sLdygiEkWUyCNMIBjk9Z3HuCA/nez0hHCHI4PIZjVz8fk5lDndGvQmIr2mRB5h9pfVU9PYysypw8MdioTBlAtysJhNvL7zWLhDEZEooUQeYf5e4iQhzqI51YeopAQbF4xI5+/7nNQ1ecMdjohEASXyCOIPBPnwsyouvTBHt5wNYZPGZGIYBlt3lIc7FBGJAsoWEWRfaS3NrX4un5gb7lAkjFKT4rhsvIO3Pq7E7fGFOxwRiXBK5BFkR4mLxHgrF43JCncoEmazLh+Jty3AXz/UtXIROTMl8gjh8wf58GA1l41Tt7rA8JxkLjk/m9d2HtOqaCJyRsoYEeKTz2vweP1coW51OWnOVaNxe3y8s7sy3KGISARTIo8QO0pcpCTaNAmMhFw4IoNxIzP4y/tl+PxqlYtI15TII0CbL8BHh6q5bJwdq0UvifxjUZWvXjGSencbr++qCC2wogVVRORUWpUhTPxB8Pr8AHx8sBpvW4CLz8+m2du+LWiEMzoJt45FVQzDwJGZyMvbS4mzmbCYzVw+MVcLqohISK+af1u2bGHOnDnMmjWLtWvXnlZeUlLCggULKCgoYNmyZfj97cmosrKSRYsWceONN3LXXXfR3NwMwOHDh1m4cCE33XQTt956KyUlJf1Ypejg9fnZUeJkR4mT13ceJSHOQoPbG9rmD6rZJWAymZhyfjYtrX4OHWsIdzgiEoF6TOROp5NVq1axbt06Nm/ezPr16zl06FCnfZYuXcqKFSt49dVXMQyDDRs2APDII4+wcOFCiouLmTx5Ms8++ywAy5cvZ/HixWzevJl77rmH+++/fwCqFh18/iAVVW5G5aZiNpvCHY5EoGHZSdgzEth7pJaAvuCJyBf0mMi3bdvG9OnTycjIICkpiYKCAoqLi0PlFRUVtLa2MnXqVADmz59PcXExPp+PHTt2UFBQ0Gk7wC233MLMmTMBGD9+PMePH+/3ikWLiupm/AGDMXmp4Q5FIpTJZOKSC3JoafVz+FhjuMMRkQjT44U2l8uF3W4PPXY4HOzZs6fbcrvdjtPppK6ujpSUFKxWa6ft0J7UOzz11FPccMMNfQo6OzulT/u3P39kJUqjtoXUlAQqq50kxls5f1QmZtM/WuQ2m5XUlK5XP+uurGPb2Rx7Ls/bwWQyYbGYz+r43sQFdLnPQNYpHMempiScVj4uOZ69R2r5pLSWuHgbdnvfPwORJNI+jwNhKNQRhkY9I72OPSZywzh91JXplITTXXlvjvv5z3/O7t27eeGFF3odMEBNjZtgH0aD2e2pVFU19ek5BlqL109dQwulxxs4b3gazc2dF8jw+fw0uVu7PLarstSUhNC2vh57Ls97qizDIBAIntXxvYkL6HKfgazTYB/b8Tp2VT55bBZv7DrGX3eU89XLR3b7vJEuEj+P/W0o1BGGRj0joY5ms+mMDdgeu9Zzc3Oprq4OPXa5XDgcjm7Lq6qqcDgcZGVl4Xa7CQQCnbYD+P1+7rvvPvbu3csLL7xAampkf9sZKJUnu9VH5Q7N+kvfDM9pv1b+8vZSPCfvbhAR6TGRz5gxg+3bt1NbW4vH42Hr1q2h69sA+fn5xMfHs2vXLgA2bdrEzJkzsdlsTJs2jVdeeaXTdoCf/exnuN1unn/++SGbxAHKnW7ibGbyspLCHYpEAZPJxOUTHbhbfGx+9/NwhyMiEaJXLfIlS5ZQVFTEvHnzmDt3LlOmTGHx4sXs3bsXgJUrV/LEE08we/ZsPB4PRUVFADz00ENs2LCBOXPmsHPnTu655x5qa2tZu3Ytn3/+Obfccgs33XQTN91008DWMgL5/EGOutyMdKRotLr0Wk56IldNzuP1nceoqHKHOxwRiQC9mlWisLCQwsLCTtvWrFkT+nnChAls3LjxtOPy8/N58cUXT9u+b9++vsYZcz47Wo/PH2S0RqtLHxVeM5bdh6pZ9/pB7rttaqexJyIy9Gg+0DD5+GAVNquZYdnqVpe+SUm08c8zz6OkrI6dB6rCHY6IhJkSeRj4A0H2HK5hpCMFi1kvgfTdl6fmM8qRwvo3D+Jt04IqIkOZskgYHCivp6XVz6jc6L4XWMLHbDax6KvjqG308qdtGvgmMpQpkYfBrgMu4mxmhuckhzsUiWIXjsjg2inD+Mv75Xx8sLrnA0QkJimRD7Jg0GDXZ1VcNDZbS5bKOVs0axyj81L5zZZPqaxuDnc4IhIGyiSD7EB5HU0tPi69MCfcoUiU6lirvNnrxxc0+D9zJ2Kzmnly4x6qGlq1XrnIEKNEPsje2XOcxHgrF43NCncoEqW8vkBoudsdJU4OHWtgxuQ8qhs8PPk/u/F4feEOUUQGkRL5IGpsaWPXARczJucRZ7OEOxyJIblZSVw5MZfK6mb+pFnfRIaUXk0II/3jvb3H8QcMvjx1eLhDkRg0blQGdW4vb+w6Rmqija9fMzbcIYnIIFAiHyRBw+Dtjyu5cEQ6+fYUmrXohQyAyyc6SE+JZ9O7n2MyQeHVSuYisU6JfJCUlNXhqvNwk1pJMoDMJhOLZo3DYjLx0t8+x2w28bWrxoQ7LBEZQErkg+TtjypISbQxbbw93KFIjDObTfyfr03EMAz++PYRAOZMH6052UVilBL5IGhwe/noYDU3TBuBzapBbjKwTGYTHl+A22aNwx8M8se3j+Cs83Dzl8/HYjETb7Ni1TBXkZihRD4I/rbnOIGgwXVT88MdigwBXl+A3Z+1L6YycXQmrd4A7+45zqGKBq6bOpxrpgzHGq+Pvkis0PfyARYMtg9ymzg6k7wsrXQmg8tkMnHZeDszJufhqm3hL9vLcNV5wh2WiPQjJfIB5A/C3/e7qGlsZfpFeaHZuJq9foJGuKOToeSCEenMunwkXl+Qlf/1Ie/uOY5h6E0oEgvUvzaAWlrbWP/GQVISbbT522fj6nDJOA16k8GVm5XEnKtGsedwDc+/UsLfS5wU3TienPTEcIcmIudALfIB9PbHlTQ0t3H5RAcWs0YMS/ilJsXx/VsuYdGscRw81sCK3/6dN3Ydwx/QBO0i0Uot8gHS4Pbyl/fLyM9JZoRdy5VK5LBYzEyfnMeFIzP479c/Y+1rn7FlWykzpw7n+stGkpFsC3eIItIHSuQDZONbh/H5g1w+0aH7dyWinDqq/fKJDvLtKewrreXP75Xy6gflzJicx5fG2blwZAbxWhNAJOIpkQ+AQ8caeO+TE9wwbSRpyXHhDkekWyaTiXx7Mvn2ZOqavFQ3tLLtkxO8/XElVouZcSPTuWhsFpdeaNddFyIRSom8nwWDBmtf+4yMlDhuvHIUew5XhzskkV7JTI2nYPpobv7y+RyqaGB/WR37y+r4n78e5n/+epjhOclMm+DgyokOhmXrcpFIpFAi72d/+aCMMmcTd359EvFx6paU6OL1Bdh9qP3L50hHCiMdKTR7fJQ5myg74eZP737On979nDF5qcy8ZDhXTMwlKUF/RkTCSZ/AflT8QTl/fPsI08bbuXJiLi1tgXCHJHLOkhNtTBqTxaQxWVw4MoNPj9Ty7p5KXnj1AP/9xkGmTXAwbYKD8SMzSNSMcSKDTp+6frJlWykvvXOEKyY6+PbcSRrgJjEpMy2Bq6cMY8bFeZQ73Wz/5Di7DlSx7ZMTmM0mzhuexqTRmYwbmcHovFSSEzQCXmSgKZGfI8Mw2Pzu5/zpvVKuuiiXb31tIhazbs+X2HTqiHeA8/PTGZOXiqveg8lkYn9ZPVu2ldIxaVxOegIjc1MZlZvChSMyOH9YWpgiF4ldvUrkW7Zs4Ve/+hU+n49vfvObLFq0qFN5SUkJy5cvx+12M23aNB555BGsViuVlZUsXbqUmpoaxo4dy8qVK0lOTqaxsZH77ruPo0ePkpWVxerVq7Hbo2umM8MwOFzZyNa/l7PzQBXXXDyMb86egFkTv8gQY7GYGZadzCXj2ke2e30BahpaqWlspaahlc/K6/joZPI3mWB0XhojcpIZlp1EXnYSeVlJ2DMSsVr0BVjkbPSYyJ1OJ6tWreJ///d/iYuL47bbbuPKK6/kggsuCO2zdOlSfvzjHzN16lQefPBBNmzYwMKFC3nkkUdYuHAhX/va1/j3f/93nn32WZYuXcrq1auZNm0av/nNb9i0aROPP/44q1evHtCK9peWVh+7DlTx5ocVlDmbSIizUHDlKOZcNRqPr/M1cc2nLkNRvM3C8Jxkhuf8Y2S7x+snOyOR0spGjlY1s+dwDe/uPd7puMR4C8kJNpITbSTFWzGb2pdkNZva/wUNg2DQIGgYGAbY2RznpAAACuhJREFUrGbirGZsVgvxcRZSEm2kJ8eRmtTx/zhSkmykJNj0BVtiWo+JfNu2bUyfPp2MjAwACgoKKC4u5nvf+x4AFRUVtLa2MnXqVADmz5/PU089xS233MKOHTv493//99D2f/mXf2Hp0qW89dZbrF27FoC5c+fy6KOP4vP5sNl6dz3tbD6U3R1TUlaHs7YFA+joDwwEDfwBA18gSCAQpKnFR22jl7qmVlq8fgDyspL41rSJmM0mrBYzJaV1p5174tgskrq5Rmi1mLst66m8q7LEeCsBv+2sjj2X5z1VXGYGCQn+szq+N3GdWse+HhuO38fZHNtRx4F63rON61yPTUqwMXFMFhgw+QI77mYvPn+AphYfbk8bacnxNHt8eLwBPF4frW1BDMMgaLT3fhm03/Nus5kxm0yYTOAPBPG0BWjy+GjzB2lp9dHVOjAmIDHBRmKchThbe9KPt1mwWjr/TQgGDXz+IL6Agc8fCH1h6IjDbAKbxYLN1vEFwkyc1YLNasJmtWCz/iO2lKR4PN620PGGwcm/MRA8+cA4GRsmMJtM0P4fZnP7Ocy0/x+TKbTfyZ8iRnJyDc3NbeEOo1fa30WEfveEXpf21zcYbN/DCJ6yL+3vHU+rL/QamU0nXyPaX48vvkYACXFWpo2391svU085r8dE7nK5OnV7OxwO9uzZ02253W7H6XRSV1dHSkoKVqu10/YvHmO1WklJSaG2tpbc3NxeVSozs+/3sGZnp3S5/ZputveX80ZknlVZVB577f1nPG/Y4tKxUROXiPRdj18Xulrq8NQR2d2V93TcaYFogJiIiEif9Zg9c3Nzqa7+x+xkLpcLh8PRbXlVVRUOh4OsrCzcbjeBQKDTdmhv1Xcc4/f7cbvdoa57ERER6b0eE/mM/7+9uw9pst/DAH5lmjiKSjMxM6mHoJSK0KKX4TJwlZtpJGSByyQLsowIaZZSCJovQ0mKKCrBSjKsTGuUEhhUS0vK/jELS/MtA+1FTefcfuePODtl89RzHk56316fv9wL4764nF/ue7LvqlUwmUzo7u5Gf38/KioqEBwcbH/cx8cHrq6uqK2tBQCUlpYiODgYLi4uCAoKgtFo/OF+AFCpVCgtLQUAGI1GBAUF/fbn40RERPQfE4Sja+DDlJeX48yZM7BYLIiKikJ8fDzi4+ORmJiIRYsW4eXLl0hJSUFfXx/8/f1x/PhxTJo0CW1tbdDr9ejq6oK3tzdyc3MxdepUfPr0CXq9Hi0tLZgyZQoMBgNmz579J/ISERHJym8NciIiIhqb+B9mREREEsZBTkREJGEc5ERERBLGQU5ERCRhHOREREQSJutBXl5ejrCwMISGhtq/210OdDodNBoNIiIiEBERgbq6Otlk7e3thVarRWtrK4Bv3/UfHh4OtVqNvLw8+/Pq6+uxefNmrFu3DkeOHMHQ0NBoHfLfNjxjcnIy1Gq1vc/KykoAI2cf606ePAmNRgONRoPs7GwA8uzRUU65dXnixAmEhYVBo9GgoKAAgPy6dJRRcj0KmXr//r0ICQkRHz9+FH19fSI8PFy8fv16tA/rH7PZbGL16tXCYrHY75NL1ufPnwutVisCAgJES0uL6O/vFyqVSrx7905YLBYRFxcnqqqqhBBCaDQa8ezZMyGEEMnJyeLy5cujeei/bXhGIYTQarWis7Pzh+f9t+xj2cOHD8WWLVuE2WwWg4ODQqfTifLyctn16ChnRUWFrLqsrq4W0dHRwmKxiP7+fhESEiLq6+tl1aWjjI2NjZLrUbZn5N9vbVMoFPatbVL35s0bTJgwAfHx8di4cSMuXbokm6xXr17F0aNH7V/l++LFC/j5+cHX1xfOzs4IDw/HnTt3HG7ck0re4Rm/fv2K9vZ2pKamIjw8HPn5+bDZbCNmH+s8PT2h1+sxadIkuLi44K+//kJTU5PsenSUs729XVZdLl++HIWFhXB2dkZXVxesViu+fPkiqy4dZXR1dZVcj7/cfiZVv9raJlVfvnzBypUrcezYMQwMDECn02HDhg2yyJqenv7DbUcddnZ2jrhxTwqGZ+zq6sKKFSuQlpYGhUKB3bt3o6SkBAqFwmH2sW7+/Pn2n5uammA0GhETEyO7Hh3lLCoqQk1NjWy6BAAXFxfk5+fjwoULWL9+vSzfk8MzWq1Wyb0nZXtGLv7m9jWpWLp0KbKzs6FQKODu7o6oqCjk5+f/9Dw5ZB2pQzl16+vri1OnTsHDwwNubm6IiYnB/fv3JZ/x9evXiIuLw6FDhzBnzpyfHpdLj9/nnDdvniy7TExMhMlkQkdHB5qamn56XA5dfp/RZDJJrkfZDvJfbW2TqqdPn8JkMtlvCyHg4+Mjy6wjdTjSxj0pamhowN27d+23hRBwdnaW9O9vbW0tYmNjcfDgQWzatEm2PQ7PKbcuGxsbUV9fDwBwc3ODWq1GdXW1rLp0lNFoNEquR9kO8l9tbZOqnp4eZGdnw2w2o7e3Fzdu3EBOTo4ssy5ZsgRv375Fc3MzrFYrbt26heDg4BE37kmREAIZGRn4/PkzLBYLiouLERoaOmL2sa6jowMJCQkwGAzQaDQA5Nmjo5xy67K1tRUpKSkYHBzE4OAg7t27h+joaFl16SjjsmXLJNejbD8j9/LywoEDB6DT6exb2xYvXjzah/WPhYSEoK6uDpGRkbDZbNi2bRsCAwNlmdXV1RWZmZnYt28fzGYzVCoV1q9fDwAwGAw/bNzT6XSjfLT/mwULFmDXrl3YunUrhoaGoFarodVqAWDE7GPZ+fPnYTabkZmZab8vOjpadj2OlFNOXapUKvvfmokTJ0KtVkOj0cDd3V02XTrKuHfvXkyfPl1SPXL7GRERkYTJ9tI6ERHReMBBTkREJGEc5ERERBLGQU5ERCRhHOREREQSxkFOREQkYRzkREREEibbL4QhIsf6+vqQnJyM5uZmODk5ISAgAGlpabh+/ToKCgrg5OSE6dOnIysrC97e3iguLsbFixfh5OSEGTNmIDU1FXPnzoVer8enT5/Q0tKCNWvWYP/+/TAYDHjy5AmsViv8/f2RkpKCyZMnj3ZkIlnjGTnROFNZWYm+vj7cvHkTJSUlAIBXr17BYDDg3LlzKC8vx9q1a3H69GmYTCacO3cOhYWFKCsrg1arRUJCgn2BxMDAAG7fvo2kpCScPXsWEydOxPXr11FWVoaZM2fCYDCMZlSicYFn5ETjTGBgIPLy8hATE4NVq1Zh+/btePDgAZRKJby9vQEAsbGxAIDs7GyEhYXB3d0dwLc90+np6WhtbbW/1r9VVVWhp6cHjx49AgBYLBZ4eHj8wWRE4xMHOdE44+vri8rKSlRXV+Px48fYsWMHoqOjf1jJODAwgLa2NoerG4UQGBoaAgAoFAr7/TabDYcPH4ZKpQLw7RK+2Wz+P6chIl5aJxpnioqKkJycDKVSiaSkJCiVSjQ0NMBkMuHDhw8AgCtXriAnJwdKpRJGoxHd3d0AgGvXrmHatGnw8/P76XWVSiUuX76MwcFB2Gw2pKamIjc3949mIxqPeEZONM5ERkaipqYGYWFhcHNzw6xZs5Ceno6qqirs3LkTAODp6YmMjAx4eXkhNjYW27dvh81mg7u7O86cOQMnp5/PAfbs2YOsrCxs2rQJVqsVCxcuhF6v/9PxiMYdbj8jIiKSMF5aJyIikjAOciIiIgnjICciIpIwDnIiIiIJ4yAnIiKSMA5yIiIiCeMgJyIikrB/ATHDEiMZFsKUAAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.distplot(crossref.score)\n",
"plt.title(\"Crossref matching scores\")\n",
"plt.vlines(cr_thresh, 0, 0.010, \"r\");"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# Join article metadata and crossref results\n",
"df = articles[['type', 'DOI', 'PMCID', 'PMID']]\n",
"df = df.join(crossref, rsuffix=\"_cr\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Different thresholds and the resulting number of matches:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"60: 16906 articles (90%)\n",
"70: 15934 articles (85%)\n",
"80: 14406 articles (77%)\n",
"90: 12248 articles (65%)\n",
"100: 9594 articles (51%)\n"
]
}
],
"source": [
"scores = df.score\n",
"ts = [60, 70, 80, 90, 100]\n",
"\n",
"for t in ts:\n",
" print(f\"{t}: {scores.where(scores>=t).count()} articles ({100*scores.where(scores>=t).count()//len(articles)}%)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this particular notebook, I am using the **threshold of 80** to determine which articles were found in Crossref."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"# Filter out DOIs that had a score lower than 80\n",
"df.loc[df.score < cr_thresh, 'DOI_cr'] = None\n",
"articles_with_doi = df[df.DOI_cr.notna()].index"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Articles with
\n",
"
% (n=18708)
\n",
"
\n",
" \n",
" \n",
"
\n",
"
DOI_cr
\n",
"
14406
\n",
"
77.0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Articles with % (n=18708)\n",
"DOI_cr 14406 77.0"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = df[['DOI_cr']].count().to_frame('Articles with')\n",
"x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n",
"x.round(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Using a minimum score of 80, we have found 14,406 DOIs in Crossref which corresponds to 77% of the original dataset**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional identifiers from NCBI\n",
"\n",
"Using the APIs provided by the NCBI we can now also attempt to convert DOIs to pmid/pmcid."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Articles with
\n",
"
% (n=18708)
\n",
"
\n",
" \n",
" \n",
"
\n",
"
pmid
\n",
"
5112
\n",
"
27.3
\n",
"
\n",
"
\n",
"
pmcid
\n",
"
5123
\n",
"
27.4
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Articles with % (n=18708)\n",
"pmid 5112 27.3\n",
"pmcid 5123 27.4"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = ncbi[['pmid', 'pmcid']].count().to_frame('Articles with')\n",
"x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n",
"x.round(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The results for the NCBI API were not very great once I started to manually check several examples. Furthermore, we can only retrieve pmid/pmcids for articles that already have a DOI. These identifiers are therefore not really useful for the processing pipeline, but might be interesting to report nevertheless."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Altmetric results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use these DOIs to retrieve altmetrics for these articles."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"df2 = altmetric.reindex(articles_with_doi)\n",
"df2 = df2[['altmetric_id', 'cited_by_tweeters_count', 'cited_by_fbwalls_count',\n",
" 'cited_by_feeds_count', 'cited_by_msm_count', 'cited_by_wikipedia_count', 'cited_by_rdts_count']]\n",
"df2.columns = [\"altmetric_id\", 'tweets', 'fb_mentions', 'blogposts', 'news', 'wikipedia', 'reddit']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Coverage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First percentage only considers articles that had a DOI (n=14,406)\n",
"\n",
"Second percentage uses the input dataset as the denominator (n=18,708)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"