{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Progress report 1\n", "\n", "*Asura Enkhbayar, 09.03.2020*\n", "\n", "This report covers intermediate results for:\n", "\n", "- **Citation parsing** on the unstructured references in order to retrieve identifiers and other metadata in the input dataset\n", "- **Crossref queries** using the unstructured references from the original dataset\n", "- Additional **NCBI identifiers** queried with the DOIs retrieved from Crossref\n", "- **Altmetric counts** for articles with DOIs retrieved from Crossref" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "from tracking_grants import project_dir, data_dir\n", "from tracking_grants import CR_THRESH" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Articles after processing with anystyle\n", "articles = pd.read_csv(data_dir / \"interim/structured.csv\", index_col=\"article_id\")\n", "\n", "# External data from CR/Altmetric\n", "crossref = pd.read_csv(data_dir / \"interim/_crossref.csv\", index_col=\"article_id\", low_memory=False)\n", "altmetric = pd.read_csv(data_dir / \"interim/_altmetric.csv\", index_col=\"article_id\", low_memory=False)\n", "ncbi = pd.read_csv(data_dir / \"interim/_ncbi.csv\", index_col=\"article_id\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Citation Parsing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our input dataset contains unstructured references in the form of strings that were typed in by the original authors.\n", "\n", "Using [anystyle](https://github.com/inukshuk/anystyle) we can attempt to retrieve DOIs, PMIDs, PMCIDs, and other structured metadata from these strings." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Articles with% (n=18708)
DOI5462.9
PMID1630.9
PMCID960.5
\n", "
" ], "text/plain": [ " Articles with % (n=18708)\n", "DOI 546 2.9\n", "PMID 163 0.9\n", "PMCID 96 0.5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = articles[['DOI', 'PMID', 'PMCID']].count().to_frame('Articles with')\n", "x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n", "x.round(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**As we can see the number of identifiers extractred from the input dataset is not really useful.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crossref results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Crossref provides an API endpoint for textual searches using references. We are using that endpoint and retrieving the best candidate for each query. The results contain a score which refers to the quality of the match.\n", "\n", "We are currently using 80 as the threshold for that score. We are still hoping to get in touch with one developer at Crossref who has been working on citation matching." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "80" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "CR_THRESH" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following plot shows the distribution of matching scores including the score of 80 which I have currently chosen based on some prelim experimentation and manual inspection of random articles." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFNCAYAAAD7De1wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nOzde3xU1b3//9fccr8nMwmEqxduIqJFRVRsPdIIJZUD+tAHnMbab7HH09qKX/laBeqt1l74Fbwc21OsPdXCOXDoEUq1ES9Vq6AFVEAJyMUkkMBM7skkk8lc9u+PkCmRhCSQZC55Px/yMLPX3ns+KzOTz6y1117LZBiGgYiIiEQlc7gDEBERkbOnRC4iIhLFlMhFRESimBK5iIhIFFMiFxERiWJK5CIiIlFMiVykB4FAgN/97nfMnz+fm266iTlz5vCLX/yCtra2cIfWyUMPPcT111/PqlWr+v3c//M//8PatWvPuM/TTz/No48+2mXZ4sWLOXToUL/HJSJgDXcAIpHu4YcfpqGhgd///vekpqbS0tLCfffdx7Jly/jFL34R7vBC1q9fz1tvvUVeXl6/n3vXrl1ceOGFZ338mjVr+jEaETmVErnIGRw9epQtW7bw7rvvkpKSAkBSUhKPPPIIH330EQA//OEPqa+v5+jRo3z5y1/mX//1X3nkkUfYv38/JpOJa6+9lnvvvRer1cpTTz3Fa6+9hs1mIzMzkyeeeAKHw9Ht9smTJ/NP//RP7N+/n5UrV5KUlMTjjz9OfX09gUCAb3zjG9x8880sXLgQwzBYvHgxDz30ENOmTQvV4emnn6a8vJyjR4/icrmYMmUKV199NZs2beLYsWMsXbqUuXPnUl1dzY9+9CNqamqoqqoiPz+f1atX8+GHH/Lmm2/y3nvvkZCQwK233sovfvEL3nrrLSwWC5deeikPPfQQAEeOHOEb3/gGVVVV5OTk8Mtf/hKHw8H111/Pk08+SUtLC6tWrWLkyJEcPHiQtrY2fvSjHzF9+nRqa2t54IEHKC8vJyMjA7vdzoUXXsjdd9/d6TXZuXMnP/3pTwkGgwB85zvfoaCggObmZn784x/z4YcfYrFYuOGGG1iyZAlut7vb16O3v9/m5mYeeOABysrKMJvNXHTRRTz66KOYzerUlAhgiEi3iouLjQULFpxxn/vvv9+4/fbbQ4//3//7f8Zjjz1mBINBw+v1Gt/61reM//iP/zAqKyuNyy67zPB6vYZhGMZvf/tb47XXXut2u2EYxrhx44yXXnrJMAzD8Pl8xpw5c4xPPvnEMAzDaGxsNGbPnm189NFHoX1rampOi++pp54yvvKVrxiNjY2Gx+MxLr/8cuOJJ54wDMMwXnvtNeOrX/2qYRiG8Z//+Z/Gf/zHfxiGYRjBYND49re/bfz2t78N1fG5554zDMMwfv/73xuLFi0yPB6PEQgEjB/84AfGSy+9ZDz11FPG9ddfH4rhrrvuMp555hnDMAzjK1/5irFnzx7j/fffNyZOnGjs27cvVNdFixYZhmEYS5YsMX7+858bhmEYTqfTuPrqq42nnnrqtPoUFRUZf/7znw3DMIySkhLj4YcfNgzDMH7yk58YS5YsMfx+v+H1eo1FixYZ77//frevR19+vy+99JLxrW99yzAMw/D7/cayZcuM0tLS7t8UIoNILXKRMzCbzaGW35l86UtfCv38zjvv8F//9V+YTCbi4uK47bbb+P3vf8+3v/1tJkyYwD//8z8zc+ZMZs6cyVVXXUUwGOxye4eO1nVpaSnl5eU8+OCDobLW1lb27dvH1KlTzxjfjBkzSE1NBcDhcHDttdcCMGrUKOrr6wG4/fbb2blzJ7/73e8oLS3l4MGDXHLJJaeda9u2bdx0000kJCQAsHr1aqC95X/11VeTlZUFwIQJE6itrT3t+OHDhzNx4kQAJk2axEsvvQTA22+/HfrZ4XBw4403dlmX2bNn8+ijj/Lmm28yY8YM7r333lBcDzzwABaLBYvFwh/+8AcA7rnnni5fjzvvvLPXv99rr72WVatW8Y1vfIMZM2Zw++23M3r06DP+zkUGixK5yBlMmTKFI0eO4Ha7Q13rAE6nkxUrVvDUU08B7d3tHb6Y+IPBIH6/H7PZzB/+8Af27t3L9u3b+clPfsKVV17J8uXLu91+6rkDgQBpaWls3rw5dO7q6upQgj6TuLi4To+t1tM/+r/4xS/Ys2cPCxYs4Morr8Tv92N0sRTDF4+trq4O1fnUMpPJ1OXxHV8AvriP1WrttH933da33XYbX/nKV3jvvff429/+xjPPPMOf/vQnrFYrJpMptN/x48dJSEjo9vXo0Jvfb3x8PK+99hoffPAB77//PnfccQfLly/v9suGyGDSBR6RM8jNzaWwsJAHH3wQt9sNgNvt5uGHHyYjI6NTUupwzTXXsHbtWgzDoK2tjQ0bNjBjxgz279/P3LlzOf/88/nOd77DN7/5TQ4cONDt9i8aO3Ys8fHxoURz/Phx5s6dyyeffNIvdX333Xe5/fbbmTdvHtnZ2Wzbto1AIACAxWIJJb+rrrqKP//5z7S1tREMBnn44Yd5+eWXz/n5r7vuOjZu3AhAXV0dr7/+eqfE3OG2226jpKSE+fPn89hjj9HY2EhDQwNXXXUVL730EsFgkLa2Nr7//e+zY8eObl+PLzrT73fdunU88MADXHPNNSxdupRrrrmGgwcPnnOdRfqDWuQiPXjooYd49tlnue2227BYLLS1tXHDDTecNgirw/Lly/nxj39MYWEhPp+Pa6+9ln/9138lLi6O2bNns2DBApKSkkhISGD58uVMmDChy+1fFBcXx7PPPsvjjz/Oc889h9/v5wc/+EGnbv1z8d3vfpef//znPPvss1gsFi677DLKy8sBmDlzJo899hgA3/72t6moqGD+/PkYhsEVV1zBN77xDX71q1+d0/M/8MADLF++nMLCQjIyMhg+fHiXX5Tuu+8+fvKTn7B69WrMZjPf+973GDFiBN/73vd4/PHHuemmmwgEAsyZM4evfvWrXH755V2+Hl90pt/vxIkT+fvf/86cOXNITExk+PDhFBUVnVN9RfqLyeiq70tEZJCtXbuWSZMmcemll9LW1sbChQu5++67ue6668IdmkhEU4tcRCLCBRdcwGOPPUYwGMTn83HjjTcqiYv0glrkIiIiUUyD3URERKKYErmIiEgUUyIXERGJYkrkIiIiUSwqR63X1TUTDPZ+jF52dgo1Ne4BjCj8VMfYoDrGhqFQRxga9YyEOprNJjIzk7stj8pEHgwafUrkHcfEOtUxNqiOsWEo1BGGRj0jvY7qWhcREYliSuQiIiJRTIlcREQkiimRi4iIRDElchERkSimRC4iIhLFlMhFRESimBK5iIhIFFMiFxERiWJRObObxB5/ELw+f7fl8Ta9VUVEuqK/jhIRvD4/O0qc3ZZfPjF3EKMREYke6loXERGJYkrkIiIiUUyJXEREJIrpGrlEpGaPj5KyOlq8fjxeP8UflJOcaOP2gvGMyk0Nd3giIhFDLXKJSNs/PUFJWR3V9a0YBuTnJNPY3MZP137Ip6W14Q5PRCRiqEUuEafB7aWyuoWpF+Yw5fxsoH3UekpKAit+/R6rN+zmW3MmctXkvDBHKiISfmqRS8QpKavHbDYxbmR6p+05GYn8cNGXuHBEOmv+vI+Xt5diGAb+IDR7/d3+8wfDUw8RkcHQq0S+ZcsW5syZw6xZs1i7du1p5SUlJSxYsICCggKWLVuG3995Yo8nn3ySp59++rTjTpw4wRVXXMGxY8fOMnyJNV5fgCOVDYwdlkpC3OkdRkkJVu69dSrTJ+Xyx7eP8F+vH8TT5mNHibPbf2eaaEZEJNr1mMidTierVq1i3bp1bN68mfXr13Po0KFO+yxdupQVK1bw6quvYhgGGzZsAKCpqYkHH3yQ559//rTzBoNBli1bhs/n66eqSCw4eKwBf8Bg4ujMbvexWsx8u3ASX718JK/vOsaLxQcIBo1BjFJEJHL0mMi3bdvG9OnTycjIICkpiYKCAoqLi0PlFRUVtLa2MnXqVADmz58fKn/jjTcYM2YMd9xxx2nnfe6555gxYwaZmd3/wZahJRg0OFBWR25WIllpCWfc12wycev1F7DguvPYud/FXz+qwB9QH7qIDD09DnZzuVzY7fbQY4fDwZ49e7ott9vtOJ3tU23OmzcP4LRu9U8++YQPPviANWvWdNlV35Ps7JQ+H2O3x/4tS9FcR6O2hapGL82tfmZeOoLUlM6JPCkpHji9jt/8+sUkJ8Xzwl9K+OuHFcy99jzirJbTjrVnJQ1sBfpRNL+OvaU6xo6hUM9Ir2OPidwwTu+yNJlMvS7/Io/Hw6OPPsrq1asxm89urF1NjbtPXal2eypVVU1n9VzRItrr2OL189EBFymJNrLT4mhyt3Yq97S24aqFlhbvacd+abyd0uMNvPNxJdt2V/Kl8fZO5S0tXqoCgQGNv79E++vYG6pj7BgK9YyEOprNpjM2YHtM5Lm5uezcuTP02OVy4XA4OpVXV1eHHldVVXUq/6KdO3dSXV3NXXfdFTrfnXfeyTPPPMN5553XUzgSo8qdTbjqPEybYMfcxRdBry9ASZnrtAQPcMk4O2PyUqnIT6OktI5xI9NJTYobjLBFRMKuxybxjBkz2L59O7W1tXg8HrZu3crMmTND5fn5+cTHx7Nr1y4ANm3a1Kn8i6699lrefPNNNm/ezObNm3E4HPzmN79REh/i3ttzHKvFxAX56T3v3I1LL7RjNsPO/VX9GJmISGTrMZHn5uayZMkSioqKmDdvHnPnzmXKlCksXryYvXv3ArBy5UqeeOIJZs+ejcfjoaioaMADl9hhGAYlZXUMz0kmzmbp+YBuJCVYufi8bI663Byvae7HCEVEIlevZnYrLCyksLCw07Y1a9aEfp4wYQIbN27s9vi7776727I333yzNyFIDKuq91DX5OXCEWffGu8waUwmB481sKPExdwZYzCbux+vISISCzSzm4RdSVkdAMOyz31kucVi5kvj7dS72zh4rP6czyciEumUyCXsSsrqSEuOIy25fwaojcpNITczkY8P1uD1RcdodRGRs6VELmFlGAb7y+oYPzLjjLct9oXJZGLaRAdeX4DDFQ39ck4RkUilRC5hVVHdTGOLj3GjMvr1vNlpCaQnx1FRpUFvIhLblMglrEpK26+PjxvZv4kcIN+ejLO2hdY2LZoiIrFLiVzCqqSsDkdGz3Orn40R9hSCBhwo16A3EYldSuQSNoFgkANH65hwhpXOzoUjMxGb1cynn9cOyPlFRCKBErmETdkJNx5vgEljBiaRm80mhuck8+nntQS7WBNARCQWKJFL2JSUtbeUJ4wauKVsR9iTaWxuo9wZ2ws7iMjQpUQuYVNSVscIe3K/3T/eleE5yZiAPYdqBuw5RETCSYlcwsLnD3LwWMOAXR/vkBhvZXReKrsPK5GLSGxSIpewOFLZgM8fZNLorAF/rovGZlF6vJHG5rYBfy4RkcGmRC5hsa+0DpNpYO4f/6KLxmZhAHuPqFUuIrFHiVzC4kB5HWPyUklK6NUCfOdkhCOF9JQ4da+LSExSIpdB5w8EKT3RxAX5A98ah/a516ecl82nn9fgDwQH5TlFRAaLErkMuqMuN23+IOfnpw3ac045PwePN8ChY1pERURiixK5DLojlY0AXJCfPmjPOWlMJmaTiX0n1z4XEYkVA3+BUuQkfxC8Pj8HjtaRnhxHXJyFZm/7gibBAZ54LTHeyvCcZEpPNA7sE4mIDDIlchk0Xp+fHSVO9pfVk5UWz879rlDZJePsA/a8JrOJZq+fEY5k9h6uwd3qC619Hm+zYlW/lIhEMSVyGVQerx+3x8f4fl5//Ey8vgC7P6vCMKC51c9bH1aQkmQD4PKJuVjj9TEQkeiltogMqqp6DwD2jP5ftrQn2entz1nT2Drozy0iMlCUyGVQVdW3YjZB9gCsP96TzNQ4zCaoblAiF5HYoUQug6q63kNWWgIWy+C/9SxmM5mpCdQokYtIDFEil0ETCBpUN7Riz0gMWwzZ6fHUNLZiaH1yEYkRSuQyaCqr3ASCBjlhuD7eITs9AZ8/SFOLL2wxiIj0JyVyGTSfH28CCG+L/OS1eV0nF5FY0atEvmXLFubMmcOsWbNYu3btaeUlJSUsWLCAgoICli1bht/v71T+5JNP8vTTT4ceHz58mIULF3LTTTdx6623UlJSco7VkGjw+fFGEuMtJA/CQindyUiJx2I26Tq5iMSMHhO50+lk1apVrFu3js2bN7N+/XoOHTrUaZ+lS5eyYsUKXn31VQzDYMOGDQA0NTXx4IMP8vzzz3faf/ny5SxevJjNmzdzzz33cP/99/djlSRSlR5vxJ6RGJqMJRzMZhNZafG6BU1EYkaPiXzbtm1Mnz6djIwMkpKSKCgooLi4OFReUVFBa2srU6dOBWD+/Pmh8jfeeIMxY8Zwxx13dDrnLbfcwsyZMwEYP348x48f77cKSWRqbG6juqGVnDB2q3fITkugtrGVoAa8iUgM6DGRu1wu7PZ/TJ/pcDhwOp3dltvt9lD5vHnzuPPOO7FYLJ3OOX/+/NC2p556ihtuuOHcaiER73Bl+6pj4ZgI5ouy0xPwBwwa3W3hDkVE5Jz1eLGyq9t0Tu0a7an8TOf9+c9/zu7du3nhhRd63P9U2dkpfdofwG5P7fMx0SaS63hix1EsZhNjhmdg7eIecpvNSmpK90neZmt/q3a1T2+OPbV81DB4b+8Jmr0BkpLisWcl9aUqAy6SX8f+ojrGjqFQz0ivY4+JPDc3l507d4Yeu1wuHA5Hp/Lq6urQ46qqqk7lXfH7/dx///04nU5eeOEFUlP79kuqqXET7MNyWXZ7KlVVTX16jmgT6XXce7CKfHsyHk/XrWCfz0+Tu/vr1j5f+wDKrvbpzbGnlltMBlaLiWOuJlpavFQFAr2txoCL9NexP6iOsWMo1DMS6mg2m87YgO2xa33GjBls376d2tpaPB4PW7duDV3fBsjPzyc+Pp5du3YBsGnTpk7lXfnZz36G2+3m+eef73MSl+gTNAxKTzQxOi8yXmuzyUR2mmZ4E5HY0KsW+ZIlSygqKsLn83HzzTczZcoUFi9ezPe//30uvvhiVq5cyfLly2lubmbSpEkUFRV1e77a2lrWrl3LiBEjuOWWW0LbN2/e3D81kojjqvPQ2hZgpCMyEjm0XyffX15PIBAMdygiIuekVzf0FhYWUlhY2GnbmjVrQj9PmDCBjRs3dnv83XffHfo5KyuLffv29TVOiWKlJxoBGJmbwvHq5jBH0y47PYFg0OB4TQtpSXHhDkdE5KxpZjcZcGUnmrBazAyLoEFlHTO8lTlj+/qeiMQ+JXIZcGUnmhjpSA7LimfdSU2yEWc1c9TpDncoIiLnJHL+skpMChoGZc4mRuelhTuUTkwmE5lp8RyrUiIXkeimRC4Dqqreg8cbYEyEjFg/VVZqApVVzQSCGvAmItFLiVwGVNmJ9mvQo3MjMJGnxeMLBDlR0xLuUEREzpoSuQyo0hNNWC0m8u3J4Q7lNFknB7yV6zq5iEQxJXIZUGUnmsi3p3Q5LWu4pSfHYbOYNXJdRKJa5P11lZhhGAblzqaIvD4O7dMeDstJplyJXESimBK5DJiqhlaaW/0ReX28w0hHMuVOd5eL/4iIRAMlchkw5R0D3SK0RQ4wwp5Ci9eveddFJGopkcuAKT3RhMVsYoS978vODpYRjvbYyjTgTUSilBK5DJiyE43k25OxWSP3bTY8JxmTCV0nF5GoFbl/YSWqGR1Ll0bw9XGAOJuFYdka8CYi0UuJXAZETWP7QLdIHbF+qlGOFMpd6loXkeikRC4DIjSjW4TNsd6VUbmp1DV5aWxpC3coIiJ9pkQuA6L0RBNmk4kRETij2xeNym0f8KaV0EQkGimRy4AoO9HE8Jxk4myWcIfSo1Enr+PrOrmIRCMlcul3Rmjp0si97exUKYk2stPiNVWriEQlJXLpd/XuNppafBE/Yv1Uo3JTtXiKiEQlJXLpdx0t21FRlsidtS20tvnDHYqISJ8okUu/8Qeh2evncGUDANkZCTR7/aF/wQieznxUbgoGcMzVHO5QRET6xBruACR2eH1+dpQ42Xu4htQkG3sP13Qqv2ScPUyR9azjMkCZs4kLRqSHORoRkd5Ti1z6XW2jl6y0hHCH0Ssms4lmr5+4OAvJCVaOHG/s1IvgD4Y7QhGRM1OLXPpVmy+A2+Pjwihp1Xp9AXZ/VgVAWnIcB8rr2FHiDJVfPjEXa7w+JiISudQil35V2+QFICstPsyR9F1WWgL1TV4CQTXDRSR6KJFLv6pr7Ejk0dG1fqrs9ASCBtQ3aapWEYkeSuTSr2obW0mIs5AYhd3R2Sd7EWoaW8MciYhI7/UqkW/ZsoU5c+Ywa9Ys1q5de1p5SUkJCxYsoKCggGXLluH3d74X98knn+Tpp58OPW5sbOTOO+9k9uzZLFq0iKqqqnOshkSK2qboGej2RSmJNmxWM7VK5CISRXpM5E6nk1WrVrFu3To2b97M+vXrOXToUKd9li5dyooVK3j11VcxDIMNGzYA0NTUxIMPPsjzzz/faf/Vq1czbdo0/vKXv3DLLbfw+OOP92OVJFx8/iD1bm9UXh8HMJlMZKclUHPy8oCISDToMZFv27aN6dOnk5GRQVJSEgUFBRQXF4fKKyoqaG1tZerUqQDMnz8/VP7GG28wZswY7rjjjk7nfOuttygsLARg7ty5vPPOO/h8vn6rlITHiZpmDAOyUqMzkUP7IL26Ji/BSJ69RkTkFD1eyHS5XNjt/5jIw+FwsGfPnm7L7XY7Tmf77Tvz5s0D6NSt/sVjrFYrKSkp1NbWkpub26ugs7P7vhiH3R4904WerXDXcfPPngPOY2ReOqkppydzm81KakrX3e5nKusoB7rcpzfH9vZ5R+Smsq+0jrYg2NMSSEqKx56V1O25B0K4X8fBoDrGjqFQz0ivY4+J3DBOb5mYTKZel/eW2dz7cXc1Ne4+tZjs9lSqqmJ7ZatIqGOlkYTN8GMmSJP79OvMPp+/y+09lXWUA30+b1+fNymufdnVoycaSbCaaGnxUhUIdHvu/hYJr+NAUx1jx1CoZyTU0Ww2nbEB22P2zM3Npbq6OvTY5XLhcDi6La+qqupU3hWHwxE6xu/343a7ycjI6CkUiXAnSMIebD6rL3KRIjXJhs2iAW8iEj16TOQzZsxg+/bt1NbW4vF42Lp1KzNnzgyV5+fnEx8fz65duwDYtGlTp/KuXHfddWzatAmAV155hWnTpmGz2c6lHhJmQcPASTKOYHQvOmIymchKi6emQYlcRKJDr1rkS5YsoaioiHnz5jF37lymTJnC4sWL2bt3LwArV67kiSeeYPbs2Xg8HoqKis54zh/84Ad8/PHHfO1rX2PdunX86Ec/6p/aSNhU1Xlow4I9GP1remelJWjAm4hEjV7N2lFYWBgaZd5hzZo1oZ8nTJjAxo0buz3+7rvv7vQ4IyODX//6132JUyJcxxrksZDIs9PjCZQZNDRrhjcRiXya2U36xVGXGzNBso2WcIdyzjomtNF1chGJBkrk0i/KnE3Y8WAl+ruj05LjsFpMmqpVRKKCErn0i3KnmzyivzUOYDaZyExNoKZBM7yJSORTIpdz1uD20tjcRi7RPWL9VNlp8dQ1tWrAm4hEPCVyOWflrvYBbnmxlMjTE/AHDFx1nnCHIiJyRkrkcs7KT45Yj5WudfjHgLejrtietUpEop8SuZyzoy43OekJJJgGbyrTgZaeHIfFbOKoK/pvpxOR2KZELues3OlmVG5kLyrQV2aziczUeI46lchFJLIpkcs58bYFcNa2MMrR9xXpIl1WWgLHqtwEu1gYSEQkUiiRyzk5WuXGAEbmxl4iz06Lp7UtQFW9BryJSORSIpdzcvTkQLdRjtjqWod/DHgrV/e6iEQwJXI5J+UuN8kJVrLS4sMdSr/LSI3DbDZRdkIj10UkcimRyzkpd7oZ6UiJ6jXIu2MxmxmenRRaEEZEJBIpkctZCwSDHKuKvRHrpxrhSKHc2YShAW8iEqGUyOWsOWs9+PxBRsbgiPUOIxwpNLX4qGvSvOsiEpmUyOWsdczoNjqGW+QdX1LUvS4ikUqJXM5aucuN1WIiLzsp3KEMmHx7CiY0cl1EIpcSuZy1o84m8nNSsFpi920Ub7OQl52kkesiErGs4Q5Aooc/CF6fHwDDMChzupl8XhbN3vZtxOh4sNG5qRw4Wh/uMEREuqRELr3m9fnZUeIEoKXVj9vjIxA0QttGhzO4ATQqN5X39zlpbGkjLSku3OGIiHQSu32iMqBqm1oByEqNvYlgvmj0yelnyzXgTUQikBK5nJW6xvbbsTKHQCIfldc+Kl/XyUUkEimRy1mpbWwlJdFGnM0S7lAGXHKCjZz0BI1cF5GIpEQuZ6W2yRuT86t3Z3Requ4lF5GIpEQufebzB2lq8Q2J6+MdRuWm4qrz0NLqD3coIiKdKJFLn9WdHOiWeXKZz6GgY/a6oy61ykUksvQqkW/ZsoU5c+Ywa9Ys1q5de1p5SUkJCxYsoKCggGXLluH3t7daKisrWbRoETfeeCN33XUXzc3NADQ0NLB48WK+/vWvc/PNN1NSUtKPVZKBVntyoNtQ61oHKNN1chGJMD0mcqfTyapVq1i3bh2bN29m/fr1HDp0qNM+S5cuZcWKFbz66qsYhsGGDRsAeOSRR1i4cCHFxcVMnjyZZ599FoDf/e53jBs3jj/96U/827/9G48++ugAVE0GSm2Tl3ibhaT4oTMNQXpyHOkpcRq5LiIRp8dEvm3bNqZPn05GRgZJSUkUFBRQXFwcKq+oqKC1tZWpU6cCMH/+fIqLi/H5fOzYsYOCgoJO2wGCwWCode7xeEhIGDpdtLGgtrGVrLT4mFyD/ExG56bqXnIRiTg9NqlcLhd2uz302OFwsGfPnm7L7XY7TqeTuro6UlJSsFqtnbYDfOtb3+LWW2/lmmuuobm5meeff75PQWdn933ZTLs9dlfo6jDQdTRqW0hKiqfe3caUC3JITen8BcxkMmGxmE/b3sFms55VWUc50OU+vclOjzIAACAASURBVDn2bJ83Lt6GcXIu+THD09l7uIYWv0FivJXEBCupAzDTm96rsWEo1BGGRj0jvY49JnLDOH0C7VNbYt2Vn+m4xx57jEWLFlFUVMRHH33EkiVLePnll0lOTu5V0DU1boLB3k/sbbenUlUV2y2pwahji9dPhbOBYNAgJcFKk7u1U3mWYRAIBE/b3sHn859VWUc50OU+vTn2bJ/X3eJl92dVADS3tGEAf3r7EHnZSVw+MZfW5v5dp1zv1dgwFOoIQ6OekVBHs9l0xgZsj13rubm5VFdXhx67XC4cDke35VVVVTgcDrKysnC73QQCgU7bAd544w0WLFgAwKWXXkp2djaHDx/uY9UkHIbiQLcOOentLfeqek+YIxER+YceE/mMGTPYvn07tbW1eDwetm7dysyZM0Pl+fn5xMfHs2vXLgA2bdrEzJkzsdlsTJs2jVdeeaXTdoAJEybw+uuvA1BaWorL5WLs2LH9Xjnpf7WNXixmE2nJQ2/xkPg4C6lJNqobum/Bi4gMtl61yJcsWUJRURHz5s1j7ty5TJkyhcWLF7N3714AVq5cyRNPPMHs2bPxeDwUFRUB8NBDD7FhwwbmzJnDzp07ueeeewD46U9/yh//+Efmzp3Lvffey89+9jNSUyP7GoS0q21qJTM1HvMQG+jWwZ6RSHWDp8tLRyIi4dCr+4cKCwspLCzstG3NmjWhnydMmMDGjRtPOy4/P58XX3zxtO1jxozhhRde6GusEmaGYVDX6A3dUz0U5aQncKSykWbN8CYiEUIzu0mv1TZ6afMHyRpCM7p9UU5GIgDVuk4uIhFCiVx67VhV+6xmQ3GgW4fM1HjMZpOuk4tIxFAil1475nJjYmisQd4di9lEdlo8VfVK5CISGZTIpdcqqppJS47Dahnab5uc9ERqG1sJBILhDkVERIlceu9YlZvMIdyt3iEnI4FA0KCiujncoYiIKJFL77g9PuqavENqDfLu2NPbB7yVagEVEYkASuTSK0dPLhYylEesd0hOtJIQZ6HseGO4QxERUSKX3ulYh3soD3TrYDKZyMlIVItcRCKCErn0ylFXE+nJcSQOoTXIz8SenoCrzkNzqy/coYjIEKdELr1S7nQzwtH35WNjVU5G+yWGzyvVvS4i4aVELj1q8wU4XtPCCHvvlpkdCrLTEzABR5TIRSTMlMilRxXVzQQNg3y7WuQd4qwW8rKTOKIBbyISZkrk0qOOVudQXiylK6PzUjlc0UBQK6GJSBgpkUuPjlQ2kJ4cpxHrX3B+fjrNrX4qqzQxjIiEjxK59OhwZSPnDU/DNETXIO/OuJEZAJSU1YU5EhEZypTI5YzcHh+uOg/nDU8LdygRJystAXtGAvvLlchFJHyUyOWMjlQ2AHD+8PQwRxKZJozK5EB5PcGgrpOLSHgokcsZHalsxGSCMcM00K0rE0dn0uL1U+7SLG8iEh5K5HJGhysbyc9JISFOM7p1ZcLoTAD2l9WHORIRGaqUyKVbQcPg85MD3aRrGSnxDMtO0oA3EQkbJXLplrO2hRavn/OVyM9owqhMPjtWjz8QDHcoIjIEKZFLtzomglGL/MwmjM7E2xbQamgiEhZK5NKtI5WNJMZbGJajOdbPZPyo9vvJ96t7XUTCQIlcunW4soExeWmYNRFMl0xmE81ePxaLmeE5yXxaWkuz1x/651dPu4gMAg1Fli55fQGOuZqZc9WocIcSsby+ALs/qwIgPTmOz47W8/6nx7GY278fXz4xF6vWbxeRAaYWuXSp7EQTQcPgvGGaCKY3crMSCQQNqutbwx2KiAwxvUrkW7ZsYc6cOcyaNYu1a9eeVl5SUsKCBQsoKChg2bJl+P1+ACorK1m0aBE33ngjd911F83N7YtLuN1u/u///b/MmzePefPm8emnn/ZjlaQ/aKBb3+RlJWECjte0hDsUERliekzkTqeTVatWsW7dOjZv3sz69es5dOhQp32WLl3KihUrePXVVzEMgw0bNgDwyCOPsHDhQoqLi5k8eTLPPvssAE888QTDhg1j06ZN3HvvvTz88MP9XzM5J4crG8hJTyAtOS7coUSFOJuFrLR4nLVK5CIyuHpM5Nu2bWP69OlkZGSQlJREQUEBxcXFofKKigpaW1uZOnUqAPPnz6e4uBifz8eOHTsoKCjotN0wDLZu3cqdd94JwMyZM/nJT34yEHWTc3CkspHz89Wt3hd52UlU1Xt0P7mIDKoeE7nL5cJut4ceOxwOnE5nt+V2ux2n00ldXR0pKSlYrdZO22tqaoiLi+MPf/gD8+bNo6ioiEAg0J91knPgD0JFTTN1TV5G2FM6jcLWuiBnlpeVTNAAV50n3KGIyBDS45Bawzj9r/ep61J3V97d9kAgQHV1Nenp6WzatIn33nuP7373u7zxxhu9Djo7O6XX+3aw22N/0Y/+qKOrtoW/7TnR/sBsYv/RhlDZ+NGZpKYkdHusyWTCYjF3u4/NZj2rso5yoMt9enPsuTxvb489L8HGXz88Rm2Tl/FjsklKiseeldTtubuj92psGAp1hKFRz0ivY4+JPDc3l507d4Yeu1wuHA5Hp/Lq6urQ46qqKhwOB1lZWbjdbgKBABaLJbQ9MzMTq9XK3LlzAbj66qtpaWmhpqaG7OzsXgVdU+Pu07KRdnsqVVWxPetWf9Wxxevn8NE64mxmEm1mmtz/GIXt8/k7Pf6iLMMgEAh2u8+Zju/p3D5f+wDKrvbpzbHn8rx9OTYnI5Gy401MHttKS4uXqj72Num9GhuGQh1haNQzEupoNpvO2IDtsWt9xowZbN++ndraWjweD1u3bmXmzJmh8vz8fOLj49m1axcAmzZtYubMmdhsNqZNm8Yrr7zSaXtcXBwzZszg5ZdfBuDjjz8mMTGRzMzMc6qo9I9g0KCiupn8nGTMZk0E01d5WUnUNLbi9elykYgMjh4TeW5uLkuWLKGoqIh58+Yxd+5cpkyZwuLFi9m7dy8AK1eu5IknnmD27Nl4PB6KiooAeOihh9iwYQNz5sxh586d3HPPPQA8/vjjvPPOO8ydO5eHH36YVatWYTbrlvZIUOZsorUtwAh73y9fCAzLbu9K1+h1ERksvZp2qrCwkMLCwk7b1qxZE/p5woQJbNy48bTj8vPzefHFF0/b7nA4+PWvf93XWGUQfPp5LSYTDNf86mclJyMRq8Wk+8lFZNCoGSydfHqkBkdGIvFxlnCHEpUsZhOOzEROKJGLyCBRIpeQ2sZWjlU1k+9Qt/q5yMtOpqG5jQa3N9yhiMgQoEQuIXsO1wAwwq5u9XMx7OQtZ58drQ9zJCIyFCiRS8juQ9XkpCeQrmlZz0lWWjxxNjMHypXIRWTgKZEL0L4k576yOi4am9Vpwh/pO5PJRF5WEp8dre9yYiQRkf6kRC4A7C+rw+cPMvm83k3KI2eWl51EXZMXV72maxWRgaVELgDsPlxDfJxFC6X0k2FZ7eMMSkrrwhyJiMQ6JXLBMAx2H6pm8pgsbFa9JfpDWrKNjJQ49pUpkYvIwNJfbeGoy01dk5cp56tbvb+YTCbGjcxgf1kdQV0nF5EBpEQu7NjvwmRCibyfjRuZgdvj45jLHe5QRCSGKZEPcT5/gLc/rmTqBTmkp8SHO5yYMm5U+0JAJepeF5EBpEQ+xO3Y78Lt8XH9l0aEO5SYk5kaT25WkhK5iAwoJfIh7s0PK8jLSmLSaC0jOxAmjs7kwNF6/IFguEMRkRilRD6EfX68kSOVjVx/Wb4mgRkgk0Zn4m0LUHq8KdyhiEiMUiIfwt788BjxNgszJg8Ldygxa/yoDABKymrDHImIxCol8iGqqaWND/a5mDE5j6SEXi1LL2chNSmOUY4UXScXkQGjRD7E+IPQ7PXzxocV+ANBrpqcR7PXH/oX1C3P/W7imEwOVTTg9QXCHYqIxCAl8iHG6/Pzwb4TvLHzKLlZiRyrcrOjxBn65w9qUFZ/mzg6E3/A4FBFQ7hDEZEYpEQ+BFVUNdPc6mfCKI1UH0gms4lmr598Rwpms4k9h2tCPR9+fV8SkX6ii6NDTNAw2HOomuQEKyMdKeEOJ6Z5fQF2f1YFQHZaPB99VsWw7CQALp+YizVeHz8ROXdqkQ8xH31WRU2jl6kX5mA265azwTIsO5mahlbadJ1cRPqZEvkQ4g8E+fN7pWSmxjN2eFq4wxlS8rKTMIATtS3hDkVEYowS+RDy9seVVDe0ctm4HMyaAGZQ2TMSsJhNSuQi0u+UyIcIj9fPlvc+58IR6QzPSQ53OEOOxWzGkZnIiRolchHpX0rkQ8TWHUdpbPHx9WvHajrWMMnLTqLe3YbH6w93KCISQ5TIh4CG5jaK/17OtPF2xuTp2ni4DMtu7wmprG4OcyQiEkuUyGOcYRisf/MgPl+Q+dedH+5whrTstHgS4ixUVCmRi0j/6VUi37JlC3PmzGHWrFmsXbv2tPKSkhIWLFhAQUEBy5Ytw+9v7zqsrKxk0aJF3Hjjjdx11100N3f+A3bixAmuuOIKjh071g9Vka688n4Z73/q5OtXjyEvKync4QxpJpOJfHsyldXNBDQXroj0kx4TudPpZNWqVaxbt47Nmzezfv16Dh061GmfpUuXsmLFCl599VUMw2DDhg0APPLIIyxcuJDi4mImT57Ms88+GzomGAyybNkyfD5fP1dJOuzc7+KPbx9h+qRcCq8eE+5wBMi3p9DmD1J6vDHcoYhIjOgxkW/bto3p06eTkZFBUlISBQUFFBcXh8orKipobW1l6tSpAMyfP5/i4mJ8Ph87duygoKCg0/YOzz33HDNmzCAzU9OEDoQjlY2s+fM+LshP5445EzTALUIMz07CZIJPP9eypiLSP3qcI9LlcmG320OPHQ4He/bs6bbcbrfjdDqpq6sjJSUFq9XaaTvAJ598wgcffMCaNWu67KrvSXZ236cWtdtT+3xMtOmoo6u2hWde2ktWWgIP33kV6SnxoX2M2hZSUxK6PYfNZu22/Exl0N51bLGYz+r4ns5ts7W/j7rapzfHnsvz9vexw7KTKSmr6/Y9OZTeq7FsKNQRhkY9I72OPSZywzj9Wt6prbvuyrvb7vF4ePTRR1m9ejVm89mNtaupcRPswzVGuz2Vqqqms3quaGG3p3L8RAPv7K5k87uf4w8E+e4/X0xdfQt19f+4dzloQJO7tdvz+Hz+bsvPVAaQZRgEAsGzOr6nc/t87eMuutqnN8eey/P297F52Ul8eKCKA4eryErrnOiHyntVdYwNQ6GekVBHs9l0xgZsj4k8NzeXnTt3hh67XC4cDken8urq6tDjqqoqHA4HWVlZuN1uAoEAFosltH3nzp1UV1dz1113hc5355138swzz3DeeeedVSWHOsMw+OCT4zy3+RNO1LZwwYh0JozK4KiriaOuzm/AS8bZuzmLDJYROcl8eKCKPUdq+PLU/HCHIyJRrscm8YwZM9i+fTu1tbV4PB62bt3KzJkzQ+X5+fnEx8eza9cuADZt2sTMmTOx2WxMmzaNV155pdP2a6+9ljfffJPNmzezefNmHA4Hv/nNb5TEz1Jdk5dfbtjNj3/3dwC+v2AK3795ymktPYkc6SlxZKXFs+dQTbhDEZEY0KsW+ZIlSygqKsLn83HzzTczZcoUFi9ezPe//30uvvhiVq5cyfLly2lubmbSpEkUFRUB8NBDD/HDH/6QX/3qVwwbNoxf/vKXA16hoWT3oWp++3IJbf4A3/nni/nSBdlYLWaaNXNYRDOZTFw0NosP9jnx+YPYrJrOQUTOXq8WRC4sLKSwsLDTtjVr1oR+njBhAhs3bjztuPz8fF588cUznvvNN9/sTQhyCk9bkI1vHeSvH1YwPCeZO+ZM5LyRmbS0ePH6g+gW5cg3aUwWf9t9nM+O1nPR2KxwhyMiUaxXiVwihz8Q5P/77w85UtnI+FEZTBtv56irifoWX2hgla6DR75xIzOwWc3sPlytRC4i50R9elFmy3ulHKls5OqL87hyUi4Wi17CaBRnszBhVCZ7D+s6uYicG2WBKPL58UZe3l7GFRMdnJ+fHu5w5BxNOT8bZ50Hp9YoF5FzoEQeJXz+AL99uYT0lDgWfPmCcIcj/eCS87MB2HnAFeZIRCSaKZFHiZf+9jmV1c3cMXsCSQka2hALcjISGTcinXf3nuhyAiURkd5QIo8Ch4418OoH5Vw3dTiTz8sOdzjSj66eMgxnbQuHK7WIioicHSXyCOUPQrPXT2NLG2v+vI/MtHjmXj2GZq9ft5fFkGnjHcTZzLy753i4QxGRKKVEHqG8Pj87Spz88a3DVNV7uOSCHPYermFHiRN/MBju8KSfJMZbmTbewY79Try+QLjDEZEopEQewQzDYF9pHWnJcYywJ4c7HBkg11w8DI83wEefVYU7FBGJQkrkEcxV76GmsZWJozO0nngMGzcqg5z0BN7dq+51Eek7JfIIVlJaR5zNzHnDdc94LDObTFx98TBKSutw1emechHpGyXyCFXd4OGo0x2aylNii8lsotnrD/2bOi4HA9jytyM0e/34NQxCRHpJNyRHqLc/qgQTTBiVEe5QZAB4fQF2f+GaeF5WEm/uPEpqgoUrJuVhjdfHU0R6pqZeBGpp9bP9kxOMHZZGUoIt3OHIIDk/P43G5jZcdZ5whyIiUUSJPAL9bU8lXl+AiaMzwx2KDKJRuanE2czsL6sLdygiEkWUyCNMIBjk9Z3HuCA/nez0hHCHI4PIZjVz8fk5lDndGvQmIr2mRB5h9pfVU9PYysypw8MdioTBlAtysJhNvL7zWLhDEZEooUQeYf5e4iQhzqI51YeopAQbF4xI5+/7nNQ1ecMdjohEASXyCOIPBPnwsyouvTBHt5wNYZPGZGIYBlt3lIc7FBGJAsoWEWRfaS3NrX4un5gb7lAkjFKT4rhsvIO3Pq7E7fGFOxwRiXBK5BFkR4mLxHgrF43JCncoEmazLh+Jty3AXz/UtXIROTMl8gjh8wf58GA1l41Tt7rA8JxkLjk/m9d2HtOqaCJyRsoYEeKTz2vweP1coW51OWnOVaNxe3y8s7sy3KGISARTIo8QO0pcpCTaNAmMhFw4IoNxIzP4y/tl+PxqlYtI15TII0CbL8BHh6q5bJwdq0UvifxjUZWvXjGSencbr++qCC2wogVVRORUWpUhTPxB8Pr8AHx8sBpvW4CLz8+m2du+LWiEMzoJt45FVQzDwJGZyMvbS4mzmbCYzVw+MVcLqohISK+af1u2bGHOnDnMmjWLtWvXnlZeUlLCggULKCgoYNmyZfj97cmosrKSRYsWceONN3LXXXfR3NwMwOHDh1m4cCE33XQTt956KyUlJf1Ypejg9fnZUeJkR4mT13ceJSHOQoPbG9rmD6rZJWAymZhyfjYtrX4OHWsIdzgiEoF6TOROp5NVq1axbt06Nm/ezPr16zl06FCnfZYuXcqKFSt49dVXMQyDDRs2APDII4+wcOFCiouLmTx5Ms8++ywAy5cvZ/HixWzevJl77rmH+++/fwCqFh18/iAVVW5G5aZiNpvCHY5EoGHZSdgzEth7pJaAvuCJyBf0mMi3bdvG9OnTycjIICkpiYKCAoqLi0PlFRUVtLa2MnXqVADmz59PcXExPp+PHTt2UFBQ0Gk7wC233MLMmTMBGD9+PMePH+/3ikWLiupm/AGDMXmp4Q5FIpTJZOKSC3JoafVz+FhjuMMRkQjT44U2l8uF3W4PPXY4HOzZs6fbcrvdjtPppK6ujpSUFKxWa6ft0J7UOzz11FPccMMNfQo6OzulT/u3P39kJUqjtoXUlAQqq50kxls5f1QmZtM/WuQ2m5XUlK5XP+uurGPb2Rx7Ls/bwWQyYbGYz+r43sQFdLnPQNYpHMempiScVj4uOZ69R2r5pLSWuHgbdnvfPwORJNI+jwNhKNQRhkY9I72OPSZywzh91JXplITTXXlvjvv5z3/O7t27eeGFF3odMEBNjZtgH0aD2e2pVFU19ek5BlqL109dQwulxxs4b3gazc2dF8jw+fw0uVu7PLarstSUhNC2vh57Ls97qizDIBAIntXxvYkL6HKfgazTYB/b8Tp2VT55bBZv7DrGX3eU89XLR3b7vJEuEj+P/W0o1BGGRj0joY5ms+mMDdgeu9Zzc3Oprq4OPXa5XDgcjm7Lq6qqcDgcZGVl4Xa7CQQCnbYD+P1+7rvvPvbu3csLL7xAampkf9sZKJUnu9VH5Q7N+kvfDM9pv1b+8vZSPCfvbhAR6TGRz5gxg+3bt1NbW4vH42Hr1q2h69sA+fn5xMfHs2vXLgA2bdrEzJkzsdlsTJs2jVdeeaXTdoCf/exnuN1unn/++SGbxAHKnW7ibGbyspLCHYpEAZPJxOUTHbhbfGx+9/NwhyMiEaJXLfIlS5ZQVFTEvHnzmDt3LlOmTGHx4sXs3bsXgJUrV/LEE08we/ZsPB4PRUVFADz00ENs2LCBOXPmsHPnTu655x5qa2tZu3Ytn3/+Obfccgs33XQTN91008DWMgL5/EGOutyMdKRotLr0Wk56IldNzuP1nceoqHKHOxwRiQC9mlWisLCQwsLCTtvWrFkT+nnChAls3LjxtOPy8/N58cUXT9u+b9++vsYZcz47Wo/PH2S0RqtLHxVeM5bdh6pZ9/pB7rttaqexJyIy9Gg+0DD5+GAVNquZYdnqVpe+SUm08c8zz6OkrI6dB6rCHY6IhJkSeRj4A0H2HK5hpCMFi1kvgfTdl6fmM8qRwvo3D+Jt04IqIkOZskgYHCivp6XVz6jc6L4XWMLHbDax6KvjqG308qdtGvgmMpQpkYfBrgMu4mxmhuckhzsUiWIXjsjg2inD+Mv75Xx8sLrnA0QkJimRD7Jg0GDXZ1VcNDZbS5bKOVs0axyj81L5zZZPqaxuDnc4IhIGyiSD7EB5HU0tPi69MCfcoUiU6lirvNnrxxc0+D9zJ2Kzmnly4x6qGlq1XrnIEKNEPsje2XOcxHgrF43NCncoEqW8vkBoudsdJU4OHWtgxuQ8qhs8PPk/u/F4feEOUUQGkRL5IGpsaWPXARczJucRZ7OEOxyJIblZSVw5MZfK6mb+pFnfRIaUXk0II/3jvb3H8QcMvjx1eLhDkRg0blQGdW4vb+w6Rmqija9fMzbcIYnIIFAiHyRBw+Dtjyu5cEQ6+fYUmrXohQyAyyc6SE+JZ9O7n2MyQeHVSuYisU6JfJCUlNXhqvNwk1pJMoDMJhOLZo3DYjLx0t8+x2w28bWrxoQ7LBEZQErkg+TtjypISbQxbbw93KFIjDObTfyfr03EMAz++PYRAOZMH6052UVilBL5IGhwe/noYDU3TBuBzapBbjKwTGYTHl+A22aNwx8M8se3j+Cs83Dzl8/HYjETb7Ni1TBXkZihRD4I/rbnOIGgwXVT88MdigwBXl+A3Z+1L6YycXQmrd4A7+45zqGKBq6bOpxrpgzHGq+Pvkis0PfyARYMtg9ymzg6k7wsrXQmg8tkMnHZeDszJufhqm3hL9vLcNV5wh2WiPQjJfIB5A/C3/e7qGlsZfpFeaHZuJq9foJGuKOToeSCEenMunwkXl+Qlf/1Ie/uOY5h6E0oEgvUvzaAWlrbWP/GQVISbbT522fj6nDJOA16k8GVm5XEnKtGsedwDc+/UsLfS5wU3TienPTEcIcmIudALfIB9PbHlTQ0t3H5RAcWs0YMS/ilJsXx/VsuYdGscRw81sCK3/6dN3Ydwx/QBO0i0Uot8gHS4Pbyl/fLyM9JZoRdy5VK5LBYzEyfnMeFIzP479c/Y+1rn7FlWykzpw7n+stGkpFsC3eIItIHSuQDZONbh/H5g1w+0aH7dyWinDqq/fKJDvLtKewrreXP75Xy6gflzJicx5fG2blwZAbxWhNAJOIpkQ+AQ8caeO+TE9wwbSRpyXHhDkekWyaTiXx7Mvn2ZOqavFQ3tLLtkxO8/XElVouZcSPTuWhsFpdeaNddFyIRSom8nwWDBmtf+4yMlDhuvHIUew5XhzskkV7JTI2nYPpobv7y+RyqaGB/WR37y+r4n78e5n/+epjhOclMm+DgyokOhmXrcpFIpFAi72d/+aCMMmcTd359EvFx6paU6OL1Bdh9qP3L50hHCiMdKTR7fJQ5myg74eZP737On979nDF5qcy8ZDhXTMwlKUF/RkTCSZ/AflT8QTl/fPsI08bbuXJiLi1tgXCHJHLOkhNtTBqTxaQxWVw4MoNPj9Ty7p5KXnj1AP/9xkGmTXAwbYKD8SMzSNSMcSKDTp+6frJlWykvvXOEKyY6+PbcSRrgJjEpMy2Bq6cMY8bFeZQ73Wz/5Di7DlSx7ZMTmM0mzhuexqTRmYwbmcHovFSSEzQCXmSgKZGfI8Mw2Pzu5/zpvVKuuiiXb31tIhazbs+X2HTqiHeA8/PTGZOXiqveg8lkYn9ZPVu2ldIxaVxOegIjc1MZlZvChSMyOH9YWpgiF4ldvUrkW7Zs4Ve/+hU+n49vfvObLFq0qFN5SUkJy5cvx+12M23aNB555BGsViuVlZUsXbqUmpoaxo4dy8qVK0lOTqaxsZH77ruPo0ePkpWVxerVq7Hbo2umM8MwOFzZyNa/l7PzQBXXXDyMb86egFkTv8gQY7GYGZadzCXj2ke2e30BahpaqWlspaahlc/K6/joZPI3mWB0XhojcpIZlp1EXnYSeVlJ2DMSsVr0BVjkbPSYyJ1OJ6tWreJ///d/iYuL47bbbuPKK6/kggsuCO2zdOlSfvzjHzN16lQefPBBNmzYwMKFC3nkkUdYuHAhX/va1/j3f/93nn32WZYuXcrq1auZNm0av/nNb9i0aROPP/44q1evHtCK9peWVh+7DlTx5ocVlDmbSIizUHDlKOZcNRqPr/M1cc2nLkNRvM3C8Jxkhuf8Y2S7x+snOyOR0spGjlY1s+dwDe/uPd7puMR4C8kJNpITbSTFWzGb2pdkNZva/wUNg2DQIGgYGAbY2RznpAAACuhJREFUrGbirGZsVgvxcRZSEm2kJ8eRmtTx/zhSkmykJNj0BVtiWo+JfNu2bUyfPp2MjAwACgoKKC4u5nvf+x4AFRUVtLa2MnXqVADmz5/PU089xS233MKOHTv493//99D2f/mXf2Hp0qW89dZbrF27FoC5c+fy6KOP4vP5sNl6dz3tbD6U3R1TUlaHs7YFA+joDwwEDfwBA18gSCAQpKnFR22jl7qmVlq8fgDyspL41rSJmM0mrBYzJaV1p5174tgskrq5Rmi1mLst66m8q7LEeCsBv+2sjj2X5z1VXGYGCQn+szq+N3GdWse+HhuO38fZHNtRx4F63rON61yPTUqwMXFMFhgw+QI77mYvPn+AphYfbk8bacnxNHt8eLwBPF4frW1BDMMgaLT3fhm03/Nus5kxm0yYTOAPBPG0BWjy+GjzB2lp9dHVOjAmIDHBRmKchThbe9KPt1mwWjr/TQgGDXz+IL6Agc8fCH1h6IjDbAKbxYLN1vEFwkyc1YLNasJmtWCz/iO2lKR4PN620PGGwcm/MRA8+cA4GRsmMJtM0P4fZnP7Ocy0/x+TKbTfyZ8iRnJyDc3NbeEOo1fa30WEfveEXpf21zcYbN/DCJ6yL+3vHU+rL/QamU0nXyPaX48vvkYACXFWpo2391svU085r8dE7nK5OnV7OxwO9uzZ02253W7H6XRSV1dHSkoKVqu10/YvHmO1WklJSaG2tpbc3NxeVSozs+/3sGZnp3S5/ZputveX80ZknlVZVB577f1nPG/Y4tKxUROXiPRdj18Xulrq8NQR2d2V93TcaYFogJiIiEif9Zg9c3Nzqa7+x+xkLpcLh8PRbXlVVRUOh4OsrCzcbjeBQKDTdmhv1Xcc4/f7cbvdoa57ERER6b0eE/mM/7+9uw9pst/DAH5lmjiKSjMxM6mHoJSK0KKX4TJwlZtpJGSByyQLsowIaZZSCJovQ0mKKCrBSjKsTGuUEhhUS0vK/jELS/MtA+1FTefcfuePODtl89RzHk56316fv9wL4764nF/ue7LvqlUwmUzo7u5Gf38/KioqEBwcbH/cx8cHrq6uqK2tBQCUlpYiODgYLi4uCAoKgtFo/OF+AFCpVCgtLQUAGI1GBAUF/fbn40RERPQfE4Sja+DDlJeX48yZM7BYLIiKikJ8fDzi4+ORmJiIRYsW4eXLl0hJSUFfXx/8/f1x/PhxTJo0CW1tbdDr9ejq6oK3tzdyc3MxdepUfPr0CXq9Hi0tLZgyZQoMBgNmz579J/ISERHJym8NciIiIhqb+B9mREREEsZBTkREJGEc5ERERBLGQU5ERCRhHOREREQSJutBXl5ejrCwMISGhtq/210OdDodNBoNIiIiEBERgbq6Otlk7e3thVarRWtrK4Bv3/UfHh4OtVqNvLw8+/Pq6+uxefNmrFu3DkeOHMHQ0NBoHfLfNjxjcnIy1Gq1vc/KykoAI2cf606ePAmNRgONRoPs7GwA8uzRUU65dXnixAmEhYVBo9GgoKAAgPy6dJRRcj0KmXr//r0ICQkRHz9+FH19fSI8PFy8fv16tA/rH7PZbGL16tXCYrHY75NL1ufPnwutVisCAgJES0uL6O/vFyqVSrx7905YLBYRFxcnqqqqhBBCaDQa8ezZMyGEEMnJyeLy5cujeei/bXhGIYTQarWis7Pzh+f9t+xj2cOHD8WWLVuE2WwWg4ODQqfTifLyctn16ChnRUWFrLqsrq4W0dHRwmKxiP7+fhESEiLq6+tl1aWjjI2NjZLrUbZn5N9vbVMoFPatbVL35s0bTJgwAfHx8di4cSMuXbokm6xXr17F0aNH7V/l++LFC/j5+cHX1xfOzs4IDw/HnTt3HG7ck0re4Rm/fv2K9vZ2pKamIjw8HPn5+bDZbCNmH+s8PT2h1+sxadIkuLi44K+//kJTU5PsenSUs729XVZdLl++HIWFhXB2dkZXVxesViu+fPkiqy4dZXR1dZVcj7/cfiZVv9raJlVfvnzBypUrcezYMQwMDECn02HDhg2yyJqenv7DbUcddnZ2jrhxTwqGZ+zq6sKKFSuQlpYGhUKB3bt3o6SkBAqFwmH2sW7+/Pn2n5uammA0GhETEyO7Hh3lLCoqQk1NjWy6BAAXFxfk5+fjwoULWL9+vSzfk8MzWq1Wyb0nZXtGLv7m9jWpWLp0KbKzs6FQKODu7o6oqCjk5+f/9Dw5ZB2pQzl16+vri1OnTsHDwwNubm6IiYnB/fv3JZ/x9evXiIuLw6FDhzBnzpyfHpdLj9/nnDdvniy7TExMhMlkQkdHB5qamn56XA5dfp/RZDJJrkfZDvJfbW2TqqdPn8JkMtlvCyHg4+Mjy6wjdTjSxj0pamhowN27d+23hRBwdnaW9O9vbW0tYmNjcfDgQWzatEm2PQ7PKbcuGxsbUV9fDwBwc3ODWq1GdXW1rLp0lNFoNEquR9kO8l9tbZOqnp4eZGdnw2w2o7e3Fzdu3EBOTo4ssy5ZsgRv375Fc3MzrFYrbt26heDg4BE37kmREAIZGRn4/PkzLBYLiouLERoaOmL2sa6jowMJCQkwGAzQaDQA5Nmjo5xy67K1tRUpKSkYHBzE4OAg7t27h+joaFl16SjjsmXLJNejbD8j9/LywoEDB6DT6exb2xYvXjzah/WPhYSEoK6uDpGRkbDZbNi2bRsCAwNlmdXV1RWZmZnYt28fzGYzVCoV1q9fDwAwGAw/bNzT6XSjfLT/mwULFmDXrl3YunUrhoaGoFarodVqAWDE7GPZ+fPnYTabkZmZab8vOjpadj2OlFNOXapUKvvfmokTJ0KtVkOj0cDd3V02XTrKuHfvXkyfPl1SPXL7GRERkYTJ9tI6ERHReMBBTkREJGEc5ERERBLGQU5ERCRhHOREREQSxkFOREQkYRzkREREEibbL4QhIsf6+vqQnJyM5uZmODk5ISAgAGlpabh+/ToKCgrg5OSE6dOnIysrC97e3iguLsbFixfh5OSEGTNmIDU1FXPnzoVer8enT5/Q0tKCNWvWYP/+/TAYDHjy5AmsViv8/f2RkpKCyZMnj3ZkIlnjGTnROFNZWYm+vj7cvHkTJSUlAIBXr17BYDDg3LlzKC8vx9q1a3H69GmYTCacO3cOhYWFKCsrg1arRUJCgn2BxMDAAG7fvo2kpCScPXsWEydOxPXr11FWVoaZM2fCYDCMZlSicYFn5ETjTGBgIPLy8hATE4NVq1Zh+/btePDgAZRKJby9vQEAsbGxAIDs7GyEhYXB3d0dwLc90+np6WhtbbW/1r9VVVWhp6cHjx49AgBYLBZ4eHj8wWRE4xMHOdE44+vri8rKSlRXV+Px48fYsWMHoqOjf1jJODAwgLa2NoerG4UQGBoaAgAoFAr7/TabDYcPH4ZKpQLw7RK+2Wz+P6chIl5aJxpnioqKkJycDKVSiaSkJCiVSjQ0NMBkMuHDhw8AgCtXriAnJwdKpRJGoxHd3d0AgGvXrmHatGnw8/P76XWVSiUuX76MwcFB2Gw2pKamIjc3949mIxqPeEZONM5ERkaipqYGYWFhcHNzw6xZs5Ceno6qqirs3LkTAODp6YmMjAx4eXkhNjYW27dvh81mg7u7O86cOQMnp5/PAfbs2YOsrCxs2rQJVqsVCxcuhF6v/9PxiMYdbj8jIiKSMF5aJyIikjAOciIiIgnjICciIpIwDnIiIiIJ4yAnIiKSMA5yIiIiCeMgJyIikrB/ATHDEiMZFsKUAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.distplot(crossref.score)\n", "plt.title(\"Crossref matching scores\")\n", "plt.vlines(cr_thresh, 0, 0.010, \"r\");" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# Join article metadata and crossref results\n", "df = articles[['type', 'DOI', 'PMCID', 'PMID']]\n", "df = df.join(crossref, rsuffix=\"_cr\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Different thresholds and the resulting number of matches:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "60: 16906 articles (90%)\n", "70: 15934 articles (85%)\n", "80: 14406 articles (77%)\n", "90: 12248 articles (65%)\n", "100: 9594 articles (51%)\n" ] } ], "source": [ "scores = df.score\n", "ts = [60, 70, 80, 90, 100]\n", "\n", "for t in ts:\n", " print(f\"{t}: {scores.where(scores>=t).count()} articles ({100*scores.where(scores>=t).count()//len(articles)}%)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this particular notebook, I am using the **threshold of 80** to determine which articles were found in Crossref." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Filter out DOIs that had a score lower than 80\n", "df.loc[df.score < cr_thresh, 'DOI_cr'] = None\n", "articles_with_doi = df[df.DOI_cr.notna()].index" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Articles with% (n=18708)
DOI_cr1440677.0
\n", "
" ], "text/plain": [ " Articles with % (n=18708)\n", "DOI_cr 14406 77.0" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = df[['DOI_cr']].count().to_frame('Articles with')\n", "x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n", "x.round(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Using a minimum score of 80, we have found 14,406 DOIs in Crossref which corresponds to 77% of the original dataset**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional identifiers from NCBI\n", "\n", "Using the APIs provided by the NCBI we can now also attempt to convert DOIs to pmid/pmcid." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Articles with% (n=18708)
pmid511227.3
pmcid512327.4
\n", "
" ], "text/plain": [ " Articles with % (n=18708)\n", "pmid 5112 27.3\n", "pmcid 5123 27.4" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = ncbi[['pmid', 'pmcid']].count().to_frame('Articles with')\n", "x[f'% (n={len(articles)})'] = 100 * x['Articles with'] / len(articles)\n", "x.round(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results for the NCBI API were not very great once I started to manually check several examples. Furthermore, we can only retrieve pmid/pmcids for articles that already have a DOI. These identifiers are therefore not really useful for the processing pipeline, but might be interesting to report nevertheless." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Altmetric results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use these DOIs to retrieve altmetrics for these articles." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "df2 = altmetric.reindex(articles_with_doi)\n", "df2 = df2[['altmetric_id', 'cited_by_tweeters_count', 'cited_by_fbwalls_count',\n", " 'cited_by_feeds_count', 'cited_by_msm_count', 'cited_by_wikipedia_count', 'cited_by_rdts_count']]\n", "df2.columns = [\"altmetric_id\", 'tweets', 'fb_mentions', 'blogposts', 'news', 'wikipedia', 'reddit']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coverage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First percentage only considers articles that had a DOI (n=14,406)\n", "\n", "Second percentage uses the input dataset as the denominator (n=18,708)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Articles with% (n=14406)
altmetric_id711749.4
tweets212614.8
wikipedia11718.1
fb_mentions6094.2
news5413.8
blogposts5203.6
reddit370.3
\n", "
" ], "text/plain": [ " Articles with % (n=14406)\n", "altmetric_id 7117 49.4\n", "tweets 2126 14.8\n", "wikipedia 1171 8.1\n", "fb_mentions 609 4.2\n", "news 541 3.8\n", "blogposts 520 3.6\n", "reddit 37 0.3" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = df2.count().to_frame('Articles with')\n", "x[f'% (n={len(df2)})'] = 100 * x['Articles with'] / len(df2)\n", "x.round(1).sort_values(\"Articles with\", ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **7,117 articles (50% of the articles with DOI) returned with an altmetric_id**\n", "- **Twitter: 15%, Facebook: 4%**\n", "- 8.1% for Wikipedia (is that considered high?)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Details for altmetric counts " ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
altmetric_idtweetsfb_mentionsblogpostsnewswikipediareddit
count7117.02126.0609.0520.0541.01171.037.0
mean18811159.07.94.61.65.01.41.3
std18072344.124.562.31.59.01.70.7
min101417.01.01.01.01.01.01.0
25%3311373.01.01.01.01.01.01.0
50%8181500.02.01.01.02.01.01.0
75%41322904.06.02.02.06.01.01.0
max76583497.0460.01538.018.0114.036.04.0
\n", "
" ], "text/plain": [ " altmetric_id tweets fb_mentions blogposts news wikipedia reddit\n", "count 7117.0 2126.0 609.0 520.0 541.0 1171.0 37.0\n", "mean 18811159.0 7.9 4.6 1.6 5.0 1.4 1.3\n", "std 18072344.1 24.5 62.3 1.5 9.0 1.7 0.7\n", "min 101417.0 1.0 1.0 1.0 1.0 1.0 1.0\n", "25% 3311373.0 1.0 1.0 1.0 1.0 1.0 1.0\n", "50% 8181500.0 2.0 1.0 1.0 2.0 1.0 1.0\n", "75% 41322904.0 6.0 2.0 2.0 6.0 1.0 1.0\n", "max 76583497.0 460.0 1538.0 18.0 114.0 36.0 4.0" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.describe().round(1)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.6.10 64-bit ('.venv': venv)", "language": "python", "name": "python361064bitvenvvenvcdc679201519459280111e6b577316b7" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }