{ "metadata": { "name": "Recommender Based on Authors" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Recommender Based on Authors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the git repository of this at [https://github.com/waltherg/article-recommender](https://github.com/waltherg/article-recommender).\n", "\n", "Fetch and fingerprint some article and look for articles on PubMed that are similar to this one.\n", "\n", "For simplicity, let's look for articles that were written by the same authors.\n", "\n", "Let's also assume that articles are more relevant for us if they were co-authored by a greater subset of authors of the original paper." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import urllib2\n", "import urllib\n", "from BeautifulSoup import BeautifulSoup" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "article_url = 'http://www.pnas.org/content/101/7/1822'\n", "response = urllib2.urlopen(article_url)\n", "html_response = response.read()\n", "data = BeautifulSoup(html_response)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "title = data.findAll('title')\n", "title" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 3, "text": [ "[Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems ]" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "authors = data.findAll('meta', {'name':'citation_author'})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "authors" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 5, "text": [ "[,\n", " ,\n", " ]" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "author_names = [a['content'].encode('utf8') for a in authors]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "author_names" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 7, "text": [ "['David Angeli', 'James E. Ferrell', 'Eduardo D. Sontag']" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look for papers on PubMed that the first author appears on." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pubmed_param = {'db': 'pubmed',\n", " 'usehistory': 'y',\n", " 'term': author_names[0]+' AND '+author_names[1]+'[author]'}\n", "pubmed_encoded_param = urllib.urlencode(pubmed_param)\n", "pubmed_url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'\n", "pubmed_url = pubmed_url + '?' + pubmed_encoded_param\n", "pubmed_response = urllib2.urlopen(pubmed_url)\n", "xml_response = pubmed_response.read()\n", "pubmed_data = BeautifulSoup(xml_response)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "pubmed_ids = [node.findAll(text=True) for node in pubmed_data.findAll('id')]\n", "pubmed_ids = [int(pubmed_id) for pm_list in pubmed_ids for pubmed_id in pm_list]\n", "print pubmed_ids" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[14766974]\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Let's form all combinations of authors and perform a PubMed search for each combination." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import itertools" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "author_combinations = [list(itertools.combinations(author_names, number_authors)) for number_authors in range(len(author_names)+1)]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This returns a list of lists, so let's tidy this up." ] }, { "cell_type": "code", "collapsed": false, "input": [ "author_combinations = [list(auth_comb) for auth_list in author_combinations for auth_comb in auth_list if len(auth_comb) > 0]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "author_combinations" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 13, "text": [ "[['David Angeli'],\n", " ['James E. Ferrell'],\n", " ['Eduardo D. Sontag'],\n", " ['David Angeli', 'James E. Ferrell'],\n", " ['David Angeli', 'Eduardo D. Sontag'],\n", " ['James E. Ferrell', 'Eduardo D. Sontag'],\n", " ['David Angeli', 'James E. Ferrell', 'Eduardo D. Sontag']]" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is how we can join author names with the keyword AND to form the corresponding Google Scholar search queries." ] }, { "cell_type": "code", "collapsed": false, "input": [ "' AND '.join(author_combinations[-1])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 14, "text": [ "'David Angeli AND James E. Ferrell AND Eduardo D. Sontag'" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Query PubMed for all combinations of authors and save the resultant list of titles together with the number of authors of the corresponding combination:\n", "\n", "([authors], [UIDs found on PubMed])" ] }, { "cell_type": "code", "collapsed": false, "input": [ "uid_data = []\n", "for auth_comb in author_combinations:\n", " pubmed_param = {'db': 'pubmed',\n", " 'usehistory': 'y',\n", " 'term': ' AND '.join(auth_comb)+'[author]'}\n", " pubmed_encoded_param = urllib.urlencode(pubmed_param)\n", " pubmed_url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'\n", " pubmed_url = pubmed_url + '?' + pubmed_encoded_param\n", " pubmed_response = urllib2.urlopen(pubmed_url)\n", " xml_response = pubmed_response.read()\n", " pubmed_data = BeautifulSoup(xml_response)\n", " pubmed_ids = [node.findAll(text=True) for node in pubmed_data.findAll('id')]\n", " pubmed_ids = [int(pubmed_id) for pm_list in pubmed_ids for pubmed_id in pm_list]\n", " pubmed_ids = [pmid for pmid in pubmed_ids]\n", " print auth_comb, pubmed_ids\n", " \n", " uid_data.append([auth_comb, pubmed_ids])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['David Angeli'] [19949950, 19405117, 19336287, 17869313, 20369910, 14766974]\n", "['James E. Ferrell']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [23624406, 23361981, 22726437, 22677291, 22324380, 22078888, 21855788, 21414480, 21292159, 20660152, 20028868, 20005833, 19901979, 19878681, 19589184, 19390045, 19222866, 18633407, 18599789, 18480403]\n", "['Eduardo D. Sontag']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [23729972, 23528097, 23319630, 23133355, 22773816, 21990429, 20711508, 20418962, 19405117, 19280266, 19119410, 19003437, 18987858, 18793131, 18704155, 18277378, 18193928, 18008071, 17869313, 17238280]\n", "['David Angeli', 'James E. Ferrell']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [14766974]\n", "['David Angeli', 'Eduardo D. Sontag']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [19405117, 17869313, 20369910, 14766974]\n", "['James E. Ferrell', 'Eduardo D. Sontag']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [14766974, 12629549]\n", "['David Angeli', 'James E. Ferrell', 'Eduardo D. Sontag']" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [14766974]\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the PubMed UID of the original paper is not filtered out and likely listed above. Keeping the original paper may be useful in weighting keywords below ..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = [len(t[0]) for t in uid_data if len(t[1]) > 0]\n", "y = [len(t[1]) for t in uid_data if len(t[1]) > 0]\n", "fig = plt.figure()\n", "grid(True)\n", "ax = fig.add_subplot(111)\n", "\n", "ax.set_xlabel('number of authors on paper = significance of match')\n", "ax.set_ylabel('number of papers found on PubMed')\n", "#ax.set_ylim(bottom=0)\n", "#ax.set_xlim([0,5])\n", "#ax.set_yticks([1,2,7,13,19])\n", "#ax.set_xticks([1,2,3,4,5])\n", "scatter(x,y)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 16, "text": [ "" ] }, { "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEMCAYAAADJQLEhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVOX+B/DPICIgbrlQigru7A4giiE/jFyA3Mh9ScRb\naqllXbO0XLp1bVHTlpstKibuW7mU5U3QxBXUcLu2wCTqVUAzFkGEeX5/cDmBgIeZ4cwwh8/79eLF\nzDAz5/uZM8wz53nOeY5GCCFARER1lo2lCyAiIstiQ0BEVMexISAiquPYEBAR1XFsCIiI6jg2BERE\ndZxiDUF6ejpCQkLg7e2Nrl274t133wUALFy4EC4uLtBqtdBqtdi3b59SJRARUTVolDqO4MaNG8jM\nzISXlxdyc3Ph5+eHrVu34quvvkKjRo3w4osvKrFYIiIykK1ST+zs7AxnZ2cAgJOTE3x8fHD16lUA\nAI9hIyKqPcwyRqDT6XDy5En06dMHAPDxxx/D3d0d48ePx61bt8xRAhERVUUoLCcnRwQEBIidO3cK\nIYTIzMwUer1e6PV6MX/+fDFu3LgKjwHAH/7whz/8MeLHGIo2BIWFhaJ///5i2bJllf796tWrokuX\nLhWLMjKMtViwYIGlS1AU81k3NedTczYhjP/sVKxrSAiByZMnw8PDA7NmzZJuz8jIkC5v374dnp6e\nSpVQa+l0OkuXoCjms25qzqfmbKZQbLA4MTERcXFx8PHxgVarBQD885//xIYNG5CSkoLCwkK0b98e\nq1atUqoEIiKqBsUaguDgYOj1+gq3h4eHK7VIqxEdHW3pEhTFfNZNzfnUnM0Uih1HYAqNRsNdTImI\nDGTsZyenmLCAhIQES5egKOazbmrOp+ZspmBDQERUx7FriIhIJdg1RERERmFDYAFq76dkPuum5nxq\nzmYKNgRERHUcxwiIiFSCYwRERGQUNgQWoPZ+SuazbmrOp+ZspmBDQERUx3GMgIhIJThGQERERmFD\nYAFq76dkPuum5nxqzmYKNgRERHUcxwiIiFSCYwRERGSUKhuCW7duPfCHjKf2fkrms25qzqfmbKao\n8lSVfn5+0mbG5cuX0axZMwDAH3/8gfbt2yMtLc1sRRIRkXJkxwieeeYZDB8+HP379wcA7N+/H9u2\nbcOnn36qXFEcIyAiMpixn52yDUH37t1x5swZ2dtqEhsCIiLDKTZY3KhRIyxevBg6nQ5paWl4++23\n0bhxY6OKpBJq76dkPuum5nxqzmYK2YZgx44d0Ol0iIyMxKBBg6DT6bB9+3Zz1EZERGZQ7eMIsrOz\nzbYlwK4hIiLDKdY1dPDgQXTq1Amenp4AgPPnz+OZZ54xvEIiIqqVZBuC559/HgcOHECLFi0AAJ6e\nnjhy5IjihamZ2vspmc+6qTmfmrOZQrYhEEKgXbt25W7TaDSKFUREROZV5QFlpdq2bYvExEQAQFFR\nEVauXIkOHTooXpiahYaGWroERTGfdVNzPjVnM4XsYPH169fx7LPP4t///jc0Gg0ef/xxrFy5Ei1b\ntlSuKA4WExEZTLHB4ocffhg7duxAdnY2/vzzT2zfvl3RRqAuUHs/JfNZNzXnU3M2U1TZNTRjxowq\nWxeNRoMPPvhA0cKIiMg8quwaql+/Pry8vDBy5Ei0bt0aAKRGQaPRYOLEicoVxa4hIiKDGfvZWeUW\nwX//+19s3boVW7ZsQb169TBq1CiMGDECTZs2NalQIiKqXaocI2jRogWmTZuG+Ph4xMbG4s8//4SH\nhwfWrVtnzvpUSe39lMxn3dScT83ZTCG7+2hycjI2bdqE/fv3Izw8HP7+/uaoi4iIzKTKMYLXX38d\n33zzDdzd3TF69GgMGDAA9evXr/YTp6enY9y4cfjjjz9QWFiIyZMn4+WXX8atW7cwatQo3LhxA488\n8gg2b95cobuJYwRERIar8fMR2NjYwM3NDY6OjpUuLCUl5YFPfOPGDWRmZsLLywu5ubnw8/PD1q1b\n8cUXX6Bjx4544YUXsHz5cqSlpWHFihU1Eqa2++yzz/Daa+8hOzsPQUFe2L59Ex566CFLl0VEKlHj\nDYFOp3vgA11dXQ1a0PDhwxETE4MZM2bgxIkTaN68ObKystCrVy/8+uuv5YtSYUPwxRdf4OmnZwF4\nF8AdAP9Gs2YXceuWzrKFKSAhIUHVR3Ayn/VSczZAgQPKXF1dpR9bW1ucPHkSycnJsLW1NbgR0Ol0\nOHnyJIKDg5GZmYnmzZsDKBmQzsjIMLhoazRv3jsA3gEwDYA/gF34449s7Nmzx7KFEVGdJztY/PHH\nH2Px4sUICwuDEALPP/885s6di2effbZaC8jNzcXw4cOxYsUKg85nEB0dLTU4TZs2Rffu3aWWvHTk\n35qu3779B4BOZRImAmiDX3/9tVbUV5PXS2+rLfUwH/OVXg8NDa1V9Zh6PSEhAbGxsQAM76UpS3au\noY4dOyI5OVka0L19+zb8/PyQmpoq++T37t3DE088gYEDB2LWrFnS8x0/fhwtWrRAZmYmgoKC6kTX\nUP/+Edi//x6AvQDsAPwIYAAyMy9LU3wTEZlCsbmGWrduDScnJ+l6w4YN0aZNG9knFkJg8uTJ8PDw\nkBoBAIiIiEBcXBwAIC4uDhEREQYXbY22bduE5s1/A+AMwBXAALz55jxVNgKl31jUivmsl5qzmaLK\nrqGlS5cCANzc3BAQEIChQ4cCAHbt2gVvb2/ZJ05MTERcXBx8fHyg1WoBAIsXL8aiRYswatQorF69\nGg8//DC2bNlSEzlqvcaNGyMrKxXfffcd9u7di4ULF3KPISKqFarsGlq4cKF0AhohRIXLCxYsUK4o\nFXYNEREprcZ3H7UkNgRERIZTbIygb9++FX4ee+wxo4qkEmrvp2Q+66bmfGrOZgrZ3Uffe+896XJB\nQQF27twJGxvZ9oOIiKyEUV1DvXr1wrFjx5SoBwC7hoiIjFHj5yModevWLemyXq9HUlISbty4YfCC\niIiodpLt4/Hz84O/vz/8/f3Rs2dPLF68GKtWrTJHbaql9n5K5rNuas6n5mymkN0ikJt8joiIrFuV\nYwQHDx7ElClTkJqaCk9PT8TGxsLX19c8RXGMgIjIYDW+++hzzz2HDz/8ENnZ2Zg7dy5eeOEFkwok\nIqLaqcqGoF69eujXrx/s7e0xYsSIcoPGZBq191Myn3VTcz41ZzNFlWMEOTk52LFjh7SZUfa6RqNB\nVFSU2YokIiLlVDlGEB0dLc0vBJSfbwgA1qxZo1xRHCMgIjIY5xoiIqrjFJtriGqe2vspmc+6qTmf\nmrOZgg0BEVEdx64hIiKVUGyuISEEDh48iPT0dOj1emlhTz31lOFVEhFRrSPbNTRy5EjMnTsXR48e\nRVJSEpKSknDy5Elz1KZaau+nZD7rpuZ8as5mCtktgp9++gmXLl0qt+soERGph+wYwejRo7FixQo4\nOzubqyaOERARGUGxMYLr16+ja9euCAwMRIMGDaSF7dq1y/AqiYio1pFtCBYuXAgAUtfQ/UcYk+ES\nEhIQGhpq6TIUw3zWTc351JzNFLINQWhoKK5evYqjR49Co9EgKCgIrVu3NkdtRERkBrJjBF9++SVe\nffVVhIWFAQAOHDiAxYsXY8KECcoVxTECIiKDKTbXkIeHBw4fPoyHHnoIQMk5jIODg3HhwgXjKq1O\nUWwIiIgMpuhcQ6WNAAA0a9aMH9ImUvu+zMxn3dScT83ZTCE7RhAWFoaBAwdi9OjREEJg69atePzx\nx81RGxERmYFs15Ber8emTZuQmJgIAOjTpw9GjRql6J5D7BoiIjIcz0dARFTH8XwEVkTt/ZTMZ93U\nnE/N2UzBhoCIqI5j1xARkUrU+FxD3t7eD1xYSkqKwQsjIqLap8quod27d2P37t0IDw/HoEGDsGHD\nBqxfvx6DBw9GeHi4OWtUHbX3UzKfdVNzPjVnM0WVWwSurq4AgPj4+HInovHx8UFgYKDihRERkXnI\njhG4u7tj9erVCAoKAgAcO3YMMTExnGKCiKiWUWz30dWrVyM6Ohrt27dH+/btMWnSJKxevVr2iWNi\nYuDs7FxurGHhwoVwcXGBVquFVqvFvn37DC6YiIhqlmxDEBQUhEuXLiEpKQnJycm4ePEievXqJfvE\nkyZNqvBBr9Fo8OKLL+L06dM4ffo0Bg4caHzlVkzt/ZTMZ93UnE/N2UwhO9dQXl4etm7divT0dOj1\negAlH+jz589/4OP69OkDnU5X4XZ2+RAR1S6yWwSRkZH49ttv0aBBAzg5OcHJyQkNGzY0eoEff/wx\n3N3dMX78eNy6dcvo57Fmaj9DEvNZNzXnU3M2U8huEWRlZdXY5tRzzz0nbUksXLgQM2fORFxcXKX3\njY6OlvZcatq0Kbp37y6txNJ6eJ3XeZ3X6/L1hIQExMbGAvhrT0+jCBlTpkwRZ8+elbtbpdLS0oSX\nl1elf7t69aro0qVLpX+rRllWLT4+3tIlKIr5rJua86k5mxDGf3bKbhEcPHgQq1atgpubGxo0aADA\n+COLMzIy0KpVKwDA9u3b4enpafBzEBFRzZI9jqCyAV9AfjNkzJgxOHjwILKysuDs7IxFixYhPj4e\nKSkpKCwsRPv27bFq1Sq0adOmYlE8joCIyGCKnY/g8uXLld7erl07gxdWXWwIiIgMp9gBZREREYiM\njERkZCTCwsLQoUMHzjVkotLBHrViPuum5nxqzmYK2TGCc+fOlbt+5swZfPTRR4oVRERE5mXU+Qi8\nvLwqNBA1iV1DRESGq/HzEZRaunSpdFmv1+PUqVNo0aKFwQsiIqLaSXaMICcnB7m5ucjNzUVBQQH6\n9++PvXv3mqM21VJ7PyXzWTc151NzNlPIbhEsXLgQQMkRxgC4NUBEpDKyYwSnTp3C+PHjkZubCwBo\n1KgR1q1bBz8/P+WK4hgBEZHBFDuOwN/fH++//z5CQkIAAD/++CNeeOEFJCcnG1dpdYpiQ0BEZDDF\njiMoLCyUGgGgZHrpe/fuGbwg+ova+ymZz7qpOZ+as5lCdoygdevWWLx4McaMGQMhBDZt2oRHHnnE\nHLUREZEZyHYNZWVlYe7cuUhMTARQskXw1ltvoXnz5soVxa4hIiKD1fhxBBMmTMC6desQFxeHzz77\nzKTiiIio9qpyjODEiRO4du0aVq9ejVu3blX4IeOpvZ+S+aybmvOpOZspqtwimDp1KsLCwpCamgp/\nf/9yf9NoNEhNTVW8OCIiUp7sGMHUqVOxcuVKc9UDgGMERETGUOw4AktgQ0BEZDjFjiOgmqf2fkrm\ns25qzqfmbKZgQ0BEVMfJdg3l5OTA0dER9erVw6VLl3D+/HlERkZKJ7JXpCh2DRERGUyxMQKtVotj\nx44hIyMDwcHBCAwMhK2tLTZu3Gh0sbJFsSEgIjKYomMEDRo0wM6dOzF9+nRs3boVFy5cMHhB9Be1\n91Myn3VTcz41ZzNFtRqCkydPYuPGjYiIiAAAflsnIlIR2a6hH374AcuWLUNISAjmzJkDnU6HJUuW\nKHoCe3YNEREZTpExAr1ejzlz5uC9994zqThDsSEgIjKcImMENjY2OHLkiNFFUeXU3k/JfNZNzfnU\nnM0Usucj8Pb2xrBhwxAVFQVHR0cAJa1OVFSU4sUREZHyZMcIoqOjS+6o0ZS7fc2aNcoVxa4hIiKD\nca4hIqI6TrHjCM6fP4/g4GB069YNAHDhwgUsWrTI8ApJovZ+SuazbmrOp+ZsppBtCGJiYrB06VI4\nODgAANzd3bFlyxbFCyMiIvOQ7Rry9fXFTz/9BK1Wi9OnTwMAfHx8kJKSolxR7BoiIjKYYl1DDz30\nEH799Vfp+p49exQ9cT0REZmXbEOwcuVKTJw4ERcvXkS7du2wYMECfP755+aoTbXU3k/JfNZNzfnU\nnM0UsscRdO3aFYmJibh58yaEEGjRooU56iIiIjORHSPIyMjA66+/jsOHD0Oj0SA4OBhvvPEGWrVq\npVxRHCMgIjKYYmMEw4YNQ/v27bFnzx7s2rUL7du3x7Bhw4wqkoiIah/ZhiAvLw9z586Fm5sbOnTo\ngFdffRV37tyRfeKYmBg4OzvD29tbuu3WrVvo168ffHx8MGDAANy+fdu06q2U2vspmc+6qTmfmrOZ\nQrYhCAsLw5YtW6DX66HX67Ft2zY89thjsk88adIk7Nu3r9xtCxYsQGRkJFJSUhAeHo4FCxYYXzkR\nEdUI2TECJycn3LlzBzY2JW2GXq9Hw4YNSx6s0SA7O7vKx+p0OgwaNAhnz54FAHTs2BEnTpxA8+bN\nkZWVhV69epXbNVUqimMEREQGM/azU3avodzcXKMKqkxmZqZ0DEKLFi2QkZFRY89NRETGkW0IgJIP\n8F9++QVFRUXSbSEhIYoVBZTMeurq6goAaNq0Kbp3747Q0FAAf/XzWev15cuXqyoP89Wu+piv6uul\nl2tLPTWRJzY2FgCkz0ujCBkrVqwQ7u7uokmTJiI0NFTY29uLvn37yj1MCCFEWlqa8PLykq536NBB\nZGZmCiGEyMjIEB07dqz0cdUoy6rFx8dbugRFMZ91U3M+NWcTwvjPTtnB4o8++gjJyclwdXVFfHw8\nUlJS0LRpU6ManYiICMTFxQEA4uLiEBERYdTzWLvSll2tmM+6qTmfmrOZQrZrqHHjxnBwcEBxcTEK\nCwvRuXNnXLx4UfaJx4wZg4MHDyIrKwtt27bFG2+8gUWLFmHUqFFYvXo1Hn74Yc5iSkRUC8g2BK1b\nt0Z2djaeeOIJhIWFoVmzZmjbtq3sE2/cuLHS2/fv3294lSqTkJCg6m8mzGfd1JxPzdlMIdsQ7Nq1\nCwCwePFifP/99ygoKMDAgQMVL4yIiMyjWqeqPHr0aLm5hnr16qVsUTyOgIjIYIrNNTRv3jxMmTIF\nubm5yM7OxpQpUzBv3jyjiiQiotpHdougU6dOuHDhAuzs7AAAhYWF8PDwqPSI4BorSuVbBGrvp2Q+\n66bmfGrOBii4ReDq6oqCggLpekFBAdzc3AxeEBER1U6yWwRDhgzByZMn0b9/fwAle/0EBgbCxcUF\nGo0GH3zwQc0XpfItAiIiJRj72SnbEJQevlx2IWV/T5w40eCFyhbFhoCIyGCKNQSWoPaGQO39lMxn\n3dScT83ZAAXHCIiISN24RUBEpBI1vkUwYcIEACVT0hIRkXpV2RCcOHEC165dw+rVq3Hr1q0KP2S8\nsnOiqxHzWTc151NzNlNUOdfQ1KlTERYWhtTUVPj7+5f7m0ajQWpqquLFERGR8mTHCKZOnYqVK1ea\nqx4AHCMgIjKGoruPnjhxAocOHYJGo0FISAh69OhhVJHVLooNARGRwRTbffSdd95BTEwMsrOzcfv2\nbcTExODdd981qkgqofZ+SuazbmrOp+ZsppA9H0FsbCxOnz4Ne3t7ACWzkWq1Wrz88suKF0dERMqT\n7Rpyd3fHTz/9VG72UV9f32qdrtLootg1RERkMGM/O2W3CMaPH4+AgABERUVBCIGvvvpKOsaAiIis\nX7VOTLNy5Uo4OjrCyckJK1euxNy5c81Rm2qpvZ+S+aybmvOpOZspZLcIAKB3797o3bu30rUQEZEF\ncK4hIiKV4OyjRERklAc2BMXFxQgLCzNXLXWG2vspmc+6qTmfmrOZ4oENQb169WBra4ucnBxz1UNE\nRGYmO0YwePBgnD59Gv369UPDhg1LHqTQuYqlojhGQERkMMWOI4iKikJUVBQ0Gg0ASOcqJiIidajW\nXkM5OTm4fPkyPD09zVGTarcIjhw5gjfeWIZLl/6Dp54ahVde+TscHBwsXVaNU/t5YZnPeqk5G6Dg\nXkNbt26FVqtFZGQkAODcuXPSZaq+o0ePol+/Yfjuu/7Q6WLw7runEBk50tJlERHJbxF4enoiMTER\nffv2xenTpwEAPj4+SElJUa4oFW4RREaOxDffPAZg6v9uuQdHRzckJe2Hu7u7JUsjIpVQbIvA1tYW\nTZs2LXdbUVGRwQuq665dywDgVuaW+rC1bYsbN25YqiQiIgDVaAg8PDywfv16FBUVIS0tDbNnz1b8\nxDRqNGJEOBwcVgC4CyABwEHo9b+iZ8+eli1MAWrfV5v5rJeas5lCtiH4/PPPkZycDCEEBg0aBL1e\nj08++cQctanKSy+9gL59G8LBoT0cHaeiUaMR2LFjgyoHi4nIulR7rqHMzExoNBq0aNFC6ZpUOUZQ\n6tdff8X169cREBAgneyHiKgmKHbO4sTERMTExKCgoAAA4ODggNWrVys6G6maGwIiIqUoNlg8efJk\nrFmzBr///jt+//13rFmzBjExMUYVWcrV1RU+Pj7QarUIDAw06bmskdr7KZnPuqk5n5qzmUL2yOLG\njRuX+/YfFBSEJk2amLRQjUaDhIQEPPTQQyY9DxERma7KrqHk5GQAwLp161BYWIiRI0sOftq2bRvs\n7OywbNkyoxfq5uaGpKQkNG/evPKi2DVERGSwGh8jCA0NrXR+odLL8fHxRhfboUMHNG3aFEVFRXjm\nmWcwffr08kWxISAiMliNTzqnZF/asWPH0KpVK2RmZmLgwIHo1q0bHn/88XL3iY6OhqurKwCgadOm\n6N69uzRHSGlt1np9+fLlqsrDfLWrPuar+nrp5dpST03kiY2NBQDp89IYsnsNZWVlITY2Funp6dDr\n9SUPqsFpqBcvXgwAePXVV/8qSuVbBAkqn/iK+aybmvOpORug4O6jfn5+CA0Nhbe3N2xsbKSuoYkT\nJxpV6J07dwAAjo6OyMvLQ0REBF566SUMHjz4r6JU3hAQESlBsfMR2NjYmDQwfL8bN25g6NCh0Gg0\nuHPnDkaPHl2uESAiIvOS3SJYsmQJmjVrhoiICDRo0EC6XcldP9W+RaD2zVPms25qzqfmbICCWwT2\n9vZ48cUX8cYbb8DGxkZaWGpqquFVEhFRrSO7ReDm5oaTJ0+aZY6hUmrfIiDrk52djRdeeAXbt++A\nk1MTzJkzEzNmPMvTtlKtotgWQbdu3eDk5GRUUURqMWzYeCQmNsPdu0eRnZ2BV1/9GxwcGuDpp/9m\n6dKITCY711CDBg3g7e2NZ555BjNmzMCMGTMwc+ZMc9SmWmX3ZVYjteW7evUqjhxJxN27n6Hk5EL5\nuHNnBd57b6WlS1OE2tZfWWrOZgrZLYKhQ4di6NCh5W7j5jDVJXfv3oVGYwegfplbG6GgIN9SJRHV\nqGqfj8CcOEZAtYkQAp6egbh0KQp6/WwA2XBwGIu//7033nhjvqXLI5IodkCZm5tbhduU3muIDQHV\nNjqdDsOGTcCFCynQaARGjRqLL774EPXr15d/MJGZKDZYfPLkSelyQUEBdu7ciYyMDIMXRH9R+77M\naszn6uqK06d/RGZmJpKSkhAeHm7pkhSjxvVXSs3ZTCE7WNyiRQvpx8XFBTNmzMC+ffvMURtRrdOy\nZUueZ5pUR7ZrKDk5WRoc1uv1SEpKwvvvv49Lly4pVxS7hoiIDKZY19BLL70kNQQ2NjZwcXHB9u3b\nDa+QiIhqJe41ZAFq76dkPuum5nxqzgYouEWQl5eHrVu3Ij09HUIIaRrq+fO52xwRkRrIbhGEhobC\n2dkZ/v7+qFevnnT7Sy+9pFxRKt8iICJSgmJbBFlZWTwsm4hIxWR3Hw0ODsa5c+fMUUudofaGlfms\nm5rzqTmbKWS3CA4ePIhVq1bBzc1NOjGNRqNBSkqK4sUREZHyZMcIdDpdpbe7uroqUE4JjhEQERlO\nsbmGLIENAdVW2dnZsLOzg729vaVLIarA2M9O2TECqnlq76dUY7709HT07t0fzZs/jMaNH8LUqS+g\nqKjI0mUpQo3rr5Sas5mCDQFRNURGjsSJE71RVPQH7t1bjy+/PIvFi9+zdFlENYJdQ0Qy0tLS4OkZ\nhPz8qwBKj6U5jnbtJuP337lHHdUe7BoiUkj9+vUhRBGA4jK3FqB+fTtLlURUo9gQWIDa+ynVls/F\nxQU9evSAnd10ANcArIKj4yzMmvW0pUtThNrWX1lqzmYKNgRE1fD11xsQFVUEBwdPNGkyD/Pnj8Oz\nz061dFlENYJjBEREKsExAiIiMgobAgtQez+lWvMJIfD7779j165dli5FUWpdf/n5+diwYQMKCwst\nXUqtw4aAqBp+/vlnuLsHwN09EMOHj8Xw4U+hoKDA0mVRNX344Sdo2bItJk9+ES1btsPmzVssXVKt\nwjECIhlCCHTposVvv02CEDMA3IG9/Vg8/7wv3n77H5Yuj2QcO3YMYWHDcefOAQBdACTD3r4/LlxI\ngpubm6XLq1EcIyBSyG+//YZr1zIhxEyU/Ms4oaBgAdav57m7rcHmzTtQUPA3lDQCAOAPIYbj66+/\ntmRZtQobAgtQax9sKbXla9iwIYqL8wGUdgUlALgFJ6dGlitKQWpbf40bO8HW9tb/riUAAGxtb6JR\nI3WuP2OwISCS8cgjj+Cxx8LQoMFEAOcAnIKj4wy88spzli6NqiEmZiLq198I4HMA16DRLEH9+kcw\nfPhwS5dWa3CMgKga8vLyMHfuQmzatAONGjXBvHkzMWlStKXLompKSkrCrFnzcfHieQQEBGD58rfQ\nrVs3S5dV43g+AiKiOo6DxVZEbX2w92M+66bmfGrOZgqLNAT79u2Dt7c3PDw88M4771iiBIs6c+aM\npUtQFPNZNzXnU3M2U5i9Ibh79y6mTZuGffv2ISUlBdu2bcPp06fNXYZF3b5929IlKIr5rJua86k5\nmynM3hAcP34cnp6eaNOmDWxtbTFq1Cjs3bvX3GUQEdH/mL0huHLlCtq2bStdd3FxwZUrV8xdhkXp\ndDpLl6Ao5rNuas6n5mymsDX3AjUaTY3ez1qtXbvW0iUoivmsm5rzqTmbsczeELi4uCA9PV26np6e\nXm4LAQB3HSUiMiOzdw316NED586dw9WrV3Hv3j1s2bIF4eHh5i6DiIj+x+xbBPb29vjkk08wYMAA\n6PV6TJgwAX5+fuYug4iI/scixxGEh4fj3LlzWLZsGTZs2FDl8QSxsbFo2bIltFottFotVq9ebYFq\njRMTEwNnZ2d4e3tXeZ+ZM2fC09MTfn5+VrcLrVy+hIQENGnSRFp3b775ppkrNF56ejpCQkLg7e2N\nrl274t3xAlohAAAPcklEQVR33630fta6/qqTz5rXX0FBAXr06AGtVosuXbpg1qxZFe5z9+5djBo1\nCt7e3nj00Ufx+++/W6BS41Qnn8GfncJCCgoKhKurq7hy5Yq4d++eCAgIEKdOnSp3n9jYWDFjxgwL\nVWiaQ4cOiVOnTgkvL69K/75t2zYxZMgQIYQQp06dEr6+vuYsz2Ry+eLj48WgQYPMXFXNuH79ujh7\n9qwQQoicnBzRuXNncebMmXL3seb1V5181rz+hBDizp07Qggh7t27J3r27CkOHDhQ7u9LliwRzz//\nvBBCiJ07d4rBgwebvUZTyOUz9LPTYlNMVOd4AiGE1Q4c9+nTB82aNavy79988w0mTJgAANBqtSgq\nKrKq3Wjl8gHWO+jv7OwMLy8vAICTkxN8fHxw7dq1cvex5vVXnXyA9a4/AHBwcAAAFBYWori4GM7O\nzuX+Xnb9DR48GEeOHLGqvHL5DP3stFhDUJ3jCTQaDXbs2AFPT08MHjzYqjbf5Kj9eAqNRoOjR4/C\n29sbYWFh+OmnnyxdklF0Oh1OnjyJ4ODgcrerZf1Vlc/a159er0f37t3h7OyMvn37wsPDo9zfy64/\nGxsbNG/eHBkZGZYo1Shy+Qz97LRYQ1Cd4wRKA5w/fx5DhgzBuHHjzFCZ+dzfYqvp2Al/f39cuXIF\nZ8+exZw5czB06FBLl2Sw3NxcjBgxAitWrKj0JCbWvv4elM/a15+NjQ3OnDmDK1eu4NChQ6qbbE4u\nn6GfnRZrCKpzPEGzZs1ga1uyY9PkyZOt7lvJg9yf/8qVK3BxcbFgRTXLyckJ9vb2AID+/fvDzs4O\n169ft3BV1Xfv3j08+eSTGDt2bKUfgta+/uTyWfv6K9WkSRNERkbi2LFj5W53cXHB5cuXAZR8u755\n8yZatmxpiRJNUlU+Qz87LdYQVOd4gszMTOny7t270blzZ3OXqZiIiAisX78eAHDq1CnUq1cPbdq0\nsXBVNScrK0u6nJycjLy8PLRq1cqCFVWfEAKTJ0+Gh4dHpXtkANa9/qqTz5rX382bN5GTkwMAyM/P\nx/79+yvs3RYREYG4uDgAwNdff42goCDY2FjHrPzVyWfoZ6fZjyMoVdXxBAsWLEBAQAAGDRqEpUuX\n4ptvvkFxcTGaNWuGdevWWapcg40ZMwYHDx5EVlYW2rZti0WLFuHevXsAgClTpuDJJ59EfHw8PD09\n0aBBA6xZs8bCFRtGLt/GjRvx2WefAQDs7OywYcMGq/lHS0xMRFxcHHx8fKDVagEA//znP6VvkNa+\n/qqTz5rX37Vr1/DUU09BCIGCggKMHTsWkZGR5T5bpk+fjgkTJsDb2xuNGjXChg0bLF12tVUnn6Gf\nnbXyDGVERGQ+1tHEExGRYtgQEBHVcWwIiIjqODYERER1HBuCWiA0NBTJycmKL+f9999H165dpUPr\nTbF8+XLk5+dL152cnEx+Tnqwp59+GhcvXjTqsdeuXcOIESOk6yNHjoSXlxeWL1+OBQsW4Icffqip\nMhVTk+/fsr7++mvZ1zUhIQGDBg2q0eXWJhbbfZT+YsoRqcXFxahXr1617vvZZ58hPj4erVu3Nnp5\npVasWIEJEyZIc56YkkGv11vNronVpUSmzz//3OjHtm7dGlu3bgUAXL9+HadPn8Yvv/xSU6WZRU2+\nf8vauXMnBg0aBHd39xp9Xmuirv8+Bel0Ori7u2Pq1Knw8vJCaGgo8vLyAJT/Rp+VlQU3NzcAJVPB\nDh06FOHh4XBzc8NHH32EJUuWICAgAH5+fuUO2lm3bh0CAwPRrVs3JCYmAiiZAmDMmDHw9fWFp6en\n9I8cGxuLwYMHY8CAAejfv3+FWt966y24u7vD3d1dmt576tSpSE1NxcCBA7F8+fIK2fr06QOtVgsv\nLy8cPHgQQMVvQdOnT8fatWvx4Ycf4tq1a+jbty/CwsKkv7/22mvStLf//e9/AQC//fYbevfuDV9f\nXwQHB0vnjI2OjsbUqVPx6KOPYs6cOfjhhx+kx2q1WumAGblcD1ovZUVHR2PatGno1asXOnbsiJ07\nd8pmDwkJweDBg9G1a1dMmjRJmlJi165d8Pf3h7e3N4YMGSLV6urqildeeQU9e/bE9u3bK9RQXdnZ\n2YiIiICvry+8vb2xZcsWAOXfZ//617/QqVMnPProo3j66acxY8YMKefzzz+PkJAQtGvXTto/XqfT\nSQcd9e/fH1evXoVWq8Xhw4cRHR0t1ZuYmIiAgAB0794dPXr0QF5e3gNfo9DQUIwePRpdunTBiBEj\npNeosucpKirC9OnT4evrC3d3d3zwwQeV5jf0/Vvd/7NPP/0UgYGB8PT0xKBBg5Cbm4sjR45g9+7d\nmD17NrRaLdLS0nDx4kUEBwfD19cXWq0Wqamp0Gg0yM3NrTSrKpgwE2qdkpaWJmxtbaXpe0eOHCnW\nrFkjhBAiNDRUJCcnCyGEyMzMFK6urkIIIdasWSM6deok8vPzRWZmpmjcuLH44osvhBBCzJo1S7z3\n3ntCCCH+7//+T0ybNk0IIURiYqLo0qWLdJ+4uDghhBB//PGH6Nixo8jOzhZr1qwRLi4uIjs7u0Kd\niYmJwtvbW9y9e1fk5+cLT09Pcfz4cSGEEK6uruLmzZsVHpOfny8KCwuFEEL8/PPPwtvbWwhRMhXx\nE088Id1v+vTpYu3atZU+l0ajEd9++60QQoiXX35ZLFiwQAghRL9+/cSGDRuEEEKsXbtWDBw4UAgh\nxMSJE8XQoUOlx0dERIgTJ04IIUqmKC8qKpLNdezYsQeul7Kio6NFZGSkEKJkXbZq1Urk5+c/MLu9\nvb24fPmy0Ov1YsCAAWLDhg3i+vXrIigoSJoG+O233xbz5s2TXpNly5ZVWLYQQqxfv1507969ws+I\nESMq3Hfz5s3S+0GIkqmihfjrfabT6UTbtm1Fdna2KC4uFiEhIdKUwxMnThSjR48WQghx4cIF0b59\neylz6ZThOp2u3PTh0dHRYvv27aKgoEC0adNGmpL6zp07oqio6IGvUZMmTcT169eFXq8XQUFBIiEh\nocrnWbFihXjzzTeldezn5yd+/vln2fUs9/6t7v/Zn3/+KT3mtddeE0uWLCmXv5SPj4/Ys2ePEEKI\noqIicefOnUqzxsfHV6jFWrFryABubm7S9L3+/v7l5pqpSt++fWFvbw97e3s0bdoUERERAABvb2+c\nOXMGQEm3ysiRIwEAvXv3RkFBATIzM/H9999j//79WLJkCQCgqKgIly9fhkajQb9+/SqdCO3w4cOI\nioqCnZ0dACAqKgqHDh1CYGBglTXm5eXh2Wefxblz52BnZ4eff/7ZgFelhJ2dHQYOHAig5LX57rvv\nAABHjx7Ft99+C6DkaOTp06dLmaOioqTHh4SEYObMmRgzZgyGDRtWYd6pynL9+OOPGDFiRLXXy/Dh\nwwGUfHPv1q0bzp49iw4dOlSZPTAwUKpj1KhROHz4MOzs7PDLL7+gd+/eAEqmAe7Zs2eFZdxv7Nix\nGDt2rOzrCJRMa/3qq6/ilVdeQUREBEJCQqS/CSFw/PhxhIWFSet/+PDhUt0ajQaDBw8GALi7u5fb\n6iz7HJXdlpKSAldXV/j6+gL4a6rj27dvP/A1Kp0CuXv37rh8+TIcHR0rfZ7vv/8ev/zyC7Zt2wag\nZMsnNTW13PQHxrx/NRpNtf7Pjh8/jtdffx35+fnIycnB448/XuE1yczMxM2bNxEZGQkAqFevnlT/\n/Vmr8/9vLdgQGKBBgwbS5Xr16klvHhsbG+j1egAlZw+q6jE2NjbS9bKPqUxpn/uuXbukrqZSSUlJ\naNiwYZWPK/uPLoSQ7b9funQpXF1dsXnzZhQXF0uTjd1fY9nB4fvVr19fulz2cQ9atqOjo3R5zpw5\neOKJJ/DNN98gODgY33//Pbp27VqtXPevlwe9rmVpNJoqs99fe+nyhBAIDw/Hl19+WelzVrVe1q9f\nLzXoZXXq1Enq8ivVuXNnJCcnY+/evViwYAH69u2L+fPnl6vr/teirNIP0fszyKnqvg96jSp77R+0\nzJUrV6Jv374PrMHQ9+/9dZT9P9NoNNL7YeLEidi/fz88PT2xdu3acjN2GroMQ95n1oBjBCYofcO6\nuLggKSkJAKS+5+o+tvRy6beko0ePwsHBAS1atMCAAQPwr3/9S7rfuXPnKjz2fsHBwfjqq69QWFiI\ngoICfPXVV+W+UVamoKBA+qazYcMGFBcXS7nOnz+PwsJC5OTk4MCBA9JjHBwcKu2Lv1/v3r2lPu5N\nmzahT58+ld5Pp9PB09MTs2fPRmBgIM6fP1+tXA96LcoSQkj94Glpabh06RK8vLyqzA4AJ06cQHp6\nOoQQ2Lp1K4KDg9GnTx/Ex8dL8/IUFBTgt99+k13+uHHjcPr06Qo/9zcCQMlgrqOjI8aNG4eXXnpJ\nem8BJR9YPXv2RHx8PHJyclBcXIwdO3aYPAW2RqOBj48PdDqd9A06Ly8PxcXFD3yNDHmeAQMG4NNP\nP5U+QNPS0ip8uTDm/Vvd90BhYSFatWqF4uJirF+/XnrNyr6XW7ZsiZYtW2LPnj0ASmZpfdAXILXg\nFoEB7v9nK70+e/ZsPPnkk1i1ahUGDhwo3a7RaMo95v7LZe9nZ2eHnj174s8//5TOL/qPf/wD06ZN\ng4eHB2xtbdG2bVvs3bu3wvOWFRQUhFGjRkmb5ZMmTUKPHj0qrb/UtGnTMHjwYKxfvx79+vWTdgXt\n0KEDhgwZgm7duqFr167w8/OTHjN58mT07dsX7du3xw8//FBlto8//hhPPfUUFi9ejMaNG0szPt5f\nz5IlS3Do0CFoNBp4eHhIm+ZyuXQ6XZXr5f7bXFxcEBQUhIyMDHzyySewt7evMjtQMkPu9OnT8Z//\n/AePPvooRo8eDaBk75XS7he9Xo+33noLHTt2rPS1NUZKSgr+/ve/w9bWFra2tvjoo4/K/b1du3aY\nPXs2unfvjocffhidO3eWui/uz1+dy6Xs7OywefNmxMTEQK/Xw97eHgcOHHjga2TI8zz33HNSg29n\nZ4dmzZph165d5Wo35v1b3f+zRYsWwd/fHy4uLggICEBubi6Akm6/v/3tb3j//fexbds2bNy4EZMn\nT8bcuXNRv359bNu2rdL/OWs7/8SDcNI5qhMmTZqEQYMGlRuXeJCEhAQsXboUu3fvVrgy4+Tn58PB\nwQFFRUWIiorC+PHjpXEmIkOxa4ioEg/a6qoNXn/9dfj5+aFLly5o3bp1uYPFiAzFLQIiojqOWwRE\nRHUcGwIiojqODQERUR3HhoCIqI5jQ0BEVMexISAiquP+H3/ZER9Q7DhkAAAAAElFTkSuQmCC\n" } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "abstracts = []\n", "for el in uid_data:\n", " auth_comb = el[0]\n", " uids = el[1]\n", " if len(uids) > 0:\n", " pubmed_param = {'db': 'pubmed',\n", " 'usehistory': 'y',\n", " 'id': ','.join([str(uid) for uid in uids]),\n", " 'retmode': 'xml'}\n", " pubmed_encoded_param = urllib.urlencode(pubmed_param)\n", " pubmed_url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi'\n", " pubmed_url = pubmed_url + '?' + pubmed_encoded_param\n", " pubmed_response = urllib2.urlopen(pubmed_url)\n", " xml_response = pubmed_response.read()\n", " pubmed_data = BeautifulSoup(xml_response)\n", " pubmed_abstracts = [node.findAll(text=True) for node in pubmed_data.findAll('abstracttext')]\n", " for abstract in pubmed_abstracts:\n", " abstracts.append((auth_comb, abstract[0]))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "len(abstracts)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 18, "text": [ "49" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "abstracts[0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 19, "text": [ "(['David Angeli'],\n", " u'This paper derives new results for certain classes of chemical reaction networks, linking structural to dynamical properties. In particular, it investigates their monotonicity and convergence under the assumption that the rates of the reactions are monotone functions of the concentrations of their reactants. This is satisfied for, yet not restricted to, the most common choices of the reaction kinetics such as mass action, Michaelis-Menten and Hill kinetics. The key idea is to find an alternative representation under which the resulting system is monotone. As a simple example, the paper shows that a phosphorylation/dephosphorylation process, which is involved in many signaling cascades, has a global stability property. We also provide a global stability result for a more complicated example that describes a regulatory pathway of a prevalent signal transduction module, the MAPK cascade.')" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "abstracts[1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 20, "text": [ "(['David Angeli'],\n", " u'Certain mass-action kinetics models of biochemical reaction networks, although described by nonlinear differential equations, may be partially viewed as state-dependent linear time-varying systems, which in turn may be modeled by convex compact valued positive linear differential inclusions. A result is provided on asymptotic stability of such inclusions, and applied to a ubiquitous biochemical reaction network with inflows and outflows, known as the futile cycle. We also provide a characterization of exponential stability of general homogeneous switched systems which is not only of interest in itself, but also plays a role in the analysis of the futile cycle.')" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "from topia.termextract import extract" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "extractor = extract.TermExtractor()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "from operator import itemgetter" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "keywords_weighted = {}\n", "for abstract in abstracts:\n", " \n", " auth_comb = abstract[0]\n", " \n", " keywords = sorted(extractor(abstract[1]), key=itemgetter(2), reverse=True)\n", " \n", " keywords_filtered = []\n", " for keyword in keywords:\n", " include = True\n", " for el in [')', '(', 'i.e', 'e.g', '.', ',','/','\\\\','*',';','&']:\n", " if el in keyword[0]:\n", " include = False\n", " if include:\n", " keywords_filtered.append(keyword)\n", " \n", " for keyword in keywords_filtered:\n", " if keyword[0] not in keywords_weighted.keys():\n", " keywords_weighted[keyword[0]] = keyword[2]+len(auth_comb)\n", " else:\n", " keywords_weighted[keyword[0]] =keywords_weighted[keyword[0]]+keyword[2]+len(auth_comb)\n", " " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "[key for key in keywords_weighted.keys() if keywords_weighted[key] > 20]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 25, "text": [ "[u'signal transduction networks',\n", " u'five-variable mitogen-activated protein kinase cascade',\n", " u'feedback',\n", " u'bistable systems',\n", " u'system',\n", " u'feedback systems',\n", " u'reaction networks']" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "[key for key in keywords_weighted.keys() if keywords_weighted[key] > 10 and keywords_weighted[key] <= 20]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 26, "text": [ "[u'method',\n", " u'feedback-blocked system',\n", " u'feedback loop',\n", " u'stability behavior',\n", " u'Certain mass-action kinetics models',\n", " u'oscillatory responses',\n", " u'cell',\n", " u'bifurcation diagrams',\n", " u'stability properties',\n", " u'feedback loops',\n", " u'cell cycle',\n", " u'feedback strengths']" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "[key for key in keywords_weighted.keys() if keywords_weighted[key] > 5 and keywords_weighted[key] <= 10]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 27, "text": [ "[u'mitotic events',\n", " u'small-gain theorem',\n", " u'clock-like oscillations',\n", " u'20- h cell cycle period',\n", " u'oscillator',\n", " u'Cdc 2-cyclin B functions',\n", " u'DNA replication',\n", " u'checkable conditions',\n", " u'species',\n", " u'feedback oscillator',\n", " u'parameter',\n", " u'response',\n", " u'1',\n", " u'control theory',\n", " u'chemical reactions',\n", " u'network',\n", " u'Lotka-Volterra systems',\n", " u'mitotic Cdc 2-cyclin B',\n", " u'convergence result',\n", " u'MDA-MB -231 metastatic breast cancer cells',\n", " u'B',\n", " u'cyclin',\n", " u'predator-prey systems',\n", " u'systems biology',\n", " u'Xenopus laevis oocyte maturation network',\n", " u'Cdk 1-cyclin B 1 translocation',\n", " u'reaction',\n", " u'Cdc 2 activation exhibits hysteresis',\n", " u'double-negative feedback loops',\n", " u'loop',\n", " u'population models',\n", " u'equilibrium points',\n", " u'HeLa cells',\n", " u'control systems',\n", " u'response functions',\n", " u'Cdk 1-cyclin B 1 redistribution',\n", " u'input-output properties',\n", " u'non-extinction property',\n", " u'chemical species',\n", " u'segment polarity gene network',\n", " u'reaction network',\n", " u'paper deals',\n", " u'time-varying systems',\n", " u'predator-prey interactions',\n", " u'saddle-node bifurcation']" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "[key for key in keywords_weighted.keys() if keywords_weighted[key] <= 5]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 28, "text": [ "[u'readout neurons',\n", " u'core design',\n", " u'mitosis work',\n", " u'cells act',\n", " u'United States hope',\n", " u'intracellular protein',\n", " u'European Commission',\n", " u'computation',\n", " u'wiring diagrams',\n", " u'DNA topoisomerase II',\n", " u'responsiveness',\n", " u'time-lapse epifluorescence microscopy',\n", " u'response threshold',\n", " u'stability test',\n", " u'unit length',\n", " u'acidic residues',\n", " u'cell-cell competition',\n", " u'Cdk',\n", " u'Monotone subsystems',\n", " u'drug combinations',\n", " u'Computational studies',\n", " u'activation',\n", " u'Wee 1 knockdown cells',\n", " u'2 n',\n", " u'target mRNA',\n", " u'cyclin B 1 phosphorylation',\n", " u'Computational modeling',\n", " u'Cdc 25C',\n", " u'level',\n", " u'phosphorylation',\n", " u'progesterone',\n", " u'tubule fluid flow',\n", " u'Turing phenomenon',\n", " u'25C',\n", " u'genomics approach',\n", " u'B 2',\n", " u'Previous work',\n", " u'hysteretic switch',\n", " u'candidate therapies',\n", " u'non-degradable cyclin B',\n", " u'peritubular',\n", " u'residue',\n", " u'600 mRNAs',\n", " u'cascade model',\n", " u'mitotic arrest',\n", " u'Mechanistic studies',\n", " u'microRNA regulation',\n", " u'cell-fate induction',\n", " u'HeLa cell cycle',\n", " u'tridiagonal systems',\n", " u'response dynamics',\n", " u'ultrasensitive functions',\n", " u'Positive-plus-negative oscillators',\n", " u'component',\n", " u'cyclin B 1-Cdk',\n", " u'multidrug combination',\n", " u'3 D collagen matrix',\n", " u'mRNA',\n", " u'interconnection structure',\n", " u'tasks--to transcriptional',\n", " u'Key features',\n", " u'microcircuit models',\n", " u'change',\n", " u'glomerulotubular balance',\n", " u'Cdc 2 regulation',\n", " u'inhibitor',\n", " u'Ser 287 dephosphorylation',\n", " u'Pareto sense',\n", " u'progesterone threshold',\n", " u'gene',\n", " u'cell-fate decisions',\n", " u'retroactivity',\n", " u'ribosome density',\n", " u'term',\n", " u'GSK -3beta',\n", " u'parameter ranges',\n", " u'scale invariance',\n", " u'General results',\n", " u'studies support',\n", " u'translational',\n", " u'Cdc 2 activates',\n", " u'landscape',\n", " u'use',\n", " u'transcriptional components',\n", " u'multisite phosphorylation-dephosphorylation cycle',\n", " u'switch-like character',\n", " u'luminal membrane solute entry',\n", " u'memory',\n", " u'Molecular Cell',\n", " u'phosphorylation sites',\n", " u'latter design',\n", " u'control parameters',\n", " u'mitotic state',\n", " u'Staphylococcus aureus',\n", " u'glycogen synthase kinase -3beta',\n", " u'Xenopus oocyte maturation',\n", " u'miR -124 transfection',\n", " u'abundance',\n", " u'tubule',\n", " u'translation profiles',\n", " u'Cdc 25',\n", " u'phosphorylated state',\n", " u'pitchfork bifurcation',\n", " u'control',\n", " u'luminal flow',\n", " u'display modularity',\n", " u'ligand-receptor interactions',\n", " u'process',\n", " u'switch-like oscillations',\n", " u'Positive feedback loops',\n", " u'German Federal Ministry',\n", " u'cell cycle oscillator',\n", " u'quot',\n", " u'delay',\n", " u'progesterone responsiveness',\n", " u'kidney function',\n", " u'pathway',\n", " u'translocation',\n", " u'profile',\n", " u'double-negative feedback loop',\n", " u'circuit',\n", " u'bistable spatiotemporal switches',\n", " u'reconstituted ultrasensitivity',\n", " u'Cdc 25C ultrasensitivity',\n", " u'cell-cell competition process',\n", " u'feedback architecture',\n", " u'reductionistic systems biology',\n", " u'cyclin B 1 abundance',\n", " u'findings show',\n", " u'cancer systems biology',\n", " u'scale-invariant behavior',\n", " u'diffusive instabilities',\n", " u'additive compound matrices',\n", " u'testable hypothesis',\n", " u'input-output systems',\n", " u'non-intuitive changes',\n", " u'Argonaute proteins',\n", " u'stability',\n", " u'basement membrane',\n", " u'cyclin degradation',\n", " u'dissipativity matrix',\n", " u'activator Cdc 25',\n", " u'Cdk 1 activity',\n", " u'metastatic cancer cells',\n", " u'systems theory',\n", " u'systems biologists',\n", " u'chromatin',\n", " u'CDK 1AF',\n", " u'combination',\n", " u'cell-fate commitment',\n", " u'progesterone-induced meiotic entry',\n", " u'resets Cdc 2',\n", " u'live-cell microscopy',\n", " u'front',\n", " u'time-varying inputs',\n", " u'polypeptide elongation',\n", " u'eukaryotic cell cycle',\n", " u'activates proteins',\n", " u'APC',\n", " u'Saccharomyces cerevisiae',\n", " u'energy',\n", " u'acid side chains',\n", " u'activator-repressor oscillators',\n", " u'Western analysis',\n", " u'frequency',\n", " u'paper addresses',\n", " u'binding sites',\n", " u'Michaelis-Menten quasi-steady state conditions',\n", " u'balance',\n", " u'control component',\n", " u'ULFO results',\n", " u'space',\n", " u'cytoskeletal effects',\n", " u'response time',\n", " u'parabolic arcs',\n", " u'induction',\n", " u'interdisciplinary approach',\n", " u'theory',\n", " u'rate-balance formalism',\n", " u'feedback control',\n", " u'cyclin B 1 accumulation',\n", " u'interphase state',\n", " u'possibility',\n", " u'multidrug response',\n", " u'histone H 3 phosphorylation',\n", " u'mitotic oscillator',\n", " u'insulation property',\n", " u'oscillator circuit',\n", " u'exit M phase-like states',\n", " u'1-cyclin',\n", " u'iterative stability analysis',\n", " u'protein function',\n", " u'phosphorylation-dephosphorylation cycle',\n", " u'miR -124-mediated regulation',\n", " u'near-constant amplitude',\n", " u'insulation',\n", " u'MAPK cascade',\n", " u'regulation',\n", " u'2',\n", " u'chromatin condensation',\n", " u'period orbits',\n", " u'modeling strategy',\n", " u'translation',\n", " u'GSK -3beta engages',\n", " u'valley',\n", " u'target',\n", " u'kinase cyclin B-Cdk 1',\n", " u'saddle-node bifurcations',\n", " u'ridges right',\n", " u'chemical biology approaches',\n", " u'tissue culture cells',\n", " u'Parameter modulation',\n", " u'cancer researchers',\n", " u'compartmental model',\n", " u'bifurcating valleys',\n", " u'non-saturating sensitivity range',\n", " u'miR',\n", " u'Lateral inhibition',\n", " u'collagen matrix',\n", " u'extracellular matrix',\n", " u'competition effects',\n", " u'mechanism',\n", " u'ribosome occupancy',\n", " u'reaction kinetics',\n", " u'pre-stimulus level',\n", " u'interconnection',\n", " u'tyrosine phosphorylation sites',\n", " u'stability analysis',\n", " u'cell fate switch',\n", " u'Cdc 2 activation',\n", " u'interconnection terms',\n", " u'stimulus triggers',\n", " u'energy consumption',\n", " u'Wee',\n", " u'show',\n", " u'Cdk 1-cyclin B 1',\n", " u'interspace pressure act',\n", " u'analog computation',\n", " u'threshold',\n", " u'cyclin B 1',\n", " u'parameter modulation',\n", " u'kinase',\n", " u'case study',\n", " u'\\u223c 15\\u2009 min',\n", " u'systems-level logic',\n", " u'modulating progesterone responsiveness',\n", " u'oocyte maturation',\n", " u'-124',\n", " u'secant criterion',\n", " u'marker beads',\n", " u'peritubular membrane solute exit',\n", " u'mitogen-activated protein kinase',\n", " u'S-phase completion',\n", " u'positive-plus-negative feedback',\n", " u'bifurcation',\n", " u'background signal level',\n", " u'reference condition',\n", " u'G 1 phase',\n", " u'tubule epithelium',\n", " u'1-Cdk',\n", " u'diffusion terms',\n", " u'Boolean models',\n", " u'coding sequence',\n", " u'input-output characteristics',\n", " u'cyclin B knockdown',\n", " u'Cdc 2-cyclin B',\n", " u'passivity properties',\n", " u'parameter variation',\n", " u'miR -124 targets',\n", " u'Xenopus laevis',\n", " u'state',\n", " u'constituent properties',\n", " u'Hill kinetics',\n", " u'membrane',\n", " u'approach',\n", " u'loops exhibit ultrasensitive responses',\n", " u'noise-filtering capabilities',\n", " u'covalent cycles',\n", " u'cell volume',\n", " u'region',\n", " u'drug',\n", " u'feedback loop functions',\n", " u'glucose gradient',\n", " u'gene expression posttranscriptionally',\n", " u'robustness property',\n", " u'load',\n", " u'cell fluorescence microscopy',\n", " u'cancer',\n", " u'mRNA levels',\n", " u'mRNA decay',\n", " u'cell membrane',\n", " u'load-induced modulation',\n", " u'order neurons',\n", " u'findings offer',\n", " u'equation systems',\n", " u'invasion front',\n", " u'computer simulations',\n", " u'miRNA-mediated regulation',\n", " u'Complex networks',\n", " u'transcriptional networks',\n", " u'Waddington',\n", " u'Escherichia coli',\n", " u'feedback term',\n", " u'HEK 293T cells',\n", " u'strain field',\n", " u'EU-US workshop',\n", " u'use drug combinations',\n", " u'alternative representation',\n", " u'miRNA effector complexes',\n", " u'bistable kinase network',\n", " u'case',\n", " u'response times',\n", " u'core components',\n", " u'dephosphorylation reactions',\n", " u'Xenopus oocytes',\n", " u'CDK',\n", " u'chemical reaction networks',\n", " u'behavior',\n", " u'loop system',\n", " u'property',\n", " u'neuron',\n", " u'Goldbeter model',\n", " u'Xenopus extracts',\n", " u'protein',\n", " u'contraction theory',\n", " u'translation initiation',\n", " u'ribosome',\n", " u'paper studies',\n", " u'cyclin B 1-Cdk mediates',\n", " u'uniform linearizations',\n", " u'3-6 h',\n", " u'signal transduction module',\n", " u'protein synthesis inhibitors',\n", " u'transcription factor',\n", " u'polypeptide antibiotics',\n", " u'mass action',\n", " u'circadian rhythm',\n", " u'effect',\n", " u'12 proteins',\n", " u'front cell leadership',\n", " u'nonlinear network',\n", " u'invariance',\n", " u'cycle',\n", " u'Cdc',\n", " u'tunable frequency',\n", " u'70 h',\n", " u'candidate mitotic cyclins',\n", " u'model',\n", " u'stability result',\n", " u'cell-fate',\n", " u'nonlinear systems',\n", " u'Xenopus embryos',\n", " u'Marc Kirschner',\n", " u'network model',\n", " u'Cdc 25C activation',\n", " u'translation rate',\n", " u'alternative ways',\n", " u'plug-and-play interconnection architecture',\n", " u'n',\n", " u'limit cycle',\n", " u'systems biology approaches',\n", " u'equation models',\n", " u'sucrose gradients',\n", " u'bistable mitotic',\n", " u'fate',\n", " u'time scale',\n", " u'valley reversibly splitting',\n", " u'phosphorylation site',\n", " u'covalent modification cycle',\n", " u'-3beta',\n", " u'bistable',\n", " u'signal',\n", " u'Biological signal transduction networks',\n", " u'flux',\n", " u'nonfading memory',\n", " u'cell fate',\n", " u'luminal',\n", " u'pattern maintenance',\n", " u'mRNA abundance',\n", " u'cell cycle regulation',\n", " u'scale',\n", " u'National Cancer Institute',\n", " u'cell front',\n", " u'polypeptide degradation',\n", " u'drug pairs',\n", " u'mitotic',\n", " u'growth response',\n", " u'biology',\n", " u'noise',\n", " u'bistable triggers',\n", " u'2 knockdowns',\n", " u'ODE models',\n", " u'CDK 1AF cells',\n", " u'protein abundance',\n", " u'tubule epithelial cells',\n", " u'cyclins B 1',\n", " u'hysteretic response',\n", " u'stability property',\n", " u'modules--semiindependent collections',\n", " u'cell pH',\n", " u'Na',\n", " u'Myt 1',\n", " u'step increase',\n", " u'Uri Alon',\n", " u'input streams',\n", " u'ridge',\n", " u'mechanism-independent method',\n", " u'ODE',\n", " u'synthesis',\n", " u'multisite phosphorylation',\n", " u'transcriptional systems',\n", " u'siRNA-resistant form',\n", " u'volume',\n", " u'knockdown',\n", " u'miRNA',\n", " u'translational inhibition',\n", " u'inhibitors Wee 1',\n", " u'cell cycles',\n", " u'CDK 1 oscillations',\n", " u'2 knockdown cells',\n", " u'robustness',\n", " u'ODE model',\n", " u'Cdc 2',\n", " u'Poincar \\xe9-Bendixson property',\n", " u'confocal imaging',\n", " u'cyclic networks',\n", " u'ribosome drop-off',\n", " u'orbit',\n", " u'Hill coefficients',\n", " u'Riccati equation',\n", " u'Cdk 1',\n", " u'non-zero output impedance',\n", " u'Cdk 1-APC system',\n", " u'cyclin B 1-CDK',\n", " u'Protein phosphorylation',\n", " u'promoter-binding sites',\n", " u'cyclin B degradation',\n", " u'network behaviors',\n", " u'test',\n", " u'spike trains',\n", " u'picture',\n", " u'salt bridges',\n", " u'interconnection matrix',\n", " u'meiotic kinase network',\n", " u'submaximal rates',\n", " u'cyclin B 1-',\n", " u'peritubular cell membranes',\n", " u'feedback mechanism',\n", " u'2-cyclin',\n", " u'transporter',\n", " u'structure',\n", " u'matrix',\n", " u'parameter space',\n", " u'condensation',\n", " u'translational response',\n", " u'biologists need',\n", " u'cell cycle progression',\n", " u'Wee 1A',\n", " u'time',\n", " u'translational rate',\n", " u'mRNA targets',\n", " u'Boolean representation']" ] } ], "prompt_number": 28 } ], "metadata": {} } ] }