{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Pathfinding\n", "\n", "Different ontologies exhibit different degrees of latticeyness. Highly latticed ontologies will have a combinatorial expolosion of paths to a root node.\n", "\n", "This notebook has an analysis of path counts for the HPO\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "## We use a Factory object in the ontobio library\n", "from ontobio import OntologyFactory" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "## Get the HPO using default method (currently OntoBee SPARQL)\n", "## This may take 5-10s the first time you run it; afterwards it is cached\n", "ofa = OntologyFactory()\n", "ont = ofa.create('hp')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## The OWL version of HPO (used here) has many interesting relationship types;\n", "## for now we just care about is-a (subClassOf between named classes)\n", "ont = ont.subontology(relations='subClassOf')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'HP:0000118'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Get the root of the abnormality subset\n", "[root] = ont.search('Phenotypic abnormality')\n", "root" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'HP:0040024'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Arbitrary term\n", "[t] = ont.search('Clinodactyly of the 3rd finger')\n", "t" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## We use the standard python networkx library for pathfinding here\n", "## This is easily extracted from an ontology object\n", "from networkx import nx\n", "G = ont.get_graph()\n", "G" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use networkx to find all paths from an arbitrary term\n", "\n", "See https://networkx.github.io/documentation/development/reference/generated/networkx.algorithms.simple_paths.all_simple_paths.html" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## number of paths\n", "## (for the mapping of networkx to an ontology, source is root, and descendant is target)\n", "len(list(nx.all_simple_paths(G, root, t)))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['HP:0000118',\n", " 'HP:0000924',\n", " 'HP:0040068',\n", " 'HP:0002813',\n", " 'HP:0011297',\n", " 'HP:0030084',\n", " 'HP:0040019',\n", " 'HP:0040024'],\n", " ['HP:0000118',\n", " 'HP:0000924',\n", " 'HP:0040068',\n", " 'HP:0002813',\n", " 'HP:0011297',\n", " 'HP:0001167',\n", " 'HP:0004097',\n", " 'HP:0009317',\n", " 'HP:0040024']]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## nx returns a list of lists, each list is a path\n", "## Examine the first 2\n", "list(nx.all_simple_paths(G, root, t))[0:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## We (heart) pandas\n", "\n", "Pandas are cute.\n", "\n", "We use a DataFrame object, which we will construct by making a table of terms plus their pathstats" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 'HP:0005237',\n", " 'label': 'Degenerative liver disease',\n", " 'longest': 5,\n", " 'pathcount': 1},\n", " {'id': 'HP:0002251',\n", " 'label': 'Aganglionic megacolon',\n", " 'longest': 8,\n", " 'pathcount': 3},\n", " {'id': 'HP:0005102',\n", " 'label': 'Cochlear degeneration',\n", " 'longest': 6,\n", " 'pathcount': 1}]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "def get_pathstats(nodes):\n", " \"\"\"\n", " for any given node, return a table row with stats\n", " \"\"\"\n", " items = []\n", " for n in nodes:\n", " paths = list(nx.all_simple_paths(G, root, n))\n", " longest = len(max(paths, key=lambda p: len(p)))\n", " items.append({'id':n, \n", " 'label': ont.label(n),\n", " 'pathcount': len(paths),\n", " 'longest': longest})\n", " return items\n", "\n", "## Test it out\n", "sample = list(ont.descendants(root))[0:20]\n", "items = get_pathstats(sample)\n", "items[0:3]" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idlabellongestpathcount
0HP:0005237Degenerative liver disease51
1HP:0002251Aganglionic megacolon83
2HP:0005102Cochlear degeneration61
3HP:0006466Ankle contracture96
4HP:0004292Undermodelled hand bones61
5HP:0004839Pyropoikilocytosis71
6HP:0008970Scapulohumeral muscular dystrophy51
7HP:0008573Low-frequency sensorineural hearing impairment62
8HP:0005435Impaired T cell function83
9HP:0009218Fragmentation of the epiphysis of the middle p...1396
10HP:0005021Bilateral elbow dislocations83
11HP:0010964Abnormality of long-chain fatty-acid metabolism51
12HP:0008019Superior lens subluxation91
13HP:0030883Femoroacetabular Impingement84
14HP:0005303Aortic arch calcification95
15HP:0000741Apathy71
16HP:0040208Elevated CSF biopterin level72
17HP:0030031Small toe1013
18HP:0025348Abnormality of the corneal limbus71
19HP:0100720Hypoplasia of the ear cartilage51
\n", "
" ], "text/plain": [ " id label longest \\\n", "0 HP:0005237 Degenerative liver disease 5 \n", "1 HP:0002251 Aganglionic megacolon 8 \n", "2 HP:0005102 Cochlear degeneration 6 \n", "3 HP:0006466 Ankle contracture 9 \n", "4 HP:0004292 Undermodelled hand bones 6 \n", "5 HP:0004839 Pyropoikilocytosis 7 \n", "6 HP:0008970 Scapulohumeral muscular dystrophy 5 \n", "7 HP:0008573 Low-frequency sensorineural hearing impairment 6 \n", "8 HP:0005435 Impaired T cell function 8 \n", "9 HP:0009218 Fragmentation of the epiphysis of the middle p... 13 \n", "10 HP:0005021 Bilateral elbow dislocations 8 \n", "11 HP:0010964 Abnormality of long-chain fatty-acid metabolism 5 \n", "12 HP:0008019 Superior lens subluxation 9 \n", "13 HP:0030883 Femoroacetabular Impingement 8 \n", "14 HP:0005303 Aortic arch calcification 9 \n", "15 HP:0000741 Apathy 7 \n", "16 HP:0040208 Elevated CSF biopterin level 7 \n", "17 HP:0030031 Small toe 10 \n", "18 HP:0025348 Abnormality of the corneal limbus 7 \n", "19 HP:0100720 Hypoplasia of the ear cartilage 5 \n", "\n", " pathcount \n", "0 1 \n", "1 3 \n", "2 1 \n", "3 6 \n", "4 1 \n", "5 1 \n", "6 1 \n", "7 2 \n", "8 3 \n", "9 96 \n", "10 3 \n", "11 1 \n", "12 1 \n", "13 4 \n", "14 5 \n", "15 1 \n", "16 2 \n", "17 13 \n", "18 1 \n", "19 1 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Look at same table in pandas\n", "import pandas as pd\n", "df = pd.DataFrame(items)\n", "df" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7.3499999999999996" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Basic aggregate stats (over our small sample, which may not be representative)\n", "df['pathcount'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting with plotly\n", "\n", "Let's do a simple barchart showing distribution of pathcounts for our sample" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "import plotly.plotly as py\n", "import plotly.graph_objs as go\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = [\n", " go.Bar(\n", " x=df['label'], # assign x as the dataframe column 'x'\n", " y=df['pathcount']\n", " )\n", "]\n", "\n", "# IPython notebook\n", "py.iplot(data, filename='pandas-bar-chart')\n", "\n", "# use this in non-notebook context\n", "# url = py.plot(data, filename='pandas-bar-chart')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing over whole ontology\n", "\n", "__warning__ this can take over an hour, if running interactively, be patient!\n", "\n", "__help wanted__ is there a way to make Jupyter show a progress bar for cases like this?\n" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 'HP:0005237',\n", " 'label': 'Degenerative liver disease',\n", " 'longest': 5,\n", " 'pathcount': 1},\n", " {'id': 'HP:0002251',\n", " 'label': 'Aganglionic megacolon',\n", " 'longest': 8,\n", " 'pathcount': 3},\n", " {'id': 'HP:0005102',\n", " 'label': 'Cochlear degeneration',\n", " 'longest': 6,\n", " 'pathcount': 1}]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample = list(ont.descendants(root))\n", "items = get_pathstats(sample)\n", "items[0:3]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12066" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(items)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.DataFrame(items)\n" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6.6176031824962704" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['pathcount'].mean()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['pathcount'].max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting all HP terms\n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = [\n", " go.Bar(\n", " x=df['label'], # assign x as the dataframe column 'x'\n", " y=df['pathcount']\n", " )\n", "]\n", "\n", "# IPython notebook\n", "py.iplot(data, filename='pandas-bar-chart-all')\n" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "data = [\n", " go.Scatter(\n", " x=df['longest'], # assign x as the dataframe column 'x'\n", " y=df['pathcount'],\n", " mode = 'markers'\n", " )\n", "]\n", "\n", "# IPython notebook\n", "py.iplot(data, filename='pandas-longest-vs-numpaths')\n" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['HP:0100379', 'HP:0010432', 'HP:0010102', 'HP:0100378']" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "max_num_paths = df['pathcount'].max()\n", "nodes_with_max = [x['id'] for x in items if x['pathcount'] == max_num_paths]\n", "nodes_with_max" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Aplasia of the distal phalanx of the 4th toe',\n", " 'Absent distal phalanx of the 2nd toe',\n", " 'Aplasia of the distal phalanx of the hallux',\n", " 'Absent distal phalanx of the 3rd toe']" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[ont.label(n) for n in nodes_with_max]" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(nodes_with_max)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "## Pick an arbitrary term from list\n", "t = nodes_with_max[0]" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "36" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ancs = ont.ancestors(t, reflexive=True)\n", "ancs = [a for a in ancs if a.startswith('HP:')]\n", "len(ancs)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "## Make a sub-ontology with just term and ancestors\n", "subont = ont.subontology(ancs)" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['HP:0000118',\n", " 'HP:0000924',\n", " 'HP:0040068',\n", " 'HP:0040069',\n", " 'HP:0006493',\n", " 'HP:0006494',\n", " 'HP:0001991',\n", " 'HP:0010760',\n", " 'HP:0010185',\n", " 'HP:0100370',\n", " 'HP:0100379']" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_path = list(nx.all_simple_paths(G, root, t))[0]\n", "sample_path" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "## Render the sub-ontology,\n", "## highlighting a sample path\n", "from ontobio.io.ontol_renderers import GraphRenderer\n", "w = GraphRenderer.create('png')\n", "w.outfile = 'output/multipath.png'\n", "w.write(subont,query_ids=sample_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![img](output/multipath.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }