{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Fear of Bees: Extracting Ontologies from Wikidata\n", "\n", "Wikidata includes links between entities using predicates such as SubClassOf (P279). These form a classification hierarchy,\n", "although as this comes from multiple sources, it may not conform to the same rules as ontology hierarchies.\n", "\n", "OntoBio includes a wikidata ontology factory, so we can transparently create an Ontology object from wikidata,\n", "and leverage the same methods available in ontobio.\n", "\n", "This example is focused around [Anxiety disorders](https://www.wikidata.org/wiki/Q544006)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:rdflib.term: does not look like a valid URI, trying to serialize this will break.\n" ] } ], "source": [ "from ontobio.ontol_factory import OntologyFactory\n", "f = OntologyFactory()\n", "\n", "## OntologyFactory recognizes the prefix wdq for wikidata queries;\n", "## We use this to make a sub-ontology\n", "## (currently we have no lazy wrapper for WD, only Eager, so we limit the size)\n", "ont = f.create('wdq:Q544006') # Anxiety disorder" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[rdflib.term.URIRef('http://www.wikidata.org/entity/Q544006')]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Find terms starting with Anxiety in the sub-ontology\n", "qids = ont.search('Anxiety%')\n", "qids" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Aktualneurosen',\n", " 'cognitive disorder',\n", " 'Anti-French sentiment in the United States',\n", " 'acarophobia',\n", " 'Organic disease',\n", " 'identifier',\n", " 'Alektorophobia',\n", " 'Katagelasticism',\n", " 'answer',\n", " 'Counterphobic attitude',\n", " 'compulsive act',\n", " 'physical condition',\n", " 'Piblokto',\n", " 'blood phobia',\n", " 'category of being',\n", " 'Childhood phobias',\n", " 'ability',\n", " 'disposition',\n", " 'Entomophobia',\n", " 'physiological condition',\n", " 'property',\n", " 'Cynophobia',\n", " 'neurosis effects',\n", " 'bowel-control anxiety',\n", " 'Anxiety disorder']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Traverse up and down from query node in our sub-ontology\n", "nodes = ont.traverse_nodes(qids, up=True, down=True)\n", "labels = [ont.label(n) for n in nodes]\n", "labels[:25]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['http://www.wikidata.org/entity/Q1347367 ability', 'http://www.wikidata.org/entity/Q151885 concept', 'http://www.wikidata.org/entity/Q9081 knowledge', 'http://www.wikidata.org/entity/Q3695082 sign', 'http://www.wikidata.org/entity/Q853614 identifier', 'http://www.wikidata.org/entity/Q937228 property']\n" ] } ], "source": [ "## Test for cycles\n", "import networkx as nx\n", "g = ont.get_graph()\n", "def show_cycle(nl):\n", " print([\"{} {}\".format(n, ont.label(n)) for n in nl])\n", "\n", "cycles_list = list(nx.simple_cycles(g))\n", "show_cycle(cycles_list[0])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ". http://www.wikidata.org/entity/Q544006 ! Anxiety disorder * \n", " % http://www.wikidata.org/entity/Q741713 ! panic disorder\n", " % http://www.wikidata.org/entity/Q6374996 ! Katagelasticism\n", " % http://www.wikidata.org/entity/Q845224 ! generalized anxiety disorder\n", " % http://www.wikidata.org/entity/Q377493 ! selective mutism\n", " % http://www.wikidata.org/entity/Q5354941 ! Elective mutism\n", " % http://www.wikidata.org/entity/Q202387 ! post-traumatic stress disorder\n", " % http://www.wikidata.org/entity/Q10547816 ! Counterphobic attitude\n", " % http://www.wikidata.org/entity/Q13604751 ! lovesickness\n", " % http://www.wikidata.org/entity/Q1316515 ! School refusal\n", " % http://www.wikidata.org/entity/Q4386741 ! Olfactory Reference Syndrome\n", " % http://www.wikidata.org/entity/Q424221 ! acute stress disorder\n", " % http://www.wikidata.org/entity/Q1482034 ! combat disorder\n", " % http://www.wikidata.org/entity/Q18967153 ! mixed disorder as reaction to stress\n", " % http://www.wikidata.org/entity/Q18967156 ! acute stress reaction with predominant disturbance of consciousness\n", " % http://www.wikidata.org/entity/Q178190 ! obsessive-compulsive disorder\n", " % http://www.wikidata.org/entity/Q7458802 ! Sexual obsessions\n", " % http://www.wikidata.org/entity/Q231624 ! compulsive act\n", " % http://www.wikidata.org/entity/Q7310756 ! Relationship obsessive–compulsive disorder\n", " % http://www.wikidata.org/entity/Q19000444 ! neurotic disorder\n", " % http://www.wikidata.org/entity/Q181032 ! neurosis effects\n", " % http://www.wikidata.org/entity/Q144119 ! hysteria\n", " % http://www.wikidata.org/entity/Q336203 ! Abwehrhysterie\n", " % http://www.wikidata.org/entity/Q1779438 ! Piblokto\n", " % http://www.wikidata.org/entity/Q423509 ! Aktualneurosen\n", " % http://www.wikidata.org/entity/Q2300749 ! separation anxiety disorder\n", " % http://www.wikidata.org/entity/Q19000931 ! organic anxiety disorder\n", " % http://www.wikidata.org/entity/Q175854 ! phobia\n", " % http://www.wikidata.org/entity/Q560107 ! Tryophobia\n", " % http://www.wikidata.org/entity/Q1343559 ! ochlophobia\n", " % http://www.wikidata.org/entity/Q980010 ! Tokophobia\n", " % http://www.wikidata.org/entity/Q5097985 ! Childhood phobias\n", " % http://www.wikidata.org/entity/Q909355 ! Francophobia\n", " % http://www.wikidata.org/entity/Q3427834 ! Anti-French sentiment in the United States\n", " % http://www.wikidata.org/entity/Q174589 ! agoraphobia\n", " % http://www.wikidata.org/entity/Q22906231 ! Afrophobia\n", " % http://www.wikidata.org/entity/Q1363791 ! erythrophobia\n", " % http://www.wikidata.org/entity/Q13 ! triskaidekaphobia\n", " % http://www.wikidata.org/entity/Q2015728 ! specific phobia\n", " % http://www.wikidata.org/entity/Q944108 ! animal phobia\n", " % http://www.wikidata.org/entity/Q619261 ! Ornithophobia\n", " % http://www.wikidata.org/entity/Q4694196 ! Agrizoophobia\n", " % http://www.wikidata.org/entity/Q3321265 ! Fear of fish\n", " % http://www.wikidata.org/entity/Q596505 ! Ophidiophobia\n", " % http://www.wikidata.org/entity/Q4422074 ! Vermiphobia\n", " % http://www.wikidata.org/entity/Q405385 ! Ailurophobia\n", " % http://www.wikidata.org/entity/Q4297397 ! Fear of frogs\n", " % http://www.wikidata.org/entity/Q2319444 ! Herpetophobia\n", " % http://www.wikidata.org/entity/Q38579 ! Cynophobia\n", " % http://www.wikidata.org/entity/Q5384517 ! Equinophobia\n", " % http://www.wikidata.org/entity/Q2157130 ! Entomophobia\n", " % http://www.wikidata.org/entity/Q2160101 ! Fear of bees\n", " % http://www.wikidata.org/entity/Q2822642 ! acarophobia\n", " % http://www.wikidata.org/entity/Q220783 ! arachnophobia\n", " % http://www.wikidata.org/entity/Q3440772 ! Fear of mice\n", " % http://www.wikidata.org/entity/Q16002436 ! Alektorophobia\n", " % http://www.wikidata.org/entity/Q5439392 ! Fear of bats\n", " % http://www.wikidata.org/entity/Q3381344 ! Blood-injection-injury type phobia\n", " % http://www.wikidata.org/entity/Q886731 ! blood phobia\n", " % http://www.wikidata.org/entity/Q6034425 ! Injury phobia\n", " % http://www.wikidata.org/entity/Q169922 ! Fear of needles\n", " % http://www.wikidata.org/entity/Q1127417 ! flying phobia\n", " % http://www.wikidata.org/entity/Q3052614 ! nosophobia\n", " % http://www.wikidata.org/entity/Q18557105 ! cancerophobia\n", " % http://www.wikidata.org/entity/Q18557109 ! AIDS phobia\n", " % http://www.wikidata.org/entity/Q281928 ! social phobia\n", " % http://www.wikidata.org/entity/Q17147649 ! Specific social phobia\n", " % http://www.wikidata.org/entity/Q1335831 ! paruresis\n", " % http://www.wikidata.org/entity/Q612851 ! Telephone phobia\n", " % http://www.wikidata.org/entity/Q7136497 ! Parcopresis\n", " % http://www.wikidata.org/entity/Q2540262 ! Glossophobia\n", " % http://www.wikidata.org/entity/Q3219948 ! bowel-control anxiety\n", " % http://www.wikidata.org/entity/Q168995 ! Surdophobia\n", " % http://www.wikidata.org/entity/Q1131359 ! Amaxophobia\n", "\n", "\n" ] } ], "source": [ "## Show our extract of the sub-ontology as an ascii tree\n", "## (note this is resilient to cycles)\n", "\n", "## only traverse down from our query nodes\n", "## (including ancestors causes multiple paths, and a verbose display)\n", "nodes = ont.traverse_nodes(qids, up=False, down=True)\n", "\n", "from ontobio.io.ontol_renderers import GraphRenderer\n", "w = GraphRenderer.create('tree')\n", "w.write_subgraph(ont, nodes, query_ids=qids)\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Show as graph using GraphViz\n", "## We can do this for both descendants and ancestors\n", "nodes = ont.traverse_nodes(qids, up=True, down=True)\n", "\n", "w = GraphRenderer.create('png')\n", "w.outfile = 'output/anxiety-disorder.png'\n", "w.write_subgraph(ont, nodes, query_ids=qids)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![img](output/anxiety-disorder.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying for associated entities\n", "\n", "TODO: Drugs\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "## What proteins are associated with PTSD? (via GWAS)\n", "[ptsd] = ont.search('post-traumatic stress disorder')\n", "import ontobio.sparql.wikidata as wd\n", "proteins = wd.canned_query('disease2protein', ptsd)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['UniProtKB:Q92831',\n", " 'UniProtKB:P17252',\n", " 'UniProtKB:Q8N9K7',\n", " 'UniProtKB:O75899',\n", " 'UniProtKB:Q92597',\n", " 'UniProtKB:P40145',\n", " 'UniProtKB:Q9HA38',\n", " 'UniProtKB:P42658',\n", " 'UniProtKB:Q9Y243',\n", " 'UniProtKB:Q9NUQ9',\n", " 'UniProtKB:Q9P272',\n", " 'UniProtKB:Q9BY07',\n", " 'UniProtKB:O43897',\n", " 'UniProtKB:A0A024R9G4',\n", " 'UniProtKB:Q4F7X0',\n", " 'UniProtKB:E5RIR1',\n", " 'UniProtKB:Q8IYG9',\n", " 'UniProtKB:A7E2E4']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "proteins" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "## Find GO terms for all genes/products associated with all nodes in Anxiety sub-ontology\n", "\n", "## First create a GO handle and get association sets for GO (in human)\n", "go = f.create('go')\n", "\n", "from ontobio.assoc_factory import AssociationSetFactory\n", "afactory = AssociationSetFactory()\n", "aset = afactory.create(ontology=go,\n", " subject_category='gene',\n", " object_category='function',\n", " taxon='NCBITaxon:9606')\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://www.wikidata.org/entity/Q202387 post-traumatic stress disorder\n", " GO:0007616 long-term memory\n", " GO:0006171 cAMP biosynthetic process\n", " GO:0007193 adenylate cyclase-inhibiting G-protein coupled receptor signaling pathway\n", " GO:0016021 integral component of membrane\n", " GO:0005524 ATP binding\n", " GO:0003091 renal water homeostasis\n", " GO:0005886 plasma membrane\n", " GO:0004016 adenylate cyclase activity\n", " GO:0004383 guanylate cyclase activity\n", " GO:0006182 cGMP biosynthetic process\n", " GO:0007165 signal transduction\n", " GO:0007190 activation of adenylate cyclase activity\n", " GO:0008294 calcium- and calmodulin-responsive adenylate cyclase activity\n", " GO:0008074 guanylate cyclase complex, soluble\n", " GO:0007189 adenylate cyclase-activating G-protein coupled receptor signaling pathway\n", " GO:0046872 metal ion binding\n", " GO:0007611 learning or memory\n", " GO:0071377 cellular response to glucagon stimulus\n", " GO:0016020 membrane\n", " GO:0035556 intracellular signal transduction\n", " GO:0034199 activation of protein kinase A activity\n", " GO:0008198 ferrous iron binding\n", " GO:0016706 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors\n", " GO:0005634 nucleus\n", " GO:0005737 cytoplasm\n", " GO:0055114 oxidation-reduction process\n", " GO:0016300 tRNA (uracil) methyltransferase activity\n", " GO:0030488 tRNA methylation\n", " GO:0002098 tRNA wobble uridine modification\n", " GO:0000049 tRNA binding\n", " GO:0006400 tRNA modification\n", " GO:0008175 tRNA methyltransferase activity\n", "http://www.wikidata.org/entity/Q741713 panic disorder\n", " GO:0003713 transcription coactivator activity\n", " GO:0030374 ligand-dependent nuclear receptor transcription coactivator activity\n", " GO:0043565 sequence-specific DNA binding\n", " GO:0044212 transcription regulatory region DNA binding\n", " GO:0005515 protein binding\n", " GO:0005634 nucleus\n", " GO:0007165 signal transduction\n", " GO:0045893 positive regulation of transcription, DNA-templated\n", " GO:0003682 chromatin binding\n", " GO:0001047 core promoter binding\n", " GO:0003712 transcription cofactor activity\n", " GO:0008022 protein C-terminus binding\n", " GO:0043231 intracellular membrane-bounded organelle\n", " GO:0045944 positive regulation of transcription from RNA polymerase II promoter\n", " GO:0030518 intracellular steroid hormone receptor signaling pathway\n", " GO:0006351 transcription, DNA-templated\n", " GO:0008013 beta-catenin binding\n", " GO:0070016 armadillo repeat domain binding\n", " GO:0010628 positive regulation of gene expression\n", " GO:0016055 Wnt signaling pathway\n", " GO:0005829 cytosol\n", " GO:0000790 nuclear chromatin\n" ] } ], "source": [ "for n in ont.nodes():\n", " proteins = wd.canned_query('disease2protein', n)\n", " anns = [a for p in proteins for a in aset.annotations(p)]\n", " if len(anns) > 0:\n", " print(\"{} {}\".format(n,ont.label(n)))\n", " for a in anns:\n", " print(\" {} {}\".format(a, go.label(a)))\n", " \n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }