{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fear of Bees: Extracting Ontologies from Wikidata\n",
    "\n",
    "Wikidata includes links between entities using predicates such as SubClassOf (P279). These form a classification hierarchy,\n",
    "although as this comes from multiple sources, it may not conform to the same rules as ontology hierarchies.\n",
    "\n",
    "OntoBio includes a wikidata ontology factory, so we can transparently create an Ontology object from wikidata,\n",
    "and leverage the same methods available in ontobio.\n",
    "\n",
    "This example is focused around [Anxiety disorders](https://www.wikidata.org/wiki/Q544006)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:rdflib.term:  does not look like a valid URI, trying to serialize this will break.\n"
     ]
    }
   ],
   "source": [
    "from ontobio.ontol_factory import OntologyFactory\n",
    "f = OntologyFactory()\n",
    "\n",
    "## OntologyFactory recognizes the prefix wdq for wikidata queries;\n",
    "## We use this to make a sub-ontology\n",
    "## (currently we have no lazy wrapper for WD, only Eager, so we limit the size)\n",
    "ont = f.create('wdq:Q544006') # Anxiety disorder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[rdflib.term.URIRef('http://www.wikidata.org/entity/Q544006')]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Find terms starting with Anxiety in the sub-ontology\n",
    "qids = ont.search('Anxiety%')\n",
    "qids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Aktualneurosen',\n",
       " 'cognitive disorder',\n",
       " 'Anti-French sentiment in the United States',\n",
       " 'acarophobia',\n",
       " 'Organic disease',\n",
       " 'identifier',\n",
       " 'Alektorophobia',\n",
       " 'Katagelasticism',\n",
       " 'answer',\n",
       " 'Counterphobic attitude',\n",
       " 'compulsive act',\n",
       " 'physical condition',\n",
       " 'Piblokto',\n",
       " 'blood phobia',\n",
       " 'category of being',\n",
       " 'Childhood phobias',\n",
       " 'ability',\n",
       " 'disposition',\n",
       " 'Entomophobia',\n",
       " 'physiological condition',\n",
       " 'property',\n",
       " 'Cynophobia',\n",
       " 'neurosis effects',\n",
       " 'bowel-control anxiety',\n",
       " 'Anxiety disorder']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## Traverse up and down from query node in our sub-ontology\n",
    "nodes = ont.traverse_nodes(qids, up=True, down=True)\n",
    "labels = [ont.label(n) for n in nodes]\n",
    "labels[:25]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['http://www.wikidata.org/entity/Q1347367 ability', 'http://www.wikidata.org/entity/Q151885 concept', 'http://www.wikidata.org/entity/Q9081 knowledge', 'http://www.wikidata.org/entity/Q3695082 sign', 'http://www.wikidata.org/entity/Q853614 identifier', 'http://www.wikidata.org/entity/Q937228 property']\n"
     ]
    }
   ],
   "source": [
    "## Test for cycles\n",
    "import networkx as nx\n",
    "g = ont.get_graph()\n",
    "def show_cycle(nl):\n",
    "    print([\"{} {}\".format(n, ont.label(n)) for n in nl])\n",
    "\n",
    "cycles_list = list(nx.simple_cycles(g))\n",
    "show_cycle(cycles_list[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ". http://www.wikidata.org/entity/Q544006 ! Anxiety disorder * \n",
      " % http://www.wikidata.org/entity/Q741713 ! panic disorder\n",
      " % http://www.wikidata.org/entity/Q6374996 ! Katagelasticism\n",
      " % http://www.wikidata.org/entity/Q845224 ! generalized anxiety disorder\n",
      " % http://www.wikidata.org/entity/Q377493 ! selective mutism\n",
      "  % http://www.wikidata.org/entity/Q5354941 ! Elective mutism\n",
      " % http://www.wikidata.org/entity/Q202387 ! post-traumatic stress disorder\n",
      " % http://www.wikidata.org/entity/Q10547816 ! Counterphobic attitude\n",
      " % http://www.wikidata.org/entity/Q13604751 ! lovesickness\n",
      " % http://www.wikidata.org/entity/Q1316515 ! School refusal\n",
      " % http://www.wikidata.org/entity/Q4386741 ! Olfactory Reference Syndrome\n",
      " % http://www.wikidata.org/entity/Q424221 ! acute stress disorder\n",
      "  % http://www.wikidata.org/entity/Q1482034 ! combat disorder\n",
      "  % http://www.wikidata.org/entity/Q18967153 ! mixed disorder as reaction to stress\n",
      "  % http://www.wikidata.org/entity/Q18967156 ! acute stress reaction with predominant disturbance of consciousness\n",
      " % http://www.wikidata.org/entity/Q178190 ! obsessive-compulsive disorder\n",
      "  % http://www.wikidata.org/entity/Q7458802 ! Sexual obsessions\n",
      "  % http://www.wikidata.org/entity/Q231624 ! compulsive act\n",
      "  % http://www.wikidata.org/entity/Q7310756 ! Relationship obsessive–compulsive disorder\n",
      " % http://www.wikidata.org/entity/Q19000444 ! neurotic disorder\n",
      "  % http://www.wikidata.org/entity/Q181032 ! neurosis effects\n",
      "   % http://www.wikidata.org/entity/Q144119 ! hysteria\n",
      "    % http://www.wikidata.org/entity/Q336203 ! Abwehrhysterie\n",
      "    % http://www.wikidata.org/entity/Q1779438 ! Piblokto\n",
      "   % http://www.wikidata.org/entity/Q423509 ! Aktualneurosen\n",
      " % http://www.wikidata.org/entity/Q2300749 ! separation anxiety disorder\n",
      " % http://www.wikidata.org/entity/Q19000931 ! organic anxiety disorder\n",
      " % http://www.wikidata.org/entity/Q175854 ! phobia\n",
      "  % http://www.wikidata.org/entity/Q560107 ! Tryophobia\n",
      "  % http://www.wikidata.org/entity/Q1343559 ! ochlophobia\n",
      "  % http://www.wikidata.org/entity/Q980010 ! Tokophobia\n",
      "  % http://www.wikidata.org/entity/Q5097985 ! Childhood phobias\n",
      "  % http://www.wikidata.org/entity/Q909355 ! Francophobia\n",
      "   % http://www.wikidata.org/entity/Q3427834 ! Anti-French sentiment in the United States\n",
      "  % http://www.wikidata.org/entity/Q174589 ! agoraphobia\n",
      "  % http://www.wikidata.org/entity/Q22906231 ! Afrophobia\n",
      "  % http://www.wikidata.org/entity/Q1363791 ! erythrophobia\n",
      "  % http://www.wikidata.org/entity/Q13 ! triskaidekaphobia\n",
      "  % http://www.wikidata.org/entity/Q2015728 ! specific phobia\n",
      "   % http://www.wikidata.org/entity/Q944108 ! animal phobia\n",
      "    % http://www.wikidata.org/entity/Q619261 ! Ornithophobia\n",
      "    % http://www.wikidata.org/entity/Q4694196 ! Agrizoophobia\n",
      "    % http://www.wikidata.org/entity/Q3321265 ! Fear of fish\n",
      "    % http://www.wikidata.org/entity/Q596505 ! Ophidiophobia\n",
      "    % http://www.wikidata.org/entity/Q4422074 ! Vermiphobia\n",
      "    % http://www.wikidata.org/entity/Q405385 ! Ailurophobia\n",
      "    % http://www.wikidata.org/entity/Q4297397 ! Fear of frogs\n",
      "    % http://www.wikidata.org/entity/Q2319444 ! Herpetophobia\n",
      "    % http://www.wikidata.org/entity/Q38579 ! Cynophobia\n",
      "    % http://www.wikidata.org/entity/Q5384517 ! Equinophobia\n",
      "    % http://www.wikidata.org/entity/Q2157130 ! Entomophobia\n",
      "     % http://www.wikidata.org/entity/Q2160101 ! Fear of bees\n",
      "     % http://www.wikidata.org/entity/Q2822642 ! acarophobia\n",
      "    % http://www.wikidata.org/entity/Q220783 ! arachnophobia\n",
      "    % http://www.wikidata.org/entity/Q3440772 ! Fear of mice\n",
      "    % http://www.wikidata.org/entity/Q16002436 ! Alektorophobia\n",
      "    % http://www.wikidata.org/entity/Q5439392 ! Fear of bats\n",
      "   % http://www.wikidata.org/entity/Q3381344 ! Blood-injection-injury type phobia\n",
      "    % http://www.wikidata.org/entity/Q886731 ! blood phobia\n",
      "    % http://www.wikidata.org/entity/Q6034425 ! Injury phobia\n",
      "    % http://www.wikidata.org/entity/Q169922 ! Fear of needles\n",
      "   % http://www.wikidata.org/entity/Q1127417 ! flying phobia\n",
      "   % http://www.wikidata.org/entity/Q3052614 ! nosophobia\n",
      "    % http://www.wikidata.org/entity/Q18557105 ! cancerophobia\n",
      "    % http://www.wikidata.org/entity/Q18557109 ! AIDS phobia\n",
      "  % http://www.wikidata.org/entity/Q281928 ! social phobia\n",
      "   % http://www.wikidata.org/entity/Q17147649 ! Specific social phobia\n",
      "    % http://www.wikidata.org/entity/Q1335831 ! paruresis\n",
      "    % http://www.wikidata.org/entity/Q612851 ! Telephone phobia\n",
      "    % http://www.wikidata.org/entity/Q7136497 ! Parcopresis\n",
      "    % http://www.wikidata.org/entity/Q2540262 ! Glossophobia\n",
      "   % http://www.wikidata.org/entity/Q3219948 ! bowel-control anxiety\n",
      "  % http://www.wikidata.org/entity/Q168995 ! Surdophobia\n",
      "  % http://www.wikidata.org/entity/Q1131359 ! Amaxophobia\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "## Show our extract of the sub-ontology as an ascii tree\n",
    "## (note this is resilient to cycles)\n",
    "\n",
    "## only traverse down from our query nodes\n",
    "## (including ancestors causes multiple paths, and a verbose display)\n",
    "nodes = ont.traverse_nodes(qids, up=False, down=True)\n",
    "\n",
    "from ontobio.io.ontol_renderers import GraphRenderer\n",
    "w = GraphRenderer.create('tree')\n",
    "w.write_subgraph(ont, nodes, query_ids=qids)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "## Show as graph using GraphViz\n",
    "## We can do this for both descendants and ancestors\n",
    "nodes = ont.traverse_nodes(qids, up=True, down=True)\n",
    "\n",
    "w = GraphRenderer.create('png')\n",
    "w.outfile = 'output/anxiety-disorder.png'\n",
    "w.write_subgraph(ont, nodes, query_ids=qids)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![img](output/anxiety-disorder.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying for associated entities\n",
    "\n",
    "TODO: Drugs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "## What proteins are associated with PTSD? (via GWAS)\n",
    "[ptsd] = ont.search('post-traumatic stress disorder')\n",
    "import ontobio.sparql.wikidata as wd\n",
    "proteins = wd.canned_query('disease2protein', ptsd)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['UniProtKB:Q92831',\n",
       " 'UniProtKB:P17252',\n",
       " 'UniProtKB:Q8N9K7',\n",
       " 'UniProtKB:O75899',\n",
       " 'UniProtKB:Q92597',\n",
       " 'UniProtKB:P40145',\n",
       " 'UniProtKB:Q9HA38',\n",
       " 'UniProtKB:P42658',\n",
       " 'UniProtKB:Q9Y243',\n",
       " 'UniProtKB:Q9NUQ9',\n",
       " 'UniProtKB:Q9P272',\n",
       " 'UniProtKB:Q9BY07',\n",
       " 'UniProtKB:O43897',\n",
       " 'UniProtKB:A0A024R9G4',\n",
       " 'UniProtKB:Q4F7X0',\n",
       " 'UniProtKB:E5RIR1',\n",
       " 'UniProtKB:Q8IYG9',\n",
       " 'UniProtKB:A7E2E4']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "proteins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Find GO terms for all genes/products associated with all nodes in Anxiety sub-ontology\n",
    "\n",
    "## First create a GO handle and get association sets for GO (in human)\n",
    "go = f.create('go')\n",
    "\n",
    "from ontobio.assoc_factory import AssociationSetFactory\n",
    "afactory = AssociationSetFactory()\n",
    "aset = afactory.create(ontology=go,\n",
    "                       subject_category='gene',\n",
    "                       object_category='function',\n",
    "                       taxon='NCBITaxon:9606')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "http://www.wikidata.org/entity/Q202387 post-traumatic stress disorder\n",
      "  GO:0007616 long-term memory\n",
      "  GO:0006171 cAMP biosynthetic process\n",
      "  GO:0007193 adenylate cyclase-inhibiting G-protein coupled receptor signaling pathway\n",
      "  GO:0016021 integral component of membrane\n",
      "  GO:0005524 ATP binding\n",
      "  GO:0003091 renal water homeostasis\n",
      "  GO:0005886 plasma membrane\n",
      "  GO:0004016 adenylate cyclase activity\n",
      "  GO:0004383 guanylate cyclase activity\n",
      "  GO:0006182 cGMP biosynthetic process\n",
      "  GO:0007165 signal transduction\n",
      "  GO:0007190 activation of adenylate cyclase activity\n",
      "  GO:0008294 calcium- and calmodulin-responsive adenylate cyclase activity\n",
      "  GO:0008074 guanylate cyclase complex, soluble\n",
      "  GO:0007189 adenylate cyclase-activating G-protein coupled receptor signaling pathway\n",
      "  GO:0046872 metal ion binding\n",
      "  GO:0007611 learning or memory\n",
      "  GO:0071377 cellular response to glucagon stimulus\n",
      "  GO:0016020 membrane\n",
      "  GO:0035556 intracellular signal transduction\n",
      "  GO:0034199 activation of protein kinase A activity\n",
      "  GO:0008198 ferrous iron binding\n",
      "  GO:0016706 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors\n",
      "  GO:0005634 nucleus\n",
      "  GO:0005737 cytoplasm\n",
      "  GO:0055114 oxidation-reduction process\n",
      "  GO:0016300 tRNA (uracil) methyltransferase activity\n",
      "  GO:0030488 tRNA methylation\n",
      "  GO:0002098 tRNA wobble uridine modification\n",
      "  GO:0000049 tRNA binding\n",
      "  GO:0006400 tRNA modification\n",
      "  GO:0008175 tRNA methyltransferase activity\n",
      "http://www.wikidata.org/entity/Q741713 panic disorder\n",
      "  GO:0003713 transcription coactivator activity\n",
      "  GO:0030374 ligand-dependent nuclear receptor transcription coactivator activity\n",
      "  GO:0043565 sequence-specific DNA binding\n",
      "  GO:0044212 transcription regulatory region DNA binding\n",
      "  GO:0005515 protein binding\n",
      "  GO:0005634 nucleus\n",
      "  GO:0007165 signal transduction\n",
      "  GO:0045893 positive regulation of transcription, DNA-templated\n",
      "  GO:0003682 chromatin binding\n",
      "  GO:0001047 core promoter binding\n",
      "  GO:0003712 transcription cofactor activity\n",
      "  GO:0008022 protein C-terminus binding\n",
      "  GO:0043231 intracellular membrane-bounded organelle\n",
      "  GO:0045944 positive regulation of transcription from RNA polymerase II promoter\n",
      "  GO:0030518 intracellular steroid hormone receptor signaling pathway\n",
      "  GO:0006351 transcription, DNA-templated\n",
      "  GO:0008013 beta-catenin binding\n",
      "  GO:0070016 armadillo repeat domain binding\n",
      "  GO:0010628 positive regulation of gene expression\n",
      "  GO:0016055 Wnt signaling pathway\n",
      "  GO:0005829 cytosol\n",
      "  GO:0000790 nuclear chromatin\n"
     ]
    }
   ],
   "source": [
    "for n in ont.nodes():\n",
    "    proteins = wd.canned_query('disease2protein', n)\n",
    "    anns = [a for p in proteins for a in aset.annotations(p)]\n",
    "    if len(anns) > 0:\n",
    "        print(\"{} {}\".format(n,ont.label(n)))\n",
    "        for a in anns:\n",
    "            print(\"  {} {}\".format(a, go.label(a)))\n",
    "            \n",
    "            "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}