## Fear of Bees: Extracting Ontologies from Wikidata

Wikidata includes links between entities using predicates such as SubClassOf (P279). These form a classification hierarchy,
although as this comes from multiple sources, it may not conform to the same rules as ontology hierarchies.

OntoBio includes a wikidata ontology factory, so we can transparently create an Ontology object from wikidata,
and leverage the same methods available in ontobio.

This example is focused around [Anxiety disorders](https://www.wikidata.org/wiki/Q544006)


In [1]:
from ontobio.ontol_factory import OntologyFactory
f = OntologyFactory()

## OntologyFactory recognizes the prefix wdq for wikidata queries;
## We use this to make a sub-ontology
## (currently we have no lazy wrapper for WD, only Eager, so we limit the size)
ont = f.create('wdq:Q544006') # Anxiety disorder



In [2]:
## Find terms starting with Anxiety in the sub-ontology
qids = ont.search('Anxiety%')
qids

[rdflib.term.URIRef('http://www.wikidata.org/entity/Q544006')]

In [3]:
## Traverse up and down from query node in our sub-ontology
nodes = ont.traverse_nodes(qids, up=True, down=True)
labels = [ont.label(n) for n in nodes]
labels[:25]

['Aktualneurosen',
 'cognitive disorder',
 'Anti-French sentiment in the United States',
 'acarophobia',
 'Organic disease',
 'identifier',
 'Alektorophobia',
 'Katagelasticism',
 'answer',
 'Counterphobic attitude',
 'compulsive act',
 'physical condition',
 'Piblokto',
 'blood phobia',
 'category of being',
 'Childhood phobias',
 'ability',
 'disposition',
 'Entomophobia',
 'physiological condition',
 'property',
 'Cynophobia',
 'neurosis effects',
 'bowel-control anxiety',
 'Anxiety disorder']

In [16]:
## Test for cycles
import networkx as nx
g = ont.get_graph()
def show_cycle(nl):
 print(["{} {}".format(n, ont.label(n)) for n in nl])

cycles_list = list(nx.simple_cycles(g))
show_cycle(cycles_list[0])

['http://www.wikidata.org/entity/Q1347367 ability', 'http://www.wikidata.org/entity/Q151885 concept', 'http://www.wikidata.org/entity/Q9081 knowledge', 'http://www.wikidata.org/entity/Q3695082 sign', 'http://www.wikidata.org/entity/Q853614 identifier', 'http://www.wikidata.org/entity/Q937228 property']


In [5]:
## Show our extract of the sub-ontology as an ascii tree
## (note this is resilient to cycles)

## only traverse down from our query nodes
## (including ancestors causes multiple paths, and a verbose display)
nodes = ont.traverse_nodes(qids, up=False, down=True)

from ontobio.io.ontol_renderers import GraphRenderer
w = GraphRenderer.create('tree')
w.write_subgraph(ont, nodes, query_ids=qids)


. http://www.wikidata.org/entity/Q544006 ! Anxiety disorder * 
 % http://www.wikidata.org/entity/Q741713 ! panic disorder
 % http://www.wikidata.org/entity/Q6374996 ! Katagelasticism
 % http://www.wikidata.org/entity/Q845224 ! generalized anxiety disorder
 % http://www.wikidata.org/entity/Q377493 ! selective mutism
 % http://www.wikidata.org/entity/Q5354941 ! Elective mutism
 % http://www.wikidata.org/entity/Q202387 ! post-traumatic stress disorder
 % http://www.wikidata.org/entity/Q10547816 ! Counterphobic attitude
 % http://www.wikidata.org/entity/Q13604751 ! lovesickness
 % http://www.wikidata.org/entity/Q1316515 ! School refusal
 % http://www.wikidata.org/entity/Q4386741 ! Olfactory Reference Syndrome
 % http://www.wikidata.org/entity/Q424221 ! acute stress disorder
 % http://www.wikidata.org/entity/Q1482034 ! combat disorder
 % http://www.wikidata.org/entity/Q18967153 ! mixed disorder as reaction to stress
 % http://www.wikidata.org/entity/Q18967156 ! acute stress reaction with pr

In [6]:
## Show as graph using GraphViz
## We can do this for both descendants and ancestors
nodes = ont.traverse_nodes(qids, up=True, down=True)

w = GraphRenderer.create('png')
w.outfile = 'output/anxiety-disorder.png'
w.write_subgraph(ont, nodes, query_ids=qids)


![img](output/anxiety-disorder.png)

## Querying for associated entities

TODO: Drugs


In [4]:
## What proteins are associated with PTSD? (via GWAS)
[ptsd] = ont.search('post-traumatic stress disorder')
import ontobio.sparql.wikidata as wd
proteins = wd.canned_query('disease2protein', ptsd)

In [5]:
proteins

['UniProtKB:Q92831',
 'UniProtKB:P17252',
 'UniProtKB:Q8N9K7',
 'UniProtKB:O75899',
 'UniProtKB:Q92597',
 'UniProtKB:P40145',
 'UniProtKB:Q9HA38',
 'UniProtKB:P42658',
 'UniProtKB:Q9Y243',
 'UniProtKB:Q9NUQ9',
 'UniProtKB:Q9P272',
 'UniProtKB:Q9BY07',
 'UniProtKB:O43897',
 'UniProtKB:A0A024R9G4',
 'UniProtKB:Q4F7X0',
 'UniProtKB:E5RIR1',
 'UniProtKB:Q8IYG9',
 'UniProtKB:A7E2E4']

In [10]:
## Find GO terms for all genes/products associated with all nodes in Anxiety sub-ontology

## First create a GO handle and get association sets for GO (in human)
go = f.create('go')

from ontobio.assoc_factory import AssociationSetFactory
afactory = AssociationSetFactory()
aset = afactory.create(ontology=go,
 subject_category='gene',
 object_category='function',
 taxon='NCBITaxon:9606')


In [19]:
for n in ont.nodes():
 proteins = wd.canned_query('disease2protein', n)
 anns = [a for p in proteins for a in aset.annotations(p)]
 if len(anns) > 0:
 print("{} {}".format(n,ont.label(n)))
 for a in anns:
 print(" {} {}".format(a, go.label(a)))
 
 

http://www.wikidata.org/entity/Q202387 post-traumatic stress disorder
 GO:0007616 long-term memory
 GO:0006171 cAMP biosynthetic process
 GO:0007193 adenylate cyclase-inhibiting G-protein coupled receptor signaling pathway
 GO:0016021 integral component of membrane
 GO:0005524 ATP binding
 GO:0003091 renal water homeostasis
 GO:0005886 plasma membrane
 GO:0004016 adenylate cyclase activity
 GO:0004383 guanylate cyclase activity
 GO:0006182 cGMP biosynthetic process
 GO:0007165 signal transduction
 GO:0007190 activation of adenylate cyclase activity
 GO:0008294 calcium- and calmodulin-responsive adenylate cyclase activity
 GO:0008074 guanylate cyclase complex, soluble
 GO:0007189 adenylate cyclase-activating G-protein coupled receptor signaling pathway
 GO:0046872 metal ion binding
 GO:0007611 learning or memory
 GO:0071377 cellular response to glucagon stimulus
 GO:0016020 membrane
 GO:0035556 intracellular signal transduction
 GO:0034199 activation of protein kinase A activity
 GO:000