{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing annotations to a term and descendants\n", "\n", "This notebook demonstrates summarizing annotation counts for a term and its descendants.\n", "\n", "An example use of this is a GO annotator exploring refactoring a subtree in GO\n", "\n", "Of course, if this were a regular thing we would make a command line or even web interface,\n", "but keeping as a notebook gives us some flexibility in logic, and anyway is intended largely\n", "as a demonstration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### boilerplate\n", "\n", " * importing relevant ontobiolibraries\n", " * set up key objects" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "## Create an ontology factory in order to fetch GO\n", "from ontobio.ontol_factory import OntologyFactory\n", "ofactory = OntologyFactory()\n", "\n", "## GOLR queries\n", "from ontobio.golr.golr_query import GolrAssociationQuery\n", "\n", "## rendering ontologies\n", "from ontobio import GraphRenderer" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "## Load GO. Note the first time this runs Jupyter will show '*' - be patient\n", "ont = ofactory.create(\"go\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding descendants\n", "\n", "Here we are using the in-memory ontology object, no external service calls are executed\n", "\n", "Change the value of `term_id` to what you like" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "term_id = \"GO:0009070\" ## serine family amino acid biosynthetic process\n", "descendants = ont.descendants(term_id, reflexive=True, relations=['subClassOf', 'BFO:0000050'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['GO:0016260',\n", " 'GO:0004124',\n", " 'GO:0019343',\n", " 'GO:0019265',\n", " 'GO:0019345',\n", " 'GO:0006564',\n", " 'GO:0006545',\n", " 'GO:0071269',\n", " 'GO:0006535',\n", " 'GO:0009090',\n", " 'GO:0070179',\n", " 'GO:0019264',\n", " 'GO:0009070',\n", " 'GO:0019344']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descendants" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### rendering subtrees\n", "\n", "We use the good-old-fashioned Tree renderer\n", "\n", "(this doesn't scale well for latticey-subontologies)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "renderer = GraphRenderer.create('tree')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ". GO:0009070 ! serine family amino acid biosynthetic process\n", " % GO:0006545 ! glycine biosynthetic process\n", " % GO:0019264 ! glycine biosynthetic process from serine\n", " % GO:0019265 ! glycine biosynthetic process, by transamination of glyoxylate\n", " % GO:0006564 ! L-serine biosynthetic process\n", " % GO:0019344 ! cysteine biosynthetic process\n", " % GO:0006535 ! cysteine biosynthetic process from serine\n", " % GO:0019343 ! cysteine biosynthetic process via cystathionine\n", " % GO:0019345 ! cysteine biosynthetic process via S-sulfo-L-cysteine\n", " < GO:0004124 ! cysteine synthase activity\n", " % GO:0009090 ! homoserine biosynthetic process\n", " % GO:0016260 ! selenocysteine biosynthetic process\n", " % GO:0070179 ! D-serine biosynthetic process\n", " % GO:0071269 ! L-homocysteine biosynthetic process\n", "\n", "\n" ] } ], "source": [ "print(renderer.render_subgraph(ont, nodes=descendants))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### summarizing annotations\n", "\n", "We write a short procedure to wrap calling Golr and returning a summary dict\n", "\n", "The dict is keyed by taxon label. We also include an entry for `ALL`\n" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "DEFAULT_FACET_FIELDS = ['taxon_subset_closure_label', 'evidence_label', 'assigned_by']\n", "def summarize(t: str, \n", " evidence_closure='ECO:0000269', ## restrict to experimental\n", " facet_fields=None) -> dict:\n", " \"\"\"\n", " Summarize a term\n", " \"\"\"\n", " if facet_fields == None:\n", " facet_fields = DEFAULT_FACET_FIELDS\n", " q = GolrAssociationQuery(object=t, rows=0, object_category='function', \n", " fq={'evidence_closur'taxon_subset_closure_label'e_label':'experimental evidence'},\n", " facet_fields=facet_fields)\n", " #params = q.solr_params()\n", " #print(params)\n", " result = q.exec()\n", " fc = result['facet_counts']\n", " item = {'ALL': result['numFound']} ## make sure this is the first entry\n", " for ff in facet_fields:\n", " if ff in fc:\n", " item.update(fc[ff])\n", " return item" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ALL': 144, 'Eukaryota': 92, 'Bacteria': 52, 'Metazoa': 33, 'Fungi': 32, 'Escherichia coli K-12': 27, 'Viridiplantae': 23, 'Mammalia': 22, 'Vertebrata ': 22, 'Arabidopsis thaliana': 21, 'Saccharomyces cerevisiae S288C': 17, 'Mycobacterium tuberculosis H37Rv': 11, 'Schizosaccharomyces pombe': 11, 'Homo sapiens': 8, 'Caenorhabditis elegans': 7, 'Mus musculus': 7, 'Rattus norvegicus': 7, 'Bacillus subtilis subsp. subtilis str. 168': 6, 'Pseudomonas aeruginosa PAO1': 4, 'Aspergillus nidulans FGSC A4': 3, 'Apis mellifera': 2, 'Leishmania major strain Friedlin': 2, 'Bombyx mori': 1, 'Candida albicans SC5314': 1, 'Dictyostelium discoideum': 1, 'Drosophila melanogaster': 1, 'direct assay evidence used in manual assertion': 81, 'mutant phenotype evidence used in manual assertion': 53, 'genetic interaction evidence used in manual assertion': 10, 'EcoCyc': 20, 'TAIR': 19, 'SGD': 17, 'UniProt': 16, 'PomBase': 11, 'MTBBASE': 10, 'EcoliWiki': 7, 'RGD': 7, 'WB': 7, 'MGI': 6, 'CAFA': 5, 'PseudoCAP': 5, 'BHF-UCL': 4, 'AspGD': 3, 'GeneDB': 3, 'CGD': 1, 'FlyBase': 1, 'GOC': 1, 'dictyBase': 1}\n" ] } ], "source": [ "print(summarize(term_id))" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "def summarize_set(ids, facet_fields=None) -> pd.DataFrame:\n", " \"\"\"\n", " Summarize a set of annotations, return a dataframe\n", " \"\"\"\n", " items = []\n", " for id in ids:\n", " item = {'id': id, 'name:': ont.label(id)}\n", " for k,v in summarize(id, facet_fields=facet_fields).items():\n", " item[k] = v\n", " items.append(item)\n", " df = pd.DataFrame(items).fillna(0)\n", " # sort using total number\n", " df.sort_values('ALL', axis=0, ascending=False, inplace=True)\n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarize GO term and descendants\n", "\n", "More advanced visualziations are easy with plotly etc. We leave as an exercise to the reader...\n", "\n", "As an example, for the first query we bundle all facets (species, evidence, assigned by) together" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname:ALLBacteriaEscherichia coli K-12EukaryotaMammaliaMetazoaMus musculusTrypanosoma brucei brucei TREU927Vertebrata <vertebrates>mutant phenotype evidence used in manual assertiondirect assay evidence used in manual assertionEcoCycGeneDBMGIViridiplantaeArabidopsis thalianaFungiCaenorhabditis elegansSchizosaccharomyces pombeBacillus subtilis subsp. subtilis str. 168Mycobacterium tuberculosis H37RvSaccharomyces cerevisiae S288CSolanum tuberosumSpinacia oleraceaStreptomyces lavendulaegenetic interaction evidence used in manual assertionTAIRPomBaseUniProtWBCAFAEcoliWikiMTBBASESGDAspergillus nidulans FGSC A4Apis melliferaHomo sapiensLeishmania major strain FriedlinAspGDBHF-UCLRattus norvegicusRGDPseudomonas aeruginosa PAO1Thermus thermophilus HB27PseudoCAPBombyx moriDrosophila melanogasterFlyBaseLactobacillus caseiCandida albicans SC5314CGDDictyostelium discoideumdictyBaseGOCPseudomonas aeruginosa
12GO:0009070serine family amino acid biosynthetic process14452.027.092.022.033.07.00.022.053.081.020.03.06.023.021.032.07.011.06.011.017.00.00.00.010.019.011.016.07.05.07.010.017.03.02.08.02.03.04.07.07.04.00.05.01.01.01.00.01.01.01.01.01.00.0
13GO:0019344cysteine biosynthetic process8225.08.057.06.015.02.00.06.030.049.06.02.01.021.019.019.07.010.05.08.06.01.01.00.03.017.010.010.07.05.02.07.06.03.02.03.02.03.02.01.01.01.00.02.00.00.00.01.00.00.00.00.01.01.0
1GO:0004124cysteine synthase activity297.03.022.00.04.00.00.00.06.021.02.00.00.013.011.05.04.04.02.01.01.01.01.01.02.010.04.04.04.02.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
5GO:0006564L-serine biosynthetic process2116.08.05.01.01.00.00.01.09.06.03.00.00.01.01.03.00.00.01.03.03.00.00.00.06.01.00.02.00.00.05.03.03.00.00.00.00.00.00.01.01.03.01.03.00.00.00.00.00.00.00.00.00.00.0
8GO:0006535cysteine biosynthetic process from serine209.04.011.00.04.00.00.00.06.013.03.01.00.00.00.06.03.05.02.02.01.00.00.00.01.00.05.02.03.02.01.02.01.00.01.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.0
6GO:0006545glycine biosynthetic process143.03.011.07.09.00.00.07.02.011.03.00.00.00.00.02.00.00.00.00.02.00.00.00.01.00.00.02.00.00.00.00.02.00.00.02.00.00.01.05.05.00.00.00.01.01.01.00.00.00.00.00.00.00.0
2GO:0019343cysteine biosynthetic process via cystathionine91.00.08.01.02.00.00.01.06.03.00.01.00.00.00.05.00.00.00.01.03.00.00.00.00.00.00.01.00.00.00.01.03.02.01.01.01.02.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
9GO:0009090homoserine biosynthetic process84.04.04.00.00.00.00.00.04.04.04.00.00.00.00.04.00.00.00.00.03.00.00.00.00.00.00.00.00.00.00.00.03.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.01.00.00.00.00.0
10GO:0070179D-serine biosynthetic process80.00.08.06.06.04.00.06.00.08.00.00.04.01.01.00.00.00.00.00.00.00.00.00.00.01.00.02.00.00.00.00.00.00.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.01.00.00.0
0GO:0016260selenocysteine biosynthetic process64.04.02.01.01.01.01.01.04.02.04.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
11GO:0019264glycine biosynthetic process from serine62.02.04.02.04.00.00.02.02.04.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.02.02.00.00.00.01.01.01.00.00.00.00.00.00.00.0
3GO:0019265glycine biosynthetic process, by transaminatio...40.00.04.03.03.00.00.03.00.04.00.00.00.00.00.01.00.00.00.00.01.00.00.00.00.00.00.01.00.00.00.00.01.00.00.02.00.00.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.0
7GO:0071269L-homocysteine biosynthetic process10.00.01.00.00.00.00.00.00.01.00.00.00.00.00.01.00.01.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
4GO:0019345cysteine biosynthetic process via S-sulfo-L-cy...00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " id name: ALL \\\n", "12 GO:0009070 serine family amino acid biosynthetic process 144 \n", "13 GO:0019344 cysteine biosynthetic process 82 \n", "1 GO:0004124 cysteine synthase activity 29 \n", "5 GO:0006564 L-serine biosynthetic process 21 \n", "8 GO:0006535 cysteine biosynthetic process from serine 20 \n", "6 GO:0006545 glycine biosynthetic process 14 \n", "2 GO:0019343 cysteine biosynthetic process via cystathionine 9 \n", "9 GO:0009090 homoserine biosynthetic process 8 \n", "10 GO:0070179 D-serine biosynthetic process 8 \n", "0 GO:0016260 selenocysteine biosynthetic process 6 \n", "11 GO:0019264 glycine biosynthetic process from serine 6 \n", "3 GO:0019265 glycine biosynthetic process, by transaminatio... 4 \n", "7 GO:0071269 L-homocysteine biosynthetic process 1 \n", "4 GO:0019345 cysteine biosynthetic process via S-sulfo-L-cy... 0 \n", "\n", " Bacteria Escherichia coli K-12 Eukaryota Mammalia Metazoa \\\n", "12 52.0 27.0 92.0 22.0 33.0 \n", "13 25.0 8.0 57.0 6.0 15.0 \n", "1 7.0 3.0 22.0 0.0 4.0 \n", "5 16.0 8.0 5.0 1.0 1.0 \n", "8 9.0 4.0 11.0 0.0 4.0 \n", "6 3.0 3.0 11.0 7.0 9.0 \n", "2 1.0 0.0 8.0 1.0 2.0 \n", "9 4.0 4.0 4.0 0.0 0.0 \n", "10 0.0 0.0 8.0 6.0 6.0 \n", "0 4.0 4.0 2.0 1.0 1.0 \n", "11 2.0 2.0 4.0 2.0 4.0 \n", "3 0.0 0.0 4.0 3.0 3.0 \n", "7 0.0 0.0 1.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Mus musculus Trypanosoma brucei brucei TREU927 Vertebrata \\\n", "12 7.0 0.0 22.0 \n", "13 2.0 0.0 6.0 \n", "1 0.0 0.0 0.0 \n", "5 0.0 0.0 1.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 7.0 \n", "2 0.0 0.0 1.0 \n", "9 0.0 0.0 0.0 \n", "10 4.0 0.0 6.0 \n", "0 1.0 1.0 1.0 \n", "11 0.0 0.0 2.0 \n", "3 0.0 0.0 3.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " mutant phenotype evidence used in manual assertion \\\n", "12 53.0 \n", "13 30.0 \n", "1 6.0 \n", "5 9.0 \n", "8 6.0 \n", "6 2.0 \n", "2 6.0 \n", "9 4.0 \n", "10 0.0 \n", "0 4.0 \n", "11 2.0 \n", "3 0.0 \n", "7 0.0 \n", "4 0.0 \n", "\n", " direct assay evidence used in manual assertion EcoCyc GeneDB MGI \\\n", "12 81.0 20.0 3.0 6.0 \n", "13 49.0 6.0 2.0 1.0 \n", "1 21.0 2.0 0.0 0.0 \n", "5 6.0 3.0 0.0 0.0 \n", "8 13.0 3.0 1.0 0.0 \n", "6 11.0 3.0 0.0 0.0 \n", "2 3.0 0.0 1.0 0.0 \n", "9 4.0 4.0 0.0 0.0 \n", "10 8.0 0.0 0.0 4.0 \n", "0 2.0 4.0 1.0 1.0 \n", "11 4.0 2.0 0.0 0.0 \n", "3 4.0 0.0 0.0 0.0 \n", "7 1.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 \n", "\n", " Viridiplantae Arabidopsis thaliana Fungi Caenorhabditis elegans \\\n", "12 23.0 21.0 32.0 7.0 \n", "13 21.0 19.0 19.0 7.0 \n", "1 13.0 11.0 5.0 4.0 \n", "5 1.0 1.0 3.0 0.0 \n", "8 0.0 0.0 6.0 3.0 \n", "6 0.0 0.0 2.0 0.0 \n", "2 0.0 0.0 5.0 0.0 \n", "9 0.0 0.0 4.0 0.0 \n", "10 1.0 1.0 0.0 0.0 \n", "0 0.0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 0.0 \n", "3 0.0 0.0 1.0 0.0 \n", "7 0.0 0.0 1.0 0.0 \n", "4 0.0 0.0 0.0 0.0 \n", "\n", " Schizosaccharomyces pombe Bacillus subtilis subsp. subtilis str. 168 \\\n", "12 11.0 6.0 \n", "13 10.0 5.0 \n", "1 4.0 2.0 \n", "5 0.0 1.0 \n", "8 5.0 2.0 \n", "6 0.0 0.0 \n", "2 0.0 0.0 \n", "9 0.0 0.0 \n", "10 0.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 0.0 \n", "3 0.0 0.0 \n", "7 1.0 0.0 \n", "4 0.0 0.0 \n", "\n", " Mycobacterium tuberculosis H37Rv Saccharomyces cerevisiae S288C \\\n", "12 11.0 17.0 \n", "13 8.0 6.0 \n", "1 1.0 1.0 \n", "5 3.0 3.0 \n", "8 2.0 1.0 \n", "6 0.0 2.0 \n", "2 1.0 3.0 \n", "9 0.0 3.0 \n", "10 0.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 0.0 \n", "3 0.0 1.0 \n", "7 0.0 0.0 \n", "4 0.0 0.0 \n", "\n", " Solanum tuberosum Spinacia oleracea Streptomyces lavendulae \\\n", "12 0.0 0.0 0.0 \n", "13 1.0 1.0 0.0 \n", "1 1.0 1.0 1.0 \n", "5 0.0 0.0 0.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " genetic interaction evidence used in manual assertion TAIR PomBase \\\n", "12 10.0 19.0 11.0 \n", "13 3.0 17.0 10.0 \n", "1 2.0 10.0 4.0 \n", "5 6.0 1.0 0.0 \n", "8 1.0 0.0 5.0 \n", "6 1.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 1.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 1.0 \n", "4 0.0 0.0 0.0 \n", "\n", " UniProt WB CAFA EcoliWiki MTBBASE SGD \\\n", "12 16.0 7.0 5.0 7.0 10.0 17.0 \n", "13 10.0 7.0 5.0 2.0 7.0 6.0 \n", "1 4.0 4.0 2.0 1.0 1.0 1.0 \n", "5 2.0 0.0 0.0 5.0 3.0 3.0 \n", "8 2.0 3.0 2.0 1.0 2.0 1.0 \n", "6 2.0 0.0 0.0 0.0 0.0 2.0 \n", "2 1.0 0.0 0.0 0.0 1.0 3.0 \n", "9 0.0 0.0 0.0 0.0 0.0 3.0 \n", "10 2.0 0.0 0.0 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "11 1.0 0.0 0.0 0.0 0.0 0.0 \n", "3 1.0 0.0 0.0 0.0 0.0 1.0 \n", "7 0.0 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Aspergillus nidulans FGSC A4 Apis mellifera Homo sapiens \\\n", "12 3.0 2.0 8.0 \n", "13 3.0 2.0 3.0 \n", "1 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 \n", "8 0.0 1.0 0.0 \n", "6 0.0 0.0 2.0 \n", "2 2.0 1.0 1.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 2.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 2.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Leishmania major strain Friedlin AspGD BHF-UCL Rattus norvegicus RGD \\\n", "12 2.0 3.0 4.0 7.0 7.0 \n", "13 2.0 3.0 2.0 1.0 1.0 \n", "1 0.0 0.0 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 1.0 1.0 \n", "8 1.0 0.0 0.0 0.0 0.0 \n", "6 0.0 0.0 1.0 5.0 5.0 \n", "2 1.0 2.0 1.0 0.0 0.0 \n", "9 0.0 0.0 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 2.0 2.0 \n", "3 0.0 0.0 1.0 1.0 1.0 \n", "7 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Pseudomonas aeruginosa PAO1 Thermus thermophilus HB27 PseudoCAP \\\n", "12 4.0 0.0 5.0 \n", "13 1.0 0.0 2.0 \n", "1 0.0 0.0 0.0 \n", "5 3.0 1.0 3.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Bombyx mori Drosophila melanogaster FlyBase Lactobacillus casei \\\n", "12 1.0 1.0 1.0 0.0 \n", "13 0.0 0.0 0.0 1.0 \n", "1 0.0 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 0.0 \n", "8 0.0 0.0 0.0 1.0 \n", "6 1.0 1.0 1.0 0.0 \n", "2 0.0 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 0.0 \n", "11 1.0 1.0 1.0 0.0 \n", "3 0.0 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 \n", "\n", " Candida albicans SC5314 CGD Dictyostelium discoideum dictyBase GOC \\\n", "12 1.0 1.0 1.0 1.0 1.0 \n", "13 0.0 0.0 0.0 0.0 1.0 \n", "1 0.0 0.0 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 0.0 0.0 \n", "8 0.0 0.0 0.0 0.0 0.0 \n", "6 0.0 0.0 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 0.0 0.0 \n", "9 1.0 1.0 0.0 0.0 0.0 \n", "10 0.0 0.0 1.0 1.0 0.0 \n", "0 0.0 0.0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Pseudomonas aeruginosa \n", "12 0.0 \n", "13 1.0 \n", "1 0.0 \n", "5 0.0 \n", "8 0.0 \n", "6 0.0 \n", "2 0.0 \n", "9 0.0 \n", "10 0.0 \n", "0 0.0 \n", "11 0.0 \n", "3 0.0 \n", "7 0.0 \n", "4 0.0 " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.max_columns = None\n", "df = summarize_set(descendants)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary by assigned by\n", "\n" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname:ALLEcoCycGeneDBMGITAIRPomBaseUniProtWBCAFAEcoliWikiMTBBASESGDAspGDBHF-UCLRGDPseudoCAPFlyBaseCGDdictyBaseGOC
12GO:0009070serine family amino acid biosynthetic process14420.03.06.019.011.016.07.05.07.010.017.03.04.07.05.01.01.01.01.0
13GO:0019344cysteine biosynthetic process826.02.01.017.010.010.07.05.02.07.06.03.02.01.02.00.00.00.01.0
1GO:0004124cysteine synthase activity292.00.00.010.04.04.04.02.01.01.01.00.00.00.00.00.00.00.00.0
5GO:0006564L-serine biosynthetic process213.00.00.01.00.02.00.00.05.03.03.00.00.01.03.00.00.00.00.0
8GO:0006535cysteine biosynthetic process from serine203.01.00.00.05.02.03.02.01.02.01.00.00.00.00.00.00.00.00.0
6GO:0006545glycine biosynthetic process143.00.00.00.00.02.00.00.00.00.02.00.01.05.00.01.00.00.00.0
2GO:0019343cysteine biosynthetic process via cystathionine90.01.00.00.00.01.00.00.00.01.03.02.01.00.00.00.00.00.00.0
9GO:0009090homoserine biosynthetic process84.00.00.00.00.00.00.00.00.00.03.00.00.00.00.00.01.00.00.0
10GO:0070179D-serine biosynthetic process80.00.04.01.00.02.00.00.00.00.00.00.00.00.00.00.00.01.00.0
0GO:0016260selenocysteine biosynthetic process64.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
11GO:0019264glycine biosynthetic process from serine62.00.00.00.00.01.00.00.00.00.00.00.00.02.00.01.00.00.00.0
3GO:0019265glycine biosynthetic process, by transaminatio...40.00.00.00.00.01.00.00.00.00.01.00.01.01.00.00.00.00.00.0
7GO:0071269L-homocysteine biosynthetic process10.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
4GO:0019345cysteine biosynthetic process via S-sulfo-L-cy...00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " id name: ALL \\\n", "12 GO:0009070 serine family amino acid biosynthetic process 144 \n", "13 GO:0019344 cysteine biosynthetic process 82 \n", "1 GO:0004124 cysteine synthase activity 29 \n", "5 GO:0006564 L-serine biosynthetic process 21 \n", "8 GO:0006535 cysteine biosynthetic process from serine 20 \n", "6 GO:0006545 glycine biosynthetic process 14 \n", "2 GO:0019343 cysteine biosynthetic process via cystathionine 9 \n", "9 GO:0009090 homoserine biosynthetic process 8 \n", "10 GO:0070179 D-serine biosynthetic process 8 \n", "0 GO:0016260 selenocysteine biosynthetic process 6 \n", "11 GO:0019264 glycine biosynthetic process from serine 6 \n", "3 GO:0019265 glycine biosynthetic process, by transaminatio... 4 \n", "7 GO:0071269 L-homocysteine biosynthetic process 1 \n", "4 GO:0019345 cysteine biosynthetic process via S-sulfo-L-cy... 0 \n", "\n", " EcoCyc GeneDB MGI TAIR PomBase UniProt WB CAFA EcoliWiki \\\n", "12 20.0 3.0 6.0 19.0 11.0 16.0 7.0 5.0 7.0 \n", "13 6.0 2.0 1.0 17.0 10.0 10.0 7.0 5.0 2.0 \n", "1 2.0 0.0 0.0 10.0 4.0 4.0 4.0 2.0 1.0 \n", "5 3.0 0.0 0.0 1.0 0.0 2.0 0.0 0.0 5.0 \n", "8 3.0 1.0 0.0 0.0 5.0 2.0 3.0 2.0 1.0 \n", "6 3.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 \n", "2 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "9 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "10 0.0 0.0 4.0 1.0 0.0 2.0 0.0 0.0 0.0 \n", "0 4.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "11 2.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " MTBBASE SGD AspGD BHF-UCL RGD PseudoCAP FlyBase CGD dictyBase \\\n", "12 10.0 17.0 3.0 4.0 7.0 5.0 1.0 1.0 1.0 \n", "13 7.0 6.0 3.0 2.0 1.0 2.0 0.0 0.0 0.0 \n", "1 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "5 3.0 3.0 0.0 0.0 1.0 3.0 0.0 0.0 0.0 \n", "8 2.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "6 0.0 2.0 0.0 1.0 5.0 0.0 1.0 0.0 0.0 \n", "2 1.0 3.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "9 0.0 3.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 0.0 2.0 0.0 1.0 0.0 0.0 \n", "3 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " GOC \n", "12 1.0 \n", "13 1.0 \n", "1 0.0 \n", "5 0.0 \n", "8 0.0 \n", "6 0.0 \n", "2 0.0 \n", "9 0.0 \n", "10 0.0 \n", "0 0.0 \n", "11 0.0 \n", "3 0.0 \n", "7 0.0 \n", "4 0.0 " ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summarize_set(descendants, facet_fields=['assigned_by'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarize by species\n", "\n", "use `taxon_subset_closure_label` facet" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idname:ALLBacteriaEscherichia coli K-12EukaryotaMammaliaMetazoaMus musculusTrypanosoma brucei brucei TREU927Vertebrata <vertebrates>ViridiplantaeArabidopsis thalianaFungiCaenorhabditis elegansSchizosaccharomyces pombeBacillus subtilis subsp. subtilis str. 168Mycobacterium tuberculosis H37RvSaccharomyces cerevisiae S288CSolanum tuberosumSpinacia oleraceaStreptomyces lavendulaeAspergillus nidulans FGSC A4Apis melliferaHomo sapiensLeishmania major strain FriedlinRattus norvegicusPseudomonas aeruginosa PAO1Thermus thermophilus HB27Bombyx moriDrosophila melanogasterLactobacillus caseiCandida albicans SC5314Dictyostelium discoideumPseudomonas aeruginosa
12GO:0009070serine family amino acid biosynthetic process14452.027.092.022.033.07.00.022.023.021.032.07.011.06.011.017.00.00.00.03.02.08.02.07.04.00.01.01.00.01.01.00.0
13GO:0019344cysteine biosynthetic process8225.08.057.06.015.02.00.06.021.019.019.07.010.05.08.06.01.01.00.03.02.03.02.01.01.00.00.00.01.00.00.01.0
1GO:0004124cysteine synthase activity297.03.022.00.04.00.00.00.013.011.05.04.04.02.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.0
5GO:0006564L-serine biosynthetic process2116.08.05.01.01.00.00.01.01.01.03.00.00.01.03.03.00.00.00.00.00.00.00.01.03.01.00.00.00.00.00.00.0
8GO:0006535cysteine biosynthetic process from serine209.04.011.00.04.00.00.00.00.00.06.03.05.02.02.01.00.00.00.00.01.00.01.00.00.00.00.00.01.00.00.00.0
6GO:0006545glycine biosynthetic process143.03.011.07.09.00.00.07.00.00.02.00.00.00.00.02.00.00.00.00.00.02.00.05.00.00.01.01.00.00.00.00.0
2GO:0019343cysteine biosynthetic process via cystathionine91.00.08.01.02.00.00.01.00.00.05.00.00.00.01.03.00.00.00.02.01.01.01.00.00.00.00.00.00.00.00.00.0
9GO:0009090homoserine biosynthetic process84.04.04.00.00.00.00.00.00.00.04.00.00.00.00.03.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.0
10GO:0070179D-serine biosynthetic process80.00.08.06.06.04.00.06.01.01.00.00.00.00.00.00.00.00.00.00.00.02.00.00.00.00.00.00.00.00.01.00.0
0GO:0016260selenocysteine biosynthetic process64.04.02.01.01.01.01.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
11GO:0019264glycine biosynthetic process from serine62.02.04.02.04.00.00.02.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.02.00.00.01.01.00.00.00.00.0
3GO:0019265glycine biosynthetic process, by transaminatio...40.00.04.03.03.00.00.03.00.00.01.00.00.00.00.01.00.00.00.00.00.02.00.01.00.00.00.00.00.00.00.00.0
7GO:0071269L-homocysteine biosynthetic process10.00.01.00.00.00.00.00.00.00.01.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
4GO:0019345cysteine biosynthetic process via S-sulfo-L-cy...00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " id name: ALL \\\n", "12 GO:0009070 serine family amino acid biosynthetic process 144 \n", "13 GO:0019344 cysteine biosynthetic process 82 \n", "1 GO:0004124 cysteine synthase activity 29 \n", "5 GO:0006564 L-serine biosynthetic process 21 \n", "8 GO:0006535 cysteine biosynthetic process from serine 20 \n", "6 GO:0006545 glycine biosynthetic process 14 \n", "2 GO:0019343 cysteine biosynthetic process via cystathionine 9 \n", "9 GO:0009090 homoserine biosynthetic process 8 \n", "10 GO:0070179 D-serine biosynthetic process 8 \n", "0 GO:0016260 selenocysteine biosynthetic process 6 \n", "11 GO:0019264 glycine biosynthetic process from serine 6 \n", "3 GO:0019265 glycine biosynthetic process, by transaminatio... 4 \n", "7 GO:0071269 L-homocysteine biosynthetic process 1 \n", "4 GO:0019345 cysteine biosynthetic process via S-sulfo-L-cy... 0 \n", "\n", " Bacteria Escherichia coli K-12 Eukaryota Mammalia Metazoa \\\n", "12 52.0 27.0 92.0 22.0 33.0 \n", "13 25.0 8.0 57.0 6.0 15.0 \n", "1 7.0 3.0 22.0 0.0 4.0 \n", "5 16.0 8.0 5.0 1.0 1.0 \n", "8 9.0 4.0 11.0 0.0 4.0 \n", "6 3.0 3.0 11.0 7.0 9.0 \n", "2 1.0 0.0 8.0 1.0 2.0 \n", "9 4.0 4.0 4.0 0.0 0.0 \n", "10 0.0 0.0 8.0 6.0 6.0 \n", "0 4.0 4.0 2.0 1.0 1.0 \n", "11 2.0 2.0 4.0 2.0 4.0 \n", "3 0.0 0.0 4.0 3.0 3.0 \n", "7 0.0 0.0 1.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 \n", "\n", " Mus musculus Trypanosoma brucei brucei TREU927 Vertebrata \\\n", "12 7.0 0.0 22.0 \n", "13 2.0 0.0 6.0 \n", "1 0.0 0.0 0.0 \n", "5 0.0 0.0 1.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 7.0 \n", "2 0.0 0.0 1.0 \n", "9 0.0 0.0 0.0 \n", "10 4.0 0.0 6.0 \n", "0 1.0 1.0 1.0 \n", "11 0.0 0.0 2.0 \n", "3 0.0 0.0 3.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Viridiplantae Arabidopsis thaliana Fungi Caenorhabditis elegans \\\n", "12 23.0 21.0 32.0 7.0 \n", "13 21.0 19.0 19.0 7.0 \n", "1 13.0 11.0 5.0 4.0 \n", "5 1.0 1.0 3.0 0.0 \n", "8 0.0 0.0 6.0 3.0 \n", "6 0.0 0.0 2.0 0.0 \n", "2 0.0 0.0 5.0 0.0 \n", "9 0.0 0.0 4.0 0.0 \n", "10 1.0 1.0 0.0 0.0 \n", "0 0.0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 0.0 \n", "3 0.0 0.0 1.0 0.0 \n", "7 0.0 0.0 1.0 0.0 \n", "4 0.0 0.0 0.0 0.0 \n", "\n", " Schizosaccharomyces pombe Bacillus subtilis subsp. subtilis str. 168 \\\n", "12 11.0 6.0 \n", "13 10.0 5.0 \n", "1 4.0 2.0 \n", "5 0.0 1.0 \n", "8 5.0 2.0 \n", "6 0.0 0.0 \n", "2 0.0 0.0 \n", "9 0.0 0.0 \n", "10 0.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 0.0 \n", "3 0.0 0.0 \n", "7 1.0 0.0 \n", "4 0.0 0.0 \n", "\n", " Mycobacterium tuberculosis H37Rv Saccharomyces cerevisiae S288C \\\n", "12 11.0 17.0 \n", "13 8.0 6.0 \n", "1 1.0 1.0 \n", "5 3.0 3.0 \n", "8 2.0 1.0 \n", "6 0.0 2.0 \n", "2 1.0 3.0 \n", "9 0.0 3.0 \n", "10 0.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 0.0 \n", "3 0.0 1.0 \n", "7 0.0 0.0 \n", "4 0.0 0.0 \n", "\n", " Solanum tuberosum Spinacia oleracea Streptomyces lavendulae \\\n", "12 0.0 0.0 0.0 \n", "13 1.0 1.0 0.0 \n", "1 1.0 1.0 1.0 \n", "5 0.0 0.0 0.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Aspergillus nidulans FGSC A4 Apis mellifera Homo sapiens \\\n", "12 3.0 2.0 8.0 \n", "13 3.0 2.0 3.0 \n", "1 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 \n", "8 0.0 1.0 0.0 \n", "6 0.0 0.0 2.0 \n", "2 2.0 1.0 1.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 2.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 0.0 \n", "3 0.0 0.0 2.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Leishmania major strain Friedlin Rattus norvegicus \\\n", "12 2.0 7.0 \n", "13 2.0 1.0 \n", "1 0.0 0.0 \n", "5 0.0 1.0 \n", "8 1.0 0.0 \n", "6 0.0 5.0 \n", "2 1.0 0.0 \n", "9 0.0 0.0 \n", "10 0.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 2.0 \n", "3 0.0 1.0 \n", "7 0.0 0.0 \n", "4 0.0 0.0 \n", "\n", " Pseudomonas aeruginosa PAO1 Thermus thermophilus HB27 Bombyx mori \\\n", "12 4.0 0.0 1.0 \n", "13 1.0 0.0 0.0 \n", "1 0.0 0.0 0.0 \n", "5 3.0 1.0 0.0 \n", "8 0.0 0.0 0.0 \n", "6 0.0 0.0 1.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 0.0 \n", "10 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 0.0 0.0 1.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Drosophila melanogaster Lactobacillus casei Candida albicans SC5314 \\\n", "12 1.0 0.0 1.0 \n", "13 0.0 1.0 0.0 \n", "1 0.0 0.0 0.0 \n", "5 0.0 0.0 0.0 \n", "8 0.0 1.0 0.0 \n", "6 1.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "9 0.0 0.0 1.0 \n", "10 0.0 0.0 0.0 \n", "0 0.0 0.0 0.0 \n", "11 1.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "7 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " Dictyostelium discoideum Pseudomonas aeruginosa \n", "12 1.0 0.0 \n", "13 0.0 1.0 \n", "1 0.0 0.0 \n", "5 0.0 0.0 \n", "8 0.0 0.0 \n", "6 0.0 0.0 \n", "2 0.0 0.0 \n", "9 0.0 0.0 \n", "10 1.0 0.0 \n", "0 0.0 0.0 \n", "11 0.0 0.0 \n", "3 0.0 0.0 \n", "7 0.0 0.0 \n", "4 0.0 0.0 " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summarize_set(descendants, facet_fields=['taxon_subset_closure_label'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }