{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Summarizing Multiple Graphs Together\n", "\n", "**Author:** [Charles Tapley Hoyt](https://github.com/cthoyt/)\n", "\n", "**Estimated Run Time:** 45 seconds\n", "\n", "\n", "This notebook shows how to combine multiple graphs from different sources and summarize them together. This might be useful during projects where multiple curators are creating BEL scripts that should be joined for scientific use, but for provenance, should be kept separate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "import sys\n", "\n", "import pybel\n", "import pybel_tools\n", "from pybel_tools.summary import info_str" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.6.3 (default, Oct 9 2017, 09:47:56) \n", "[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]\n" ] } ], "source": [ "print(sys.version)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Thu Mar 15 14:37:02 2018\n" ] } ], "source": [ "print(time.asctime())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dependencies" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.11.2-dev'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pybel.utils.get_version()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.5.2-dev'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pybel_tools.utils.get_version()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "bms_base = os.environ['BMS_BASE']" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "human_dir = os.path.join(bms_base, 'cbn', 'Human-2.0')\n", "mouse_dir = os.path.join(bms_base, 'cbn', 'Mouse-2.0')\n", "rat_dir = os.path.join(bms_base, 'cbn', 'Rat-2.0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data\n", "\n", "In this notebook, pickled instances of networks from the Causal Biological Networks database are used." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 291 ms, sys: 78.2 ms, total: 369 ms\n", "Wall time: 451 ms\n" ] } ], "source": [ "%%time\n", "graphs = []\n", "\n", "for d in (human_dir, mouse_dir, rat_dir):\n", " for p in os.listdir(d):\n", " if not p.endswith('gpickle'):\n", " continue\n", "\n", " path = os.path.join(d, p)\n", " g = pybel.from_pickle(path)\n", " graphs.append(g)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "138" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(graphs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The graphs are combine with the [`union`](http://pybel.readthedocs.io/en/latest/datamodel.html#pybel.struct.union) function, which retains all node and edges from each graph" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 42.4 s, sys: 165 ms, total: 42.5 s\n", "Wall time: 42.7 s\n" ] } ], "source": [ "%%time\n", "combine = pybel.struct.union(graphs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [`info_str`](http://pybel-tools.readthedocs.io/en/stable/summary.html#pybel_tools.summary.info_str) function creates a short text summary of the network. The information is generated with [`info_json`](http://pybel-tools.readthedocs.io/en/stable/summary.html#pybel_tools.summary.info_json) which is more useful programatically." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nodes: 5343\n", "Edges: 28766\n", "Citations: 4580\n", "Authors: 0\n", "Network density: 0.001007837278459561\n", "Components: 466\n", "Average degree: 5.383866741530975\n" ] } ], "source": [ "print(info_str(combine))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion\n", "\n", "Because networks are represented with Python objects, they can easily be operated upon and passed to functions that already create the appropriate summaries." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }