{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Integrating transcriptome profile with KEGG pathway\n", "\n", "by Kozo Nishida (Riken, Japan)\n", "\n", "\n", " This example demonstrates how to integrate transcriptome data (preprocessed with bioconductor packages) with KEGG pathways and visualize it in Cytoscape.\n", " \n", " \n", " ## Software Requirments\n", " \n", " * Cytoscape 3.2.1\n", " * [KEGGScape 0.7.x](http://apps.cytoscape.org/apps/keggscape)\n", " * [cyREST 0.9.x or later](http://apps.cytoscape.org/apps/cyrest)\n", "\n", "\n", "### For data pre-processing\n", "\n", "* [R](http://www.r-project.org/)\n", "* [Bioconductor - ecoliLeucine](http://www.bioconductor.org/packages/release/data/experiment/html/ecoliLeucine.html)\n", "* [Bioconductor - affy](http://www.bioconductor.org/packages/release/bioc/html/affy.html)\n", "* [Bioconductor - genefilter](http://www.bioconductor.org/packages/release/bioc/html/genefilter.html)\n", " \n", "## Input and Output\n", "\n", "* Input - bioconductor ecoliLeucine package\n", "* Output - Cytoscape session file containing KEGG pathway with differentially expressed genes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing a KEGG Pathway\n", "\n", "#### Glycine, serine and threonine metabolism - Escherichia coli K-12 MG1655)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import requests\n", "import json\n", "\n", "# Basic Setup\n", "PORT_NUMBER = 1234\n", "BASE_URL = \"http://localhost:\" + str(PORT_NUMBER) + \"/v1/\"\n", "\n", "# Header for posting data to the server as JSON\n", "HEADERS = {'Content-Type': 'application/json'}\n", "\n", "# Delete all networks in current session\n", "requests.delete(BASE_URL + 'session')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pathway SUID = 70708\n" ] } ], "source": [ "pathway_location = \"http://rest.kegg.jp/get/eco00260/kgml\"\n", "res1 = requests.post(BASE_URL + \"networks?source=url\", data=json.dumps([pathway_location]), headers=HEADERS)\n", "result = json.loads(res1.content)\n", "pathway_suid = result[0][\"networkSUID\"][0]\n", "print(\"Pathway SUID = \" + str(pathway_suid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-processing transcriptome data and testing differentially expressed genes with Bioconductor\n", "\n", "### You need to run the following code in R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "source(\"http://bioconductor.org/biocLite.R\")\n", "biocLite(c(\"genefilter\", \"ecoliLeucine\"))\n", "library(\"ecoliLeucine\")\n", "library(\"genefilter\")\n", "data(\"ecoliLeucine\")\n", "eset = rma(ecoliLeucine)\n", "r = rowttests(eset, eset$strain)\n", "filtered = r[r$p.value < 0.05,]\n", "write.csv(filtered, file=\"ttest.csv\")\n", "``` " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading ttest.csv as Pandas DataFrame" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0statisticdmp.value
0IG_1070_1689385_1697378_fwd_f_st2.4597920.0823830.049133
1IG_10_10495_10642_rev_st-3.009316-0.0463990.023721
2IG_1110_1744617_1744723_fwd_st-2.515037-0.1696260.045592
3IG_1145_1805715_1805819_fwd_st3.5562630.3687730.011981
4IG_1189_1874879_1874911_fwd_st-2.875842-0.2767480.028211
\n", "
" ], "text/plain": [ " Unnamed: 0 statistic dm p.value\n", "0 IG_1070_1689385_1697378_fwd_f_st 2.459792 0.082383 0.049133\n", "1 IG_10_10495_10642_rev_st -3.009316 -0.046399 0.023721\n", "2 IG_1110_1744617_1744723_fwd_st -2.515037 -0.169626 0.045592\n", "3 IG_1145_1805715_1805819_fwd_st 3.556263 0.368773 0.011981\n", "4 IG_1189_1874879_1874911_fwd_st -2.875842 -0.276748 0.028211" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "ttest_df = pd.read_csv('ttest.csv')\n", "ttest_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting node table from Cytoscape and merge with ttest.csv" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUIDshared namenameselectedKEGG_NODE_XKEGG_NODE_YKEGG_NODE_WIDTHKEGG_NODE_HEIGHTKEGG_NODE_LABELKEGG_NODE_LABEL_LIST_FIRSTKEGG_NODE_LABEL_LISTKEGG_IDKEGG_NODE_LABEL_COLORKEGG_NODE_FILL_COLORKEGG_NODE_REACTIONIDKEGG_NODE_TYPEKEGG_NODE_SHAPEKEGG_LINK
070718path:eco00260:46path:eco00260:46False1625474617K17755K17755K17755ko:K17755#000000#FFFFFFrn:R08211orthologrectanglehttp://www.kegg.jp/dbget-bin/www_bget?K17755
170719path:eco00260:47path:eco00260:47False6882224617K12235K12235K12235ko:K12235#000000#FFFFFFrn:R00589orthologrectanglehttp://www.kegg.jp/dbget-bin/www_bget?K12235
270720path:eco00260:48path:eco00260:48False107993088C164325-HydroxyectoineC16432cpd:C16432#000000#FFFFFFNaNcompoundcirclehttp://www.kegg.jp/dbget-bin/www_bget?C16432
370721path:eco00260:49path:eco00260:49False10239304617K10674K10674K10674ko:K10674#000000#FFFFFFrn:R08050orthologrectanglehttp://www.kegg.jp/dbget-bin/www_bget?K10674
470722path:eco00260:50path:eco00260:50False994644617K00499K00499K00499ko:K00499#000000#FFFFFFrn:R07409orthologrectanglehttp://www.kegg.jp/dbget-bin/www_bget?K00499
\n", "
" ], "text/plain": [ " SUID shared name name selected KEGG_NODE_X \\\n", "0 70718 path:eco00260:46 path:eco00260:46 False 162 \n", "1 70719 path:eco00260:47 path:eco00260:47 False 688 \n", "2 70720 path:eco00260:48 path:eco00260:48 False 1079 \n", "3 70721 path:eco00260:49 path:eco00260:49 False 1023 \n", "4 70722 path:eco00260:50 path:eco00260:50 False 99 \n", "\n", " KEGG_NODE_Y KEGG_NODE_WIDTH KEGG_NODE_HEIGHT KEGG_NODE_LABEL \\\n", "0 547 46 17 K17755 \n", "1 222 46 17 K12235 \n", "2 930 8 8 C16432 \n", "3 930 46 17 K10674 \n", "4 464 46 17 K00499 \n", "\n", " KEGG_NODE_LABEL_LIST_FIRST KEGG_NODE_LABEL_LIST KEGG_ID \\\n", "0 K17755 K17755 ko:K17755 \n", "1 K12235 K12235 ko:K12235 \n", "2 5-Hydroxyectoine C16432 cpd:C16432 \n", "3 K10674 K10674 ko:K10674 \n", "4 K00499 K00499 ko:K00499 \n", "\n", " KEGG_NODE_LABEL_COLOR KEGG_NODE_FILL_COLOR KEGG_NODE_REACTIONID \\\n", "0 #000000 #FFFFFF rn:R08211 \n", "1 #000000 #FFFFFF rn:R00589 \n", "2 #000000 #FFFFFF NaN \n", "3 #000000 #FFFFFF rn:R08050 \n", "4 #000000 #FFFFFF rn:R07409 \n", "\n", " KEGG_NODE_TYPE KEGG_NODE_SHAPE KEGG_LINK \n", "0 ortholog rectangle http://www.kegg.jp/dbget-bin/www_bget?K17755 \n", "1 ortholog rectangle http://www.kegg.jp/dbget-bin/www_bget?K12235 \n", "2 compound circle http://www.kegg.jp/dbget-bin/www_bget?C16432 \n", "3 ortholog rectangle http://www.kegg.jp/dbget-bin/www_bget?K10674 \n", "4 ortholog rectangle http://www.kegg.jp/dbget-bin/www_bget?K00499 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "deftable = requests.get('http://localhost:1234/v1/networks/' + str(pathway_suid) + '/tables/defaultnode.tsv')\n", "handle = open('defaultnode.tsv','w')\n", "handle.write(deftable.content)\n", "handle.close()\n", "\n", "deftable_df = pd.read_table('defaultnode.tsv')\n", "deftable_df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0statisticdmp.valueKEGG_ID_INPATHWAYKEGG_NODE_LABEL_INPATHWAY
0IG_1070_1689385_1697378_fwd_f_st2.4597920.0823830.049133NoneNone
1IG_10_10495_10642_rev_st-3.009316-0.0463990.023721NoneNone
2IG_1110_1744617_1744723_fwd_st-2.515037-0.1696260.045592NoneNone
3IG_1145_1805715_1805819_fwd_st3.5562630.3687730.011981NoneNone
4IG_1189_1874879_1874911_fwd_st-2.875842-0.2767480.028211NoneNone
\n", "
" ], "text/plain": [ " Unnamed: 0 statistic dm p.value \\\n", "0 IG_1070_1689385_1697378_fwd_f_st 2.459792 0.082383 0.049133 \n", "1 IG_10_10495_10642_rev_st -3.009316 -0.046399 0.023721 \n", "2 IG_1110_1744617_1744723_fwd_st -2.515037 -0.169626 0.045592 \n", "3 IG_1145_1805715_1805819_fwd_st 3.556263 0.368773 0.011981 \n", "4 IG_1189_1874879_1874911_fwd_st -2.875842 -0.276748 0.028211 \n", "\n", " KEGG_ID_INPATHWAY KEGG_NODE_LABEL_INPATHWAY \n", "0 None None \n", "1 None None \n", "2 None None \n", "3 None None \n", "4 None None " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "bnum_re = re.compile('b[0-9]{4}')\n", "\n", "keggids = []\n", "keggnode_labels = []\n", "for index, probe in ttest_df['Unnamed: 0'].iteritems():\n", " m = bnum_re.search(probe)\n", " if m:\n", " keggids.append(None)\n", " keggnode_labels.append(None)\n", " for i, keggid in deftable_df['KEGG_ID'].iteritems():\n", " if m.group(0) in keggid:\n", " keggids.pop()\n", " keggids.append(keggid)\n", " keggnode_labels.pop()\n", " keggnode_labels.append(deftable_df['KEGG_NODE_LABEL'][i])\n", " else:\n", " keggids.append(None)\n", " keggnode_labels.append(None)\n", "\n", "s1 = pd.Series(keggids, name='KEGG_ID_INPATHWAY')\n", "s2 = pd.Series(keggnode_labels, name='KEGG_NODE_LABEL_INPATHWAY')\n", "\n", "merged_df = pd.concat([ttest_df, s1, s2], axis=1)\n", "merged_df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://localhost:1234/v1/networks/70708/tables/defaultnode\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ttestjson = json.loads(merged_df.to_json(orient=\"records\"))\n", "\n", "new_table_data = {\n", " \"key\": \"KEGG_NODE_LABEL\",\n", " \"dataKey\": \"KEGG_NODE_LABEL_INPATHWAY\",\n", " \"data\" : ttestjson\n", "}\n", "\n", "update_table_url = BASE_URL + \"networks/\" + str(pathway_suid) + \"/tables/defaultnode\"\n", "\n", "print(update_table_url)\n", "\n", "requests.put(update_table_url, data=json.dumps(new_table_data), headers=HEADERS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see the t-test results in Cytoscape default node table!\n", "\n", "### Discussion\n", "This workflow integrates data, but visualization part is not fully automated. This is a TODO item..." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }