{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualise KEGG Pathways with Custom Colours\n", "The goal is to colour nodes from BETS output with different colours in the KEGG pathway. Code adapted from [here](https://nbviewer.jupyter.org/github/widdowquinn/notebooks/blob/master/Biopython_KGML_intro.ipynb). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from Bio import SeqIO\n", "from Bio.KEGG.REST import *\n", "from Bio.KEGG.KGML import KGML_parser\n", "from Bio.Graphics.KGML_vis import KGMLCanvas\n", "from Bio.Graphics.ColorSpiral import ColorSpiral\n", "\n", "from IPython.display import Image, HTML, IFrame\n", "\n", "import random\n", "import pandas as pd\n", "import networkx as nx\n", "import seaborn as sns\n", "import re\n", "import os\n", "sns.set()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us check the MAPK Signalling Pathway first. With `Bio.KEGG`, we can display it as an image." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in BETS\n", "Let us read in BETS graph and find nodes common with the MAPK Signalling pathway." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "gene_details = pd.read_csv('../../../data/gene_details.csv', index_col=0)\n", "results_file = '/mnt_volume/BETS/runs/arid_th1-0mean-reps-er-norm-enet-2-g/run_l-fdr/networks/arid_th1-0mean-reps-er-norm-enet-2-g-1-fdr-0.05-effect-matrix.txt'\n", "\n", "bets_graph = pd.read_csv(results_file, sep='\\t', index_col=0)\n", "node_names = list(bets_graph.columns)\n", "\n", "bets_graph = nx.from_numpy_matrix(bets_graph.to_numpy(), create_using=nx.DiGraph())\n", "\n", "mapping = {}\n", "for i in range(len(node_names)):\n", " try:\n", " entrez_name = 'mmu:' + str(int(gene_details.loc[node_names[i]].entrez))\n", " except ValueError: #nan\n", " entrez_name = node_names[i]\n", " mapping[i] = entrez_name\n", "\n", "bets_graph = nx.relabel_nodes(bets_graph, mapping)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get KEGG Pathway information" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pathway: MAPK signaling pathway\n", "KEGG ID: path:mmu04010\n", "Image file: http://www.kegg.jp/kegg/pathway/mmu/mmu04010.png\n", "Organism: mmu\n", "Entries: 132\n", "Entry types:\n", "\tgene: 119\n", "\tgroup: 1\n", "\tcompound: 5\n", "\tmap: 7\n", "\n" ] } ], "source": [ "pathway = KGML_parser.read(kegg_get(\"mmu04010\", \"kgml\"))\n", "print(pathway)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'mmu:13537 mmu:18218 mmu:19252 mmu:235584 mmu:240672 mmu:319520 mmu:63953 mmu:67603 mmu:70686 mmu:75590'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pathway.genes[1].name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Potential Problem\n", "\n", "Notice that one gene can have more than one Entrez IDs associated with it. This is a problem because in the BETS graph, each gene is uniquely associated with an Entrez ID. In this scenario, more than one vertex in the BETS graph can be associated with one gene in the KEGG Pathway. \n", "\n", "This can happen - In the BETS graph, we can have \n", "$$ G_1 \\rightarrow G_2 \\rightarrow G_3 $$\n", "but in the KEGG Pathway, $G_1$, $G_2$ and $G_3$ can map to the same gene.\n", "\n", "For a given gene from BETS, it's descendants may map to the same gene in the KEGG Pathway." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Colour Palette\n", "Choose Pastel Colour Palette from Seaborn" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjwAAABECAYAAACF4e8fAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAADCElEQVR4nO3av2tdZRzH8U9686Md0oQ2tGZyaRdxDYikS/8DB0eRDp06FVMURMShSwwIgjh0KMXR/8GlGQoFF8e6dGmkpOEmDja9idfl4nCtd1CePvDl9VoOnIcLn7Mc3lzO3Hg8DgBAZWd6DwAAaE3wAADlCR4AoDzBAwCUNz/jbCnJRpK9JKdvZg4AwH8ySLKe5HGS4+nDWcGzkeRho1EAAC1cS7I7fXNW8OwlySffP8n+0ajVqK4efPpODr+70XtGMyu37ueLXz7rPaOJjx+c5MrOTn7d2uo9pYkrOzv54cufes9o5qOvrufenQ97z2jm5tc/5uk3j3rPaObt2+9le3u794wmPnj/MFc37+bJ7ue9pzRxdfNuvv35We8ZTSwvDnLj3cvJpF+mzQqe0yTZPxrl+bBm8CTJn4fPe09o6uDVi94TmhjtjybX/c5L2vn94I/eE5o6evFb7wlNnQz/8Y96KcPhsPeEJkYvDybXmu/OJDk8Lv+Vymsf0EfLAEB5ggcAKE/wAADlCR4AoDzBAwCUJ3gAgPIEDwBQnuABAMoTPABAeYIHAChP8AAA5QkeAKA8wQMAlCd4AIDyBA8AUJ7gAQDKEzwAQHmCBwAoT/AAAOUJHgCgPMEDAJQneACA8gQPAFCe4AEAyhM8AEB5ggcAKE/wAADlCR4AoDzBAwCUJ3gAgPIEDwBQnuABAMoTPABAeYIHAChP8AAA5QkeAKA8wQMAlCd4AIDyBA8AUJ7gAQDKEzwAQHmCBwAoT/AAAOUJHgCgPMEDAJQneACA8gQPAFCe4AEAyhM8AEB5ggcAKE/wAADlzc84GyTJ2vmFNzSljzMrl3pPaOrC4sXeE5pYWDuZXNc6L2ln+cK53hOaOn/xrd4TmppfXeo9oanV1dXeE5pYODs3udZ8dybJytKg94Qmlhf/fq7XPuDceDz+t99uJnnYYBMAQCvXkuxO35wVPEtJNpLsJTlttwsA4H8bJFlP8jjJ8fThrOABACjBR8sAQHmCBwAoT/AAAOUJHgCgvL8A85loG2pTlQoAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "current_palette = sns.color_palette('muted').as_hex()\n", "sns.palplot(current_palette)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate new graph for each of these nodes with different color for anscenstor and descendants." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "pathway_stats_less = pd.read_csv('/mnt_volume/static/pathway_statistics_less.csv', index_col=0)\n", "lesser_pathways = list(pathway_stats_less[pathway_stats_less['p.val'] < 0.05]['id'])\n", "\n", "pathway_stats_greater = pd.read_csv('/mnt_volume/static/pathway_statistics_greater.csv', index_col=0)\n", "greater_pathways = list(pathway_stats_greater[pathway_stats_greater['p.val'] < 0.05]['id'])\n", "\n", "all_pathways = lesser_pathways + greater_pathways" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(14, 73, 87)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(greater_pathways), len(lesser_pathways), len(all_pathways)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def get_bets_graph():\n", " gene_details = pd.read_csv('/mnt_volume/tt-atac-causal/data/gene_details.csv', index_col=0)\n", " results_file = '/mnt_volume/BETS/runs/arid_th1-0mean-reps-er-norm-enet-2-g/run_l-fdr/networks/arid_th1-0mean-reps' \\\n", " '-er-norm-enet-2-g-1-fdr-0.05-effect-matrix.txt '\n", "\n", " bets_graph = pd.read_csv(results_file, sep='\\t', index_col=0)\n", " node_names = list(bets_graph.columns)\n", "\n", " bets_graph = nx.from_numpy_matrix(bets_graph.to_numpy(), create_using=nx.DiGraph())\n", "\n", " mapping = {}\n", " for i in range(len(node_names)):\n", " try:\n", " entrez_name = 'mmu:' + str(int(gene_details.loc[node_names[i]].entrez))\n", " except ValueError: # nan\n", " entrez_name = node_names[i]\n", " mapping[i] = entrez_name\n", "\n", " bets_graph = nx.relabel_nodes(bets_graph, mapping)\n", " return bets_graph\n", "\n", "\n", "def draw_bets_kegg_intersection(pathway, pathway_name):\n", " common_nodes_in_bets = []\n", " for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name in bets_graph.nodes():\n", " colour = current_palette[0]\n", " if bets_graph.in_degree(name) == 0 and bets_graph.out_degree(name) != 0:\n", " colour = current_palette[2]\n", " print(name)\n", " common_nodes_in_bets.append(name)\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = colour\n", " else:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = '#FFFFFF'\n", " \n", " # write to file\n", " write_directory = os.path.join('/mnt_volume/static/', pathway_name)\n", " if not os.path.exists(write_directory):\n", " os.makedirs(write_directory)\n", "\n", " filename = os.path.join(write_directory, pathway_name + \".pdf\")\n", " canvas = KGMLCanvas(pathway, import_imagemap=True)\n", " canvas.draw(filename)\n", " return common_nodes_in_bets\n", "\n", "\n", "def get_descendants(common_nodes_in_bets):\n", " genes_descendants = {}\n", " for node in common_nodes_in_bets:\n", " descendants = set(nx.descendants(bets_graph, node)) - {node} # remove self loops\n", " common = descendants.intersection(set(common_nodes_in_bets))\n", " if len(common) != 0:\n", " print('\\n************\\n' + node)\n", " print(common)\n", " genes_descendants[node] = common\n", " return genes_descendants\n", "\n", "\n", "def write_to_file(pathway, gene_name, genes_descendants, pathway_name):\n", " # make everything white\n", " for gene in pathway.genes:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = '#FFFFFF'\n", "\n", " for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " colour = current_palette[0]\n", " if bets_graph.in_degree(name) == 0 and bets_graph.out_degree(name) != 0:\n", " colour = current_palette[2]\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = colour\n", " elif name in genes_descendants[gene_name]:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[-2]\n", " else:\n", " pass\n", "\n", " # write to file\n", " write_directory = os.path.join('/mnt_volume/static/', pathway_name)\n", " if not os.path.exists(write_directory):\n", " os.makedirs(write_directory)\n", "\n", " filename = os.path.join(write_directory, pathway_name + '_' + gene_name + \".pdf\")\n", "\n", " canvas = KGMLCanvas(pathway, import_imagemap=True)\n", " canvas.draw(filename)\n", "\n", "\n", "def get_pathway_per_gene_anscestor(kegg_pathway_code, pathway_name):\n", " pathway = KGML_parser.read(kegg_get(kegg_pathway_code, \"kgml\"))\n", "\n", " common_nodes_in_bets = draw_bets_kegg_intersection(pathway, pathway_name)\n", "\n", " genes_descendants = get_descendants(common_nodes_in_bets)\n", "\n", " for gene_name in genes_descendants:\n", " for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " temp = set(gene.name.split(' '))\n", " break\n", " print('Intersection', genes_descendants[gene_name].intersection(temp))\n", " genes_descendants[gene_name] = genes_descendants[gene_name] - temp\n", "\n", " write_to_file(pathway, gene_name, genes_descendants, pathway_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "###########\n", "Running for systemic-lupus-erythematosus\n", "mmu:26914\n", "mmu:319163\n", "mmu:319177\n", "mmu:319187\n", "mmu:319190\n", "mmu:78303\n", "mmu:319150\n", "mmu:319152\n", "mmu:97114\n", "mmu:319159\n", "mmu:326620\n", "mmu:97122\n", "\n", "************\n", "mmu:97114\n", "{'mmu:382523'}\n", "Intersection {'mmu:382523'}\n", "\n", "###########\n", "Running for ribosome\n", "\n", "###########\n", "Running for oxidative-phosphorylation\n", "mmu:407790\n", "mmu:407790\n", "mmu:407790\n", "\n", "###########\n", "Running for glycolysis--gluconeogenesis\n", "mmu:16828\n", "mmu:74551\n", "\n", "************\n", "mmu:56421\n", "{'mmu:68738'}\n", "Intersection set()\n", "\n", "###########\n", "Running for parkinsons-disease\n", "mmu:407790\n", "\n", "###########\n", "Running for huntingtons-disease\n", "mmu:407790\n", "mmu:21817\n", "mmu:208647\n", "mmu:52639\n", "mmu:12287\n", "\n", "************\n", "mmu:21817\n", "{'mmu:78330'}\n", "\n", "************\n", "mmu:52639\n", "{'mmu:67530'}\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for alzheimers-disease\n", "mmu:11820\n", "mmu:407790\n", "mmu:26417\n", "mmu:16956\n", "mmu:52639\n", "mmu:11911\n", "mmu:26417\n", "\n", "************\n", "mmu:11820\n", "{'mmu:67530', 'mmu:22147'}\n", "\n", "************\n", "mmu:14812\n", "{'mmu:16439'}\n", "\n", "************\n", "mmu:52639\n", "{'mmu:67530'}\n", "\n", "************\n", "mmu:22143\n", "{'mmu:12006'}\n", "\n", "************\n", "mmu:23797\n", "{'mmu:67530', 'mmu:22147'}\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for cysteine-and-methionine-metabolism\n", "mmu:229905\n", "mmu:70266\n", "mmu:229905\n", "mmu:70266\n", "mmu:16828\n", "mmu:12035\n", "mmu:229905\n", "mmu:70266\n", "\n", "###########\n", "Running for proteasome\n", "\n", "###########\n", "Running for pyruvate-metabolism\n", "mmu:74551\n", "mmu:109264\n", "mmu:16828\n", "\n", "###########\n", "Running for lysosome\n", "mmu:15211\n", "mmu:15932\n", "mmu:14667\n", "mmu:20661\n", "\n", "************\n", "mmu:140494\n", "{'mmu:11775'}\n", "\n", "************\n", "mmu:15211\n", "{'mmu:13032', 'mmu:11775', 'mmu:16792'}\n", "\n", "************\n", "mmu:16792\n", "{'mmu:11775'}\n", "\n", "************\n", "mmu:20661\n", "{'mmu:11775'}\n", "\n", "************\n", "mmu:52120\n", "{'mmu:11775'}\n", "\n", "************\n", "mmu:140494\n", "{'mmu:11775'}\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for pyrimidine-metabolism\n", "\n", "###########\n", "Running for valine-leucine-and-isoleucine-degradation\n", "mmu:12035\n", "mmu:12035\n", "mmu:66904\n", "mmu:268860\n", "mmu:12035\n", "\n", "###########\n", "Running for purine-metabolism\n", "mmu:110639\n", "mmu:78801\n", "mmu:18160\n", "mmu:78801\n", "\n", "###########\n", "Running for propanoate-metabolism\n", "mmu:268860\n", "mmu:66904\n", "mmu:16828\n", "\n", "###########\n", "Running for cell-adhesion-molecules-cams\n", "mmu:13003\n", "mmu:20971\n", "mmu:20971\n", "mmu:58235\n", "mmu:58235\n", "mmu:58235\n", "mmu:58235\n", "mmu:58235\n", "mmu:58235\n", "mmu:20971\n", "\n", "************\n", "mmu:20971\n", "{'mmu:20737'}\n", "\n", "************\n", "mmu:20971\n", "{'mmu:20737'}\n", "\n", "************\n", "mmu:20971\n", "{'mmu:20737'}\n", "Intersection set()\n", "\n", "###########\n", "Running for dna-replication\n", "mmu:18972\n", "\n", "###########\n", "Running for glycine-serine-and-threonine-metabolism\n", "\n", "************\n", "mmu:434437\n", "{'mmu:19193'}\n", "Intersection set()\n", "\n", "###########\n", "Running for bacterial-invasion-of-epithelial-cells\n", "mmu:216148\n", "\n", "###########\n", "Running for pentose-phosphate-pathway\n", "mmu:110639\n", "mmu:74419\n", "mmu:74419\n", "\n", "###########\n", "Running for fructose-and-mannose-metabolism\n", "\n", "###########\n", "Running for fatty-acid-metabolism\n", "\n", "###########\n", "Running for spliceosome\n", "mmu:668137\n", "mmu:100043292\n", "mmu:115490088\n", "mmu:97418\n", "\n", "###########\n", "Running for mismatch-repair\n", "mmu:18972\n", "\n", "###########\n", "Running for amyotrophic-lateral-sclerosis-als\n", "\n", "###########\n", "Running for citrate-cycle-tca-cycle\n", "mmu:74551\n", "\n", "###########\n", "Running for drug-metabolism--other-enzymes\n", "\n", "###########\n", "Running for cardiac-muscle-contraction\n", "mmu:319734\n", "mmu:11931\n", "mmu:232975\n", "\n", "************\n", "mmu:11931\n", "{'mmu:67530'}\n", "\n", "************\n", "mmu:232975\n", "{'mmu:12292', 'mmu:67530'}\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for peroxisome\n", "mmu:110959\n", "\n", "************\n", "mmu:51798\n", "{'mmu:19193'}\n", "Intersection set()\n", "\n", "###########\n", "Running for histidine-metabolism\n", "mmu:15109\n", "\n", "###########\n", "Running for galactose-metabolism\n", "\n", "###########\n", "Running for rna-transport\n", "mmu:668137\n", "mmu:100043292\n", "mmu:115490088\n", "mmu:97418\n", "\n", "###########\n", "Running for amino-sugar-and-nucleotide-sugar-metabolism\n", "mmu:15211\n", "\n", "###########\n", "Running for glyoxylate-and-dicarboxylate-metabolism\n", "mmu:66904\n", "\n", "************\n", "mmu:434437\n", "{'mmu:68738'}\n", "Intersection set()\n", "\n", "###########\n", "Running for cell-cycle\n", "mmu:12575\n", "mmu:12576\n", "mmu:12447\n", "\n", "************\n", "mmu:12575\n", "{'mmu:17869'}\n", "Intersection set()\n", "\n", "###########\n", "Running for arginine-and-proline-metabolism\n", "mmu:12716\n", "\n", "************\n", "mmu:12716\n", "{'mmu:11847', 'mmu:66988'}\n", "Intersection set()\n", "\n", "###########\n", "Running for tyrosine-metabolism\n", "\n", "###########\n", "Running for axon-guidance\n", "mmu:26417\n", "mmu:20564\n", "mmu:18479\n", "mmu:18208\n", "mmu:26417\n", "mmu:20349\n", "mmu:18479\n", "mmu:108058\n", "mmu:228026\n", "mmu:18479\n", "mmu:18479\n", "mmu:18479\n", "\n", "###########\n", "Running for adherens-junction\n", "mmu:26417\n", "mmu:58235\n", "mmu:58235\n", "\n", "###########\n", "Running for thyroid-cancer\n", "mmu:26417\n", "mmu:20181\n", "mmu:26417\n", "mmu:12575\n", "\n", "************\n", "mmu:12575\n", "{'mmu:17869'}\n", "Intersection set()\n", "\n", "###########\n", "Running for prion-diseases\n", "mmu:26417\n", "\n", "###########\n", "Running for notch-signaling-pathway\n", "mmu:13017\n", "mmu:15205\n", "mmu:19719\n", "\n", "###########\n", "Running for aminoacyltrna-biosynthesis\n", "\n", "###########\n", "Running for ppar-signaling-pathway\n", "mmu:74551\n", "mmu:20181\n", "mmu:20181\n", "mmu:20181\n", "mmu:16956\n", "mmu:109264\n", "\n", "###########\n", "Running for nitrogen-metabolism\n", "mmu:12351\n", "mmu:23831\n", "\n", "************\n", "mmu:23831\n", "{'mmu:76459'}\n", "Intersection {'mmu:76459'}\n", "\n", "###########\n", "Running for rna-polymerase\n", "\n", "###########\n", "Running for biosynthesis-of-unsaturated-fatty-acids\n", "\n", "###########\n", "Running for valine-leucine-and-isoleucine-biosynthesis\n", "mmu:12035\n", "mmu:12035\n", "mmu:12035\n", "\n", "###########\n", "Running for homologous-recombination\n", "mmu:18972\n", "\n", "###########\n", "Running for butanoate-metabolism\n", "mmu:69772\n", "mmu:268860\n", "\n", "###########\n", "Running for sulfur-relay-system\n", "\n", "###########\n", "Running for glycosaminoglycan-degradation\n", "mmu:15932\n", "mmu:15932\n", "mmu:15211\n", "mmu:15211\n", "mmu:15932\n", "mmu:15932\n", "mmu:15211\n", "mmu:15211\n", "\n", "###########\n", "Running for collecting-duct-acid-secretion\n", "\n", "###########\n", "Running for pantothenate-and-coa-biosynthesis\n", "mmu:12035\n", "\n", "###########\n", "Running for arrhythmogenic-right-ventricular-cardiomyopathy-arvc\n", "mmu:319734\n", "mmu:16399\n", "mmu:11472\n", "\n", "###########\n", "Running for leukocyte-transendothelial-migration\n", "\n", "###########\n", "Running for base-excision-repair\n", "mmu:18972\n", "mmu:382913\n", "mmu:18972\n", "\n", "###########\n", "Running for glycosphingolipid-biosynthesis--ganglio-series\n", "mmu:15211\n", "\n", "###########\n", "Running for pentose-and-glucuronate-interconversions\n", "\n", "###########\n", "Running for selenocompound-metabolism\n", "mmu:229905\n", "mmu:70266\n", "mmu:229905\n", "mmu:70266\n", "mmu:229905\n", "mmu:70266\n", "\n", "###########\n", "Running for proximal-tubule-bicarbonate-reclamation\n", "mmu:12351\n", "mmu:11931\n", "mmu:232975\n", "mmu:74551\n", "\n", "************\n", "mmu:11931\n", "{'mmu:54403'}\n", "Intersection set()\n", "\n", "###########\n", "Running for other-glycan-degradation\n", "mmu:15211\n", "mmu:15211\n", "mmu:15211\n", "mmu:15211\n", "\n", "###########\n", "Running for natural-killer-cell-mediated-cytotoxicity\n", "mmu:26417\n", "mmu:216148\n", "mmu:18479\n", "\n", "###########\n", "Running for phagosome\n", "\n", "************\n", "mmu:14963\n", "{'mmu:22143'}\n", "\n", "************\n", "mmu:14963\n", "{'mmu:22143'}\n", "\n", "************\n", "mmu:14963\n", "{'mmu:22143'}\n", "Intersection set()\n", "\n", "###########\n", "Running for nucleotide-excision-repair\n", "mmu:18972\n", "\n", "###########\n", "Running for one-carbon-pool-by-folate\n", "\n", "###########\n", "Running for glutathione-metabolism\n", "mmu:625249\n", "mmu:75475\n", "mmu:69065\n", "\n", "************\n", "mmu:14859\n", "{'mmu:66988'}\n", "Intersection set()\n", "\n", "###########\n", "Running for glycerolipid-metabolism\n", "mmu:67916\n", "mmu:16956\n", "\n", "###########\n", "Running for nicotinate-and-nicotinamide-metabolism\n", "\n", "###########\n", "Running for betaalanine-metabolism\n", "mmu:268860\n", "\n", "###########\n", "Running for protein-export\n", "\n", "###########\n", "Running for ribosome-biogenesis-in-eukaryotes\n", "\n", "###########\n", "Running for tight-junction\n", "mmu:21844\n", "mmu:102098\n", "mmu:19684\n", "mmu:227157\n", "mmu:16476\n", "mmu:17883\n", "\n", "************\n", "mmu:102098\n", "{'mmu:22147', 'mmu:14463'}\n", "Intersection set()\n", "\n", "###########\n", "Running for tgfbeta-signaling-pathway\n", "mmu:18119\n", "mmu:17131\n", "mmu:18741\n", "mmu:26417\n", "mmu:17131\n", "mmu:12159\n", "mmu:12159\n", "mmu:12159\n", "\n", "************\n", "mmu:18741\n", "{'mmu:17869'}\n", "\n", "************\n", "mmu:320202\n", "{'mmu:17869'}\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for vascular-smooth-muscle-contraction\n", "mmu:18784\n", "mmu:26417\n", "mmu:18160\n", "\n", "###########\n", "Running for vitamin-digestion-and-absorption\n", "\n", "###########\n", "Running for erbb-signaling-pathway\n", "mmu:12575\n", "mmu:12576\n", "mmu:18479\n", "mmu:100042150\n", "mmu:216148\n", "mmu:108058\n", "mmu:16476\n", "mmu:26417\n", "\n", "************\n", "mmu:12575\n", "{'mmu:17869'}\n", "Intersection set()\n", "\n", "###########\n", "Running for calcium-signaling-pathway\n", "mmu:108058\n", "mmu:18439\n", "mmu:12287\n", "mmu:12290\n", "\n", "###########\n", "Running for amoebiasis\n", "mmu:12826\n", "mmu:12826\n", "\n", "************\n", "mmu:16772\n", "{'mmu:11847'}\n", "Intersection set()\n", "\n", "###########\n", "Running for osteoclast-differentiation\n", "mmu:18035\n", "mmu:16451\n", "mmu:16476\n", "mmu:26417\n", "\n", "************\n", "mmu:16451\n", "{'mmu:19261'}\n", "\n", "************\n", "mmu:20963\n", "{'mmu:19261', 'mmu:14284', 'mmu:14282'}\n", "\n", "************\n", "mmu:26417\n", "{'mmu:14284'}\n", "Intersection set()\n", "Intersection set()\n", "Intersection set()\n", "\n", "###########\n", "Running for hedgehog-signaling-pathway\n", "mmu:14632\n", "mmu:14632\n", "mmu:14632\n", "mmu:14632\n", "mmu:14632\n", "mmu:14632\n", "\n", "###########\n", "Running for bile-secretion\n", "mmu:11931\n", "mmu:232975\n", "mmu:239273\n", "mmu:11931\n", "mmu:232975\n", "mmu:239273\n", "mmu:20181\n", "mmu:20181\n", "\n", "************\n", "mmu:11931\n", "{'mmu:54403'}\n", "\n", "************\n", "mmu:11931\n", "{'mmu:54403'}\n", "Intersection set()\n", "\n", "###########\n", "Running for gap-junction\n", "mmu:26417\n", "\n", "###########\n", "Running for jakstat-signaling-pathway\n", "mmu:16451\n", "mmu:12575\n", "mmu:16878\n", "mmu:239114\n" ] } ], "source": [ "# generate for all significant pathways\n", "all_pathways\n", "\n", "for path in all_pathways:\n", " temp = path.split()\n", " code = temp[0]\n", " temp = [re.sub(r'\\W+', '', _).lower() for _ in temp[1:]]\n", " name = '-'.join(temp)\n", " \n", " print('\\n###########\\nRunning for', name)\n", " get_pathway_per_gene_anscestor(code, name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MAPK and BETS\n", "Let us colour all nodes in common with BETS in a different colour." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "common_nodes_in_bets = []\n", "for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name in bets_graph.nodes():\n", " common_nodes_in_bets.append(name)\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[1]\n", " else:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = '#FFFFFF'\n", "\n", "canvas = KGMLCanvas(pathway, import_imagemap=True)\n", "canvas.draw(\"mapk_bets.pdf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Descendant Relationship in BETS\n", "Check if any common nodes are descendants of each other." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "************\n", "mmu:20663\n", "{'mmu:26408'}\n", "\n", "************\n", "mmu:16590\n", "{'mmu:14184', 'mmu:14171', 'mmu:18212', 'mmu:17869', 'mmu:14164', 'mmu:12292'}\n", "\n", "************\n", "mmu:21687\n", "{'mmu:14184', 'mmu:14171', 'mmu:18212', 'mmu:17869', 'mmu:14164'}\n", "\n", "************\n", "mmu:23797\n", "{'mmu:14171'}\n", "\n", "************\n", "mmu:17869\n", "{'mmu:18212'}\n" ] } ], "source": [ "genes_descendants = {}\n", "for node in common_nodes_in_bets:\n", " descendants = set(nx.descendants(bets_graph, node))\n", " common = descendants.intersection(set(common_nodes_in_bets))\n", " if len(common)!=0:\n", " print('\\n************\\n'+node)\n", " print(common)\n", " genes_descendants[node] = common" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List of Interesting Pathways\n", "Orange is the parent, and blue is the descendants.\n", "\n", "1. MAPK - https://129.70.51.142:5000/mapk-signaling-pathway/mapk-signaling-pathway_mmu%3A17869.pdf\n", "1. MAPK - https://129.70.51.142:5000/mapk-signaling-pathway/mapk-signaling-pathway_mmu%3A21687.pdf\n", "1. Amoebiasis - https://129.70.51.142:5000/amoebiasis/amoebiasis_mmu%3A16772.pdf\n", "1. Huntington's diesease - https://129.70.51.142:5000/huntingtons-disease/huntingtons-disease_mmu%3A21817.pdf\n", "1. Alzheimer's disease - https://129.70.51.142:5000/alzheimers-disease/alzheimers-disease_mmu%3A11820.pdf\n", "1. Alzheimer's disease - https://129.70.51.142:5000/alzheimers-disease/alzheimers-disease_mmu%3A14812.pdf\n", "1. Arginine and proline metabolism - https://129.70.51.142:5000/arginine-and-proline-metabolism/arginine-and-proline-metabolism_mmu%3A12716.pdf\n", "1. Bile secretion (interesting) - https://129.70.51.142:5000/bile-secretion/bile-secretion_mmu%3A11931.pdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----------------" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'mmu:20663': {'mmu:26408'},\n", " 'mmu:16590': {'mmu:12292', 'mmu:14164', 'mmu:14171', 'mmu:17869'},\n", " 'mmu:21687': {'mmu:14164', 'mmu:14171', 'mmu:17869'},\n", " 'mmu:23797': {'mmu:14171'},\n", " 'mmu:17869': {'mmu:18212'}}" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "genes_descendants" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intersection {'mmu:18212', 'mmu:14184'}\n" ] } ], "source": [ "gene_name = 'mmu:16590'\n", "\n", "for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " temp = set(gene.name.split(' '))\n", " break\n", "\n", "print('Intersection', genes_descendants[gene_name].intersection(temp))\n", "genes_descendants[gene_name] = genes_descendants[gene_name] - temp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Contradiction\n", "We have a problem here. The current gene (`mmu:16590`) has many different names associated with it. Some of the names associated with it are listed as descendants in the BETS graph.\n", "\n", "HACK - for now, remove the descendants which are also other names for the same gene." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Descendants {'mmu:17869', 'mmu:14164', 'mmu:12292', 'mmu:14171'}\n" ] } ], "source": [ "gene_name = 'mmu:16590'\n", "print('Descendants', genes_descendants[gene_name])\n", "\n", "# make everything white\n", "for gene in pathway.genes:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = '#FFFFFF'\n", "\n", "for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[1]\n", " elif name in genes_descendants[gene_name]:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[0]\n", " else:\n", " pass" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "canvas = KGMLCanvas(pathway, import_imagemap=True)\n", "canvas.draw(\"mapk_bets_\"+gene_name+\".pdf\")" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intersection {'mmu:18212', 'mmu:14184'}\n" ] } ], "source": [ "gene_name = 'mmu:21687'\n", "\n", "for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " temp = set(gene.name.split(' '))\n", " break\n", "\n", "print('Intersection', genes_descendants[gene_name].intersection(temp))\n", "genes_descendants[gene_name] = genes_descendants[gene_name] - temp" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'mmu:17869', 'mmu:14164', 'mmu:14171'}\n" ] } ], "source": [ "gene_name = 'mmu:21687'\n", "print(genes_descendants[gene_name])\n", "\n", "# make everything white\n", "for gene in pathway.genes:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = '#FFFFFF'\n", "\n", "for gene in pathway.genes:\n", " names = gene.name.split(' ')\n", " for name in names:\n", " if name == gene_name:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[1]\n", " elif name in genes_descendants[gene_name]:\n", " for graphic in gene.graphics:\n", " graphic.bgcolor = current_palette[0]\n", " else:\n", " pass\n", " \n", "# write to file\n", "canvas = KGMLCanvas(pathway, import_imagemap=True)\n", "canvas.draw(\"mapk_bets_\"+gene_name+\".pdf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "------------" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "kegg Kyoto Encyclopedia of Genes and Genomes\n", "kegg Release 94.0+/04-16, Apr 20\n", " Kanehisa Laboratories\n", " pathway 701,754 entries\n", " brite 246,899 entries\n", " module 463 entries\n", " orthology 23,318 entries\n", " genome 6,830 entries\n", " genes 31,407,917 entries\n", " compound 18,699 entries\n", " glycan 11,039 entries\n", " reaction 11,414 entries\n", " rclass 3,165 entries\n", " enzyme 7,736 entries\n", " network 1,125 entries\n", " variant 416 entries\n", " disease 2,417 entries\n", " drug 11,255 entries\n", " dgroup 2,276 entries\n", " environ 864 entries\n", "\n" ] } ], "source": [ "# Kyoto Encyclopedia of Genes and Genomes\n", "print(kegg_info(\"kegg\").read())" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "path:mmu00010\tGlycolysis / Gluconeogenesis - Mus musculus (mouse)\n", "path:mmu00020\tCitrate cycle (TCA cycle) - Mus musculus (mouse)\n", "path:mmu00030\tPentose phosphate pathway - Mus musculus (mouse)\n", "path:mmu00040\tPentose and glucuronate interconversions - Mus musculus (mouse)\n", "path:mmu00051\tFructose and mannose metabolism - Mus musculus (mouse)\n", "path:mmu00052\tGalactose metabolism - Mus musculus (mouse)\n", "path:mmu00053\tAscorbate and aldarate metabolism - Mus musculus (mouse)\n", "path:mmu00061\tFatty acid biosynthesis - Mus musculus (mouse)\n", "path:mmu00062\tFatty acid elongation - Mus musculus (mouse)\n", "path:mmu00071\tFatty acid degradation - Mus musculus (mouse)\n", "[...]\n" ] } ], "source": [ "head(kegg_list('pathway', 'mmu').read())" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ENTRY mmu00010 Pathway\n", "NAME Glycolysis / Gluconeogenesis - Mus musculus (mouse)\n", "DESCRIPTION Glycolysis is the process of converting glucose into pyruvate and generating small amounts of ATP (energy) and NADH (reducing power). It is a central pathway that produces important precursor metabolites: six-carbon compounds of glucose-6P and fructose-6P and three-carbon compounds of glycerone-P, glyceraldehyde-3P, glycerate-3P, phosphoenolpyruvate, and pyruvate [MD:M00001]. Acetyl-CoA, another important precursor metabolite, is produced by oxidative decarboxylation of pyruvate [MD:M00307]. When the enzyme genes of this pathway are examined in completely sequenced genomes, the reaction steps of three-carbon compounds from glycerone-P to pyruvate form a conserved core module [MD:M00002], which is found in almost all organisms and which sometimes contains operon structures in bacterial genomes. Gluconeogenesis is a synthesis pathway of glucose from noncarbohydrate precursors. It is essentially a reversal of glycolysis with minor variations of alternative paths [MD:M00003].\n", "CLASS Metabolism; Carbohydrate metabolism\n", "PATHWAY_MAP mmu00010 Glycolysis / Gluconeogenesis\n", "MODULE mmu_M00001 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate [PATH:mmu00010]\n", " mmu_M00002 Glycolysis, core module involving three-carbon compounds [PATH:mmu00010]\n", " mmu_M00003 Gluconeogenesis, oxaloacetate => fructose-6P [PATH:mmu00010]\n", " mmu_M00307 Pyruvate oxidation, pyruvate => acetyl-CoA [PATH:mmu00010]\n", "DBLINKS GO: 0006096 0006094\n", "[...]\n" ] } ], "source": [ "head(kegg_get(\"path:mmu00010\").read())" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "canvas = KGMLCanvas(pathway)\n", "canvas.draw(\"fab_map.pdf\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "canvas.import_imagemap = True\n", "canvas.draw(\"fab_map_with_image.pdf\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "def draw_kegg_map(map_id):\n", " \"\"\" Render a local PDF of a KEGG map with the passed map ID\n", " \"\"\"\n", " # Get the background image first\n", " pathway = KGML_parser.read(kegg_get(map_id, \"kgml\"))\n", " canvas = KGMLCanvas(pathway, import_imagemap=True)\n", " img_filename = \"%s.pdf\" % map_id\n", " canvas.draw(img_filename)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rasgrf1, AI844718, CDC25, CDC25Mm, Gnrp, Grf1, Grfbeta, P190-A, Ras-GRF1, p190, p190RhoGEF...\n" ] } ], "source": [ "print(pathway.genes[0].graphics[0].name)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rasgrf1, AI844718, CDC25, CDC25Mm, Gnrp, Grf1, Grfbeta, P190-A, Ras-GRF1, p190, p190RhoGEF...\n", "392.0\n", "236.0\n", "None\n", "rectangle\n", "46.0\n", "17.0\n", "#000000\n", "#BFFFBF\n", "[(369.0, 227.5), (415.0, 244.5)]\n", "(392.0, 236.0)\n" ] } ], "source": [ "element = pathway.genes[0].graphics[0]\n", "attrs = [element.name, element.x, element.y, element.coords, element.type, \n", " element.width, element.height, element.fgcolor, element.bgcolor, \n", " element.bounds, element.centre]\n", "print('\\n'.join([str(attr) for attr in attrs]))" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "# Helper function to convert colour as RGB tuple to hex string\n", "def rgb_to_hex(rgb):\n", " rgb = tuple([int(255*val) for val in rgb])\n", " return '#' + ''.join([hex(val)[2:] for val in rgb]).upper()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# Define arbitrary colours\n", "colorspiral = ColorSpiral()\n", "colorlist = colorspiral.get_colors(len(pathway.genes))\n", "\n", "# Change the colours of ortholog elements\n", "for color, element in zip(colorlist, pathway.genes):\n", " for graphic in element.graphics:\n", " graphic.bgcolor = rgb_to_hex(color)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \"\n", " frameborder=\"0\"\n", " allowfullscreen\n", " >\n", " " ], "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "canvas = KGMLCanvas(pathway, import_imagemap=True)\n", "canvas.draw(\"fab_map_new_colours.pdf\")\n", "PDF(\"./fab_map_new_colours.pdf\")" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'#a1c9f4'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "current_palette[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Image(kegg_get(\"mmu04010\", \"image\").read())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }