{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotly interactive Human Disease Network ##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The human disease network (HDN) is represented by a graph whose nodes (vertices) are diseases (disorders).\n", "Two disorders are connected if they share a genetic component.\n", "\n", "The data source for this network is a `gexf` [file](https://github.com/gephi/gephi.github.io/blob/master/datasets/diseasome.gexf.zip), \n", "that contains information provided in the original [study of the HDN](http://www.pnas.org/content/104/21/8685). \n", "\n", "I parsed and converted the `gexf` file to a `gml` format,\n", "to be read with the Python graph library, `igraph`. To reduce edge clutter \n", "I slightly modified 55 node positions (see the new positions in this [md file](https://github.com/empet/Human-Disease-Network/blob/master/new-positions-HDN.md))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The node attributes recorded in the `gml` file are as follows:\n", "\n", "```\n", "node\n", " [\n", " id 0\n", " name \"30\"\n", " color \"rgb(238, 68, 68)\"\n", " disclass \"Neurological\"\n", " label \"Alzheimer disease\"\n", " position \"(-116.486664, -95.29654)\"\n", " type \"disease\"\n", " size 17.46866\n", " ]\n", "```\n", "\n", "\n", "The HDN displays 516 disorders clustered in 22 classes. The nodes in the same class have a common color, and the size of each node is proportional\n", "to the number of genes that are known to be associated to the corresponding disorder.\n", "\n", "The HDN has links/connections between both disorders in the same class and distinct classes. \n", "\n", "\n", "*The HDN is a connected graph suggesting that the genetic origins of most diseases, to some extent, are shared with other diseases.* [[http://www.pnas.org/content/104/21/8685](http://www.pnas.org/content/104/21/8685)]." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following we define the HDN as an `igraph.Graph` instance, and extract from its attributes data to define the Plotly graphical object representing this network.\n", "\n", "Usually to an `igraph.Graph` one associates, via a layout algorithm, a spatial position for each node. \n", "In this case however the node positions are prescribed, i.e. they are given in the `gml` file.\n", "On the other hand we define the Plotly plot layout, as a dict." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python version: 3.7.1\n" ] } ], "source": [ "import platform\n", "print(f'Python version: {platform.python_version()}')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'4.0.0'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import plotly\n", "plotly.__version__" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import igraph as ig\n", "import numpy as np\n", "import ast" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "from chart_studio.plotly import iplot\n", "import plotly.graph_objs as go\n", "import plotly.io as pio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the `gml` file as an `igraph.Graph`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "K = ig.Graph.Read_GML('human-disease.gml')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the following function to check whether a network is connected or not:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def get_graph_components(G, ret=False):\n", " comp = G.components()\n", " if ret:\n", " return comp, len(comp)\n", " else:\n", " return len(comp)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "d_classes = list(set([v['disclass'] for v in K.vs]))\n", "disorder_classes = sorted(d_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basic information on the graph K:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of vertices: 516 \n", "Number of edges: 2376 \n", "Disorder classes: 22 \n", "Number of components: 1 \n", "\n", "Disorder classes: ['Bone', 'Cancer', 'Cardiovascular', 'Connective tissue disorder', 'Dermatological', 'Developmental', 'Ear,Nose,Throat', 'Endocrine', 'Gastrointestinal', 'Hematological', 'Immunological', 'Metabolic', 'Multiple', 'Muscular', 'Neurological', 'Nutritional', 'Ophthamological', 'Psychiatric', 'Renal', 'Respiratory', 'Skeletal', 'Unclassified']\n" ] } ], "source": [ "print ('Number of vertices: ', len(K.vs), '\\nNumber of edges: ', len(K.es),\n", " '\\nDisorder classes: ', len(disorder_classes),\n", " '\\nNumber of components: ', get_graph_components(K),\n", " '\\n\\nDisorder classes: ', disorder_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us compute the eigenvalue centrality of the disease network nodes:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "v = K.evcent(directed=False)\n", "centralities = np.array(v)\n", "I = np.argsort(centralities)[::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the top twenty diseases, according to their eigenvalue centralities:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Colon cancer', 0.9999999999999999),\n", " ('Breast cancer', 0.7706634205589713),\n", " ('Thyroid carcinoma', 0.6472351048664623),\n", " ('Pancreatic cancer', 0.6139192264158725),\n", " ('Hepatic adenoma', 0.5101682483951583),\n", " ('Li-Fraumeni syndrome', 0.4780901721306633),\n", " ('Osteosarcoma', 0.4780901721306631),\n", " ('Prostate cancer', 0.4604774501831161),\n", " ('Gastric cancer', 0.4524973931721693),\n", " ('Multiple malignancy syndrome', 0.445428744021657),\n", " ('Histiocytoma', 0.4454287440216569),\n", " ('Adrenal cortical carcinoma', 0.4454287440216568),\n", " ('Nasopharyngeal carcinoma', 0.4454287440216568),\n", " ('Ovarian cancer', 0.41094522623707935),\n", " ('Endometrial carcinoma', 0.3968933342610114),\n", " ('Bladder cancer', 0.3805654441304783),\n", " ('Lymphoma', 0.34382444357241887),\n", " ('Leukemia', 0.3270438988001863),\n", " ('Glioblastoma', 0.2697605264679987),\n", " ('Crouzon syndrome', 0.26320974885145487)]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_20 = [K.vs[j]['label'] for j in I[:20]]\n", "list(zip(top_20, np.sort(centralities)[::-1][:20]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Annotation text for data source:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "annotation_text = \"HDN: [1] \"+\\\n", "\" Data source: [2]\" " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function extracts from the graph K, the data needed to plot it via Plotly:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "def get_Plotly_netw(G, graph_layout, title='Plotly Interactive Human Disease Network',\n", " source=annotation_text, flip='lr', width=950, height=850):\n", " #G is an igraph.Graph instance\n", " #graph_layout is an igraph.Layout instance\n", " #title - the network title\n", " #flip is one of the strings 'lr', 'ud' to perform a pseudo-flip effect\n", " #the igraph.Layout is referenced to the screen system of coords, and is upside-down flipped chonging the sign of y-coords\n", " #the global HDN looks better with the left-right flipped layout, by changing the x-coords sign.\n", " #width and height are the sizes of the plot area\n", " \n", " graph_layout=np.array(graph_layout)\n", " \n", " if flip =='lr':\n", " graph_layout[:, 0] = -graph_layout[:,0]\n", " elif flip == 'ud':\n", " graph_layout[:, 1] = -graph_layout[:,1] \n", " else: \n", " raise ValueError('There is no such a flip type')\n", " \n", " m = len(G.vs)\n", " graph_edges = [e.tuple for e in G.es]# represent edges as tuples of end vertex indices\n", " \n", " Xn = [graph_layout[k][0] for k in range(m)]#x-coordinates of graph nodes(vertices)\n", " Yn = [graph_layout[k][1] for k in range(m)]#y-coordinates -\"-\n", " \n", " Xe = []#list of edge ends x-coordinates\n", " Ye = []#list of edge ends y-coordinates\n", " for e in graph_edges:\n", " Xe.extend([graph_layout[e[0]][0],graph_layout[e[1]][0], None])\n", " Ye.extend([graph_layout[e[0]][1],graph_layout[e[1]][1], None]) \n", "\n", " size = [vertex['size'] for vertex in G.vs]\n", " #define the Plotly graphical objects\n", " \n", " plotly_edges = go.Scatter(x=Xe,\n", " y=Ye,\n", " mode='lines',\n", " line=dict(color='rgb(180,180,180)', width=0.75),\n", " hoverinfo='none'\n", " )\n", " plotly_vertices = go.Scatter(x=Xn,\n", " y=Yn,\n", " mode='markers',\n", " name='',\n", " marker=dict(symbol='circle-dot',\n", " size=size, \n", " color=[vertex['color'] for vertex in G.vs], \n", " line=dict(color='rgb(50,50,50)', width=0.5)\n", " ),\n", " text=[f\"{vertex['label']}
({vertex['disclass']})\" for vertex in G.vs],\n", " hoverinfo='text'\n", " )\n", " \n", " #Define the Plotly plot layout:\n", " \n", " plotly_plot_layout = dict(title=title, \n", " width=width,\n", " height=height,\n", " showlegend=False,\n", " xaxis=dict(visible=False),\n", " yaxis=dict(visible=False), \n", " \n", " margin=dict(t=100),\n", " hovermode='closest',\n", " template='none',\n", " paper_bgcolor='rgb(235,235,235)'\n", " )\n", " \n", " if source is not None:\n", " annotations = [dict(showarrow=False, \n", " text=source, \n", " xref='paper', \n", " yref='paper', \n", " x=0, \n", " y=-0.1, \n", " xanchor='left', \n", " yanchor='bottom', \n", " font=dict(size=14))]\n", " else:\n", " annotations = []\n", " \n", " disorder_name = [vertex['label'] for vertex in G.vs]\n", " for k, s in enumerate(size):\n", " if s>= 20:\n", " annotations.append(dict(text=disorder_name[k], \n", " x=graph_layout[k][0], y=graph_layout[k][1],\n", " xref='x1', yref='y1',\n", " font=dict(color='rgb(50,50,50)', size=10),\n", " showarrow=False\n", " )\n", " )\n", " \n", " plotly_plot_layout.update(annotations=annotations)\n", " return go.FigureWidget(data=[plotly_edges, plotly_vertices], layout=plotly_plot_layout) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the position of each node is recorded as a string, we convert it to a tuple of floats:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "pos = [ast.literal_eval(v['position']) for v in K.vs]\n", "HDN_layout = np.array(pos)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "fig = get_Plotly_netw(K, HDN_layout)\n", "#fig" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iplot(fig, filename='Diseasome-Network')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This network is posted on [Wikipedia](https://en.wikipedia.org/wiki/Human_disease_network). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the study of this network we could be interested in the subgraph of a disorder class or a disorder class and its\n", "neighbors (adjacent nodes). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we define and plot the subgraph of a disorder class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function `extract_subgraph` returns the subgraph generated by the subset of vertices in a disorder class:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "def extract_subgraph(G, disclass='Cancer'):\n", " sbgraph_vertices = [v for v in G.vs if v['disclass']==disclass]\n", " return G.subgraph(sbgraph_vertices)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "def subgraph_fig(G, disorder_class, disorder_classes=disorder_classes, \n", " G_layout=HDN_layout, width=800, height=700):\n", " #generate a plotly figure of the `disclass` subgraph\n", " if disorder_class not in disorder_classes:\n", " raise ValueError(\"There is no such a disorder class\")\n", " S = extract_subgraph(G, disclass=disorder_class)\n", " sbgraph_verts_idx = [int(v['id']) for v in S.vs]#extract the S-vertex indexes in the global graph G\n", " sbgraph_layout = np.array(G_layout)[sbgraph_verts_idx]# extract the layout of the subgraph\n", " figure = get_Plotly_netw(S, sbgraph_layout, title=disorder_class+' subnetwork', \n", " source=None, width=width, height=height)\n", " return S, figure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The largest disorder class seems to be the class ``Cancer``" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "C, figc = subgraph_fig(K, 'Cancer')\n", "#figc" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iplot(figc, filename='Cancer-subnetwork')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this default layout (set in the gml data file), the Cancer subnetwork has cluttered edges, and we cannot identify visually\n", "all pairs of connected disorders. After experimenting with different (i)graph layouts we concluded that the best choice to reduce clutter is the Reingold-Tilford layout:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "new_layout = C.layout('rt')\n", "figc1 = get_Plotly_netw(C, new_layout, title='Cancer subnetwork with Reingold-Tilford layout', \n", " source=None, flip='ud', width=900, height=700)\n", "#figc1" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iplot(figc1, filename='Cancer-subnetwork-rtlayt')" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "N, fign=subgraph_fig(K, 'Neurological')\n", "#fign" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iplot(fign, filename='Neurological-subnetwork')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define a subgraph generated by the vertices in a disorder class and its neighbors, in order to get new insights into\n", "that class, i.e.\n", "to identify disorders that share a genetic component with those in the current class:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def adjacent_subgraph(G, disorder_class='Neurological', disorder_classes=disorder_classes):\n", " if disorder_class not in disorder_classes:\n", " raise ValueError(\"There is no such a disorder class\")\n", " Ed = [e.tuple for e in G.es]\n", " adjacent_edges = [e for e in Ed if G.vs[e[0]]['disclass']==disorder_class \n", " or K.vs[e[1]]['disclass']==disorder_class]\n", " vertsa = []\n", " for e in adjacent_edges:\n", " vertsa.extend([e[0], e[1]])\n", " vertsa_idx = list(set(vertsa)) \n", " verts = [G.vs[i] for i in vertsa_idx]\n", " S = K.subgraph(verts)\n", " idxes = [int(v['id']) for v in S.vs]\n", " layt = np.array(HDN_layout)[idxes]\n", " return get_Plotly_netw(S, layt, title=f'Subnetwork of the {disorder_class} class, and its neighbors', \n", " source=None, flip='lr', width=800, height=700)\n", " " ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "figna = adjacent_subgraph(K, disorder_class='Neurological')\n", "#figna" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iplot(figna, filename='Neurological-adjacentN')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 1 }