{ "cells": [ { "cell_type": "markdown", "id": "geographic-department", "metadata": {}, "source": [ "# Diabetes related genes expressed in pancreas\n", "\n", "This notebook shows how to integrate genomic and image data resources.\n", "This notebook looks at the question **Which diabetes related genes are expressed in the pancreas?**\n", "Tissue and disease can be modified.\n", "\n", "\n", "Steps:\n", "\n", "* Query [humanmine.org](https://www.humanmine.org/humanmine), an integrated database of *Homo sapiens* genomic data using the intermine API to find the genes.\n", "* Using the list of found genes, search in the Image Data Resource (IDR) for images linked to the genes, tissue and disease.\n", "\n", " \n", "We use the intermine API and the IDR API. This notebook is inspired by [Workshop_Pax6Workflow](https://github.com/intermine/intermine-ws-python-docs/blob/master/Workshop_Pax6Workflow.ipynb).\n", "\n", "## Summary:\n", "\n", "\n", "## Settings:\n", "\n", "\n", "### Auxiliary libraries used\n", "* [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels): Enables a Jupyter Notebook or JupyterLab application in one conda environment to access kernels for Python, R, and other languages found in other environments.\n", "* [jupyter_contrib_nbextensions](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/index.html): Package containing a collection of community-contributed unofficial extensions that add functionality to the Jupyter notebook.\n", "\n", "## Launch\n", "\n", "### binder\n", "\n", "If not already running, you can launch by clicking on the logo [](https://mybinder.org/v2/gh/IDR/idr-notebooks/master?urlpath=notebooks%2Fhumanmine.ipynb)\n", "\n", "### run locally using Docker and repo2docker\n", "\n", "With ``jupyter-repo2docker`` installed, run:\n", "\n", "```\n", "git clone https://github.com/IDR/idr-notebooks.git\n", "cd idr-notebooks\n", "repo2docker .\n", "```" ] }, { "cell_type": "markdown", "id": "conscious-metabolism", "metadata": {}, "source": [ "### Install dependencies if required\n", "The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true)." ] }, { "cell_type": "code", "execution_count": 1, "id": "hazardous-complement", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: intermine in /Users/jmarie/opt/anaconda3/envs/stardist-1/lib/python3.9/site-packages (1.13.0)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install intermine" ] }, { "cell_type": "markdown", "id": "external-swaziland", "metadata": {}, "source": [ "### Import libraries " ] }, { "cell_type": "code", "execution_count": 2, "id": "collect-sensitivity", "metadata": {}, "outputs": [], "source": [ "# libraries to interact with intermine\n", "from intermine.webservice import Service\n", "\n", "# libraries to interact with IDR\n", "import requests\n", "import json" ] }, { "cell_type": "markdown", "id": "accomplished-transsexual", "metadata": {}, "source": [ "## Search for genes in HumanMine\n", "\n", "We first define the output columns, then add the constraints i.e. specify the tissue and the disease." ] }, { "cell_type": "code", "execution_count": 3, "id": "interracial-petite", "metadata": {}, "outputs": [], "source": [ "service = Service(\"https://www.humanmine.org/humanmine/service\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "empty-yeast", "metadata": {}, "outputs": [], "source": [ "query = service.new_query(\"Gene\")" ] }, { "cell_type": "code", "execution_count": 5, "id": "miniature-sailing", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<intermine.query.Query at 0x111b1b220>" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_view(\n", " \"primaryIdentifier\", \"symbol\", \"proteinAtlasExpression.cellType\",\n", " \"proteinAtlasExpression.level\", \"proteinAtlasExpression.reliability\",\n", " \"proteinAtlasExpression.tissue.name\"\n", ")" ] }, { "cell_type": "markdown", "id": "ordered-director", "metadata": {}, "source": [ "We look for those genes in the specified tissue and that are also associated with the specified disease." ] }, { "cell_type": "code", "execution_count": 6, "id": "684b4167", "metadata": {}, "outputs": [], "source": [ "TISSUE = \"Pancreas\"\n", "DISEASE = \"diabetes\"" ] }, { "cell_type": "code", "execution_count": 7, "id": "prospective-stable", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<BinaryConstraint: Gene.diseases.name CONTAINS diabetes>" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_constraint(\"proteinAtlasExpression.tissue.name\", \"=\", TISSUE)\n", "query.add_constraint(\"proteinAtlasExpression.level\", \"ONE OF\", [\"Medium\", \"High\"])\n", "query.add_constraint(\"organism.name\", \"=\", \"Homo sapiens\")\n", "query.add_constraint(\"diseases.name\", \"CONTAINS\", DISEASE)" ] }, { "cell_type": "markdown", "id": "proved-bronze", "metadata": {}, "source": [ "Collect the genes" ] }, { "cell_type": "code", "execution_count": 8, "id": "adopted-treaty", "metadata": {}, "outputs": [], "source": [ "upin_tissue = set()\n", "for row in query.rows():\n", " upin_tissue.add(row[\"symbol\"])\n", "genes = sorted(upin_tissue, reverse=True)" ] }, { "cell_type": "markdown", "id": "selective-pathology", "metadata": {}, "source": [ "Print out the list of genes" ] }, { "cell_type": "code", "execution_count": 9, "id": "secret-labor", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "YIPF5 WFS1 VEGFA TCF7L2 TBC1D4 SOD2 SLC30A8 PTPN22 \n", "PDX1 MIA3 KCNJ11 IRS2 IRS1 INSR INS IGF2BP2 \n", "IER3IP1 HNF4A HNF1B HMGA1 HFE GPD2 GCK ENPP1 \n", "EIF2AK3 DNAJC3 CEL CAPN10 APPL1 AKT2 ABCC8 " ] } ], "source": [ "for i, a in enumerate(genes):\n", " print(a, end=' ')\n", " if i % 8 == 7: \n", " print(\"\")" ] }, { "cell_type": "markdown", "id": "pending-fusion", "metadata": {}, "source": [ "## Search for images in IDR associated to the genes found in HumanMine\n", "\n", "From the list of genes found using the intermine API, we are now looking in [Image Data Resource](https://idr.openmicroscopy.org/) for studies linked to those genes and with **TISSUE**." ] }, { "cell_type": "markdown", "id": "round-dancing", "metadata": {}, "source": [ "### Set up where to query and session" ] }, { "cell_type": "code", "execution_count": 10, "id": "supported-jaguar", "metadata": { "code_folding": [] }, "outputs": [], "source": [ "INDEX_PAGE = \"https://idr.openmicroscopy.org/webclient/?experimenter=-1\"\n", "\n", "# create http session\n", "with requests.Session() as session:\n", " request = requests.Request('GET', INDEX_PAGE)\n", " prepped = session.prepare_request(request)\n", " response = session.send(prepped)\n", " if response.status_code != 200:\n", " response.raise_for_status()" ] }, { "cell_type": "markdown", "id": "elect-batch", "metadata": {}, "source": [ "### Search studies\n", "Search the studies related to the list of genes found in the HumanMine resource." ] }, { "cell_type": "code", "execution_count": 11, "id": "e91eab83", "metadata": {}, "outputs": [], "source": [ "SEARCH_URL = \"https://idr.openmicroscopy.org/searchengine/api/v1/resources/{type}/search/\"\n", "KEY_VALUE_SEARCH = SEARCH_URL + \"?key={key}&value={value}\"\n", "KEY = \"Gene Symbol\"" ] }, { "cell_type": "code", "execution_count": 12, "id": "perfect-throw", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.02 s, sys: 335 ms, total: 1.36 s\n", "Wall time: 10.3 s\n" ] } ], "source": [ "%%time\n", "import collections\n", "from collections import defaultdict\n", "\n", "results = {}\n", "for gene in genes:\n", " qs1 = {'type': 'image', 'key': KEY, 'value': gene}\n", " url = KEY_VALUE_SEARCH.format(**qs1)\n", " json = session.get(url).json()\n", " images = json['results']['results']\n", " results[gene] = images" ] }, { "cell_type": "markdown", "id": "returning-retreat", "metadata": {}, "source": [ "### First we filter images with the development stage" ] }, { "cell_type": "code", "execution_count": 13, "id": "aec7ada7", "metadata": {}, "outputs": [], "source": [ "# Annotation key in IDR to find and filter by.\n", "EXPRESSION_KEY = \"Expression Pattern Description\"\n", "EXPRESSION = \"Islets\"\n", "STAGE = \"Developmental Stage\"" ] }, { "cell_type": "code", "execution_count": 14, "id": "ad0e81b0", "metadata": {}, "outputs": [], "source": [ "development_stage = {}\n", "for k in results:\n", " images = results[k]\n", " result_images = defaultdict(list)\n", " for image in images:\n", " values = image[\"key_values\"]\n", " stage = \"\"\n", " for v in values:\n", " name = v[\"name\"]\n", " value = v['value']\n", " if name == STAGE:\n", " stage = value\n", " if name == EXPRESSION_KEY and EXPRESSION in value:\n", " result_images[stage].append(image[\"id\"])\n", " development_stage[k] = result_images.items()" ] }, { "cell_type": "code", "execution_count": 15, "id": "adapted-ethics", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'YIPF5': dict_items([]), 'WFS1': dict_items([]), 'VEGFA': dict_items([]), 'TCF7L2': dict_items([]), 'TBC1D4': dict_items([]), 'SOD2': dict_items([]), 'SLC30A8': dict_items([]), 'PTPN22': dict_items([]), 'PDX1': dict_items([('15PCW', [9841210, 9841211]), ('9PCW', [9841212, 9841213, 9841218, 9841219]), ('CS21', [9841214, 9841215]), ('CS16', [9841216, 9841217])]), 'MIA3': dict_items([]), 'KCNJ11': dict_items([]), 'IRS2': dict_items([]), 'IRS1': dict_items([]), 'INSR': dict_items([]), 'INS': dict_items([('15PCW', [9839153, 9839154])]), 'IGF2BP2': dict_items([]), 'IER3IP1': dict_items([]), 'HNF4A': dict_items([]), 'HNF1B': dict_items([]), 'HMGA1': dict_items([]), 'HFE': dict_items([]), 'GPD2': dict_items([]), 'GCK': dict_items([]), 'ENPP1': dict_items([]), 'EIF2AK3': dict_items([]), 'DNAJC3': dict_items([]), 'CEL': dict_items([]), 'CAPN10': dict_items([]), 'APPL1': dict_items([]), 'AKT2': dict_items([]), 'ABCC8': dict_items([])}\n" ] } ], "source": [ "print(development_stage)" ] }, { "cell_type": "markdown", "id": "fallen-girlfriend", "metadata": {}, "source": [ "## Display the images\n", "Display the images associated to the genes." ] }, { "cell_type": "code", "execution_count": 16, "id": "sunset-playlist", "metadata": {}, "outputs": [], "source": [ "# URLs to retrieve the thumbnails and link to the images in IDR\n", "BASE_URL = \"https://idr.openmicroscopy.org/webclient\"\n", "IMAGE_DATA_URL = BASE_URL + \"/render_thumbnail/{id}\"\n", "LINK_URL = BASE_URL + \"/?show=image-{id}\"" ] }, { "cell_type": "markdown", "id": "subjective-radius", "metadata": {}, "source": [ "## Display the images with development stage\n", "Click on the thumbnail to open the image in IDR." ] }, { "cell_type": "code", "execution_count": 17, "id": "detailed-oakland", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c5c1c2a4649f4f5781841abe8671fea4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "AppLayout(children=(HTML(value='<table><tr><td><h2>Gene: PDX1</h2></td></tr><tr><tr><td><h4>Developmental stag…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display the images\n", "from ipywidgets import AppLayout, widgets\n", "\n", "table_widget = widgets.HTML(\"\")\n", "\n", "html = \"<table>\"\n", "for gene in development_stage:\n", " images = development_stage[gene]\n", " if len(images) > 0:\n", " html += '<tr><td><h2>Gene: '+gene+'</h2></td></tr><tr>'\n", " for k, v in images:\n", " html += '<tr><td><h4>Developmental stage: '+k+'</h4></td></tr><tr>'\n", " for i in v:\n", " qs = {'id': i}\n", " url = IMAGE_DATA_URL.format(**qs)\n", " url_link = LINK_URL.format(**qs)\n", " html += '<td><a href=\"'+url_link+'\" target=\"_blank\"><img src=\"'+url+'\"/></a></td>'\n", " html += \"</tr>\"\n", " html += \"</tr>\"\n", "html += \"</table>\"\n", "\n", "table_widget.value = html\n", "AppLayout(header=None,\n", " left_sidebar=None,\n", " center=table_widget,\n", " right_sidebar=None,\n", " footer=None)" ] }, { "cell_type": "markdown", "id": "04d0b34e", "metadata": {}, "source": [ "## Find the image with the following key(s) and value(s)" ] }, { "cell_type": "code", "execution_count": 18, "id": "1385e2a7", "metadata": {}, "outputs": [], "source": [ "PART_KEY = \"Organism Part\"\n", "PATHOLOGY_KEY = \"Pathology\"\n", "PATHOLOGY_NORMAL_VALUE = \"Normal\"" ] }, { "cell_type": "code", "execution_count": 19, "id": "55f1cf9b", "metadata": {}, "outputs": [], "source": [ "pathology_images = {}\n", "for k in results:\n", " images = results[k]\n", " result_images = defaultdict(list)\n", " for image in images:\n", " values = image[\"key_values\"]\n", " part = None\n", " for v in values:\n", " name = v[\"name\"]\n", " value = v['value']\n", " if PART_KEY in name and (TISSUE or EXPRESSION in value):\n", " part = value\n", " for v in values:\n", " name = v[\"name\"]\n", " value = v['value']\n", " if part is not None and name == PATHOLOGY_KEY:\n", " if PATHOLOGY_NORMAL_VALUE in value:\n", " result_images[PATHOLOGY_NORMAL_VALUE].append(image[\"id\"])\n", " else:\n", " result_images[value].append(image[\"id\"])\n", " pathology_images[k] = result_images.items()" ] }, { "cell_type": "markdown", "id": "7d107a03", "metadata": {}, "source": [ "**Plot the disease vs number of images found**" ] }, { "cell_type": "markdown", "id": "367f36ee", "metadata": {}, "source": [ "## Filter images with a given Organism Part value\n", "\n", "We explore the images associated to the gene **PDX1** with abnormal pathology status." ] }, { "cell_type": "code", "execution_count": 20, "id": "7fe56469", "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<Figure size 720x720 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "disease_map = {}\n", "gene = \"PDX1\"\n", "images = pathology_images[gene]\n", "if len(images) == 0:\n", " print(\"No images found\")\n", "else:\n", " for k, v in images:\n", " if k != PATHOLOGY_NORMAL_VALUE:\n", " disease_map[k] = len(v)\n", "\n", " disease_ordered = collections.OrderedDict(sorted(disease_map.items()))\n", " df = pd.DataFrame({'disease':disease_ordered.items(),\n", " 'number of images':disease_ordered.values()})\n", " df.plot(kind='barh', x='disease', y='number of images', figsize=(10,10))\n" ] }, { "cell_type": "markdown", "id": "ae84af6c", "metadata": {}, "source": [ "## Select the disease and display the associated images\n", "Click on the thumbnail to open the image in IDR." ] }, { "cell_type": "code", "execution_count": 21, "id": "b16ac025", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "c532ecf4735743f8ae436c7e8ee752c7", "version_major": 2, "version_minor": 0 }, "text/plain": [ "GridspecLayout(children=(HTML(value='Gene: <b>PDX1</b>', layout=Layout(grid_area='widget001')), Dropdown(descr…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from ipywidgets import GridspecLayout, widgets\n", "\n", "increase = 8\n", "max_value = increase\n", "min_value = 0\n", "\n", "disease = \"\"\n", "\n", "def display_images(images, min, max):\n", " html = \"<table>\"\n", " html += '<tr>'\n", " if min < 0:\n", " min = 0\n", " if max >= len(images):\n", " max = len(images)\n", "\n", " \n", " for i in images[min:max]:\n", " qs = {'id': i}\n", " url = IMAGE_DATA_URL.format(**qs)\n", " url_link = LINK_URL.format(**qs)\n", " html += '<td><a href=\"'+url_link+'\" target=\"_blank\"><img src=\"'+url+'\"/></a> </td>'\n", " html += \"</tr>\"\n", " html += \"</table>\"\n", " html_widget.value = html\n", " \n", " # Set the number of images found\n", " count_widget.value = \"<b>Number of images found: \" + str(len(images)) + \"</b>\"\n", " \n", "def on_selection_change(change):\n", " global disease\n", " if change['name'] == 'value':\n", " values = get_images(change['new']) \n", " if values is None:\n", " return\n", " disease = change['new']\n", " min_value = 0\n", " max_value = increase \n", " display_images(values, min_value, max_value)\n", " \n", "def get_images(disease):\n", " for k, v in images:\n", " if k == disease:\n", " return v\n", " return None\n", " \n", "def on_click_next(b):\n", " global min_value\n", " global max_value\n", " max_value = max_value + increase\n", " min_value = min_value + increase\n", " values = get_images(disease)\n", " button_previous.disabled = False\n", " if values is None:\n", " return\n", " if max_value > len(values):\n", " button_next.disabled = True\n", " \n", " display_images(values, min_value, max_value)\n", "\n", "def on_click_previous(b):\n", " global min_value\n", " global max_value\n", " max_value = max_value - increase\n", " min_value = min_value - increase\n", " button_next.disabled = False\n", " if min_value <= 0: # reset \n", " min_value = 0\n", " max_value = increase\n", " button_previous.disabled = True\n", " values = get_images(disease)\n", " if values is not None:\n", " display_images(values, min_value, max_value)\n", " \n", "def dropdown_widget(disease_list,\n", " dropdown_widget_name,\n", " displaywidget=False):\n", "\n", " selection = widgets.Dropdown(\n", " options=disease_list,\n", " value=disease_list[0],\n", " description=dropdown_widget_name,\n", " disabled=False,\n", " )\n", " selection.observe(on_selection_change)\n", " display_images(get_images(selection.value), min_value, max_value)\n", " return selection\n", "\n", "disease_list = list(disease_ordered.keys())\n", "disease = disease_list[0]\n", "gene_widget = widgets.HTML(\"\")\n", "count_widget = widgets.HTML(\"\")\n", "html_widget = widgets.HTML(\"\")\n", "disease_box = dropdown_widget(\n", " disease_list,\n", " 'Disease: ', True\n", ")\n", "\n", "button_next = widgets.Button(description=\"Next>>\")\n", "button_next.on_click(on_click_next)\n", "\n", "button_previous = widgets.Button(description=\"<<Previous\", disabled=True)\n", "button_previous.on_click(on_click_previous)\n", "\n", "gene_widget.value = \"Gene: <b>\" + gene + \"</b>\"\n", "\n", "grid = GridspecLayout(3, 3)\n", "grid[0, 0] = gene_widget\n", "grid[0, 1] = disease_box\n", "grid[0, 2] = count_widget\n", "grid[2, 0] = button_previous\n", "grid[1, :] = html_widget\n", "grid[2, 2] = button_next\n", "grid\n" ] }, { "cell_type": "markdown", "id": "sought-subsection", "metadata": {}, "source": [ "### License (BSD 2-Clause)¶\n", "\n", "Copyright (C) 2021-2022 University of Dundee. All Rights Reserved.\n", "\n", "Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n", "\n", "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. " ] }, { "cell_type": "code", "execution_count": null, "id": "31f1dbaa", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "stardist-1", "language": "python", "name": "stardist-1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }