{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Compound\n", "This notebook explores the metadata and images associated with a set of compounds across all IDR studies.\n", "We aim at finding out the range of concentrations used across studies for each compound.\n", "We retrieve images associated to each of the compound and offer all the other metadata associated with those images as a CSV. Using a subset of these images, we further programmatically generate an OMERO.figure that can be viewed in any OMERO.server." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install dependencies if required\n", "The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install idr-py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import libraries " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import csv\n", "import os\n", "import pandas as pd\n", "from tempfile import NamedTemporaryFile\n", "\n", "import scipy\n", "import numpy\n", "from skimage import filters\n", "import matplotlib.pyplot as plt\n", "from idr import connection\n", "\n", "import requests\n", "import json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up where to query and session " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "code_folding": [ 3 ] }, "outputs": [], "source": [ "INDEX_PAGE = \"https://idr.openmicroscopy.org/webclient/?experimenter=-1\"\n", "\n", "# create http session\n", "with requests.Session() as session:\n", " request = requests.Request('GET', INDEX_PAGE)\n", " prepped = session.prepare_request(request)\n", " response = session.send(prepped)\n", " if response.status_code != 200:\n", " response.raise_for_status()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compounds to query " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "compounds = ['loratadine', 'cycloheximide', 'ML9', 'ML-9']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up base URLS so can use shorter variable names later on" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "URL = \"https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&orphaned=true\"\n", "SCREENS_PROJECTS_URL = \"https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&id={compound_id}\"\n", "PLATES_URL = \"https://idr.openmicroscopy.org/mapr/api/{key}/plates/?value={value}&id={screen_id}&case_sensitive=false\"\n", "IMAGES_URL = \"https://idr.openmicroscopy.org/mapr/api/{key}/images/?value={value}&node={parent_type}&id={parent_id}&case_sensitive=false\"\n", "ATTRIBUTES_URL = \"https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&image={image_id}\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find images for each compound specified\n", "For each compound, search of images in plates then search for annotations associated with the images. The results are saved in a CSV file. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "code_folding": [ 1, 2 ] }, "outputs": [], "source": [ "TYPE = \"compound\"\n", "KEYS = {TYPE:\n", " (\"InChIKey\",\n", " \"PubChem InChIKey\",\n", " \"Compound Concentration (microMolar)\",\n", " \"Concentration (microMolar)\",\n", " \"Dose\",\n", " \"Compound MoA\",\n", " \"Compound Action\")\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper method\n", "Parse the output of the json and save it into the CSV file." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "code_folding": [ 0 ] }, "outputs": [], "source": [ "def parse_annotation(writer, json_data, name, data_type):\n", " plate_name = \"-\"\n", " screen_name = name\n", " for p in json_data[data_type]:\n", " parent_id = p['id']\n", " plate_name = p['name']\n", " qs3 = {'key': TYPE, 'value': compound,\n", " 'parent_type': data_type[:-1], 'parent_id': parent_id}\n", " url3 = IMAGES_URL.format(**qs3)\n", " c = compound.lower()\n", " if c.startswith(\"ml\"):\n", " c = 'ml9'\n", " for i in session.get(url3).json()['images']:\n", " image_id = i['id']\n", " url4 = ATTRIBUTES_URL.format(**{'image_id': image_id})\n", " row = {}\n", " inchikey = \"unknown\"\n", " concentration = \"unknown\"\n", " moa = \"unknown\"\n", " for a in session.get(url4).json()['annotations']:\n", " for v in a['values']:\n", " key = str(v[0])\n", " if key in KEYS[TYPE]:\n", " if key in ['InChIKey', 'PubChem InChIKey']:\n", " inchikey = v[1]\n", " elif key in ['Dose', 'Compound Concentration (microMolar)', 'Concentration (microMolar)']:\n", " concentration = float(v[1].replace(' micromolar', ''))\n", " elif key in ['Compound MoA', 'Compound Action']:\n", " moa = v[1]\n", " row.update({'Compound': c,\n", " 'Screen': screen_name,\n", " 'Plate': plate_name,\n", " 'Image': image_id,\n", " 'InChIKey': inchikey,\n", " 'Concentration (microMolar)': concentration,\n", " 'MoA': moa})\n", " writer.writerow(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Retrieve data \n", "A CSV file is first created in the ``home`` directory. The CSV file can then be downloaded to your local machine. To download it, click ``File > Open``, select the CSV file and open it, then click ``File > Download``.\n", "\n", "If you are running the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true), click on the ``Files`` icon on the left-hand side. The files are saved under the ``root`` directory. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "code_folding": [ 2, 21 ] }, "outputs": [], "source": [ "home = os.path.expanduser(\"~\")\n", "csvfile = NamedTemporaryFile(\"w\", delete=False, newline='', dir=home, suffix=\".csv\")\n", "try:\n", " fieldnames = [\n", " 'Compound', 'Screen', 'Plate', 'Image',\n", " 'InChIKey', 'Concentration (microMolar)', 'MoA']\n", " writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n", " writer.writeheader()\n", " for compound in compounds:\n", " qs1 = {'key': TYPE, 'value': compound}\n", " url1 = URL.format(**qs1)\n", " json_data = session.get(url1).json()\n", " for m in json_data['maps']:\n", " qs2 = {'key': TYPE, 'value': compound, 'compound_id': m['id']}\n", " url2 = SCREENS_PROJECTS_URL.format(**qs2)\n", " json_data = session.get(url2).json()\n", " for s in json_data['screens']:\n", " compound = s['extra']['value']\n", " qs3 = {'key': TYPE, 'value': compound, 'screen_id': s['id']}\n", " url3 = PLATES_URL.format(**qs3)\n", " parse_annotation(writer, session.get(url3).json(), s['name'], 'plates')\n", "finally:\n", " csvfile.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Explore the data\n", "Read the generated CSV file into a dataframe." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Compound | \n", "Screen | \n", "Plate | \n", "Image | \n", "InChIKey | \n", "Concentration (microMolar) | \n", "MoA | \n", "
|---|---|---|---|---|---|---|---|
| 1940 | \n", "ml9 | \n", "idr0094-ellinger-sarscov2/screenB (216) | \n", "ESP0025959 | \n", "10631902 | \n", "OZSMSRIUUDGTEP-UHFFFAOYSA-N | \n", "0.00636 | \n", "unknown | \n", "
| 2056 | \n", "ml9 | \n", "idr0094-ellinger-sarscov2/screenB (216) | \n", "ESP0025961 | \n", "10633852 | \n", "OZSMSRIUUDGTEP-UHFFFAOYSA-N | \n", "0.00636 | \n", "unknown | \n", "
| 2007 | \n", "ml9 | \n", "idr0094-ellinger-sarscov2/screenB (216) | \n", "ESP0025960 | \n", "10633344 | \n", "OZSMSRIUUDGTEP-UHFFFAOYSA-N | \n", "0.00636 | \n", "unknown | \n", "
| 1991 | \n", "ml9 | \n", "idr0094-ellinger-sarscov2/screenB (216) | \n", "ESP0025960 | \n", "10633350 | \n", "OZSMSRIUUDGTEP-UHFFFAOYSA-N | \n", "0.00636 | \n", "unknown | \n", "
| 2044 | \n", "ml9 | \n", "idr0094-ellinger-sarscov2/screenB (216) | \n", "ESP0025961 | \n", "10633848 | \n", "OZSMSRIUUDGTEP-UHFFFAOYSA-N | \n", "0.00636 | \n", "unknown | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 1871 | \n", "ml9 | \n", "idr0017-breinig-drugscreen/screenA (96) | \n", "PTEN-/-_LOPAC_Plate_1_Replicate_2 | \n", "1741163 | \n", "unknown | \n", "unknown | \n", "Inhibitor | \n", "
| 1872 | \n", "ml9 | \n", "idr0017-breinig-drugscreen/screenA (96) | \n", "PTEN-/-_LOPAC_Plate_1_Replicate_2 | \n", "1741162 | \n", "unknown | \n", "unknown | \n", "Inhibitor | \n", "
| 1873 | \n", "ml9 | \n", "idr0017-breinig-drugscreen/screenA (96) | \n", "PTEN-/-_LOPAC_Plate_1_Replicate_2 | \n", "1741160 | \n", "unknown | \n", "unknown | \n", "Inhibitor | \n", "
| 1860 | \n", "ml9 | \n", "idr0017-breinig-drugscreen/screenA (96) | \n", "PI3KCA_mt-/wt+_LOPAC_Plate_1_Replicate_1 | \n", "1752609 | \n", "unknown | \n", "unknown | \n", "Inhibitor | \n", "
| 716 | \n", "loratadine | \n", "idr0094-ellinger-sarscov2/screenA (9) | \n", "GUF0000027 | \n", "10536416 | \n", "JCCNYMKQOSZNPW-UHFFFAOYSA-N | \n", "unknown | \n", "unknown | \n", "
2099 rows × 7 columns
\n", "