{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# myChEMBL drug ADME data\n", "\n", "### myChEMBL team, ChEMBL group, EMBL-EBI.\n", "\n", "A demo notebook illustrating one way of extracting _in vivo_ ADME data from myChEMBL for marketed drugs (as defined by the FDA Orange Book)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%matplotlib inline\n", "%pylab inline\n", "\n", "from __future__ import print_function, division\n", "\n", "import warnings\n", "\n", "import psycopg2\n", "import pandas as pd\n", "pd.set_option('display.max_colwidth', 1000)\n", "from IPython.display import HTML, Javascript" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configuration" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "IPython.notebook.kernel.execute(\"current_url = \" + \"'\"+document.URL+\"'\");" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get the base URL of the VM:\n", "display(Javascript('IPython.notebook.kernel.execute(\"current_url = \" + \"\\'\"+document.URL+\"\\'\");'))\n", "\n", "# Base URL for report cards...\n", "# NB This currently uses the main (external) ChEMBL instance, as myChEMBL doesn't yet offer report cards\n", "\n", "report_url = 'https://www.ebi.ac.uk/chembl'\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cur_base_url = current_url.split('http://')[1].split('/')[0]\n", "base_url = 'localhost:8000' if (cur_base_url == 'localhost:9612' or cur_base_url == '127.0.0.1:9612') else current_url.split('http://')[1].split(':')[0] + ':8000'\n", "img_url = 'http://' + base_url + '/chemblws' " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'http://tannin.windows.ebi.ac.uk:8000/chemblws'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "img_url" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# PostgreSQL connection details...\n", "\n", "username, password, hostname, database = 'mychembl', 'read', 'localhost', 'chembl_21'" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Connect to PostgreSQL...\n", "\n", "conn = psycopg2.connect(host=hostname, database=database, user=username, password=password)\n", "\n", "cursor = conn.cursor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Marketed Drugs\n", "\n", "Here, we query myChEMBL for a list of marketed drugs, and for any alternative salt/solvate forms of those drugs. This is done by obtaining the parent form (_i.e._ the bioactive component) for each drug, and requesting all versions (_i.e._ salt/solvate forms) containing that parent.\n", "\n", "This is done because data may be recorded in the literature for versions of the drug other than the marketed form. Interest is _usually_ in the parent form (_i.e._ stripped of any salt/solvate components), as this is the bioactive component (ignoring prodrugs and active metabolites for the present). Thus, to ensure the greatest amount of data is obtained for a given bioactive compound, it normally makes sense to extend the query out to versions sharing the same parent. \n", "\n", "Note that in some cases this might not be appropriate, as different salt forms might not behave the same in certain assays (_e.g._ dissolution rates and hence oral bioavailabilities might differ). If this was of concern, the query below could be simplified so as only to retrieve versions flagged as the marketed drugs (a simplified version is included but commented out).\n", "\n", "In practice, the differences between the two approaches appear to be fairly small, and the more inclusive query is used mainly for illustrative purposes." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Get list of marketed drugs, including any alternative salt forms...\n", "\n", "drugs_sql = \"\"\"\n", "select\n", " c.chembl_id as parent_chembl_id\n", " , a.chembl_id\n", " , a.pref_name\n", " , d.canonical_smiles as parent_smiles\n", "from\n", " molecule_dictionary a\n", " , molecule_hierarchy b\n", " , chembl_id_lookup c\n", " , compound_structures d\n", " , ( -- List of parent structures (i.e. the bioactive form) of marketed drugs\n", " select distinct\n", " b.parent_molregno \n", " from\n", " molecule_dictionary a \n", " , molecule_hierarchy b\n", " where\n", " a.molregno = b.molregno\n", " and a.max_phase = 4\n", " ) e\n", "where\n", " a.molregno = b.molregno\n", "and b.parent_molregno = c.entity_id and c.entity_type = 'COMPOUND'\n", "and b.parent_molregno = d.molregno\n", "and b.parent_molregno = e.parent_molregno -- restrict to cases where parent is that of as marketed drug\n", "order by\n", " parent_chembl_id\n", " , chembl_id\n", "\"\"\"\n", "\n", "# Simplified query that doesn't pull in other salt forms...\n", "\n", "# drugs_sql = \"\"\"\n", "# select\n", "# c.chembl_id as parent_chembl_id\n", "# , a.chembl_id\n", "# , a.pref_name\n", "# , d.canonical_smiles as parent_smiles\n", "# from\n", "# molecule_dictionary a\n", "# , molecule_hierarchy b\n", "# , chembl_id_lookup c\n", "# , compound_structures d\n", "# where\n", "# a.molregno = b.molregno\n", "# and b.parent_molregno = c.entity_id and c.entity_type = 'COMPOUND'\n", "# and b.parent_molregno = d.molregno\n", "# and a.max_phase = 4\n", "# order by\n", "# parent_chembl_id\n", "# , chembl_id\n", "# \"\"\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Run query...\n", "\n", "cursor.execute(drugs_sql)\n", "\n", "# Get column names...\n", "\n", "columns = [x[0] for x in cursor.description]\n", "\n", "# Retrieve data...\n", "\n", "rows = cursor.fetchall()\n", "\n", "# Convert to Pandas dataframe...\n", "\n", "drugs = pd.DataFrame(rows, columns=columns)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(2947, 4)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drugs.shape" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parent_chembl_idchembl_idpref_nameparent_smiles
0CHEMBL1000CHEMBL1000CETIRIZINEOC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3
1CHEMBL1000CHEMBL1201113CETIRIZINE HYDROCHLORIDEOC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3
2CHEMBL1000CHEMBL1607273CETIRIZINE DIHYDROCHLORIDEOC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3
3CHEMBL1002CHEMBL1002LEVOSALBUTAMOLCC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1
4CHEMBL1002CHEMBL1201061LEVALBUTEROL HYDROCHLORIDECC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1
5CHEMBL1002CHEMBL2062258LEVALBUTEROL TARTRATECC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1
6CHEMBL1002CHEMBL2106337LEVALBUTEROL SULFATECC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1
7CHEMBL1004CHEMBL1004DOXYLAMINECN(C)CCOC(C)(c1ccccc1)c2ccccn2
8CHEMBL1004CHEMBL1200392DOXYLAMINE SUCCINATECN(C)CCOC(C)(c1ccccc1)c2ccccn2
9CHEMBL1005CHEMBL1005REMIFENTANILCCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC
\n", "
" ], "text/plain": [ " parent_chembl_id chembl_id pref_name \\\n", "0 CHEMBL1000 CHEMBL1000 CETIRIZINE \n", "1 CHEMBL1000 CHEMBL1201113 CETIRIZINE HYDROCHLORIDE \n", "2 CHEMBL1000 CHEMBL1607273 CETIRIZINE DIHYDROCHLORIDE \n", "3 CHEMBL1002 CHEMBL1002 LEVOSALBUTAMOL \n", "4 CHEMBL1002 CHEMBL1201061 LEVALBUTEROL HYDROCHLORIDE \n", "5 CHEMBL1002 CHEMBL2062258 LEVALBUTEROL TARTRATE \n", "6 CHEMBL1002 CHEMBL2106337 LEVALBUTEROL SULFATE \n", "7 CHEMBL1004 CHEMBL1004 DOXYLAMINE \n", "8 CHEMBL1004 CHEMBL1200392 DOXYLAMINE SUCCINATE \n", "9 CHEMBL1005 CHEMBL1005 REMIFENTANIL \n", "\n", " parent_smiles \n", "0 OC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3 \n", "1 OC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3 \n", "2 OC(=O)COCCN1CCN(CC1)C(c2ccccc2)c3ccc(Cl)cc3 \n", "3 CC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1 \n", "4 CC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1 \n", "5 CC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1 \n", "6 CC(C)(C)NC[C@H](O)c1ccc(O)c(CO)c1 \n", "7 CN(C)CCOC(C)(c1ccccc1)c2ccccn2 \n", "8 CN(C)CCOC(C)(c1ccccc1)c2ccccn2 \n", "9 CCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drugs.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect drugs" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Take a copy, as the dataframe will be altered...\n", "\n", "drugs2 = drugs.copy()\n", "\n", "# Add images for parent structure and salt form(s)...\n", "\n", "drugs2['parent_img'] = drugs2['parent_chembl_id'].apply(lambda x: ''.format(img_url, x))\n", "drugs2['version_img'] = drugs2['chembl_id'].apply(lambda x: ''.format(img_url, x))\n", "\n", "# Add links to report cards (NB these are currently calls out to the main ChEMBL instance, as myChEMBL doesn't yet offer report cards)...\n", "\n", "drugs2['parent_chembl_id'] = drugs2['parent_chembl_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "drugs2['chembl_id'] = drugs2['chembl_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "\n", "# Subset and reorder columns for easier inspection...\n", "\n", "drugs2 = drugs2[['parent_chembl_id', 'parent_img', 'chembl_id', 'version_img', 'pref_name', 'parent_smiles']]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parent_chembl_idparent_imgchembl_idversion_imgpref_name
0CHEMBL1000CHEMBL1000CETIRIZINE
1CHEMBL1000CHEMBL1201113CETIRIZINE HYDROCHLORIDE
2CHEMBL1000CHEMBL1607273CETIRIZINE DIHYDROCHLORIDE
3CHEMBL1002CHEMBL1002LEVOSALBUTAMOL
4CHEMBL1002CHEMBL1201061LEVALBUTEROL HYDROCHLORIDE
" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(drugs2.head().drop('parent_smiles', 1).to_html(escape=False))\n", "\n", "# HTML(drugs2.drop('parent_smiles', 1).to_html(escape=False)) # Uncomment to view full table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load ADME data\n", "\n", "This query pulls back _in vivo_ PK data for humans and a variety of species used in drug development and safety studies." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# ADME data query...\n", "\n", "data_sql = \"\"\"\n", "SELECT\n", " cil.chembl_id as chembl_id\n", " , act.standard_type as data_type\n", " , act.standard_relation as modifier\n", " , act.standard_value as value\n", " , act.standard_units as units\n", " , ass.assay_organism as organism\n", " , ass.assay_tissue as tissue\n", " , td.chembl_id as target_id\n", "FROM\n", " activities act\n", " , assays ass\n", " , target_dictionary td\n", " , chembl_id_lookup cil\n", "WHERE \n", " act.assay_id = ass.assay_id\n", " AND td.tid = ass.tid\n", " AND act.standard_value IS NOT NULL\n", " AND act.data_validity_comment IS NULL\n", " AND((act.standard_type = 'Cmax'\n", " AND act.standard_units = 'nM')\n", " OR (act.standard_type = 'F'\n", " AND act.standard_units = '%%')\n", " OR(act.standard_type = 'Tmax'\n", " AND act.standard_units = 'hr')\n", " OR (act.standard_type = 'T1/2'\n", " AND act.standard_units = 'hr')\n", " OR (act.standard_type = 'Vd'\n", " AND act.standard_units = 'L.kg-1')\n", " OR (act.standard_type = 'Vdss'\n", " AND act.standard_units = 'L.kg-1')\n", " OR (act.standard_type = 'CL'\n", " AND act.standard_units = 'mL.min-1.kg-1'))\n", " AND ass.assay_test_type = 'In vivo'\n", " AND (ass.assay_tissue IS NULL\n", " OR lower(ass.assay_tissue) NOT IN ('blood', 'liver', 'adipose', 'spleen', 'prostate', 'skin', 'jejunum', 'lung', 'duodenum', 'hypothalamus', 'cortex', 'pancreas', 'interstitial fluid', 'lung epithelial lining fluid', 'small intestine', 'adrenal gland', 'heart', 'retina', 'brain', 'tumour', 'interstitial fluid (thigh)', 'stomach', 'kidney', 'intestinal segment', 'brain', 'csf', 'liver', 'Pancreas', 'testes', 'hypothalamus', 'peritoneal fluid', 'adrenals', 'intestine', 'gi tract', 'choroid/sclera', 'not recorded', 'muscle', 'kidney', 'bile', 'cornea', 'ovary'))\n", " AND upper(ass.assay_organism) IN ('HOMO SAPIENS', 'RATTUS NORVEGICUS', 'MUS MUSCULUS', 'CANIS LUPUS FAMILIARIS', 'MACACA MULATTA', 'MACACA FASCICULARIS','MACACA','CERCOPITHECIDAE','MARMOSETS', 'PRIMATES','MONKEY')\n", " AND act.molregno = cil.entity_id AND cil.entity_type = 'COMPOUND'\n", "ORDER BY\n", " chembl_id\n", " , target_id\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Run query...\n", "\n", "cursor.execute(data_sql)\n", "\n", "# Get column names...\n", "\n", "columns = [x[0] for x in cursor.description]\n", "\n", "# Retrieve data...\n", "\n", "rows = cursor.fetchall()\n", "\n", "# Convert to Pandas dataframe...\n", "\n", "data = pd.DataFrame(rows, columns=columns)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(41380, 8)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.shape" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_iddata_typemodifiervalueunitsorganismtissuetarget_id
0CHEMBL100049T1/2=3.28hrCanis lupus familiarisNoneCHEMBL373
1CHEMBL100181T1/2=0.6hrCanis lupus familiarisNoneCHEMBL373
2CHEMBL100367Vd=0.29L.kg-1Rattus norvegicusNoneCHEMBL376
3CHEMBL100367T1/2=5.7hrRattus norvegicusNoneCHEMBL376
4CHEMBL100367Vd=0.34L.kg-1Rattus norvegicusNoneCHEMBL376
\n", "
" ], "text/plain": [ " chembl_id data_type modifier value units organism \\\n", "0 CHEMBL100049 T1/2 = 3.28 hr Canis lupus familiaris \n", "1 CHEMBL100181 T1/2 = 0.6 hr Canis lupus familiaris \n", "2 CHEMBL100367 Vd = 0.29 L.kg-1 Rattus norvegicus \n", "3 CHEMBL100367 T1/2 = 5.7 hr Rattus norvegicus \n", "4 CHEMBL100367 Vd = 0.34 L.kg-1 Rattus norvegicus \n", "\n", " tissue target_id \n", "0 None CHEMBL373 \n", "1 None CHEMBL373 \n", "2 None CHEMBL376 \n", "3 None CHEMBL376 \n", "4 None CHEMBL376 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect data table" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Take a copy, as the dataframe will be altered...\n", "\n", "data2 = data.copy()\n", "\n", "# Add images...\n", "\n", "data2['img'] = data2['chembl_id'].apply(lambda x: ''.format(img_url, x))\n", "\n", "# Add links to report cards (NB these are currently calls out to the main ChEMBL instance, as myChEMBL doesn't yet offer report cards)...\n", "\n", "data2['chembl_id'] = data2['chembl_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "data2['target_id'] = data2['target_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "\n", "# Subset and reorder columns for easier inspection...\n", "\n", "data2 = data2[['chembl_id', 'img', 'data_type', 'modifier', 'value', 'units', 'organism', 'tissue', 'target_id']]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chembl_idimgdata_typemodifiervalueunitsorganismtissuetarget_id
0CHEMBL100049T1/2=3.28hrCanis lupus familiarisNoneCHEMBL373
1CHEMBL100181T1/2=0.6hrCanis lupus familiarisNoneCHEMBL373
2CHEMBL100367Vd=0.29L.kg-1Rattus norvegicusNoneCHEMBL376
3CHEMBL100367T1/2=5.7hrRattus norvegicusNoneCHEMBL376
4CHEMBL100367Vd=0.34L.kg-1Rattus norvegicusNoneCHEMBL376
" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(data2.head(5).to_html(escape=False))\n", "\n", "# HTML(data2.head(5).to_html(escape=False)) # NB Uncomment to render all rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merge Drug and ADME data" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Merge Pandas data frames...\n", "# NB This could obviously also be done in the database using SQL.\n", "\n", "df = pd.merge(drugs, data, how='inner', left_on='chembl_id', right_on='chembl_id')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(5013, 11)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parent_chembl_idchembl_idpref_nameparent_smilesdata_typemodifiervalueunitsorganismtissuetarget_id
0CHEMBL1005CHEMBL1005REMIFENTANILCCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OCCL=37mL.min-1.kg-1Homo sapiensNoneCHEMBL372
1CHEMBL1005CHEMBL1005REMIFENTANILCCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OCVdss=0.4L.kg-1Homo sapiensNoneCHEMBL612558
2CHEMBL1005CHEMBL1005REMIFENTANILCCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OCT1/2=0.8hrHomo sapiensNoneCHEMBL612558
3CHEMBL1005CHEMBL1005REMIFENTANILCCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OCCL=37mL.min-1.kg-1Homo sapiensNoneCHEMBL612558
4CHEMBL1009CHEMBL1009LEVODOPAN[C@@H](Cc1ccc(O)c(O)c1)C(=O)OCL=23mL.min-1.kg-1Homo sapiensNoneCHEMBL372
\n", "
" ], "text/plain": [ " parent_chembl_id chembl_id pref_name \\\n", "0 CHEMBL1005 CHEMBL1005 REMIFENTANIL \n", "1 CHEMBL1005 CHEMBL1005 REMIFENTANIL \n", "2 CHEMBL1005 CHEMBL1005 REMIFENTANIL \n", "3 CHEMBL1005 CHEMBL1005 REMIFENTANIL \n", "4 CHEMBL1009 CHEMBL1009 LEVODOPA \n", "\n", " parent_smiles data_type modifier value \\\n", "0 CCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC CL = 37 \n", "1 CCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC Vdss = 0.4 \n", "2 CCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC T1/2 = 0.8 \n", "3 CCC(=O)N(c1ccccc1)C2(CCN(CCC(=O)OC)CC2)C(=O)OC CL = 37 \n", "4 N[C@@H](Cc1ccc(O)c(O)c1)C(=O)O CL = 23 \n", "\n", " units organism tissue target_id \n", "0 mL.min-1.kg-1 Homo sapiens None CHEMBL372 \n", "1 L.kg-1 Homo sapiens None CHEMBL612558 \n", "2 hr Homo sapiens None CHEMBL612558 \n", "3 mL.min-1.kg-1 Homo sapiens None CHEMBL612558 \n", "4 mL.min-1.kg-1 Homo sapiens None CHEMBL372 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Uncomment to save data as spreadsheet...\n", "\n", "# df.to_csv('myChEMBL_drugs_ADME.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect merged table" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Take a copy, as the dataframe will be altered...\n", "\n", "df2 = df.copy()\n", "\n", "# Add images...\n", "\n", "df2['parent_img'] = df2['parent_chembl_id'].apply(lambda x: ''.format(img_url, x))\n", "\n", "# Add links to report cards (NB these are currently calls out to the main ChEMBL instance, as myChEMBL doesn't yet offer report cards)...\n", "\n", "df2['parent_chembl_id'] = df2['parent_chembl_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "df2['chembl_id'] = df2['chembl_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "\n", "df2['target_id'] = df2['target_id'].apply(lambda x: '{}'.format(report_url, x, x))\n", "\n", "# Subset and reorder columns for easier inspection...\n", "\n", "df2 = df2[['parent_chembl_id', 'parent_img', 'chembl_id', 'pref_name', 'parent_smiles', 'data_type', 'modifier', 'value', 'units', 'organism', 'tissue', 'target_id']]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parent_chembl_idparent_imgchembl_idpref_namedata_typemodifiervalueunitsorganismtissuetarget_id
0CHEMBL1005CHEMBL1005REMIFENTANILCL=37mL.min-1.kg-1Homo sapiensNoneCHEMBL372
1CHEMBL1005CHEMBL1005REMIFENTANILVdss=0.4L.kg-1Homo sapiensNoneCHEMBL612558
2CHEMBL1005CHEMBL1005REMIFENTANILT1/2=0.8hrHomo sapiensNoneCHEMBL612558
3CHEMBL1005CHEMBL1005REMIFENTANILCL=37mL.min-1.kg-1Homo sapiensNoneCHEMBL612558
4CHEMBL1009CHEMBL1009LEVODOPACL=23mL.min-1.kg-1Homo sapiensNoneCHEMBL372
" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(df2.head().drop('parent_smiles', 1).to_html(escape=False)) # NB smiles column excluded from rendering of table only\n", "\n", "# HTML(df2.drop('parent_smiles', 1).to_html(escape=False)) # NB Uncomment to render all rows" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parent_chembl_idparent_imgchembl_idpref_namedata_typemodifiervalueunitsorganismtissuetarget_id
1006CHEMBL1422CHEMBL1422SITAGLIPTINVdss=2.8L.kg-1Homo sapiensNoneCHEMBL372
1007CHEMBL1422CHEMBL1422SITAGLIPTINCL=6mL.min-1.kg-1Homo sapiensNoneCHEMBL372
1008CHEMBL1422CHEMBL1422SITAGLIPTINCL=6mL.min-1.kg-1Homo sapiensNoneCHEMBL372
1009CHEMBL1422CHEMBL1422SITAGLIPTINCL=1.3mL.min-1.kg-1Homo sapiensNoneCHEMBL372
1010CHEMBL1422CHEMBL1422SITAGLIPTINCL=6mL.min-1.kg-1Canis lupus familiarisNoneCHEMBL373
1011CHEMBL1422CHEMBL1422SITAGLIPTINCL=60mL.min-1.kg-1Rattus norvegicusNoneCHEMBL376
1012CHEMBL1422CHEMBL1422SITAGLIPTINCL=60mL.min-1.kg-1Rattus norvegicusNoneCHEMBL376
1013CHEMBL1422CHEMBL1422SITAGLIPTINT1/2=1.7hrRattus norvegicusPlasmaCHEMBL376
1014CHEMBL1422CHEMBL1422SITAGLIPTINT1/2=1.7hrRattus norvegicusNoneCHEMBL376
1015CHEMBL1422CHEMBL1422SITAGLIPTINCmax=330nMRattus norvegicusNoneCHEMBL376
1016CHEMBL1422CHEMBL1422SITAGLIPTINCL=60mL.min-1.kg-1Rattus norvegicusNoneCHEMBL376
1017CHEMBL1422CHEMBL1422SITAGLIPTINT1/2=12hrHomo sapiensNoneCHEMBL612558
1018CHEMBL1422CHEMBL1422SITAGLIPTINCL=6mL.min-1.kg-1Homo sapiensNoneCHEMBL612558
1019CHEMBL1422CHEMBL1422SITAGLIPTINVdss=2.8L.kg-1Homo sapiensNoneCHEMBL612558
1020CHEMBL1422CHEMBL1422SITAGLIPTINCL=28mL.min-1.kg-1Macaca mulattaNoneCHEMBL614875
1021CHEMBL1422CHEMBL539077NoneCmax=330nMRattus norvegicusNoneCHEMBL376
1022CHEMBL1422CHEMBL539077NoneT1/2=1.7hrRattus norvegicusNoneCHEMBL376
1023CHEMBL1422CHEMBL539077NoneCL=60mL.min-1.kg-1Rattus norvegicusNoneCHEMBL376
1024CHEMBL1422CHEMBL539077NoneCmax=330nMCercopithecidaePlasmaCHEMBL612558
1025CHEMBL1422CHEMBL539077NoneCL=60mL.min-1.kg-1Rattus norvegicusNoneCHEMBL612558
1026CHEMBL1422CHEMBL539077NoneCmax=330nMRattus norvegicusPlasmaCHEMBL612558
1027CHEMBL1422CHEMBL539077NoneCL=6mL.min-1.kg-1Canis lupus familiarisNoneCHEMBL612558
1028CHEMBL1422CHEMBL539077NoneCmax=2200nMCanis lupus familiarisPlasmaCHEMBL612558
1029CHEMBL1422CHEMBL539077NoneCL=28mL.min-1.kg-1CercopithecidaeNoneCHEMBL612558
" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# An example of a case where extra data is obtained by looking at other salt forms: CHEMBL539077 (the hydrochloride salt)\n", "# would not have been found using the simple query.\n", "\n", "HTML(df2[df2['parent_chembl_id'].str.contains('CHEMBL1422')].drop('parent_smiles', 1).to_html(escape=False))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 0 }