{ "metadata": { "name": "", "signature": "sha256:f87ed21695bada3bc17557c798ab0b58fe4317b70c4458dfb66cdc3078f4ac0c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Scaffold analysis of ChEMBL data with pandas and RDKit\n", "### Dr. Samo Turk\n", "#### BioMed X Innovation Center, Heidelberg \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Python
\n", "[Python](http://www.python.org/) very popular programming language especially in science.
\n", "\n", "pandas
\n", "[Pandas](http://pandas.pydata.org/) is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. \n", "No need for R!
\n", "\n", "RDKit
\n", "[RDKit](http://www.rdkit.org/) is an open source chemistry toolkit.
\n", "\n", "IPython
\n", "[IPython](http://ipython.org/) interactive Python shell. Has web-based interactive computational environment IPython Notebook.
\n", "\n", "chembl_webresource_client
\n", "[chembl_webresource_client](https://github.com/chembl/chembl_webresource_client) Python client for accessing ChEMBL webservices.
\n", "\n", "### More notebooks: https://github.com/Team-SKI/snippets\n", "\n", " | bioactivityCount | \n", "chemblId | \n", "compoundCount | \n", "description | \n", "geneNames | \n", "organism | \n", "preferredName | \n", "proteinAccession | \n", "synonyms | \n", "targetType | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "9984 | \n", "CHEMBL1862 | \n", "2997 | \n", "Tyrosine-protein kinase ABL | \n", "Unspecified | \n", "Homo sapiens | \n", "Tyrosine-protein kinase ABL | \n", "P00519 | \n", "Proto-oncogene c-Abl,JTK7,ABL1,2.7.10.2,Abelso... | \n", "SINGLE PROTEIN | \n", "
\n", " | activity_comment | \n", "assay_chemblid | \n", "assay_description | \n", "assay_type | \n", "bioactivity_type | \n", "ingredient_cmpd_chemblid | \n", "name_in_reference | \n", "operator | \n", "organism | \n", "parent_cmpd_chemblid | \n", "reference | \n", "target_chemblid | \n", "target_confidence | \n", "target_name | \n", "units | \n", "value | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Unspecified | \n", "CHEMBL1063789 | \n", "Binding constant for ABL1(E255K) kinase domain | \n", "B | \n", "Kd | \n", "CHEMBL440084 | \n", "SB-431542 | \n", "> | \n", "Homo sapiens | \n", "CHEMBL440084 | \n", "Nat. Biotechnol., (2008) 26:1:127 | \n", "CHEMBL1862 | \n", "9 | \n", "Tyrosine-protein kinase ABL | \n", "nM | \n", "10000 | \n", "
\n", " | acdAcidicPka | \n", "acdBasicPka | \n", "acdLogd | \n", "acdLogp | \n", "alogp | \n", "chemblId | \n", "knownDrug | \n", "medChemFriendly | \n", "molecularFormula | \n", "molecularWeight | \n", "numRo5Violations | \n", "passesRuleOfThree | \n", "preferredCompoundName | \n", "rotatableBonds | \n", "smiles | \n", "species | \n", "stdInChiKey | \n", "synonyms | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "NaN | \n", "8.97 | \n", "3.71 | \n", "5.27 | \n", "3.82 | \n", "CHEMBL388978 | \n", "No | \n", "Yes | \n", "C28H26N4O3 | \n", "466.53 | \n", "0 | \n", "No | \n", "STAUROSPORINE | \n", "2 | \n", "CN[C@@H]1C[C@H]2O[C@@](C)([C@@H]1OC)n3c4ccccc4... | \n", "BASE | \n", "HKSZLNNOFSGOKW-FYTWVXJKSA-N | \n", "SID26755744,SID26755745 | \n", "
1 | \n", "NaN | \n", "6.88 | \n", "4.02 | \n", "4.13 | \n", "3.76 | \n", "CHEMBL1290072 | \n", "No | \n", "Yes | \n", "C19H16N2O2 | \n", "304.34 | \n", "0 | \n", "No | \n", "NaN | \n", "4 | \n", "COc1ccc(CNc2ccnc3oc4ccccc4c23)cc1 | \n", "NEUTRAL | \n", "MNMQBQOLMHVBJT-UHFFFAOYSA-N | \n", "NaN | \n", "
2 | \n", "12.27 | \n", "3.32 | \n", "4.18 | \n", "4.18 | \n", "5.10 | \n", "CHEMBL565609 | \n", "No | \n", "Yes | \n", "C28H28Cl2N6O4 | \n", "583.47 | \n", "2 | \n", "No | \n", "NaN | \n", "9 | \n", "CN1C(=O)C(=Cc2cnc(Nc3ccc(NC(=O)CCNC(=O)OC(C)(C... | \n", "NEUTRAL | \n", "APGJWCRRERWKDQ-UHFFFAOYSA-N | \n", "NaN | \n", "
3 | \n", "12.80 | \n", "7.86 | \n", "5.67 | \n", "6.62 | \n", "5.26 | \n", "CHEMBL1171085 | \n", "No | \n", "Yes | \n", "C30H25F3N6O | \n", "542.55 | \n", "2 | \n", "No | \n", "NaN | \n", "8 | \n", "CN(C)Cc1nccn1c2cc(NC(=O)c3ccc(C)c(c3)C#Cc4cnc5... | \n", "NEUTRAL | \n", "QZYXEXGJVTZDIZ-UHFFFAOYSA-N | \n", "NaN | \n", "
4 | \n", "8.13 | \n", "7.54 | \n", "4.88 | \n", "5.35 | \n", "4.75 | \n", "CHEMBL1171657 | \n", "No | \n", "Yes | \n", "C29H27F3N6O | \n", "532.56 | \n", "1 | \n", "No | \n", "NaN | \n", "7 | \n", "CN1CCN(Cc2ccc(NC(=O)c3ccc(C)c(c3)C#Cc4cnc5[nH]... | \n", "NEUTRAL | \n", "XGDBPKHGUYPWPJ-UHFFFAOYSA-N | \n", "NaN | \n", "
\n", " | chemblId | \n", "smiles | \n", "ROMol | \n", "knownDrug | \n", "preferredCompoundName | \n", "pIC50 | \n", "LE | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "CHEMBL388978 | \n", "CN[C@@H]1C[C@H]2O[C@@](C)([C@@H]1OC)n3c4ccccc4c5c6CNC(=O)c6c7c8ccccc8n2c7c35 | \n", "\n", " | No | \n", "STAUROSPORINE | \n", "6.677781 | \n", "0.267111 | \n", "
\n", " | chemblId | \n", "smiles | \n", "ROMol | \n", "knownDrug | \n", "preferredCompoundName | \n", "pIC50 | \n", "LE | \n", "Murcko_SMILES | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "CHEMBL388978 | \n", "CN[C@@H]1C[C@H]2O[C@@](C)([C@@H]1OC)n3c4ccccc4c5c6CNC(=O)c6c7c8ccccc8n2c7c35 | \n", "\n", " | No | \n", "STAUROSPORINE | \n", "6.677781 | \n", "0.267111 | \n", "O=C1NCc2c1c1c3ccccc3n3c1c1c2c2ccccc2n1C1CCCC3O1 | \n", "
\n", " | count | \n", "Murcko_SMILES | \n", "
---|---|---|
1 | \n", "17 | \n", "O=c1[nH]c2nc(Nc3ccccc3)ncc2cc1-c1ccccc1 | \n", "
2 | \n", "11 | \n", "O=C(Nc1cccc(Nc2nccc(-c3cccnc3)n2)c1)c1ccccc1 | \n", "
3 | \n", "10 | \n", "O=C(Nc1ccc(CN2CCNCC2)cc1)c1cccc(C#Cc2cnc[nH]2)c1 | \n", "
4 | \n", "9 | \n", "O=C(COc1ccc2c(=O)cc(-c3ccccc3)oc2c1)Nc1ccccn1 | \n", "
5 | \n", "8 | \n", "O=C(Nc1ccc(CN2CCNCC2)cc1)c1cccc(C#Cc2cnc3cccnn23)c1 | \n", "