{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide a set of notebooks to show how the GDSCTools package can be used in ipython / ipython notebook.\n", "\n", "The source code is available on github https://github.com/CancerRxGene/gdsctools\n", "Would you have any issues (bug related), please fill an issue here https://github.com/CancerRxGene/gdsctools/issues\n", "\n", "In this notebook, we will simply give a flavour of what can be done. Other notebooks will provide more detailed examples. \n", "\n", "\n", "Documentation is also available for users and developers in a dedicated entry page on Pypi and http://gdsctools.readthedocs.org\n", "\n", "
\n", "\n", "
\n", "

**Other notebooks:**

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this package is to provide tools related to the GDSC project \n", "(Genomics of Drug Sensitivity in Cancer) http://www.cancerrxgene.org/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, GSDSTools provides functionalities to identify associations between drugs and genomic features across a set of cell lines " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The genomic features are provided within the packages. Users need to provide IC50 for a set of drugs and a set of cell lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide an example to play with. First let us get this IC50 test file and ad it" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from gdsctools import ic50_test" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "location: /home/cokelaer/Work/virtualenv/lib/python2.7/site-packages/gdsctools-0.13.0-py2.7.egg/gdsctools/data/IC50_10drugs.tsv\n", "description: IC50s for 10 public drugs across cell lines\n", "authors: GDSC consortium\n", "\n" ] } ], "source": [ "print(ic50_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is just a file with a location and description. It can be read using\n", "the IC50 class" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 988\n", "Percentage of NA 0.206569746043\n", "\n" ] } ], "source": [ "from gdsctools import IC50\n", "data = IC50(ic50_test)\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, it contains 11 drugs across 988 cell lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, there is a genomic feature data set provided, which can be read \n", "with the GenomicFeatures class" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from gdsctools import genomic_features, GenomicFeatures" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Genomic features distribution\n", "Number of unique tissues 27\n", "Here are the first 10 tissues: myeloma, nervous_system, soft_tissue, bone, lung_NSCLC, skin, Bladder, cervix, lung_SCLC, lung\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 47 unique features distributed as\n", "- Mutation: 47\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "gf = GenomicFeatures(genomic_features)\n", "print(gf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This file is going to be downloaded automatically when an analysis \n", "is performed. However, you may provide your own file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now perform the analysis using the ANOVA class" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from gdsctools import ANOVA" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TISSUE FACTOR : included\n", "MEDIA FACTOR : NOT included\n", "MSI FACTOR : included\n", "FEATURE FACTOR : included\n" ] } ], "source": [ "an = ANOVA(data, genomic_features=genomic_features.filename)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 988\n", "Percentage of NA 0.206569746043\n", "\n", "Genomic features distribution\n", "Number of unique tissues 27\n", "Here are the first 10 tissues: lung_NSCLC, prostate, stomach, nervous_system, skin, Bladder, leukemia, kidney, thyroid, soft_tissue\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 47 unique features distributed as\n", "- Mutation: 47\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "print(an)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so, we have 11 drugs, 677 features across 988 cell lines (27 tissues). This \n", "is a PANCAN analysis (across several cancer cell types).\n", "\n", "We can analysis the entire data set, which takes some time (still reasonable; about 1 minute dependiing on your system). " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " [-----------------100%-----------------] 11 of 11 complete in 3.4 sec\n", "\n" ] } ], "source": [ "results = an.anova_all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All results are now in the new variable results, which can be looked at. This is a dataframe formatted variable using Pandas library. Each association can be accessed to using a unique identifier from 0 to the length of the dataframe:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "ASSOC_ID 1\n", "FEATURE TP53_mut\n", "DRUG_ID 1047\n", "DRUG_NAME NaN\n", "DRUG_TARGET NaN\n", "N_FEATURE_neg 292\n", "N_FEATURE_pos 554\n", "FEATURE_pos_logIC50_MEAN 4.06932\n", "FEATURE_neg_logIC50_MEAN 2.49511\n", "FEATURE_delta_MEAN_IC50 1.57421\n", "FEATURE_IC50_effect_size 1.39063\n", "FEATURE_neg_Glass_delta 1.09839\n", "FEATURE_pos_Glass_delta 1.68301\n", "FEATURE_neg_IC50_sd 1.4332\n", "FEATURE_pos_IC50_sd 0.935351\n", "FEATURE_IC50_T_pval 1.27218e-68\n", "ANOVA_FEATURE_pval 1.57507e-58\n", "ANOVA_TISSUE_pval 5.54188e-44\n", "ANOVA_MSI_pval 0.0259029\n", "ANOVA_MEDIA_pval NaN\n", "ANOVA_FEATURE_FDR 8.03288e-54\n", "Name: 0, dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.df.ix[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, we can plot the histogram of the FDR columns:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD/CAYAAAAKVJb/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEMdJREFUeJzt3UFsW1d2xvHvpAYGWYmxEIx2pRVkTyXunogUzL6a1FkF\nWljpdJdg6k68SrOqg3iRXZyxFl7OuE32E40FrdukIbqdWs7sxoBlqelqMChOF7z0eaFoieJ71GXu\n+/8AwrzXj4+Px3xHlx9J2dxdAICyvJD7AAAAzaO5A0CBaO4AUCCaOwAUiOYOAAW6dNYGZrYp6VjS\nW+7+i7G5VXe/+7w5AEAep67czWxd0oa7P5C0amY9M1uT5GlOZrY2Ya437wMHADzfqc3d3R+4+z+k\n4UvuPpB0TcMVuiQdSNp4zhwAIJNpYpklSe9K+pc01ZH0tLLJsqSlCXMAgEzObO7u/j+SPjGzr8zs\n2ws4JgBATac290qWPpD0n5J+LulI0uW0SUfSE0k+Nnc4YV/8ngMAmIG723lvc9ZHITf0w6b9UNJv\nJa2muVVJv3/O3KQD5OKuDz/8MPsxLMqFWlALanH6ZVZnxTKfS/o7M3tF0pG7fylJZnY1fZLmyIer\n+olzAIA8Tm3u7v69pJ0J81PNYbLvvvsu9yEsDGoRqEWgFvXxDdUMej2+BjBCLQK1CNSiPquT6Zzr\njsz8ou4LAEphZvI5vKEKAPgRorlnsL+/n/sQFga1CNQilFKLlZWuzGzmy8pKd+b7PvNLTACA2Tx+\n/EcNvwY06+3PncY8Q+YOAHNiZqrT3KVhcydzBwBIorlnUUqe2ARqEahFoBb10dwBoEBk7gAwJ2Tu\nAIBG0dwzIE8M1CJQi0At6qO5A0CByNwBYE7I3AEAjaK5Z0CeGKhFoBaBWtRHcweAApG5A8CckLkD\nABpFc8+APDFQi0AtArWoj+YOAAUicweAOSFzBwA0iuaeAXlioBaBWgRqUR/NHQAKROYOAHNC5g4A\naBTNPQPyxEAtArUI1KK+M5u7mW2ny63K3K3R31XmNs1svToHAMjj1MzdzNYlPXT378zsvqQ77r5n\nZk8lHUr6+zRek3TF3b9Mzf0/3H0wti8ydwCtssiZ+6qkjXT9II0l6bq7v+rue2l8TdJxZbsNAQCy\nObW5u/tdd99Jw9ckfZ2uX04RzI007kh6WrnpcrOHWRbyxEAtArUI1KK+S9NslGKXb0ZRy6jhm9mb\nKboBACyQqT7nbmb/6O630/VtSYcpX78h6UjSK5J2U/6+qWH+fntsH2TuAFolZ+Z+5srdzLYrjX1d\n0kNFPLMsaTeNr0ra0zCX3520r62tLXW7XUlSp9NRr9dTv9+XFC/DGDNmzLiUcRiN+1OM9yXdU13T\nfFrmvoar85ckvVVZnUuVFbqZXZf0KM3tTNgXK/dkf3//2ZOg7ahFoBahlFos7Mrd3R9owpuj7v7F\nhLkTDR0AkAe/WwYA5mSRP+cOAPgRorlncPLNlvaiFoFaBGpRH80dAApE5g4Ac0LmDgBoFM09A/LE\nQC0CtQjUoj6aOwAUiMwdAOaEzB0A0CiaewbkiYFaBGoRqEV9NHcAKBCZOwDMCZk7AKBRNPcMyBMD\ntQjUIlCL+mjuAFAgMncAmBMydwBAo2juGZAnBmoRqEWgFvXR3AGgQGTuADAnZO4AgEbR3DMgTwzU\nIlCLQC3qo7kDQIHI3AFgTsjcAQCNorlnQJ4YqEWgFoFa1EdzB4ACkbkDwJzkzNwvnblrs+109RV3\n/yDNbUo6lrTq7nefNwcAyOPUWMbM1iXtpma9amZvmNmaJHf3B2mbtQlzvXkf+I8ZeWKgFoFaBGpR\n31mZ+6qkjXT9II2vabhCH81tPGcOAJDJ1Jm7mX0l6Z8k/ULSHXcfpJX9m5KWJH1emdtw95tjtydz\nB9AqC/859xS7fOPug/PeAQDg4p35hmqyXlmJH0m6nK53JD3R8EdTde5w0k62trbU7XaHG3U66vV6\n6vf7kiJja8O4micuwvHkHI/mFuV4co4Hg4Hee++9hTmenONPP/20iP4QRuP+FON9SfdU15mxjJlt\nVz4Rs65h477q7jtmdkPSbtr0B3Pjq3ximbC/v//sSdB21CJQi1BKLXLGMqc299TM72u4Wn9J0lvu\nvmdm1yU9knTF3XfStifmxvZFcwfQKgvb3JtEcwfQNgv/hiqadTKPay9qEahFoBb10dwBoEDEMgAw\nJ8QyAIBG0dwzIE8M1CJQi0At6qO5A0CByNwBYE7I3AEAjaK5Z0CeGKhFoBaBWtRHcweAApG5A8Cc\nkLkDABpFc8+APDFQi0AtArWoj+YOAAUicweAOSFzBwA0iuaeAXlioBaBWgRqUR/NHQAKROYOAHNC\n5g4AaBTNPQPyxEAtArUI1KI+mjsAFIjMHQDmhMwdANAomnsG5ImBWgRqEahFfTR3ACgQmTsAzMnC\nZ+5mdmvS2My2K3ObZrZenQMA5HFmc0/NenNs+l0z+4Okh2mbNUnu7g/SuNf0gZaEPDFQi0AtArWo\n78zm7u53JR2MTV9391fdfS+Nr0k6TtcPJG00d4gAgPOaKnM3s9+5+88q4+uSHkl6zd0/MbM7ku64\n+8DM1iVtuPvNsX2QuQNolZyZ+6VZ7s7ddyTJzN5MzRwAsEDO/VFIM9s2s79Nw0NJVyQdSbqc5jpp\nHs9BnhioRaAWgVrUN+3KvfqS4KGkr9P1ZUm7aXxV0p6k1TR3wtbWlrrdriSp0+mo1+up3+9Lin9M\nxu0ajyzK8eQcDwaDhTqenOPBYLBQx1P3+S2Nxv0pxvuS7qmuMzN3M9uU9GtJv6rEMaNPz1xx99tp\nbpTDXxltN7YfMncArZIzc+dLTAAwJwv/JSY06+RLtvaiFoFaBGpRH80dQHFWVroys1qXlZVu7odR\nC7EMgOLUj0MkyVS3ZxHLAAAaRXPPgDwxUItALQK1qI/mDgAFInMHUBwyd1buAFAkmnsG5ImBWgRq\nEahFfTR3ACgQmTuA4pC5s3IHgCLR3DMgTwzUIlCLQC3qo7kDQIHI3AEUh8ydlTsAFInmngF5YqAW\ngVoEalEfzR0ACkTmDqA4ZO6s3AGgSDT3DMgTA7UI1CJQi/po7gBQIDJ3AMUhc2flDgBForlnQJ4Y\nqEWgFoFa1EdzB4ACkbkDKA6ZOyt3ACjSVM3dzG6NjTfNbN3Mtk+bw2TkiYFaBGoRqEV9Zzb31Kw3\nK+M1Se7uD0bjCXO9OR0vAGAKU2XuZvY7d/9Zun5L0lfuvmdm65Jek7Q8Nrfm7rfH9kHmDuBCkLlP\nn7lXd9yR9LQyXpa0NGEOAJAJb6hmQJ4YqEWgFoFa1Hdpyu2qryuOJF1O1zuSnqS/r84dNnJ0AICZ\nTNvcq7HMfUmvS9qTtCppN81fnTD3A1tbW+p2u5KkTqejXq+nfr8vKX5St2Hc7/cX6ngYL854ZFGO\nJ9d4NFfn9tK+pH7lumYYx7HM8nh+eCzT3v++pHuq68w3VM1sU9KvJf3K3XfS3HVJjyRdOW1ubD+8\noQrgQvCG6hSZu7t/4e7L1Ybt7jvu/uCsOUx28qd6e1GLQC0CtaiPN1QBoED8bhkAxSGWYeUOAEWi\nuWdAnhioRaAWgVrUR3MHgAKRuQMoDpk7K3cAKBLNPQPyxEAtArUI1KI+mjsAFIjMHUBxyNxZuQNA\nkWjuGZAnBmoRqEWgFvXR3AGgQGTuAIpD5s7KHQCKRHPPgDwxUItALQK1qI/mDgAFInMHUBwyd1bu\nAFAkmnsG5ImBWgRqEahFfTR3ACgQmTuA4pC5s3IHgCLR3DMgTwzUIlCLQC3qo7kDQIHI3AEUh8yd\nlTsAFInmngF5YqAWgVoEalEfzR0ACjRT5m5mt9z9AzPbdve7aW5T0rGk1dHc2G3I3AFcCDL32Vfu\n75rZHyQ9lCQzWxvevz9I496M+wUANGDW5n7d3V919700vqbhql2SDiRt1D6ygpEnBmoRqEWgFvXN\n2twvm9m6md1I446kp5W/X653WACAOmp9zt3MbknalfSWpDvuPjCzdUkb7n5zbFsydwAXgsxdunTu\nuzLblnTo7l9KOpR0RdKRpMtpk06aP2Fra0vdbne4UaejXq+nfr8vKV6GMWbMmHHd8dC+pH7lumYY\nq9bx/PBYpr3/fUn3VNe5V+5m9oakr939+7Ry/036q6vuvpOiml13H4zdjpV7sr+/P/YkbC9qEahF\nqFsLVu4zrNzdfc/MNocHrSejJm5mV1MkczTe2AEAF4vfLQOgOKzc+YYqABSJ5p7ByTdb2otaBGoR\nqEV9NHcAKBCZO4DikLmzcgeAItHcMyBPDNQiUItALeqjuQNAgcjcARSHzJ2VOwAUieaeAXlioBaB\nWoTFqMVPZGa1Ljmd+3fLAEA7/FlNRDu5kLkDKE5TmXv+fZC5AwAqaO4ZLEaeuBioRaAWgVrUR3MH\ngAKRuQMoDpk7K3cAKBLNPQPyxEAtArUI1KI+mjsAFIjMHUBxyNxZuQNAkWjuGZAnBmoRqEWgFvXR\n3AGgQGTuAIpD5s7KHQCKRHPPgDwxUItALQK1qI/mDgAFInMHUBwy9wZX7ma2aWbrZrbd1D6x2FZW\nurX/G7KVlW7uh9EIaoFF00hzN7M1Se7uD9K418R+S1VKnvj48R81XJXMfhnuI68mGnMptVgUpZwj\nOTW1cr8m6ThdP5C00dB+izQYDHIfwgL5q+yr3SYaM5rFOVJfU/9BdkfS08p4edJGn3322cx38MIL\nL+idd97Riy++OPM+FsXx8fHZG7XG/6lOc3z8OO//ML9IVla6tVf/P/3pX+tPf/qumQOa0ehxvP/+\n+1mP48euqeY+lV/+8r9mvq37rpaWlvT222/XOoa6J8AiPPlR9ZP05hniFUidfeSv5fBxfCjpn2vs\nJf/jyK2p5n4k6XK63pF0OGkj93+d+Q7+8pf/1csvvzzz7UfqngBNPPk//vi2Pvroo1r74IfMyJ/V\nzCcaMFT/h2Uzz826t0cjH4VMb6i+7u47ZnZD0q67D8a2IZgEgBnM8lHIRlbu7v6tmb1uZuuSjsYb\n+6wHBwCYzYV9iQmoMrMb7v5Jur6p4aetVt39bt4jA/Iws1vu/kFlfOK8OM+5ciG/fqDtX3Ays+10\nuVWZa21N0iu8jXS9td+RMLO19DzYrsy18nlRedzXJ8wVX4v0GDcr4/HzYu2858rcm3ubT17pWSPb\nTT9lV83sjbbXZEybvyNx092/kLRkZr22Pi/S4z5Ij/tRG2uR+sNBZWrSeXGuc+UiVu5tPnklaVXx\nmA/SuLU1MbO10QmbTPUdidKkl9f/Lknufju9T9Xa54Wkj9OfV1pci+r7kpPOi6UJc891Ec29lSfv\niLvfdfedNHxN0tdqd01eyn0AC+JvJC2nl9s30lwrnxfu/q2kAzN7qnj8raxFk/iVvxckvcz8ZtIn\nidoirdr3xqaPNcV3JAp1mBrbaCXfyk83mNmSpP+WdF3SXTO7kvmQcqn++49/d+iJznmuXMQ3VKf6\nglMLrLv7zXS9rTVZTSfusoar1p6k30i6KmlPw8hqN+PxXaRDRcZ6rOFKvq0/6N6V9Lm7f29mx5J+\nrnaeI9VY5r6k13XyvJj6XLmIlfv9dCBKf/7+Au5zoZjZtrvfTtfXJf1WLayJu3/h7l+m4VKaG0jP\n6jLxOxKF+jfFc6CjYf7eyueFhm+cfp+u7GnY2FtVi/TK7fXRp4Uqr+ienRfnPVcu5HPu6YAfafhm\nyc5Z25ck/UPc1/AJ+5Kkt9x9r801wVB6DhxJujp6VdfW50V63+GhpMujx93WWjSFLzEBQIF4QxUA\nCkRzB4AC0dwBoEA0dwAoEM0dAApEcweAAtHcAaBANHcAKND/A5YGcljUdbosAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%pylab inline\n", "results.df['ANOVA_FEATURE_FDR'].hist(bins=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next notebooks, we will now investigate more precisely \n", "- the input data sets\n", "- the analysis and in particular how to look at\n", " - one association\n", " - associations for a given drug\n", " - all associations (what we did here when we called anova_all() function)\n", "- How to generate HTML reports\n", "- The settings\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "**Author: Thomas Cokelaer, Nov 2015**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.5" } }, "nbformat": 4, "nbformat_minor": 0 }