{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide a set of notebooks to show how the GDSCTools package can be used in ipython / ipython notebook.\n", "\n", "The source code is available on github https://github.com/CancerRxGene/gdsctools\n", "Would you have any issues (bug related), please fill an issue here https://github.com/CancerRxGene/gdsctools/issues\n", "\n", "In this notebook, we will simply give a flavour of what can be done. Other notebooks will provide more detailed examples. \n", "\n", "\n", "Documentation is also available for users and developers in a dedicated entry page on Pypi and http://gdsctools.readthedocs.org\n", "\n", "
\n", "\n", "
\n", "

**Other notebooks:**

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this package is to provide tools related to the GDSC project \n", "(Genomics of Drug Sensitivity in Cancer) http://www.cancerrxgene.org/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, GSDSTools provides functionalities to identify associations between drugs and genomic features across a set of cell lines " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The genomic features are provided within the packages. Users need to provide IC50 for a set of drugs and a set of cell lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We provide an example to play with. First let us get this IC50 test file and ad it" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline\n", "\n", "from gdsctools import ic50_test, GenomicFeatures, genomic_features_test" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "location: /home/cokelaer/Work/github/gdsctools/gdsctools/data/IC50_10drugs.tsv\n", "description: IC50s for 10 public drugs across cell lines\n", "authors: GDSC consortium\n", "\n" ] } ], "source": [ "print(ic50_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is just a file with a location and description. It can be read using\n", "the IC50 class" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 988\n", "Percentage of NA 0.20656974604343026\n", "\n" ] } ], "source": [ "from gdsctools import IC50\n", "data = IC50(ic50_test)\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, it contains 11 drugs across 988 cell lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, there is a genomic feature data set provided, which can be read \n", "with the GenomicFeatures class" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Genomic features distribution\n", "Number of unique tissues 27\n", "Here are the first 10 tissues: myeloma, nervous_system, soft_tissue, bone, lung_NSCLC, skin, Bladder, cervix, lung_SCLC, lung\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 47 unique features distributed as\n", "- Mutation: 47\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "gf = GenomicFeatures(genomic_features_test)\n", "print(gf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This file is going to be downloaded automatically when an analysis \n", "is performed. However, you may provide your own file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now perform the analysis using the ANOVA class" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from gdsctools import ANOVA" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "an = ANOVA(data, genomic_features=genomic_features_test.filename)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 988\n", "Percentage of NA 0.20656974604343026\n", "\n", "Genomic features distribution\n", "Number of unique tissues 27\n", "Here are the first 10 tissues: lung_NSCLC, prostate, stomach, nervous_system, skin, Bladder, leukemia, kidney, thyroid, soft_tissue\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 47 unique features distributed as\n", "- Mutation: 47\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "print(an)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so, we have 11 drugs, 677 features across 988 cell lines (27 tissues). This \n", "is a PANCAN analysis (across several cancer cell types).\n", "\n", "We can analysis the entire data set, which takes some time (still reasonable; about 1 minute dependiing on your system). " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " [ 0% ] 0 of 11 complete in 0.0 sec[--- 9% ] 1 of 11 complete in 0.2 sec[------ 18% ] 2 of 11 complete in 0.4 sec[---------- 27% ] 3 of 11 complete in 0.6 sec[------------- 36% ] 4 of 11 complete in 0.9 sec[-----------------45% ] 5 of 11 complete in 1.1 sec[-----------------54% ] 6 of 11 complete in 1.3 sec[-----------------63%---- ] 7 of 11 complete in 1.6 sec[-----------------72%------- ] 8 of 11 complete in 1.8 sec[-----------------81%----------- ] 9 of 11 complete in 2.1 sec[-----------------90%-------------- ] 10 of 11 complete in 2.3 sec[-----------------100%-----------------] 11 of 11 complete in 2.6 sec\n", "\n" ] } ], "source": [ "results = an.anova_all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All results are now in the new variable results, which can be looked at. This is a dataframe formatted variable using Pandas library. Each association can be accessed to using a unique identifier from 0 to the length of the dataframe:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ASSOC_ID 1\n", "FEATURE TP53_mut\n", "DRUG_ID 1047\n", "DRUG_NAME NaN\n", "DRUG_TARGET NaN\n", "N_FEATURE_neg 292\n", "N_FEATURE_pos 554\n", "FEATURE_pos_logIC50_MEAN 4.06932\n", "FEATURE_neg_logIC50_MEAN 2.49511\n", "FEATURE_delta_MEAN_IC50 1.57421\n", "FEATURE_IC50_effect_size 1.39063\n", "FEATURE_neg_Glass_delta 1.09839\n", "FEATURE_pos_Glass_delta 1.68301\n", "FEATURE_neg_IC50_sd 1.4332\n", "FEATURE_pos_IC50_sd 0.935351\n", "FEATURE_IC50_T_pval 1.27218e-68\n", "ANOVA_FEATURE_pval 1.57507e-58\n", "ANOVA_TISSUE_pval 1.02587e-44\n", "ANOVA_MSI_pval 0.0259029\n", "ANOVA_MEDIA_pval NaN\n", "ANOVA_FEATURE_FDR 8.03288e-54\n", "Name: 0, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.df.loc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, we can plot the histogram of the FDR columns:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEh1JREFUeJzt3X+s3XV9x/Hn25ah9rq2DL2pbbNbY+dEjPy4wW4uy73g\nZoFlxUQ2iMGiLNc/0OFi4qr7QxdHhpmRacLIqmUt6rgywNGUzoVV7oiJoC0SWqiMKh3ctqMipXIx\nU2Hv/XG+TY71tufc7zmHy/mc5yM5ud/v53w+5/t553v7Ot9+7veeG5mJJKlcr5jvCUiSesugl6TC\nGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBVu4XxPAOD000/PkZGRWmOff/55Fi1a1N0J\nvcxZ82Cw5sHQSc27du16OjNf26pfy6CPiFcC9wKnVv1vy8xPRsQqYBI4DXgAuCIzfx4RpwI3A+cC\nPwb+NDP3n+wYIyMj7Ny5s9VUZjU1NcXY2Fitsf3KmgeDNQ+GTmqOiP9up187Szc/A87PzLcBZwFr\nI2IN8Bng+sxcDRwBrqr6XwUcycw3AtdX/SRJ86Rl0GfDTLV7SvVI4Hzgtqp9C3BJtb2u2qd6/oKI\niK7NWJI0J9HOp1dGxAJgF/BG4Abg74D7qqt2ImIl8G+ZeWZE7AHWZuZ09dwPgLdn5tPHveYEMAEw\nPDx87uTkZK0CZmZmGBoaqjW2X1nzYLDmwdBJzePj47syc7RVv7Z+GJuZLwJnRcQS4OvAm2frVn2d\n7er9V95NMnMjsBFgdHQ0665RuaY3GKx5MFhzb8zp9srMfBaYAtYASyLi2BvFCuBgtT0NrASonl8M\nPNONyUqS5q5l0EfEa6sreSLiVcA7gb3APcB7qm7rgTur7a3VPtXz30z/uokkzZt2lm6WAVuqdfpX\nALdm5raIeASYjIi/Ab4HbKr6bwK+HBH7aFzJX9aDeUuS2tQy6DPzIeDsWdp/CJw3S/v/Apd2ZXaS\npI75EQiSVLiXxUcgSFI/G9lwV+2xm9f2/iMfvKKXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPo\nJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16S\nCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVrmXQR8TKiLgnIvZGxMMRcU3V/qmIOBARD1aPi5rGfDwi\n9kXEoxHxrl4WIEk6uYVt9HkB+GhmPhARrwF2RcTd1XPXZ+ZnmztHxBnAZcBbgNcD/xERv5WZL3Zz\n4pKk9rS8os/MQ5n5QLX9HLAXWH6SIeuAycz8WWY+DuwDzuvGZCVJczenNfqIGAHOBu6vmj4UEQ9F\nxE0RsbRqWw482TRsmpO/MUiSeigys72OEUPAfwLXZuYdETEMPA0k8GlgWWZ+ICJuAL6dmV+pxm0C\ntmfm7ce93gQwATA8PHzu5ORkrQJmZmYYGhqqNbZfWfNgsOb+sfvA0dpjVy1eULvm8fHxXZk52qpf\nO2v0RMQpwO3AVzPzDoDMfKrp+S8C26rdaWBl0/AVwMHjXzMzNwIbAUZHR3NsbKydqfyKqakp6o7t\nV9Y8GKy5f1y54a7aYzevXdTzmtu56yaATcDezPxcU/uypm7vBvZU21uByyLi1IhYBawGvtO9KUuS\n5qKdK/p3AFcAuyPiwartE8DlEXEWjaWb/cAHATLz4Yi4FXiExh07V3vHjSTNn5ZBn5nfAmKWp7af\nZMy1wLUdzEuS1CX+ZqwkFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJek\nwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqc\nQS9JhTPoJalwBr0kFc6gl6TCGfSSVLiWQR8RKyPinojYGxEPR8Q1VftpEXF3RDxWfV1atUdEfCEi\n9kXEQxFxTq+LkCSdWDtX9C8AH83MNwNrgKsj4gxgA7AjM1cDO6p9gAuB1dVjArix67OWJLWtZdBn\n5qHMfKDafg7YCywH1gFbqm5bgEuq7XXAzdlwH7AkIpZ1feaSpLZEZrbfOWIEuBc4E3giM5c0PXck\nM5dGxDbgusz8VtW+A/jLzNx53GtN0LjiZ3h4+NzJyclaBczMzDA0NFRrbL+y5sFgzf1j94Gjtceu\nWrygds3j4+O7MnO0Vb+F7b5gRAwBtwMfycyfRMQJu87S9ivvJpm5EdgIMDo6mmNjY+1O5ZdMTU1R\nd2y/subBYM3948oNd9Ueu3ntop7X3NZdNxFxCo2Q/2pm3lE1P3VsSab6erhqnwZWNg1fARzsznQl\nSXPVzl03AWwC9mbm55qe2gqsr7bXA3c2tb+vuvtmDXA0Mw91cc6SpDloZ+nmHcAVwO6IeLBq+wRw\nHXBrRFwFPAFcWj23HbgI2Af8FHh/V2csSZqTlkFf/VD1RAvyF8zSP4GrO5yXJKlL/M1YSSqcQS9J\nhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4\ng16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhWsZ\n9BFxU0Qcjog9TW2fiogDEfFg9bio6bmPR8S+iHg0It7Vq4lLktrTzhX9ZmDtLO3XZ+ZZ1WM7QESc\nAVwGvKUa8w8RsaBbk5UkzV3LoM/Me4Fn2ny9dcBkZv4sMx8H9gHndTA/SVKHOlmj/1BEPFQt7Syt\n2pYDTzb1ma7aJEnzJDKzdaeIEWBbZp5Z7Q8DTwMJfBpYlpkfiIgbgG9n5leqfpuA7Zl5+yyvOQFM\nAAwPD587OTlZq4CZmRmGhoZqje1X1jwYrLl/7D5wtPbYVYsX1K55fHx8V2aOtuq3sM6LZ+ZTx7Yj\n4ovAtmp3GljZ1HUFcPAEr7ER2AgwOjqaY2NjdabC1NQUdcf2K2seDNbcP67ccFftsZvXLup5zbWW\nbiJiWdPuu4Fjd+RsBS6LiFMjYhWwGvhOZ1OUJHWi5RV9RNwCjAGnR8Q08ElgLCLOorF0sx/4IEBm\nPhwRtwKPAC8AV2fmi72ZuiSpHS2DPjMvn6V500n6Xwtc28mkJGmuRjpYPtl/3cVdnMnLj78ZK0mF\nM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiD\nXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+gl\nqXAGvSQVrmXQR8RNEXE4IvY0tZ0WEXdHxGPV16VVe0TEFyJiX0Q8FBHn9HLykqTW2rmi3wysPa5t\nA7AjM1cDO6p9gAuB1dVjArixO9OUJNXVMugz817gmeOa1wFbqu0twCVN7Tdnw33AkohY1q3JSpLm\nru4a/XBmHgKovr6ual8OPNnUb7pqkyTNk8jM1p0iRoBtmXlmtf9sZi5pev5IZi6NiLuAv83Mb1Xt\nO4CPZeauWV5zgsbyDsPDw+dOTk7WKmBmZoahoaFaY/uVNQ8Ga56b3QeO1j7uW5cvrj2202OvWryg\nds3j4+O7MnO0Vb+FtV4dnoqIZZl5qFqaOVy1TwMrm/qtAA7O9gKZuRHYCDA6OppjY2O1JjI1NUXd\nsf3KmgeDNc/NlRvuqn3c/e+td8xuHHvz2kU9P891l262Auur7fXAnU3t76vuvlkDHD22xCNJmh8t\nr+gj4hZgDDg9IqaBTwLXAbdGxFXAE8ClVfftwEXAPuCnwPt7MGdJ0hy0DPrMvPwET10wS98Eru50\nUpKk7vE3YyWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ\n9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEv\nSYUz6CWpcAa9JBXOoJekwi3sZHBE7AeeA14EXsjM0Yg4DfgaMALsB/4kM490Nk1JUl3duKIfz8yz\nMnO02t8A7MjM1cCOal+SNE96sXSzDthSbW8BLunBMSRJbYrMrD844nHgCJDAP2bmxoh4NjOXNPU5\nkplLZxk7AUwADA8Pnzs5OVlrDjMzMwwNDdUa26+seTBY89zsPnC09nHfunxx7bGdHnvV4gW1ax4f\nH9/VtJpyQp0G/esz82BEvA64G/gwsLWdoG82OjqaO3furDWHqakpxsbGao3tV9Y8GKx5bkY23FX7\nuPuvu7j22E6PvXntoto1R0RbQd/R0k1mHqy+Hga+DpwHPBURy6pJLAMOd3IMSVJnagd9RCyKiNcc\n2wb+ENgDbAXWV93WA3d2OklJUn2d3F45DHw9Io69zj9n5jci4rvArRFxFfAEcGnn05Qk1VU76DPz\nh8DbZmn/MXBBJ5OSJHWPvxkrSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCtfRxxRLUgk6+QiDfuAV\nvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BL\nUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwvmnBPWS6eTPtW1eu6iLM3npDGLNevnpWdBHxFrg88AC\n4EuZeV2vjqXy7T5wlCtrhub+6y7u6Nil/z1Rla8nQR8RC4AbgD8ApoHvRsTWzHykF8eT9Ms6eXPq\n9I2xE528oevEenVFfx6wLzN/CBARk8A6oOtB3+k3Riff1P36j2nQeEWuQderoF8OPNm0Pw28vUfH\nGjjz+eam8nX6xuj318tPZGb3XzTiUuBdmfln1f4VwHmZ+eGmPhPARLX7JuDRmoc7HXi6g+n2I2se\nDNY8GDqp+Tcz87WtOvXqin4aWNm0vwI42NwhMzcCGzs9UETszMzRTl+nn1jzYLDmwfBS1Nyr++i/\nC6yOiFUR8WvAZcDWHh1LknQSPbmiz8wXIuJDwL/TuL3ypsx8uBfHkiSdXM/uo8/M7cD2Xr1+k46X\nf/qQNQ8Gax4MPa+5Jz+MlSS9fPhZN5JUuL4O+ohYGxGPRsS+iNgw3/PphYhYGRH3RMTeiHg4Iq6p\n2k+LiLsj4rHq69L5nmu3RcSCiPheRGyr9ldFxP1VzV+rftBfjIhYEhG3RcT3q/P9O6Wf54j4i+r7\nek9E3BIRryztPEfETRFxOCL2NLXNel6j4QtVpj0UEed0Yw59G/RNH7NwIXAGcHlEnDG/s+qJF4CP\nZuabgTXA1VWdG4Admbka2FHtl+YaYG/T/meA66uajwBXzcuseufzwDcy87eBt9GovdjzHBHLgT8H\nRjPzTBo3blxGeed5M7D2uLYTndcLgdXVYwK4sRsT6Nugp+ljFjLz58Cxj1koSmYeyswHqu3naPzj\nX06j1i1Vty3AJfMzw96IiBXAxcCXqv0Azgduq7oUVXNE/Drw+8AmgMz8eWY+S+HnmcYNIa+KiIXA\nq4FDFHaeM/Ne4Jnjmk90XtcBN2fDfcCSiFjW6Rz6Oehn+5iF5fM0l5dERIwAZwP3A8OZeQgabwbA\n6+ZvZj3x98DHgP+r9n8DeDYzX6j2SzvfbwB+BPxTtVz1pYhYRMHnOTMPAJ8FnqAR8EeBXZR9no85\n0XntSa71c9DHLG3F3kIUEUPA7cBHMvMn8z2fXoqIPwIOZ+au5uZZupZ0vhcC5wA3ZubZwPMUtEwz\nm2pdeh2wCng9sIjG0sXxSjrPrfTk+7yfg77lxyyUIiJOoRHyX83MO6rmp479l676eni+5tcD7wD+\nOCL201iSO5/GFf6S6r/4UN75ngamM/P+av82GsFf8nl+J/B4Zv4oM38B3AH8LmWf52NOdF57kmv9\nHPQD8TEL1dr0JmBvZn6u6amtwPpqez1w50s9t17JzI9n5orMHKFxXr+Zme8F7gHeU3Urreb/AZ6M\niDdVTRfQ+FjvYs8zjSWbNRHx6ur7/FjNxZ7nJic6r1uB91V336wBjh5b4ulIZvbtA7gI+C/gB8Bf\nzfd8elTj79H4r9tDwIPV4yIaa9Y7gMeqr6fN91x7VP8YsK3afgPwHWAf8C/AqfM9vy7XehawszrX\n/wosLf08A38NfB/YA3wZOLW08wzcQuNnEL+gccV+1YnOK42lmxuqTNtN446kjufgb8ZKUuH6eelG\nktQGg16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpML9P7ARhCUZaWQVAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%pylab inline\n", "results.df['ANOVA_FEATURE_FDR'].hist(bins=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next notebooks, we will now investigate more precisely \n", "- the input data sets\n", "- the analysis and in particular how to look at\n", " - one association\n", " - associations for a given drug\n", " - all associations (what we did here when we called anova_all() function)\n", "- How to generate HTML reports\n", "- The settings\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 1 }