{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pan-cancer analysis (ANOVA)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline\n", "matplotlib.rcParams['figure.figsize'] = (10,6)\n", "from gdsctools import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "gdsc = ANOVA(ic50_test)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['lung_NSCLC', 'prostate', 'stomach', 'nervous_system', 'skin',\n", " 'Bladder', 'leukemia', 'kidney', 'thyroid', 'soft_tissue',\n", " 'aero_dig_tract', 'ovary', 'lymphoma', 'myeloma', 'endometrium',\n", " 'pancreas', 'breast', 'neuroblastoma', 'large_intestine', 'cervix',\n", " 'liver', 'bone', 'lung_SCLC', 'lung', 'biliary_tract',\n", " 'urogenital_system_other', 'testis'], dtype=object)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdsc.tissue_factor.unique()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 988\n", "Percentage of NA 0.20656974604343026\n", "\n", "Genomic features distribution\n", "Number of unique tissues 27\n", "Here are the first 10 tissues: lung_NSCLC, prostate, stomach, nervous_system, skin, Bladder, leukemia, kidney, thyroid, soft_tissue\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 47 unique features distributed as\n", "- Mutation: 47\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "print(gdsc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 27 different tissues in this original data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select sub set of cancers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about an analysis on only stomach and pancreas:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "gdsc.set_cancer_type(['pancreas', 'stomach'])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of drugs: 11\n", "Number of cell lines: 61\n", "Percentage of NA 0.2414307004470939\n", "\n", "Genomic features distribution\n", "Number of unique tissues 2\n", "Here are the tissues: stomach,pancreas\n", "MSI column: yes\n", "MEDIA column: no\n", "\n", "There are 24 unique features distributed as\n", "- Mutation: 24\n", "- CNA (gain): 0\n", "- CNA (loss): 0\n" ] } ], "source": [ "print(gdsc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of cell lines has decreased significanly, which is expected.\n", "Also, the number of features has decreased! \n", "\n", "Indeed, features that do not have at least 1 positive or 1\n", "negative. Note that later, further features may be ignored based on the content of the settings.featFactorPopulationTh" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'PANCAN'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# because we have more than one tissue, the analysis type is still PANCAN\n", "gdsc.settings.analysis_type" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "results = gdsc.anova_one_drug_one_feature(1047, 'ACACA_mut', \n", " show=True)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ANOVA_FEATURE_FDR | \n", "ANOVA_FEATURE_pval | \n", "ANOVA_MEDIA_pval | \n", "ANOVA_MSI_pval | \n", "ANOVA_TISSUE_pval | \n", "ASSOC_ID | \n", "DRUG_ID | \n", "DRUG_NAME | \n", "DRUG_TARGET | \n", "FEATURE | \n", "... | \n", "FEATURE_IC50_effect_size | \n", "FEATURE_delta_MEAN_IC50 | \n", "FEATURE_neg_Glass_delta | \n", "FEATURE_neg_IC50_sd | \n", "FEATURE_neg_logIC50_MEAN | \n", "FEATURE_pos_Glass_delta | \n", "FEATURE_pos_IC50_sd | \n", "FEATURE_pos_logIC50_MEAN | \n", "N_FEATURE_neg | \n", "N_FEATURE_pos | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1047 | \n", "NaN | \n", "NaN | \n", "ACACA_mut | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "50 | \n", "1 | \n", "
1 rows × 21 columns
\n", "