{ "metadata": { "name": "", "signature": "sha256:b791ac5a8cf88a101367ce342c8fc2337229ac0fb66ee4f6ef0da8a9628860ff" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "TCGA HNSCC Molecular Validation Cohort TCGA MAF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we analyze the TCGA HNSCC molecular validation cohort. For this data we break our use of the data-versioned Firehose run by using a new MAF file and copy number matrix." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####Import Data and Packages \n", "For full list of data and packages imported see the [Imports](../Analysis_Notebooks/Imports.ipynb) notebook." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import NotebookImport\n", "from Imports import *" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 46 }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Read in TCGA HNSCCC Molecular Validation Calls" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = '../Data/MAFs/PR_TCGA_HNSC_PAIR_Capture_All_Pairs_QCPASS_v4.aggregated.capture.tcga.uuid.automated.somatic.maf.txt'\n", "mut_new = pd.read_table(f, skiprows=4, low_memory=False)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "keep = (mut_new.Variant_Classification.isin(['Silent', 'Intron', \"3'UTR\", \"5'UTR\"])==False)\n", "mut_new = mut_new[keep]\n", "mut_new['barcode'] = mut_new.Tumor_Sample_Barcode.map(lambda s: s[:12])\n", "mut_new = mut_new.groupby(['barcode','Hugo_Symbol']).size().unstack().fillna(0).T" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": [ "mut_old = mut.df.ix[mut_new.index, mut_new.columns].dropna([0,1], how='all')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "del_3p = cn.features.ix['Deletion'].ix['3p14.2']\n", "del_3p.name = '3p_deletion'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "I downloaded an updated version of the GISTIC gene by patient matrix from the April 16, 2014 Firehose run. This has calls for 511 paitents as opposed to 452 in the January 15th run." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = '../Extra_Data/FH_HNSC__4_16_all_data_thresholded_by_genes.txt'\n", "gistic = pd.read_table(f, index_col=[2, 1, 0], low_memory=False)\n", "gistic = FH.fix_barcode_columns(gistic, tissue_code='01')\n", "del_3p = gistic.ix['3p14.2'].median(0)\n", "del_3p.name = '3p_deletion'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Validation of Molecular Associations in Recent TCGA Patients" ] }, { "cell_type": "code", "collapsed": false, "input": [ "mut_all = mut.df.combine_first(mut_new)\n", "\n", "clinical_cohort = mut.df.columns\n", "molecular_cohort = mut_new.columns.diff(mut.features.columns)\n", "hpv_neg_cohort = mut_all.columns.intersection(true_index(hpv == 0))\n", "molecular_cohort_n = molecular_cohort.intersection(hpv_neg_cohort)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 35 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "TP53 is mutually exclusive with HPV status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This association is very significnat but weaking in the validation data. This could be because of less acurate mutation calls, or less accurate HPV status assignment as most of these patients' HPV statuses were inferred from the expression data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "cohorts = {'Discovery': clinical_cohort, 'Validation': molecular_cohort, 'All': mut_all.columns}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "hpv.name = 'HPV'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "ct = pd.concat({c: combine(hpv, mut_all.ix['TP53']>0).ix[s].value_counts()\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "ct.ix[['neither','HPV','TP53','both'],['Discovery','Validation','All']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationAll
neither 52 32 84
HPV 41 31 72
TP53 211 139 350
both 2 3 5
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 38, "text": [ " Discovery Validation All\n", "neither 52 32 84\n", "HPV 41 31 72\n", "TP53 211 139 350\n", "both 2 3 5" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "stats = pd.concat({c: fisher_exact_test(hpv.ix[s], mut_all.ix['TP53'].ix[s]>0)\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "stats[['Discovery','Validation','All']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationAll
odds_ratio 1.20e-02 2.23e-02 1.67e-02
p 1.70e-22 5.90e-16 2.93e-37
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 39, "text": [ " Discovery Validation All\n", "odds_ratio 1.20e-02 2.23e-02 1.67e-02\n", "p 1.70e-22 5.90e-16 2.93e-37" ] } ], "prompt_number": 39 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "TP53 mutation and 3p deletion have high co-occurance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the HPV- cohort contains a couple of the patients in original TCGA analysis set that were filtered out due to missing data or old age. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "cohorts = {'Discovery': keepers_o, 'Validation': molecular_cohort_n, 'HPV-': hpv_neg_cohort}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "ct = pd.concat({c: combine(mut_all.ix['TP53'].ix[s].dropna()>0, del_3p<0).value_counts()\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "ct.ix[['neither','3p_deletion','TP53','both'],['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
neither 22 18 42
3p_deletion 26 7 33
TP53 23 20 45
both 179 81 265
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ " Discovery Validation HPV-\n", "neither 22 18 42\n", "3p_deletion 26 7 33\n", "TP53 23 20 45\n", "both 179 81 265" ] } ], "prompt_number": 52 }, { "cell_type": "code", "collapsed": false, "input": [ "ct.sum()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ "Discovery 250\n", "HPV- 385\n", "Validation 126\n", "dtype: int64" ] } ], "prompt_number": 53 }, { "cell_type": "code", "collapsed": false, "input": [ "stats = pd.concat({c: fisher_exact_test(mut_all.ix['TP53'].ix[s]>0, del_3p<0)\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "stats[['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
odds_ratio 6.59e+00 1.04e+01 7.49e+00
p 3.56e-07 1.44e-06 8.11e-13
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ " Discovery Validation HPV-\n", "odds_ratio 6.59e+00 1.04e+01 7.49e+00\n", "p 3.56e-07 1.44e-06 8.11e-13" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we show that 3p is has the highest association of any chromosomal segment with TP53 mutation in both training and discovery cohorts. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "cn.features.index = cn.features.index.droplevel(2)\n", "r1 = screen_feature(mut_all.ix['TP53'].ix[molecular_cohort_n] > 0, fisher_exact_test, \n", " cn.features.ix['Deletion'] < 0)\n", "r2 = screen_feature(mut_all.ix['TP53'].ix[molecular_cohort_n] > 0, fisher_exact_test, \n", " cn.features.ix['Amplification'] > 0)\n", "\n", "r3 = screen_feature(mut_all.ix['TP53'].ix[keepers_o] > 0, fisher_exact_test, \n", " cn.features.ix['Deletion'] < 0)\n", "r4 = screen_feature(mut_all.ix['TP53'].ix[keepers_o] > 0, fisher_exact_test, \n", " cn.features.ix['Amplification'] > 0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "v1 = pd.concat([r3, r1], keys=['Discovery','Validation'], axis=1).sort([('Discovery','p')])\n", "v2 = pd.concat([r4, r2], keys=['Discovery','Validation'], axis=1).sort([('Discovery','p')])\n", "v3 = pd.concat([v1.head(6), v2.head(6)], keys=['Deletion','Amplification'])\n", "v3.columns = v3.columns.swaplevel(0,1)\n", "v3 = v3.sort_index(axis=1)\n", "del v3['q']\n", "v3[('q','bonf')] = pd.concat([v3.p.Discovery['Deletion'] * len(r3), \n", " v3.p.Discovery['Amplification'] * len(r4)],\n", " keys=['Deletion','Amplification'])\n", "v3" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
odds_ratiopq
DiscoveryValidationDiscoveryValidationbonf
Deletion3p14.3 6.59 10.41 3.56e-07 1.44e-06 1.71e-05
3p14.2 6.59 10.41 3.56e-07 1.44e-06 1.71e-05
3p25.3 5.49 9.23 1.81e-06 4.17e-06 8.68e-05
3p12.2 5.07 8.72 6.28e-06 6.87e-06 3.02e-04
10p15.3 4.00 6.10 6.74e-04 7.33e-03 3.24e-02
11q23.1 3.23 3.04 1.12e-03 5.79e-02 5.36e-02
Amplification3q26.33 6.35 7.82 9.00e-08 2.43e-05 2.34e-06
8q24.21 4.24 2.70 3.98e-05 7.84e-02 1.04e-03
12p13.33 3.21 1.46 2.63e-03 6.12e-01 6.85e-02
9p24.1 0.40 0.76 8.88e-03 6.05e-01 2.31e-01
18p11.31 2.61 4.41 1.69e-02 3.91e-02 4.40e-01
8q11.21 2.13 2.41 2.15e-02 5.88e-02 5.60e-01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ " odds_ratio p q\n", " Discovery Validation Discovery Validation bonf\n", "Deletion 3p14.3 6.59 10.41 3.56e-07 1.44e-06 1.71e-05\n", " 3p14.2 6.59 10.41 3.56e-07 1.44e-06 1.71e-05\n", " 3p25.3 5.49 9.23 1.81e-06 4.17e-06 8.68e-05\n", " 3p12.2 5.07 8.72 6.28e-06 6.87e-06 3.02e-04\n", " 10p15.3 4.00 6.10 6.74e-04 7.33e-03 3.24e-02\n", " 11q23.1 3.23 3.04 1.12e-03 5.79e-02 5.36e-02\n", "Amplification 3q26.33 6.35 7.82 9.00e-08 2.43e-05 2.34e-06\n", " 8q24.21 4.24 2.70 3.98e-05 7.84e-02 1.04e-03\n", " 12p13.33 3.21 1.46 2.63e-03 6.12e-01 6.85e-02\n", " 9p24.1 0.40 0.76 8.88e-03 6.05e-01 2.31e-01\n", " 18p11.31 2.61 4.41 1.69e-02 3.91e-02 4.40e-01\n", " 8q11.21 2.13 2.41 2.15e-02 5.88e-02 5.60e-01" ] } ], "prompt_number": 16 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "TP53-3p and CASP8 are Mutually Exclusive" ] }, { "cell_type": "code", "collapsed": false, "input": [ "combo_all = combine(mut_all.ix['TP53']>0, del_3p<0)\n", "two_hit = combo_all == 'both'\n", "two_hit.name = 'TP53-3p'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "ct = pd.concat({c: combine(two_hit, mut_all.ix['CASP8']>0).ix[s].value_counts()\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "ct.ix[['neither','CASP8','TP53-3p','both'],['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
neither 45 26 71
CASP8 11 18 31
TP53-3p 184 116 304
both 10 9 22
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ " Discovery Validation HPV-\n", "neither 45 26 71\n", "CASP8 11 18 31\n", "TP53-3p 184 116 304\n", "both 10 9 22" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "stats = pd.concat({c: fisher_exact_test(two_hit.ix[s], mut_all.ix['CASP8']>0)\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "stats[['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
odds_ratio 0.22 1.12e-01 1.66e-01
p 0.00 1.21e-06 5.77e-09
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ " Discovery Validation HPV-\n", "odds_ratio 0.22 1.12e-01 1.66e-01\n", "p 0.00 1.21e-06 5.77e-09" ] } ], "prompt_number": 19 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "TP53-3p and RAS/SOS1 Pathway are Mutually Exclusive" ] }, { "cell_type": "code", "collapsed": false, "input": [ "combo_all = combine(mut_all.ix['TP53']>0, del_3p<0)\n", "two_hit = combo_all == 'both'\n", "two_hit.name = 'TP53-3p'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "gs = run.gene_sets['REACTOME_SOS_MEDIATED_SIGNALLING']\n", "sos1_pathway = mut_all.ix[gs].sum()>0\n", "sos1_pathway.name = 'SOS1 Pathway'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "ct = pd.concat({c: combine(two_hit, sos1_pathway>0).ix[s].value_counts()\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "ct.ix[['neither','SOS1 Pathway','TP53-3p','both'],['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
neither 41 26 67
SOS1 Pathway 15 18 35
TP53-3p 186 117 308
both 8 8 18
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ " Discovery Validation HPV-\n", "neither 41 26 67\n", "SOS1 Pathway 15 18 35\n", "TP53-3p 186 117 308\n", "both 8 8 18" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "stats = pd.concat({c: fisher_exact_test(two_hit.ix[s], sos1_pathway)\n", " for c,s in cohorts.iteritems()}, axis=1)\n", "stats[['Discovery','Validation','HPV-']]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DiscoveryValidationHPV-
odds_ratio 1.18e-01 9.88e-02 1.12e-01
p 4.04e-06 4.85e-07 2.01e-12
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ " Discovery Validation HPV-\n", "odds_ratio 1.18e-01 9.88e-02 1.12e-01\n", "p 4.04e-06 4.85e-07 2.01e-12" ] } ], "prompt_number": 23 } ], "metadata": {} } ] }