{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 3: Joining dataframes with `cptac`\n", "\n", "In this tutorial, we provide several examples of how to use the built-in `cptac` functions for joining different dataframes." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "import cptac\n", "cptac.download(dataset=\"endometrial\", version=\"latest\")\n", "en = cptac.Endometrial()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## General format\n", "\n", "In all of the join functions, you specify the dataframes you want to join by passing their names to the appropriate parameters in the function call. The function will automatically check that the dataframes whose names you provided are valid for the join function, and print an error message if they aren't.\n", "\n", "Whenever a column from an -omics dataframe is included in a joined table, the name of the -omics dataframe it came from is joined to the column header, to avoid confusion.\n", "\n", "If you wish to only include particular columns in the join, pass them to the appropriate parameters in the join function. All such parameters will accept either a single column name as a string, or a list of column name strings. In this use case, we will usually only select specific columns for readability, but you could select the whole dataframe in all these cases, except for the mutations dataframe.\n", "\n", "The join functions use logic analogous to an SQL INNER JOIN." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `join_omics_to_omics`\n", "\n", "The `join_omics_to_omics` function joins two -omics dataframes to each other. Types of -omics data valid for use with this function are acetylproteomics, CNV, phosphoproteomics, phosphoproteomics_gene, proteomics, and transcriptomics." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameA1BG_proteomicsA2M_proteomicsA2ML1_proteomicsA4GALT_proteomicsAAAS_proteomicsAACS_proteomicsAADAT_proteomicsAAED1_proteomicsAAGAB_proteomicsAAK1_proteomics...ZZZ3_phosphoproteomics
Site...S397S411S420S424S426S468S89T415T418Y399
Patient_ID
C3L-00006-1.180-0.8630-0.8020.2220.25600.66501.2800-0.33900.412-0.664...0.18400NaNNaNNaN-0.20500NaNNaNNaNNaNNaN
C3L-00008-0.685-1.0700-0.6840.9840.13500.33401.30000.13901.330-0.367...-0.17100NaNNaN-0.393-0.17100NaN0.29NaN0.1605-0.0635
C3L-00032-0.528-1.32000.435NaN-0.24001.0400-0.0213-0.04790.419-0.500...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
C3L-00090-1.670-1.1900-0.4430.243-0.09930.75700.7400-0.92900.229-0.223...0.13970NaNNaNNaN-0.55900NaNNaNNaNNaN0.2980
C3L-00098-0.374-0.0206-0.5370.3110.37500.0131-1.1000NaN0.565-0.101...-0.15875NaNNaN0.1960.06175NaNNaNNaNNaN-0.2900
\n", "

5 rows × 84211 columns

\n", "
" ], "text/plain": [ "Name A1BG_proteomics A2M_proteomics A2ML1_proteomics A4GALT_proteomics \\\n", "Site \n", "Patient_ID \n", "C3L-00006 -1.180 -0.8630 -0.802 0.222 \n", "C3L-00008 -0.685 -1.0700 -0.684 0.984 \n", "C3L-00032 -0.528 -1.3200 0.435 NaN \n", "C3L-00090 -1.670 -1.1900 -0.443 0.243 \n", "C3L-00098 -0.374 -0.0206 -0.537 0.311 \n", "\n", "Name AAAS_proteomics AACS_proteomics AADAT_proteomics AAED1_proteomics \\\n", "Site \n", "Patient_ID \n", "C3L-00006 0.2560 0.6650 1.2800 -0.3390 \n", "C3L-00008 0.1350 0.3340 1.3000 0.1390 \n", "C3L-00032 -0.2400 1.0400 -0.0213 -0.0479 \n", "C3L-00090 -0.0993 0.7570 0.7400 -0.9290 \n", "C3L-00098 0.3750 0.0131 -1.1000 NaN \n", "\n", "Name AAGAB_proteomics AAK1_proteomics ... ZZZ3_phosphoproteomics \\\n", "Site ... S397 S411 \n", "Patient_ID ... \n", "C3L-00006 0.412 -0.664 ... 0.18400 NaN \n", "C3L-00008 1.330 -0.367 ... -0.17100 NaN \n", "C3L-00032 0.419 -0.500 ... NaN NaN \n", "C3L-00090 0.229 -0.223 ... 0.13970 NaN \n", "C3L-00098 0.565 -0.101 ... -0.15875 NaN \n", "\n", "Name \n", "Site S420 S424 S426 S468 S89 T415 T418 Y399 \n", "Patient_ID \n", "C3L-00006 NaN NaN -0.20500 NaN NaN NaN NaN NaN \n", "C3L-00008 NaN -0.393 -0.17100 NaN 0.29 NaN 0.1605 -0.0635 \n", "C3L-00032 NaN NaN NaN NaN NaN NaN NaN NaN \n", "C3L-00090 NaN NaN -0.55900 NaN NaN NaN NaN 0.2980 \n", "C3L-00098 NaN 0.196 0.06175 NaN NaN NaN NaN -0.2900 \n", "\n", "[5 rows x 84211 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prot_and_phos = en.join_omics_to_omics(df1_name=\"proteomics\", df2_name=\"phosphoproteomics\")\n", "prot_and_phos.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Joining only specific columns.\n", "(Note that when a gene is selected from the phosphoproteomics dataframe, data for all sites of the gene are selected. The same is done for acetylproteomics data.)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameA1BG_proteomicsPIK3CA_phosphoproteomics
SiteS312T313
Patient_ID
C3L-00006-1.180-0.006150.0731
C3L-00008-0.685-0.02220NaN
C3L-00032-0.528NaN0.0830
C3L-00090-1.670NaN-0.8460
C3L-00098-0.3740.43600NaN
\n", "
" ], "text/plain": [ "Name A1BG_proteomics PIK3CA_phosphoproteomics \n", "Site S312 T313\n", "Patient_ID \n", "C3L-00006 -1.180 -0.00615 0.0731\n", "C3L-00008 -0.685 -0.02220 NaN\n", "C3L-00032 -0.528 NaN 0.0830\n", "C3L-00090 -1.670 NaN -0.8460\n", "C3L-00098 -0.374 0.43600 NaN" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prot_and_phos_selected = en.join_omics_to_omics(\n", " df1_name=\"proteomics\", \n", " df2_name=\"phosphoproteomics\", \n", " genes1=\"A1BG\", \n", " genes2=\"PIK3CA\")\n", "\n", "prot_and_phos_selected.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `join_metadata_to_omics`\n", "\n", "The `join_metadata_to_omics` function joins a metadata dataframe (e.g. clinical or derived_molecular) with an -omics dataframe:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: transcriptomics data was not found for the following samples, so transcriptomics data columns were filled with NaN for these samples: C3L-00563.N, C3L-00605.N, C3L-00769.N, C3L-00770.N, C3L-00771.N, C3L-00930.N, C3L-00947.N, C3L-00963.N, C3L-01246.N, C3L-01249.N, C3L-01252.N, C3L-01256.N, C3L-01257.N, C3L-01744.N, C3N-00200.N, C3N-00729.N, C3N-01211.N, NX1.N, NX10.N, NX11.N, NX12.N, NX13.N, NX14.N, NX15.N, NX16.N, NX17.N, NX18.N, NX2.N, NX3.N, NX4.N, NX5.N, NX6.N, NX7.N, NX8.N, NX9.N (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameSample_IDSample_Tumor_NormalProteomics_Tumor_NormalCountryHistologic_Grade_FIGOMyometrial_invasion_SpecifyHistologic_typeTreatment_naiveTumor_purityPath_Stage_Primary_Tumor-pT...ZWILCH_transcriptomicsZWINT_transcriptomicsZXDA_transcriptomicsZXDB_transcriptomicsZXDC_transcriptomicsZYG11A_transcriptomicsZYG11B_transcriptomicsZYX_transcriptomicsZZEF1_transcriptomicsZZZ3_transcriptomics
Patient_ID
C3L-00006S001TumorTumorUnited StatesFIGO grade 1under 50 %EndometrioidYESNormalpT1a (FIGO IA)...11.0610.738.409.7810.885.9311.5210.2311.5011.47
C3L-00008S002TumorTumorUnited StatesFIGO grade 1under 50 %EndometrioidYESNormalpT1a (FIGO IA)...10.8711.438.399.1410.387.2511.6410.6411.2611.57
C3L-00032S003TumorTumorUnited StatesFIGO grade 2under 50 %EndometrioidYESNormalpT1a (FIGO IA)...10.0610.138.359.2710.466.8511.6010.2111.5111.09
C3L-00090S005TumorTumorUnited StatesFIGO grade 2under 50 %EndometrioidYESNormalpT1a (FIGO IA)...10.2910.419.109.5910.157.8911.9010.2111.3411.51
C3L-00098S006TumorTumorUnited StatesNaNunder 50 %SerousYESNormalpT1a (FIGO IA)...10.3611.248.609.4411.809.3211.979.7711.3712.35
\n", "

5 rows × 28084 columns

\n", "
" ], "text/plain": [ "Name Sample_ID Sample_Tumor_Normal Proteomics_Tumor_Normal \\\n", "Patient_ID \n", "C3L-00006 S001 Tumor Tumor \n", "C3L-00008 S002 Tumor Tumor \n", "C3L-00032 S003 Tumor Tumor \n", "C3L-00090 S005 Tumor Tumor \n", "C3L-00098 S006 Tumor Tumor \n", "\n", "Name Country Histologic_Grade_FIGO Myometrial_invasion_Specify \\\n", "Patient_ID \n", "C3L-00006 United States FIGO grade 1 under 50 % \n", "C3L-00008 United States FIGO grade 1 under 50 % \n", "C3L-00032 United States FIGO grade 2 under 50 % \n", "C3L-00090 United States FIGO grade 2 under 50 % \n", "C3L-00098 United States NaN under 50 % \n", "\n", "Name Histologic_type Treatment_naive Tumor_purity \\\n", "Patient_ID \n", "C3L-00006 Endometrioid YES Normal \n", "C3L-00008 Endometrioid YES Normal \n", "C3L-00032 Endometrioid YES Normal \n", "C3L-00090 Endometrioid YES Normal \n", "C3L-00098 Serous YES Normal \n", "\n", "Name Path_Stage_Primary_Tumor-pT ... ZWILCH_transcriptomics \\\n", "Patient_ID ... \n", "C3L-00006 pT1a (FIGO IA) ... 11.06 \n", "C3L-00008 pT1a (FIGO IA) ... 10.87 \n", "C3L-00032 pT1a (FIGO IA) ... 10.06 \n", "C3L-00090 pT1a (FIGO IA) ... 10.29 \n", "C3L-00098 pT1a (FIGO IA) ... 10.36 \n", "\n", "Name ZWINT_transcriptomics ZXDA_transcriptomics ZXDB_transcriptomics \\\n", "Patient_ID \n", "C3L-00006 10.73 8.40 9.78 \n", "C3L-00008 11.43 8.39 9.14 \n", "C3L-00032 10.13 8.35 9.27 \n", "C3L-00090 10.41 9.10 9.59 \n", "C3L-00098 11.24 8.60 9.44 \n", "\n", "Name ZXDC_transcriptomics ZYG11A_transcriptomics \\\n", "Patient_ID \n", "C3L-00006 10.88 5.93 \n", "C3L-00008 10.38 7.25 \n", "C3L-00032 10.46 6.85 \n", "C3L-00090 10.15 7.89 \n", "C3L-00098 11.80 9.32 \n", "\n", "Name ZYG11B_transcriptomics ZYX_transcriptomics ZZEF1_transcriptomics \\\n", "Patient_ID \n", "C3L-00006 11.52 10.23 11.50 \n", "C3L-00008 11.64 10.64 11.26 \n", "C3L-00032 11.60 10.21 11.51 \n", "C3L-00090 11.90 10.21 11.34 \n", "C3L-00098 11.97 9.77 11.37 \n", "\n", "Name ZZZ3_transcriptomics \n", "Patient_ID \n", "C3L-00006 11.47 \n", "C3L-00008 11.57 \n", "C3L-00032 11.09 \n", "C3L-00090 11.51 \n", "C3L-00098 12.35 \n", "\n", "[5 rows x 28084 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clin_and_tran = en.join_metadata_to_omics(metadata_df_name=\"clinical\", omics_df_name=\"transcriptomics\")\n", "clin_and_tran.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Joining only specific columns:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: transcriptomics data was not found for the following samples, so transcriptomics data columns were filled with NaN for these samples: C3L-00563.N, C3L-00605.N, C3L-00769.N, C3L-00770.N, C3L-00771.N, C3L-00930.N, C3L-00947.N, C3L-00963.N, C3L-01246.N, C3L-01249.N, C3L-01252.N, C3L-01256.N, C3L-01257.N, C3L-01744.N, C3N-00200.N, C3N-00729.N, C3N-01211.N, NX1.N, NX10.N, NX11.N, NX12.N, NX13.N, NX14.N, NX15.N, NX16.N, NX17.N, NX18.N, NX2.N, NX3.N, NX4.N, NX5.N, NX6.N, NX7.N, NX8.N, NX9.N (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAgeHistologic_typeZZZ3_transcriptomics
Patient_ID
C3L-0000664.0Endometrioid11.47
C3L-0000858.0Endometrioid11.57
C3L-0003250.0Endometrioid11.09
C3L-0009075.0Endometrioid11.51
C3L-0009863.0Serous12.35
\n", "
" ], "text/plain": [ "Name Age Histologic_type ZZZ3_transcriptomics\n", "Patient_ID \n", "C3L-00006 64.0 Endometrioid 11.47\n", "C3L-00008 58.0 Endometrioid 11.57\n", "C3L-00032 50.0 Endometrioid 11.09\n", "C3L-00090 75.0 Endometrioid 11.51\n", "C3L-00098 63.0 Serous 12.35" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clin_and_tran = en.join_metadata_to_omics(\n", " metadata_df_name=\"clinical\", \n", " omics_df_name=\"transcriptomics\", \n", " metadata_cols = [\"Age\", \"Histologic_type\"], \n", " omics_genes=\"ZZZ3\")\n", "\n", "clin_and_tran.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `join_metadata_to_metadata`\n", "\n", "The `join_metadata_to_metadata` function joins two metadata dataframes (e.g. clinical or derived_molecular) to each other. Note how we passed a column name to select from the clinical dataframe, but passing `None` for the column parameter for the derived_molecular dataframe caused the entire dataframe to be selected. We could have omitted the `cols2` parameter altogether, as it is assigned to None by default." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameHistologic_typeEstrogen_ReceptorEstrogen_Receptor_%Progesterone_ReceptorProgesterone_Receptor_%MLH1MLH2MSH6PMS2p53...Log2_variant_totalLog2_SNP_totalLog2_INDEL_totalGenomics_subtypeMutation_signature_C>AMutation_signature_C>GMutation_signature_C>TMutation_signature_T>CMutation_signature_T>AMutation_signature_T>G
Patient_ID
C3L-00006EndometrioidCannot be determinedNaNCannot be determinedNaNIntact nuclear expressionIntact nuclear expressionLoss of nuclear expressionIntact nuclear expressionCannot be determined...10.0620469.9844185.832890MSI-H8.3003951.48221372.52964414.4268771.3833991.877470
C3L-00008EndometrioidCannot be determinedNaNCannot be determinedNaNIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionLoss of nuclear expressionCannot be determined...8.8610878.3309177.169925MSI-H14.6417452.80373864.48598115.2647980.9345791.869159
C3L-00032EndometrioidCannot be determinedNaNCannot be determinedNaNIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionCannot be determined...5.3219285.0000003.169925CNV_low16.1290323.22580670.9677423.2258063.2258063.225806
C3L-00090EndometrioidCannot be determinedNaNCannot be determinedNaNIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionCannot be determined...5.6724255.5235622.584963CNV_low17.7777788.88888962.2222228.8888892.2222220.000000
C3L-00098SerousCannot be determinedNaNCannot be determinedNaNIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionIntact nuclear expressionNormal...6.1085245.9541963.000000CNV_high9.83606613.11475462.2950823.2786898.1967213.278689
\n", "

5 rows × 126 columns

\n", "
" ], "text/plain": [ "Name Histologic_type Estrogen_Receptor Estrogen_Receptor_% \\\n", "Patient_ID \n", "C3L-00006 Endometrioid Cannot be determined NaN \n", "C3L-00008 Endometrioid Cannot be determined NaN \n", "C3L-00032 Endometrioid Cannot be determined NaN \n", "C3L-00090 Endometrioid Cannot be determined NaN \n", "C3L-00098 Serous Cannot be determined NaN \n", "\n", "Name Progesterone_Receptor Progesterone_Receptor_% \\\n", "Patient_ID \n", "C3L-00006 Cannot be determined NaN \n", "C3L-00008 Cannot be determined NaN \n", "C3L-00032 Cannot be determined NaN \n", "C3L-00090 Cannot be determined NaN \n", "C3L-00098 Cannot be determined NaN \n", "\n", "Name MLH1 MLH2 \\\n", "Patient_ID \n", "C3L-00006 Intact nuclear expression Intact nuclear expression \n", "C3L-00008 Intact nuclear expression Intact nuclear expression \n", "C3L-00032 Intact nuclear expression Intact nuclear expression \n", "C3L-00090 Intact nuclear expression Intact nuclear expression \n", "C3L-00098 Intact nuclear expression Intact nuclear expression \n", "\n", "Name MSH6 PMS2 \\\n", "Patient_ID \n", "C3L-00006 Loss of nuclear expression Intact nuclear expression \n", "C3L-00008 Intact nuclear expression Loss of nuclear expression \n", "C3L-00032 Intact nuclear expression Intact nuclear expression \n", "C3L-00090 Intact nuclear expression Intact nuclear expression \n", "C3L-00098 Intact nuclear expression Intact nuclear expression \n", "\n", "Name p53 ... Log2_variant_total Log2_SNP_total \\\n", "Patient_ID ... \n", "C3L-00006 Cannot be determined ... 10.062046 9.984418 \n", "C3L-00008 Cannot be determined ... 8.861087 8.330917 \n", "C3L-00032 Cannot be determined ... 5.321928 5.000000 \n", "C3L-00090 Cannot be determined ... 5.672425 5.523562 \n", "C3L-00098 Normal ... 6.108524 5.954196 \n", "\n", "Name Log2_INDEL_total Genomics_subtype Mutation_signature_C>A \\\n", "Patient_ID \n", "C3L-00006 5.832890 MSI-H 8.300395 \n", "C3L-00008 7.169925 MSI-H 14.641745 \n", "C3L-00032 3.169925 CNV_low 16.129032 \n", "C3L-00090 2.584963 CNV_low 17.777778 \n", "C3L-00098 3.000000 CNV_high 9.836066 \n", "\n", "Name Mutation_signature_C>G Mutation_signature_C>T \\\n", "Patient_ID \n", "C3L-00006 1.482213 72.529644 \n", "C3L-00008 2.803738 64.485981 \n", "C3L-00032 3.225806 70.967742 \n", "C3L-00090 8.888889 62.222222 \n", "C3L-00098 13.114754 62.295082 \n", "\n", "Name Mutation_signature_T>C Mutation_signature_T>A \\\n", "Patient_ID \n", "C3L-00006 14.426877 1.383399 \n", "C3L-00008 15.264798 0.934579 \n", "C3L-00032 3.225806 3.225806 \n", "C3L-00090 8.888889 2.222222 \n", "C3L-00098 3.278689 8.196721 \n", "\n", "Name Mutation_signature_T>G \n", "Patient_ID \n", "C3L-00006 1.877470 \n", "C3L-00008 1.869159 \n", "C3L-00032 3.225806 \n", "C3L-00090 0.000000 \n", "C3L-00098 3.278689 \n", "\n", "[5 rows x 126 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist_and_derived_molecular = en.join_metadata_to_metadata(\n", " df1_name=\"clinical\",\n", " df2_name=\"derived_molecular\",\n", " cols1=\"Histologic_type\") # Note that we can omit the cols2 parameter, and it will by default select all of df2.\n", " # We could have also omitted cols1, if we wanted to select all of df1.\n", "\n", "hist_and_derived_molecular.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `join_omics_to_mutations`\n", "\n", "The `join_omics_to_mutations` function joins an -omics dataframe with the mutation data for a specified gene or genes. Because there may be multiple mutations for one gene in a single sample, the mutation type and location data are returned in lists by default, even if there is only one mutation. If there is no mutation for the gene in a particular sample, the list contains either \"Wildtype_Tumor\" or \"Wildtype_Normal\", depending on whether it's a tumor or normal sample. The mutation status column contains either \"Single_mutation\", \"Multiple_mutation\", \"Wildtype_Tumor\", or \"Wildtype_Normal\", for help with parsing.\n", "\n", "(Note: You can hide the Location columns by passing `False` to the optional `show_location` parameter.)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAURKA_proteomicsTP53_proteomicsPTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006NaN0.2950[Missense_Mutation, Nonsense_Mutation][p.R130Q, p.R233*]Multiple_mutationTumor
C3L-000080.311000.2770[Missense_Mutation][p.G127R]Single_mutationTumor
C3L-00032NaN-0.8710[Nonsense_Mutation][p.W111*]Single_mutationTumor
C3L-00090-0.79800-0.3430[Missense_Mutation][p.R130G]Single_mutationTumor
C3L-000983.110003.0100[Wildtype_Tumor][No_mutation]Wildtype_TumorTumor
C3L-00136-1.65000-0.1480[Missense_Mutation, Missense_Mutation][p.Y68C, p.R130G]Multiple_mutationTumor
C3L-00137NaN0.4410[Frame_Shift_Ins, Nonsense_Mutation][p.H118Qfs*8, p.Y180*]Multiple_mutationTumor
C3L-001390.84800-1.2200[Wildtype_Tumor][No_mutation]Wildtype_TumorTumor
C3L-00143-1.73000-0.0825[Missense_Mutation][p.R130G]Single_mutationTumor
C3L-00145-0.00513-0.1810[Missense_Mutation, Frame_Shift_Ins][p.H93R, p.E242*]Multiple_mutationTumor
\n", "
" ], "text/plain": [ "Name AURKA_proteomics TP53_proteomics \\\n", "Patient_ID \n", "C3L-00006 NaN 0.2950 \n", "C3L-00008 0.31100 0.2770 \n", "C3L-00032 NaN -0.8710 \n", "C3L-00090 -0.79800 -0.3430 \n", "C3L-00098 3.11000 3.0100 \n", "C3L-00136 -1.65000 -0.1480 \n", "C3L-00137 NaN 0.4410 \n", "C3L-00139 0.84800 -1.2200 \n", "C3L-00143 -1.73000 -0.0825 \n", "C3L-00145 -0.00513 -0.1810 \n", "\n", "Name PTEN_Mutation PTEN_Location \\\n", "Patient_ID \n", "C3L-00006 [Missense_Mutation, Nonsense_Mutation] [p.R130Q, p.R233*] \n", "C3L-00008 [Missense_Mutation] [p.G127R] \n", "C3L-00032 [Nonsense_Mutation] [p.W111*] \n", "C3L-00090 [Missense_Mutation] [p.R130G] \n", "C3L-00098 [Wildtype_Tumor] [No_mutation] \n", "C3L-00136 [Missense_Mutation, Missense_Mutation] [p.Y68C, p.R130G] \n", "C3L-00137 [Frame_Shift_Ins, Nonsense_Mutation] [p.H118Qfs*8, p.Y180*] \n", "C3L-00139 [Wildtype_Tumor] [No_mutation] \n", "C3L-00143 [Missense_Mutation] [p.R130G] \n", "C3L-00145 [Missense_Mutation, Frame_Shift_Ins] [p.H93R, p.E242*] \n", "\n", "Name PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 Multiple_mutation Tumor \n", "C3L-00008 Single_mutation Tumor \n", "C3L-00032 Single_mutation Tumor \n", "C3L-00090 Single_mutation Tumor \n", "C3L-00098 Wildtype_Tumor Tumor \n", "C3L-00136 Multiple_mutation Tumor \n", "C3L-00137 Multiple_mutation Tumor \n", "C3L-00139 Wildtype_Tumor Tumor \n", "C3L-00143 Single_mutation Tumor \n", "C3L-00145 Multiple_mutation Tumor " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selected_acet_and_PTEN_mut = en.join_omics_to_mutations(\n", " omics_df_name=\"proteomics\",\n", " mutations_genes=\"PTEN\", \n", " omics_genes=[\"AURKA\", \"TP53\"])\n", "\n", "selected_acet_and_PTEN_mut.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filtering multiple mutations\n", "\n", "The function has the ability to filter multiple mutations down to just one mutation. It allows you to specify particular mutation types or locations to prioritize, and also provides a default sorting hierarchy for all other mutations. The default hierarchy chooses truncation mutations over missense mutations, and silent mutations last of all. If there are multiple mutations of the same type, it chooses the mutation occurring earlier in the sequence. \n", "\n", "To filter all mutations based on this default hierarchy, simply pass an empty list to the optional `mutations_filter` parameter. Notice how in sample S001, the nonsense mutation was chosen over the missense mutation, because it's a type of trucation mutation, even though the missense mutation occurs earlier in the peptide sequence. In sample S008, both mutations were types of truncation mutations, so the function just chose the earlier one." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAURKA_proteomicsTP53_proteomicsPTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006NaN0.295Nonsense_Mutationp.R233*Multiple_mutationTumor
C3L-00137NaN0.441Frame_Shift_Insp.H118Qfs*8Multiple_mutationTumor
\n", "
" ], "text/plain": [ "Name AURKA_proteomics TP53_proteomics PTEN_Mutation \\\n", "Patient_ID \n", "C3L-00006 NaN 0.295 Nonsense_Mutation \n", "C3L-00137 NaN 0.441 Frame_Shift_Ins \n", "\n", "Name PTEN_Location PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 p.R233* Multiple_mutation Tumor \n", "C3L-00137 p.H118Qfs*8 Multiple_mutation Tumor " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PTEN_default_filter = en.join_omics_to_mutations(omics_df_name=\"proteomics\", mutations_genes=\"PTEN\", \n", " omics_genes=[\"AURKA\", \"TP53\"],\n", " mutations_filter=[])\n", "PTEN_default_filter.loc[[\"C3L-00006\", \"C3L-00137\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To prioritize a particular type of mutation, or a particular location, include it in the `mutations_filter` list. Below, we tell the function to prioritize nonsense mutations over all other mutations. Notice how in sample S008, the nonsense mutation is now selected instead of the frameshift insertion, even though the nonsense mutation occurs later in the peptide sequence." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAURKA_proteomicsTP53_proteomicsPTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006NaN0.295Nonsense_Mutationp.R233*Multiple_mutationTumor
C3L-00137NaN0.441Nonsense_Mutationp.Y180*Multiple_mutationTumor
\n", "
" ], "text/plain": [ "Name AURKA_proteomics TP53_proteomics PTEN_Mutation \\\n", "Patient_ID \n", "C3L-00006 NaN 0.295 Nonsense_Mutation \n", "C3L-00137 NaN 0.441 Nonsense_Mutation \n", "\n", "Name PTEN_Location PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 p.R233* Multiple_mutation Tumor \n", "C3L-00137 p.Y180* Multiple_mutation Tumor " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PTEN_simple_filter = en.join_omics_to_mutations(omics_df_name=\"proteomics\", mutations_genes=\"PTEN\", \n", " omics_genes=[\"AURKA\", \"TP53\"], \n", " mutations_filter=[\"Nonsense_Mutation\"])\n", "PTEN_simple_filter.loc[[\"C3L-00006\", \"C3L-00137\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can include multiple mutation types and/or locations in the `mutations_filter` list. Values earlier in the list will be prioritized over values later in the list. For example, with the filter we specify below, the function first selects sample S001's missense mutation over its nonsense mutation, because we put the location of S001's missense mutation as the first value in our filter list. We still included Nonsense_Mutation in the filter list, but it comes after the location of S001's missense mutation, which is why S001's missense mutation is still prioritized. However, on all other samples, unless they also have a mutation at that same location, the function will continue prioritizing nonsense mutations, as we see in sample S008." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAURKA_proteomicsTP53_proteomicsPTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006NaN0.295Missense_Mutationp.R130QMultiple_mutationTumor
C3L-00137NaN0.441Nonsense_Mutationp.Y180*Multiple_mutationTumor
\n", "
" ], "text/plain": [ "Name AURKA_proteomics TP53_proteomics PTEN_Mutation \\\n", "Patient_ID \n", "C3L-00006 NaN 0.295 Missense_Mutation \n", "C3L-00137 NaN 0.441 Nonsense_Mutation \n", "\n", "Name PTEN_Location PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 p.R130Q Multiple_mutation Tumor \n", "C3L-00137 p.Y180* Multiple_mutation Tumor " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PTEN_complex_filter = en.join_omics_to_mutations(omics_df_name=\"proteomics\", mutations_genes=\"PTEN\", \n", " omics_genes=[\"AURKA\", \"TP53\"], \n", " mutations_filter=[\"p.R130Q\", \"Nonsense_Mutation\"])\n", "PTEN_complex_filter.loc[[\"C3L-00006\", \"C3L-00137\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `join_metadata_to_mutations`\n", "\n", "The `join_metadata_to_mutations` function works exactly like `join_omics_to_mutations`, except that it works with metadata dataframes (e.g. clinical and derived molecular) instead of omics dataframes. It also can filter multiple mutations, which you control through the `mutations_filter` parameter, and has the ability to hide the location colunms." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameHistologic_typePTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006Endometrioid[Missense_Mutation, Nonsense_Mutation][p.R130Q, p.R233*]Multiple_mutationTumor
C3L-00008Endometrioid[Missense_Mutation][p.G127R]Single_mutationTumor
C3L-00032Endometrioid[Nonsense_Mutation][p.W111*]Single_mutationTumor
C3L-00090Endometrioid[Missense_Mutation][p.R130G]Single_mutationTumor
C3L-00098Serous[Wildtype_Tumor][No_mutation]Wildtype_TumorTumor
\n", "
" ], "text/plain": [ "Name Histologic_type PTEN_Mutation \\\n", "Patient_ID \n", "C3L-00006 Endometrioid [Missense_Mutation, Nonsense_Mutation] \n", "C3L-00008 Endometrioid [Missense_Mutation] \n", "C3L-00032 Endometrioid [Nonsense_Mutation] \n", "C3L-00090 Endometrioid [Missense_Mutation] \n", "C3L-00098 Serous [Wildtype_Tumor] \n", "\n", "Name PTEN_Location PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 [p.R130Q, p.R233*] Multiple_mutation Tumor \n", "C3L-00008 [p.G127R] Single_mutation Tumor \n", "C3L-00032 [p.W111*] Single_mutation Tumor \n", "C3L-00090 [p.R130G] Single_mutation Tumor \n", "C3L-00098 [No_mutation] Wildtype_Tumor Tumor " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist_and_PTEN = en.join_metadata_to_mutations(\n", " metadata_df_name=\"clinical\",\n", " mutations_genes=\"PTEN\",\n", " metadata_cols=\"Histologic_type\")\n", "\n", "hist_and_PTEN.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With multiple mutations filtered:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 69 samples for the PTEN gene (, line 1)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameHistologic_typePTEN_MutationPTEN_LocationPTEN_Mutation_StatusSample_Status
Patient_ID
C3L-00006EndometrioidNonsense_Mutationp.R233*Multiple_mutationTumor
C3L-00008EndometrioidMissense_Mutationp.G127RSingle_mutationTumor
C3L-00032EndometrioidNonsense_Mutationp.W111*Single_mutationTumor
C3L-00090EndometrioidMissense_Mutationp.R130GSingle_mutationTumor
C3L-00098SerousWildtype_TumorNo_mutationWildtype_TumorTumor
\n", "
" ], "text/plain": [ "Name Histologic_type PTEN_Mutation PTEN_Location \\\n", "Patient_ID \n", "C3L-00006 Endometrioid Nonsense_Mutation p.R233* \n", "C3L-00008 Endometrioid Missense_Mutation p.G127R \n", "C3L-00032 Endometrioid Nonsense_Mutation p.W111* \n", "C3L-00090 Endometrioid Missense_Mutation p.R130G \n", "C3L-00098 Serous Wildtype_Tumor No_mutation \n", "\n", "Name PTEN_Mutation_Status Sample_Status \n", "Patient_ID \n", "C3L-00006 Multiple_mutation Tumor \n", "C3L-00008 Single_mutation Tumor \n", "C3L-00032 Single_mutation Tumor \n", "C3L-00090 Single_mutation Tumor \n", "C3L-00098 Wildtype_Tumor Tumor " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist_and_PTEN = en.join_metadata_to_mutations(\n", " metadata_df_name=\"clinical\",\n", " mutations_genes=\"PTEN\",\n", " metadata_cols=\"Histologic_type\",\n", " mutations_filter=[\"Nonsense_Mutation\"])\n", "\n", "hist_and_PTEN.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exporting dataframes\n", "\n", "If you wish to export a dataframe to a file, simply call the dataframe's to_csv method, passing the path you wish to save the file to, and the value separator you want:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "hist_and_PTEN.to_csv(path_or_buf=\"histologic_type_and_PTEN_mutation.tsv\", sep='\\t')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }