{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use Case 7: Trans genetic effects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trans genetic effects occur when a DNA mutation in one gene affects a different gene. To better understand the effects of DNA mutation, we will investigate downstream proteins potentially influenced by these mutations. Two prominent cancer genes, ARID1A and TP53, will be examined." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Part I: ARID1A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ARID1A, a chromatin remodeling protein, may impact the transcription of numerous genes. We will analyze the proteins interacting with ARID1A to discover possible trans effects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Import Libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Begin by importing standard Python libraries, such as pandas and seaborn for data analysis and visualization, scipy.stats for statistical computations, matplotlib for creating static, animated, and interactive visualizations in Python, numpy for mathematical computations, and CPTAC (Clinical Proteomic Tumor Analysis Consortium) for accessing CPTAC data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scipy.stats\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import cptac\n", "import cptac.utils as ut\n", "\n", "en = cptac.Ucec()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will conduct our analysis using endometrial cancer data, but the methods used can be applied to other cancer types in the CPTAC dataset as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2: Retrieve Interacting Proteins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will acquire a list of proteins known to directly interact with ARID1A using the Bioplex process, which identifies proteins in direct physical contact. The CPTAC package offers a function called get_interacting_proteins_bioplex, which yields a list of proteins interacting with a specified gene." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interacting Proteins:\n", "['SMARCC2', 'DPF2', 'DPF1', 'SS18L2', 'SMARCE1', 'TEX13B', 'SMARCD1', 'WWP2', 'BCL7A', 'BCL7C', 'SS18', 'SMARCB1', 'DPF3']\n" ] } ], "source": [ "gene = \"ARID1A\"\n", "omics = \"proteomics\"\n", "interacting_proteins = ut.get_interacting_proteins_bioplex(gene)\n", "print(\"Interacting Proteins:\")\n", "print(interacting_proteins)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 3: Obtain Omics Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, create a new dataframe containing protein measurements for ARID1A and its interacting proteins using the en.join_omics_to_mutations method. If the proteomics data doesn't recognize one of the genes in your request, the method will raise a warning and fill the missing values with NaN." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: The following columns were not found in the umich proteomics dataframe, so they were inserted into joined table, but filled with NaN: DPF1, TEX13B (C:\\Users\\sabme\\anaconda3\\lib\\site-packages\\cptac\\cancers\\cancer.py, line 525)\n", "cptac warning: Your version of cptac (1.5.1) is out-of-date. Latest is 1.5.0. Please run 'pip install --upgrade cptac' to update it. (C:\\Users\\sabme\\anaconda3\\lib\\threading.py, line 910)\n", "cptac warning: In joining the somatic_mutation table, no mutations were found for the following samples, so they were filled with Wildtype_Tumor or Wildtype_Normal: 107 samples for the ARID1A gene (C:\\Users\\sabme\\anaconda3\\lib\\site-packages\\cptac\\cancers\\cancer.py, line 325)\n" ] }, { "data": { "text/html": [ "
| Name | \n", "BCL7A_umich_proteomics | \n", "BCL7A_umich_proteomics | \n", "BCL7C_umich_proteomics | \n", "DPF1_umich_proteomics | \n", "DPF2_umich_proteomics | \n", "DPF3_umich_proteomics | \n", "SMARCB1_umich_proteomics | \n", "SMARCC2_umich_proteomics | \n", "SMARCC2_umich_proteomics | \n", "SMARCD1_umich_proteomics | \n", "SMARCE1_umich_proteomics | \n", "SS18_umich_proteomics | \n", "SS18L2_umich_proteomics | \n", "TEX13B_umich_proteomics | \n", "WWP2_umich_proteomics | \n", "ARID1A_Mutation | \n", "ARID1A_Location | \n", "ARID1A_Mutation_Status_washu_somatic_mutation | \n", "ARID1A_Mutation_Status_washu_somatic_mutation | \n", "Sample_Status | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Patient_ID | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| C3L-00006 | \n", "NaN | \n", "0.342403 | \n", "-0.636899 | \n", "NaN | \n", "-0.341313 | \n", "NaN | \n", "-0.276047 | \n", "-0.518042 | \n", "-0.147223 | \n", "-0.180085 | \n", "-0.255385 | \n", "-0.184546 | \n", "NaN | \n", "NaN | \n", "0.102838 | \n", "[Missense_Mutation] | \n", "[p.T2121P] | \n", "Single_mutation | \n", "Single_mutation | \n", "Tumor | \n", "
| C3L-00008 | \n", "0.279261 | \n", "0.043996 | \n", "-0.853702 | \n", "NaN | \n", "-0.613538 | \n", "NaN | \n", "-0.412291 | \n", "-0.575983 | \n", "-0.913941 | \n", "-0.242887 | \n", "-0.506047 | \n", "-0.045882 | \n", "-0.069976 | \n", "NaN | \n", "0.283238 | \n", "[Nonsense_Mutation, Frame_Shift_Del] | \n", "[p.Q403*, p.D1850Tfs*33] | \n", "Multiple_mutation | \n", "Multiple_mutation | \n", "Tumor | \n", "
| C3L-00032 | \n", "NaN | \n", "0.012216 | \n", "-0.405616 | \n", "NaN | \n", "-0.311407 | \n", "NaN | \n", "-0.227223 | \n", "-0.550890 | \n", "-0.500229 | \n", "-0.609800 | \n", "-0.394279 | \n", "-0.353417 | \n", "-0.524786 | \n", "NaN | \n", "0.225262 | \n", "[Wildtype_Tumor] | \n", "[No_mutation] | \n", "NaN | \n", "NaN | \n", "Tumor | \n", "
| C3L-00084 | \n", "NaN | \n", "0.006604 | \n", "0.209216 | \n", "NaN | \n", "0.453395 | \n", "NaN | \n", "0.311589 | \n", "-0.041216 | \n", "-0.520971 | \n", "1.456642 | \n", "0.582811 | \n", "0.309990 | \n", "NaN | \n", "NaN | \n", "-0.191736 | \n", "[Wildtype_Tumor] | \n", "[No_mutation] | \n", "NaN | \n", "NaN | \n", "Tumor | \n", "
| C3L-00090 | \n", "NaN | \n", "0.548479 | \n", "-0.049807 | \n", "NaN | \n", "0.201228 | \n", "NaN | \n", "0.364734 | \n", "-0.063142 | \n", "0.412197 | \n", "0.210401 | \n", "0.170752 | \n", "0.061522 | \n", "0.056844 | \n", "NaN | \n", "0.071053 | \n", "[Wildtype_Tumor] | \n", "[No_mutation] | \n", "NaN | \n", "NaN | \n", "Tumor | \n", "