{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use Case 3: Associating Clinical Variables with Acetylation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this use case we will show how to analyze the acetylation data with a clinical attribute. We will use the clinical attribute \"Histologic_type\", but you can apply the processes shown here to many other clinical attributes. Our goal is to identify which acetylation sites differ significantly in frequency between non-tumor, serous and endometrial cells." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Import Packages and Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will start by importing the data analysis tools we need, importing the cptac package, and loading the Endometrial dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scipy.stats\n", "import statsmodels.stats.multitest\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import math\n", "import cptac\n", "import cptac.utils as ut\n", "\n", "cptac.download(dataset=\"endometrial\", version=\"latest\")\n", "en = cptac.Endometrial()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2: Choose Clinical Attribute and Join Dataframes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this use case, we will use the 'Histologic_type' clinical attribute in order to find differences in acetylation sites between \"endometrioid\" and \"serous\" cancer cells. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#Set desired attribute to variable 'clinical_attribute'\n", "clinical_attribute = \"Histologic_type\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will join our desired clinical attribute with our acetylation dataframe using the `en.join_metadata_to_omics` method." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| Name | \n", "Histologic_type | \n", "A2M_acetylproteomics_K1168 | \n", "A2M_acetylproteomics_K1176 | \n", "A2M_acetylproteomics_K135 | \n", "A2M_acetylproteomics_K145 | \n", "A2M_acetylproteomics_K516 | \n", "A2M_acetylproteomics_K664 | \n", "A2M_acetylproteomics_K682 | \n", "AACS_acetylproteomics_K391 | \n", "AAGAB_acetylproteomics_K290 | \n", "... | \n", "ZSCAN31_acetylproteomics_K215 | \n", "ZSCAN32_acetylproteomics_K659 | \n", "ZW10_acetylproteomics_K634 | \n", "ZYX_acetylproteomics_K24 | \n", "ZYX_acetylproteomics_K25 | \n", "ZYX_acetylproteomics_K265 | \n", "ZYX_acetylproteomics_K272 | \n", "ZYX_acetylproteomics_K279 | \n", "ZYX_acetylproteomics_K533 | \n", "ZZZ3_acetylproteomics_K117 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Patient_ID | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| C3L-00006 | \n", "Endometrioid | \n", "NaN | \n", "1.080 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.461 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| C3L-00008 | \n", "Endometrioid | \n", "NaN | \n", "0.477 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1.770 | \n", "... | \n", "-0.104 | \n", "-0.80300 | \n", "NaN | \n", "-0.988 | \n", "-0.343 | \n", "-0.307 | \n", "NaN | \n", "-0.0955 | \n", "NaN | \n", "NaN | \n", "
| C3L-00032 | \n", "Endometrioid | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "-0.815 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "-0.459 | \n", "-1.170 | \n", "NaN | \n", "NaN | \n", "-0.7050 | \n", "0.089 | \n", "NaN | \n", "
| C3L-00090 | \n", "Endometrioid | \n", "NaN | \n", "-0.608 | \n", "NaN | \n", "NaN | \n", "-0.919 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "-0.457 | \n", "-0.00175 | \n", "-0.33 | \n", "NaN | \n", "-0.537 | \n", "NaN | \n", "NaN | \n", "-0.3700 | \n", "NaN | \n", "NaN | \n", "
| C3L-00098 | \n", "Serous | \n", "NaN | \n", "1.630 | \n", "NaN | \n", "2.4 | \n", "NaN | \n", "NaN | \n", "1.26 | \n", "NaN | \n", "0.205 | \n", "... | \n", "NaN | \n", "0.41100 | \n", "NaN | \n", "NaN | \n", "-0.358 | \n", "NaN | \n", "NaN | \n", "-0.9700 | \n", "NaN | \n", "NaN | \n", "
5 rows × 10863 columns
\n", "| \n", " | Comparison | \n", "P_Value | \n", "
|---|---|---|
| 0 | \n", "TBL1XR1_acetylproteomics_K102 | \n", "0.000114 | \n", "
| 1 | \n", "FOXA2_acetylproteomics_K280 | \n", "0.000153 | \n", "
| 2 | \n", "SRRT_acetylproteomics_K720 | \n", "0.000257 | \n", "
| 3 | \n", "TOP2A_acetylproteomics_K1433 | \n", "0.000660 | \n", "
| 4 | \n", "NCL_acetylproteomics_K398 | \n", "0.001211 | \n", "
| 5 | \n", "MEAF6_acetylproteomics_K69 | \n", "0.002100 | \n", "
| 6 | \n", "JADE3_acetylproteomics_K32 | \n", "0.003546 | \n", "
| 7 | \n", "NOP2_acetylproteomics_K91 | \n", "0.008264 | \n", "
| 8 | \n", "TOP2A_acetylproteomics_K1422 | \n", "0.010327 | \n", "
| 9 | \n", "MCRS1_acetylproteomics_K136 | \n", "0.015518 | \n", "
| 10 | \n", "PRR15_acetylproteomics_K81 | \n", "0.027040 | \n", "
| 11 | \n", "FUS_acetylproteomics_K332 | \n", "0.035779 | \n", "
| 12 | \n", "SUPT16H_acetylproteomics_K674 | \n", "0.036792 | \n", "