{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 4: Multi-level column indices (`MultiIndex`)\n", "\n", "A MultiIndex is an index with multiple, hierarchical levels. We use a multiindex for the column headers of a dataframe. This allows each column to have multiple keys associated with it--one key for each level.\n", "\n", "The multiindex is a native part of the `pandas` package. For more documentation, see their [MultiIndex / advanced indexing page](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html).\n", "\n", "MultiIndexes make it easy to associate multiple identifiers with a single column. In this tutorial, we'll work with phosphoproteomics data from the endometrial cancer dataset.\n", "\n", "Phosphoproteomics is the large-scale study of proteins that have undergone a process called phosphorylation. This is particularly important in the context of disease, as the process is often disrupted in cancer cells.\n", "\n", "Let's start by importing our required package and loading the endometrial dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import cptac\n", "en = cptac.Ucec()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see the data sources available in our dataset." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Data type</th>\n", " <th>Available sources</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>CNV</td>\n", " <td>[bcm, washu]</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>circular_RNA</td>\n", " <td>[bcm]</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>miRNA</td>\n", " <td>[bcm, washu]</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>proteomics</td>\n", " <td>[bcm, umich]</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>transcriptomics</td>\n", " <td>[bcm, broad, washu]</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>ancestry_prediction</td>\n", " <td>[harmonized]</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>somatic_mutation</td>\n", " <td>[harmonized, washu]</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>clinical</td>\n", " <td>[mssm]</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>follow-up</td>\n", " <td>[mssm]</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>medical_history</td>\n", " <td>[mssm]</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>acetylproteomics</td>\n", " <td>[umich]</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>phosphoproteomics</td>\n", " <td>[umich]</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>cibersort</td>\n", " <td>[washu]</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>hla_typing</td>\n", " <td>[washu]</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>tumor_purity</td>\n", " <td>[washu]</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>xcell</td>\n", " <td>[washu]</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Data type Available sources\n", "0 CNV [bcm, washu]\n", "1 circular_RNA [bcm]\n", "2 miRNA [bcm, washu]\n", "3 proteomics [bcm, umich]\n", "4 transcriptomics [bcm, broad, washu]\n", "5 ancestry_prediction [harmonized]\n", "6 somatic_mutation [harmonized, washu]\n", "7 clinical [mssm]\n", "8 follow-up [mssm]\n", "9 medical_history [mssm]\n", "10 acetylproteomics [umich]\n", "11 phosphoproteomics [umich]\n", "12 cibersort [washu]\n", "13 hla_typing [washu]\n", "14 tumor_purity [washu]\n", "15 xcell [washu]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "en.list_data_sources()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we see different types of omics data (e.g., CNV, proteomics, phosphoproteomics) available from various sources (e.g., washu, umich).\n", "\n", "We will retrieve phosphoproteomics, proteomics and CNV data from the respective sources." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: Your version of cptac (1.5.1) is out-of-date. Latest is 1.5.0. Please run 'pip install --upgrade cptac' to update it. (C:\\Users\\sabme\\anaconda3\\lib\\threading.py, line 910)\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>A1BG</th>\n", " <th>A1CF</th>\n", " <th>A2M</th>\n", " <th>A2ML1</th>\n", " <th>A3GALT2</th>\n", " <th>A4GALT</th>\n", " <th>A4GNT</th>\n", " <th>AAAS</th>\n", " <th>AACS</th>\n", " <th>AADAC</th>\n", " <th>...</th>\n", " <th>ZW10</th>\n", " <th>ZWILCH</th>\n", " <th>ZWINT</th>\n", " <th>ZXDC</th>\n", " <th>ZYG11A</th>\n", " <th>ZYG11B</th>\n", " <th>ZYX</th>\n", " <th>ZZEF1</th>\n", " <th>ZZZ3</th>\n", " <th>pk</th>\n", " </tr>\n", " <tr>\n", " <th>Database_ID</th>\n", " <th>ENSG00000121410.10</th>\n", " <th>ENSG00000148584.13</th>\n", " <th>ENSG00000175899.13</th>\n", " <th>ENSG00000166535.18</th>\n", " <th>ENSG00000184389.9</th>\n", " <th>ENSG00000128274.14</th>\n", " <th>ENSG00000118017.3</th>\n", " <th>ENSG00000094914.11</th>\n", " <th>ENSG00000081760.15</th>\n", " <th>ENSG00000114771.12</th>\n", " <th>...</th>\n", " <th>ENSG00000086827.7</th>\n", " <th>ENSG00000174442.10</th>\n", " <th>ENSG00000122952.15</th>\n", " <th>ENSG00000070476.13</th>\n", " <th>ENSG00000203995.8</th>\n", " <th>ENSG00000162378.11</th>\n", " <th>ENSG00000159840.14</th>\n", " <th>ENSG00000074755.13</th>\n", " <th>ENSG00000036549.11</th>\n", " <th>ENSG00000091436.15</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.00659</td>\n", " <td>-0.01982</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01418</td>\n", " <td>-0.00839</td>\n", " <td>-0.01305</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01305</td>\n", " <td>...</td>\n", " <td>-0.01641</td>\n", " <td>-0.00963</td>\n", " <td>-0.01982</td>\n", " <td>-0.01305</td>\n", " <td>-0.01418</td>\n", " <td>-0.01418</td>\n", " <td>-0.01897</td>\n", " <td>-0.00529</td>\n", " <td>-0.01418</td>\n", " <td>-0.01480</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.02578</td>\n", " <td>0.00726</td>\n", " <td>0.01350</td>\n", " <td>0.01350</td>\n", " <td>0.00732</td>\n", " <td>0.01642</td>\n", " <td>0.01005</td>\n", " <td>0.01225</td>\n", " <td>0.01225</td>\n", " <td>0.01005</td>\n", " <td>...</td>\n", " <td>0.01583</td>\n", " <td>0.01844</td>\n", " <td>0.00726</td>\n", " <td>0.01005</td>\n", " <td>0.00732</td>\n", " <td>0.00732</td>\n", " <td>0.01200</td>\n", " <td>0.01969</td>\n", " <td>0.00732</td>\n", " <td>0.01121</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.01262</td>\n", " <td>0.00425</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>0.00166</td>\n", " <td>0.00549</td>\n", " <td>-0.00038</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>-0.00038</td>\n", " <td>...</td>\n", " <td>-0.00305</td>\n", " <td>0.00214</td>\n", " <td>0.00425</td>\n", " <td>-0.00038</td>\n", " <td>0.00166</td>\n", " <td>0.00166</td>\n", " <td>0.01408</td>\n", " <td>0.00683</td>\n", " <td>0.00166</td>\n", " <td>0.00208</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.00100</td>\n", " <td>0.41191</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.02436</td>\n", " <td>-0.01198</td>\n", " <td>-0.03307</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.03307</td>\n", " <td>...</td>\n", " <td>-0.01982</td>\n", " <td>-0.02071</td>\n", " <td>0.41191</td>\n", " <td>-0.61621</td>\n", " <td>-0.02436</td>\n", " <td>-0.02436</td>\n", " <td>-0.02182</td>\n", " <td>-0.00336</td>\n", " <td>-0.02436</td>\n", " <td>-0.02548</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00098</th>\n", " <td>1.01075</td>\n", " <td>0.27221</td>\n", " <td>-0.39802</td>\n", " <td>-0.39802</td>\n", " <td>0.00226</td>\n", " <td>0.31684</td>\n", " <td>0.31108</td>\n", " <td>-0.38507</td>\n", " <td>-0.41089</td>\n", " <td>0.31108</td>\n", " <td>...</td>\n", " <td>-0.39843</td>\n", " <td>-0.38591</td>\n", " <td>0.27221</td>\n", " <td>0.31108</td>\n", " <td>0.01711</td>\n", " <td>0.01711</td>\n", " <td>-0.01434</td>\n", " <td>-0.34344</td>\n", " <td>-0.01427</td>\n", " <td>0.53267</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>C3N-01520</th>\n", " <td>-0.05661</td>\n", " <td>-0.06508</td>\n", " <td>-0.06174</td>\n", " <td>-0.06174</td>\n", " <td>-0.05318</td>\n", " <td>-0.05054</td>\n", " <td>-0.06413</td>\n", " <td>-0.06174</td>\n", " <td>-0.06174</td>\n", " <td>-0.06413</td>\n", " <td>...</td>\n", " <td>-0.05110</td>\n", " <td>-0.05228</td>\n", " <td>-0.06508</td>\n", " <td>-0.06413</td>\n", " <td>-0.05318</td>\n", " <td>-0.05318</td>\n", " <td>0.37759</td>\n", " <td>-0.04914</td>\n", " <td>-0.05318</td>\n", " <td>-0.05533</td>\n", " </tr>\n", " <tr>\n", " <th>C3N-01521</th>\n", " <td>-0.36477</td>\n", " <td>-0.00244</td>\n", " <td>-0.06953</td>\n", " <td>-0.06953</td>\n", " <td>0.38241</td>\n", " <td>-0.34150</td>\n", " <td>0.31502</td>\n", " <td>-0.06953</td>\n", " <td>-0.04038</td>\n", " <td>0.31502</td>\n", " <td>...</td>\n", " <td>-0.05879</td>\n", " <td>-0.19572</td>\n", " <td>-0.00244</td>\n", " <td>0.79082</td>\n", " <td>0.38241</td>\n", " <td>0.38241</td>\n", " <td>0.58054</td>\n", " <td>-0.34785</td>\n", " <td>0.38241</td>\n", " <td>0.00842</td>\n", " </tr>\n", " <tr>\n", " <th>C3N-01537</th>\n", " <td>0.09203</td>\n", " <td>0.00535</td>\n", " <td>0.08807</td>\n", " <td>0.08807</td>\n", " <td>-0.19341</td>\n", " <td>-0.29275</td>\n", " <td>0.08943</td>\n", " <td>0.08807</td>\n", " <td>0.07883</td>\n", " <td>0.08943</td>\n", " <td>...</td>\n", " <td>-0.11330</td>\n", " <td>0.07121</td>\n", " <td>0.00535</td>\n", " <td>0.08943</td>\n", " <td>-0.10289</td>\n", " <td>-0.10289</td>\n", " <td>-0.01151</td>\n", " <td>-0.30164</td>\n", " <td>-0.10289</td>\n", " <td>0.02389</td>\n", " </tr>\n", " <tr>\n", " <th>C3N-01802</th>\n", " <td>-0.06298</td>\n", " <td>-0.04134</td>\n", " <td>0.05070</td>\n", " <td>0.05070</td>\n", " <td>-0.00420</td>\n", " <td>-0.18427</td>\n", " <td>0.17981</td>\n", " <td>0.08711</td>\n", " <td>-0.12670</td>\n", " <td>0.17981</td>\n", " <td>...</td>\n", " <td>0.14454</td>\n", " <td>0.05509</td>\n", " <td>-0.04134</td>\n", " <td>0.17981</td>\n", " <td>-0.12128</td>\n", " <td>-0.12128</td>\n", " <td>-0.06015</td>\n", " <td>0.14747</td>\n", " <td>-0.13738</td>\n", " <td>-0.01938</td>\n", " </tr>\n", " <tr>\n", " <th>C3N-01825</th>\n", " <td>0.12974</td>\n", " <td>0.03784</td>\n", " <td>0.11400</td>\n", " <td>0.11400</td>\n", " <td>0.04662</td>\n", " <td>0.01662</td>\n", " <td>0.13939</td>\n", " <td>0.11400</td>\n", " <td>-0.00039</td>\n", " <td>0.13939</td>\n", " <td>...</td>\n", " <td>0.02229</td>\n", " <td>-0.00842</td>\n", " <td>0.03784</td>\n", " <td>0.13939</td>\n", " <td>0.04662</td>\n", " <td>0.04662</td>\n", " <td>0.03222</td>\n", " <td>-0.02923</td>\n", " <td>0.04662</td>\n", " <td>0.02880</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>95 rows × 18919 columns</p>\n", "</div>" ], "text/plain": [ "Name A1BG A1CF A2M \\\n", "Database_ID ENSG00000121410.10 ENSG00000148584.13 ENSG00000175899.13 \n", "Patient_ID \n", "C3L-00006 -0.00659 -0.01982 -0.01402 \n", "C3L-00008 0.02578 0.00726 0.01350 \n", "C3L-00032 0.01262 0.00425 -0.00275 \n", "C3L-00090 0.00100 0.41191 -0.02299 \n", "C3L-00098 1.01075 0.27221 -0.39802 \n", "... ... ... ... \n", "C3N-01520 -0.05661 -0.06508 -0.06174 \n", "C3N-01521 -0.36477 -0.00244 -0.06953 \n", "C3N-01537 0.09203 0.00535 0.08807 \n", "C3N-01802 -0.06298 -0.04134 0.05070 \n", "C3N-01825 0.12974 0.03784 0.11400 \n", "\n", "Name A2ML1 A3GALT2 A4GALT \\\n", "Database_ID ENSG00000166535.18 ENSG00000184389.9 ENSG00000128274.14 \n", "Patient_ID \n", "C3L-00006 -0.01402 -0.01418 -0.00839 \n", "C3L-00008 0.01350 0.00732 0.01642 \n", "C3L-00032 -0.00275 0.00166 0.00549 \n", "C3L-00090 -0.02299 -0.02436 -0.01198 \n", "C3L-00098 -0.39802 0.00226 0.31684 \n", "... ... ... ... \n", "C3N-01520 -0.06174 -0.05318 -0.05054 \n", "C3N-01521 -0.06953 0.38241 -0.34150 \n", "C3N-01537 0.08807 -0.19341 -0.29275 \n", "C3N-01802 0.05070 -0.00420 -0.18427 \n", "C3N-01825 0.11400 0.04662 0.01662 \n", "\n", "Name A4GNT AAAS AACS \\\n", "Database_ID ENSG00000118017.3 ENSG00000094914.11 ENSG00000081760.15 \n", "Patient_ID \n", "C3L-00006 -0.01305 -0.01402 -0.01402 \n", "C3L-00008 0.01005 0.01225 0.01225 \n", "C3L-00032 -0.00038 -0.00275 -0.00275 \n", "C3L-00090 -0.03307 -0.02299 -0.02299 \n", "C3L-00098 0.31108 -0.38507 -0.41089 \n", "... ... ... ... \n", "C3N-01520 -0.06413 -0.06174 -0.06174 \n", "C3N-01521 0.31502 -0.06953 -0.04038 \n", "C3N-01537 0.08943 0.08807 0.07883 \n", "C3N-01802 0.17981 0.08711 -0.12670 \n", "C3N-01825 0.13939 0.11400 -0.00039 \n", "\n", "Name AADAC ... ZW10 ZWILCH \\\n", "Database_ID ENSG00000114771.12 ... ENSG00000086827.7 ENSG00000174442.10 \n", "Patient_ID ... \n", "C3L-00006 -0.01305 ... -0.01641 -0.00963 \n", "C3L-00008 0.01005 ... 0.01583 0.01844 \n", "C3L-00032 -0.00038 ... -0.00305 0.00214 \n", "C3L-00090 -0.03307 ... -0.01982 -0.02071 \n", "C3L-00098 0.31108 ... -0.39843 -0.38591 \n", "... ... ... ... ... \n", "C3N-01520 -0.06413 ... -0.05110 -0.05228 \n", "C3N-01521 0.31502 ... -0.05879 -0.19572 \n", "C3N-01537 0.08943 ... -0.11330 0.07121 \n", "C3N-01802 0.17981 ... 0.14454 0.05509 \n", "C3N-01825 0.13939 ... 0.02229 -0.00842 \n", "\n", "Name ZWINT ZXDC ZYG11A \\\n", "Database_ID ENSG00000122952.15 ENSG00000070476.13 ENSG00000203995.8 \n", "Patient_ID \n", "C3L-00006 -0.01982 -0.01305 -0.01418 \n", "C3L-00008 0.00726 0.01005 0.00732 \n", "C3L-00032 0.00425 -0.00038 0.00166 \n", "C3L-00090 0.41191 -0.61621 -0.02436 \n", "C3L-00098 0.27221 0.31108 0.01711 \n", "... ... ... ... \n", "C3N-01520 -0.06508 -0.06413 -0.05318 \n", "C3N-01521 -0.00244 0.79082 0.38241 \n", "C3N-01537 0.00535 0.08943 -0.10289 \n", "C3N-01802 -0.04134 0.17981 -0.12128 \n", "C3N-01825 0.03784 0.13939 0.04662 \n", "\n", "Name ZYG11B ZYX ZZEF1 \\\n", "Database_ID ENSG00000162378.11 ENSG00000159840.14 ENSG00000074755.13 \n", "Patient_ID \n", "C3L-00006 -0.01418 -0.01897 -0.00529 \n", "C3L-00008 0.00732 0.01200 0.01969 \n", "C3L-00032 0.00166 0.01408 0.00683 \n", "C3L-00090 -0.02436 -0.02182 -0.00336 \n", "C3L-00098 0.01711 -0.01434 -0.34344 \n", "... ... ... ... \n", "C3N-01520 -0.05318 0.37759 -0.04914 \n", "C3N-01521 0.38241 0.58054 -0.34785 \n", "C3N-01537 -0.10289 -0.01151 -0.30164 \n", "C3N-01802 -0.12128 -0.06015 0.14747 \n", "C3N-01825 0.04662 0.03222 -0.02923 \n", "\n", "Name ZZZ3 pk \n", "Database_ID ENSG00000036549.11 ENSG00000091436.15 \n", "Patient_ID \n", "C3L-00006 -0.01418 -0.01480 \n", "C3L-00008 0.00732 0.01121 \n", "C3L-00032 0.00166 0.00208 \n", "C3L-00090 -0.02436 -0.02548 \n", "C3L-00098 -0.01427 0.53267 \n", "... ... ... \n", "C3N-01520 -0.05318 -0.05533 \n", "C3N-01521 0.38241 0.00842 \n", "C3N-01537 -0.10289 0.02389 \n", "C3N-01802 -0.13738 -0.01938 \n", "C3N-01825 0.04662 0.02880 \n", "\n", "[95 rows x 18919 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Some data may take several minutes to load\n", "en.get_proteomics('umich')\n", "en.get_phosphoproteomics('umich')\n", "en.get_CNV('washu')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that our data is loaded, let's take a look at the structure of the phosphoproteomics data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>ARF5</th>\n", " <th>M6PR</th>\n", " <th colspan=\"8\" halign=\"left\">ESRRA</th>\n", " <th>...</th>\n", " <th colspan=\"2\" halign=\"left\">SCRIB</th>\n", " <th colspan=\"6\" halign=\"left\">TSGA10</th>\n", " <th colspan=\"2\" halign=\"left\">SVIL</th>\n", " </tr>\n", " <tr>\n", " <th>Site</th>\n", " <th>S137</th>\n", " <th>S267</th>\n", " <th>S19</th>\n", " <th>S22</th>\n", " <th>S19S22</th>\n", " <th>T31</th>\n", " <th>S19S22</th>\n", " <th>S19S22S27</th>\n", " <th>S19S22T31</th>\n", " <th>S27</th>\n", " <th>...</th>\n", " <th>S1575T1588S1594</th>\n", " <th>S1594</th>\n", " <th>S11</th>\n", " <th>S173</th>\n", " <th>S213</th>\n", " <th>S391</th>\n", " <th>S779</th>\n", " <th>S101</th>\n", " <th>S296</th>\n", " <th>S459</th>\n", " </tr>\n", " <tr>\n", " <th>Peptide</th>\n", " <th>QDMPNAMPVsELTDK</th>\n", " <th>GVGDDQLGEEsEERDDHLLPM</th>\n", " <th>AEPAsPDSPK</th>\n", " <th>AEPASPDsPK</th>\n", " <th>AEPAsPDsPK</th>\n", " <th>AEPASPDSPKGSSETEtEPPVALAPGPAPTR</th>\n", " <th>AEPAsPDsPKGSSETETEPPVALAPGPAPTR</th>\n", " <th>AEPAsPDsPKGSsETETEPPVALAPGPAPTR</th>\n", " <th>AEPAsPDsPKGSSETEtEPPVALAPGPAPTR</th>\n", " <th>GSsETETEPPVALAPGPAPTR</th>\n", " <th>...</th>\n", " <th>LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR</th>\n", " <th>LsPDFAEELR</th>\n", " <th>sPGRDPELQVEAAEVTTK</th>\n", " <th>sPSRLDSFVK</th>\n", " <th>RPsPTAR</th>\n", " <th>AMDTEsELGR</th>\n", " <th>GLDRsLEENLCYR;GLDRsLEENLCYRDF</th>\n", " <th>EVVSSQVDDLTsHNEHLCK</th>\n", " <th>DSEGDTPsLINWPSSK</th>\n", " <th>LPsPTVAR</th>\n", " </tr>\n", " <tr>\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000000412.3</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>...</th>\n", " <th>ENSP00000501177.1</th>\n", " <th>ENSP00000501177.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501312.1</th>\n", " <th>ENSP00000501521.1</th>\n", " <th>ENSP00000501521.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>NaN</td>\n", " <td>0.573633</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.304721</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.667426</td>\n", " <td>NaN</td>\n", " <td>0.905606</td>\n", " <td>NaN</td>\n", " <td>-0.069911</td>\n", " <td>-0.584774</td>\n", " <td>NaN</td>\n", " <td>-0.561657</td>\n", " <td>-0.652457</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.003632</td>\n", " <td>-0.393734</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.789193</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.488427</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.431599</td>\n", " <td>-1.079638</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>NaN</td>\n", " <td>-0.211020</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.131605</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.104862</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.439041</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>NaN</td>\n", " <td>0.220473</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.290506</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.399718</td>\n", " <td>-0.875016</td>\n", " <td>-0.579824</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.505807</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.521725</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>NaN</td>\n", " <td>0.161496</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.708453</td>\n", " <td>NaN</td>\n", " <td>0.405402</td>\n", " <td>1.253045</td>\n", " <td>NaN</td>\n", " <td>0.265813</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>1.069439</td>\n", " <td>NaN</td>\n", " <td>0.510268</td>\n", " <td>-1.889144</td>\n", " <td>NaN</td>\n", " <td>-0.592203</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.126482</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 85661 columns</p>\n", "</div>" ], "text/plain": [ "Name ARF5 M6PR ESRRA \\\n", "Site S137 S267 S19 \n", "Peptide QDMPNAMPVsELTDK GVGDDQLGEEsEERDDHLLPM AEPAsPDSPK \n", "Database_ID ENSP00000000233.5 ENSP00000000412.3 ENSP00000000442.6 \n", "Patient_ID \n", "C3L-00006 NaN 0.573633 NaN \n", "C3L-00008 0.003632 -0.393734 NaN \n", "C3L-00032 NaN -0.211020 NaN \n", "C3L-00084 NaN 0.220473 NaN \n", "C3L-00090 NaN 0.161496 NaN \n", "\n", "Name \\\n", "Site S22 S19S22 \n", "Peptide AEPASPDsPK AEPAsPDsPK \n", "Database_ID ENSP00000000442.6 ENSP00000000442.6 \n", "Patient_ID \n", "C3L-00006 NaN 0.304721 \n", "C3L-00008 NaN 0.789193 \n", "C3L-00032 NaN NaN \n", "C3L-00084 NaN -0.290506 \n", "C3L-00090 NaN 0.708453 \n", "\n", "Name \\\n", "Site T31 S19S22 \n", "Peptide AEPASPDSPKGSSETEtEPPVALAPGPAPTR AEPAsPDsPKGSSETETEPPVALAPGPAPTR \n", "Database_ID ENSP00000000442.6 ENSP00000000442.6 \n", "Patient_ID \n", "C3L-00006 NaN NaN \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 NaN NaN \n", "C3L-00090 NaN 0.405402 \n", "\n", "Name \\\n", "Site S19S22S27 S19S22T31 \n", "Peptide AEPAsPDsPKGSsETETEPPVALAPGPAPTR AEPAsPDsPKGSSETEtEPPVALAPGPAPTR \n", "Database_ID ENSP00000000442.6 ENSP00000000442.6 \n", "Patient_ID \n", "C3L-00006 NaN NaN \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 NaN NaN \n", "C3L-00090 1.253045 NaN \n", "\n", "Name ... \\\n", "Site S27 ... \n", "Peptide GSsETETEPPVALAPGPAPTR ... \n", "Database_ID ENSP00000000442.6 ... \n", "Patient_ID ... \n", "C3L-00006 NaN ... \n", "C3L-00008 NaN ... \n", "C3L-00032 0.131605 ... \n", "C3L-00084 NaN ... \n", "C3L-00090 0.265813 ... \n", "\n", "Name SCRIB \\\n", "Site S1575T1588S1594 S1594 \n", "Peptide LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR LsPDFAEELR \n", "Database_ID ENSP00000501177.1 ENSP00000501177.1 \n", "Patient_ID \n", "C3L-00006 NaN 0.667426 \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN 0.104862 \n", "C3L-00084 NaN 0.399718 \n", "C3L-00090 NaN 1.069439 \n", "\n", "Name TSGA10 \\\n", "Site S11 S173 S213 \n", "Peptide sPGRDPELQVEAAEVTTK sPSRLDSFVK RPsPTAR \n", "Database_ID ENSP00000501312.1 ENSP00000501312.1 ENSP00000501312.1 \n", "Patient_ID \n", "C3L-00006 NaN 0.905606 NaN \n", "C3L-00008 NaN -0.488427 NaN \n", "C3L-00032 NaN NaN NaN \n", "C3L-00084 -0.875016 -0.579824 NaN \n", "C3L-00090 NaN 0.510268 -1.889144 \n", "\n", "Name \\\n", "Site S391 S779 \n", "Peptide AMDTEsELGR GLDRsLEENLCYR;GLDRsLEENLCYRDF \n", "Database_ID ENSP00000501312.1 ENSP00000501312.1 \n", "Patient_ID \n", "C3L-00006 -0.069911 -0.584774 \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 NaN -0.505807 \n", "C3L-00090 NaN -0.592203 \n", "\n", "Name SVIL \n", "Site S101 S296 S459 \n", "Peptide EVVSSQVDDLTsHNEHLCK DSEGDTPsLINWPSSK LPsPTVAR \n", "Database_ID ENSP00000501312.1 ENSP00000501521.1 ENSP00000501521.1 \n", "Patient_ID \n", "C3L-00006 NaN -0.561657 -0.652457 \n", "C3L-00008 NaN -0.431599 -1.079638 \n", "C3L-00032 NaN NaN -1.439041 \n", "C3L-00084 NaN NaN -1.521725 \n", "C3L-00090 NaN NaN -1.126482 \n", "\n", "[5 rows x 85661 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display first 5 rows of phosphoproteomics data\n", "en.get_phosphoproteomics('umich').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Join functions with multiindices \n", "The join functions have been written to handle multiindices. More information on the join functions can be found in the joining_dataframes tutorial. \n", "An example of joining a multiindexed dataframe (in this case phosphoproteomics) with a non multiindexed dataframe (in this case CNV) is below. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>A1BG_washu_CNV</th>\n", " <th>A1CF_washu_CNV</th>\n", " <th>A2M_washu_CNV</th>\n", " <th>A2ML1_washu_CNV</th>\n", " <th>A3GALT2_washu_CNV</th>\n", " <th>A4GALT_washu_CNV</th>\n", " <th>A4GNT_washu_CNV</th>\n", " <th>AAAS_washu_CNV</th>\n", " <th>AACS_washu_CNV</th>\n", " <th>AADAC_washu_CNV</th>\n", " <th>...</th>\n", " <th colspan=\"2\" halign=\"left\">SCRIB_umich_phosphoproteomics</th>\n", " <th colspan=\"6\" halign=\"left\">TSGA10_umich_phosphoproteomics</th>\n", " <th colspan=\"2\" halign=\"left\">SVIL_umich_phosphoproteomics</th>\n", " </tr>\n", " <tr>\n", " <th>Site</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th>...</th>\n", " <th>S1575T1588S1594</th>\n", " <th>S1594</th>\n", " <th>S11</th>\n", " <th>S173</th>\n", " <th>S213</th>\n", " <th>S391</th>\n", " <th>S779</th>\n", " <th>S101</th>\n", " <th>S296</th>\n", " <th>S459</th>\n", " </tr>\n", " <tr>\n", " <th>Peptide</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th>...</th>\n", " <th>LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR</th>\n", " <th>LsPDFAEELR</th>\n", " <th>sPGRDPELQVEAAEVTTK</th>\n", " <th>sPSRLDSFVK</th>\n", " <th>RPsPTAR</th>\n", " <th>AMDTEsELGR</th>\n", " <th>GLDRsLEENLCYR;GLDRsLEENLCYRDF</th>\n", " <th>EVVSSQVDDLTsHNEHLCK</th>\n", " <th>DSEGDTPsLINWPSSK</th>\n", " <th>LPsPTVAR</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.00659</td>\n", " <td>-0.01982</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01418</td>\n", " <td>-0.00839</td>\n", " <td>-0.01305</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01305</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.667426</td>\n", " <td>NaN</td>\n", " <td>0.905606</td>\n", " <td>NaN</td>\n", " <td>-0.069911</td>\n", " <td>-0.584774</td>\n", " <td>NaN</td>\n", " <td>-0.561657</td>\n", " <td>-0.652457</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.02578</td>\n", " <td>0.00726</td>\n", " <td>0.01350</td>\n", " <td>0.01350</td>\n", " <td>0.00732</td>\n", " <td>0.01642</td>\n", " <td>0.01005</td>\n", " <td>0.01225</td>\n", " <td>0.01225</td>\n", " <td>0.01005</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.488427</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.431599</td>\n", " <td>-1.079638</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.01262</td>\n", " <td>0.00425</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>0.00166</td>\n", " <td>0.00549</td>\n", " <td>-0.00038</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>-0.00038</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.104862</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.439041</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.399718</td>\n", " <td>-0.875016</td>\n", " <td>-0.579824</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.505807</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.521725</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.00100</td>\n", " <td>0.41191</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.02436</td>\n", " <td>-0.01198</td>\n", " <td>-0.03307</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.03307</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>1.069439</td>\n", " <td>NaN</td>\n", " <td>0.510268</td>\n", " <td>-1.889144</td>\n", " <td>NaN</td>\n", " <td>-0.592203</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.126482</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 104580 columns</p>\n", "</div>" ], "text/plain": [ "Name A1BG_washu_CNV A1CF_washu_CNV A2M_washu_CNV A2ML1_washu_CNV \\\n", "Site \n", "Peptide \n", "Patient_ID \n", "C3L-00006 -0.00659 -0.01982 -0.01402 -0.01402 \n", "C3L-00008 0.02578 0.00726 0.01350 0.01350 \n", "C3L-00032 0.01262 0.00425 -0.00275 -0.00275 \n", "C3L-00084 NaN NaN NaN NaN \n", "C3L-00090 0.00100 0.41191 -0.02299 -0.02299 \n", "\n", "Name A3GALT2_washu_CNV A4GALT_washu_CNV A4GNT_washu_CNV AAAS_washu_CNV \\\n", "Site \n", "Peptide \n", "Patient_ID \n", "C3L-00006 -0.01418 -0.00839 -0.01305 -0.01402 \n", "C3L-00008 0.00732 0.01642 0.01005 0.01225 \n", "C3L-00032 0.00166 0.00549 -0.00038 -0.00275 \n", "C3L-00084 NaN NaN NaN NaN \n", "C3L-00090 -0.02436 -0.01198 -0.03307 -0.02299 \n", "\n", "Name AACS_washu_CNV AADAC_washu_CNV ... \\\n", "Site ... \n", "Peptide ... \n", "Patient_ID ... \n", "C3L-00006 -0.01402 -0.01305 ... \n", "C3L-00008 0.01225 0.01005 ... \n", "C3L-00032 -0.00275 -0.00038 ... \n", "C3L-00084 NaN NaN ... \n", "C3L-00090 -0.02299 -0.03307 ... \n", "\n", "Name SCRIB_umich_phosphoproteomics \\\n", "Site S1575T1588S1594 S1594 \n", "Peptide LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR LsPDFAEELR \n", "Patient_ID \n", "C3L-00006 NaN 0.667426 \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN 0.104862 \n", "C3L-00084 NaN 0.399718 \n", "C3L-00090 NaN 1.069439 \n", "\n", "Name TSGA10_umich_phosphoproteomics \\\n", "Site S11 S173 S213 S391 \n", "Peptide sPGRDPELQVEAAEVTTK sPSRLDSFVK RPsPTAR AMDTEsELGR \n", "Patient_ID \n", "C3L-00006 NaN 0.905606 NaN -0.069911 \n", "C3L-00008 NaN -0.488427 NaN NaN \n", "C3L-00032 NaN NaN NaN NaN \n", "C3L-00084 -0.875016 -0.579824 NaN NaN \n", "C3L-00090 NaN 0.510268 -1.889144 NaN \n", "\n", "Name \\\n", "Site S779 S101 \n", "Peptide GLDRsLEENLCYR;GLDRsLEENLCYRDF EVVSSQVDDLTsHNEHLCK \n", "Patient_ID \n", "C3L-00006 -0.584774 NaN \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 -0.505807 NaN \n", "C3L-00090 -0.592203 NaN \n", "\n", "Name SVIL_umich_phosphoproteomics \n", "Site S296 S459 \n", "Peptide DSEGDTPsLINWPSSK LPsPTVAR \n", "Patient_ID \n", "C3L-00006 -0.561657 -0.652457 \n", "C3L-00008 -0.431599 -1.079638 \n", "C3L-00032 NaN -1.439041 \n", "C3L-00084 NaN -1.521725 \n", "C3L-00090 NaN -1.126482 \n", "\n", "[5 rows x 104580 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "phospho_and_CNV = en.join_omics_to_omics(df1_name=\"CNV\", df2_name=\"phosphoproteomics\", df1_source='washu', df2_source = 'umich')\n", "phospho_and_CNV.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the C3L-00084 row doesn't have CNV data, it is filled in with NANs, so that it can be joined to the CNV dataframe. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# How to select from multiindex\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting based on all levels\n", "We can select single columns by passing the proper keys for all levels of the multiindex. For example, to get the proteomics for ARF5, we'd do the following:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.056513</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.549959</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.088681</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>-0.846555</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.539019</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00098</th>\n", " <td>-0.017370</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00136</th>\n", " <td>0.230347</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00137</th>\n", " <td>0.191915</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00139</th>\n", " <td>-0.410142</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00143</th>\n", " <td>-0.170514</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "Database_ID ENSP00000000233.5\n", "Patient_ID \n", "C3L-00006 -0.056513\n", "C3L-00008 0.549959\n", "C3L-00032 0.088681\n", "C3L-00084 -0.846555\n", "C3L-00090 0.539019\n", "C3L-00098 -0.017370\n", "C3L-00136 0.230347\n", "C3L-00137 0.191915\n", "C3L-00139 -0.410142\n", "C3L-00143 -0.170514" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prot = en.get_proteomics('umich')\n", "all_levels_selection = prot[\"ARF5\"]\n", "\n", "#Display the first 10 rows of the desired data\n", "all_levels_selection.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting based on one level\n", "We can easily select multiple columns from our multiindex dataframe, based on just the \"Name\" level of the multiindex:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>ARF5</th>\n", " <th>AKAP11</th>\n", " <th>ARHGEF5</th>\n", " <th>APPBP2</th>\n", " <th>AQR</th>\n", " <th>ACAP1</th>\n", " <th>ANO8</th>\n", " <th>AP3M2</th>\n", " <th>ASNS</th>\n", " <th>ALDH3A2</th>\n", " <th>...</th>\n", " <th>ABCC2</th>\n", " <th>ANKS1A</th>\n", " <th>AL034430.2</th>\n", " <th>AP5Z1</th>\n", " <th>ATP8B1</th>\n", " <th>AKR1B15</th>\n", " <th>AC004706.3</th>\n", " <th>ATAD3B</th>\n", " <th colspan=\"2\" halign=\"left\">ANK2</th>\n", " </tr>\n", " <tr>\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000025301.2</th>\n", " <th>ENSP00000056217.5</th>\n", " <th>ENSP00000083182.3</th>\n", " <th>ENSP00000156471.5</th>\n", " <th>ENSP00000158762.3</th>\n", " <th>ENSP00000159087.4</th>\n", " <th>ENSP00000174653.3</th>\n", " <th>ENSP00000175506.4</th>\n", " <th>ENSP00000176643.6</th>\n", " <th>...</th>\n", " <th>ENSP00000497274.1</th>\n", " <th>ENSP00000497393.1</th>\n", " <th>ENSP00000497510.1</th>\n", " <th>ENSP00000497815.1</th>\n", " <th>ENSP00000497896.1</th>\n", " <th>ENSP00000498877.1</th>\n", " <th>ENSP00000499350.1</th>\n", " <th>ENSP00000500094.1</th>\n", " <th>ENSP00000500102.1</th>\n", " <th>ENSP00000500937.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.056513</td>\n", " <td>-0.385278</td>\n", " <td>0.188877</td>\n", " <td>-0.059319</td>\n", " <td>0.276154</td>\n", " <td>-0.252270</td>\n", " <td>1.280740</td>\n", " <td>0.086567</td>\n", " <td>0.334008</td>\n", " <td>1.048464</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.454359</td>\n", " <td>1.346643</td>\n", " <td>-0.186762</td>\n", " <td>-0.361594</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.549959</td>\n", " <td>-0.491451</td>\n", " <td>0.277281</td>\n", " <td>0.225857</td>\n", " <td>0.400321</td>\n", " <td>-0.485365</td>\n", " <td>NaN</td>\n", " <td>-0.544367</td>\n", " <td>1.634042</td>\n", " <td>-0.848812</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.438931</td>\n", " <td>0.250021</td>\n", " <td>0.005658</td>\n", " <td>1.065706</td>\n", " <td>-0.310341</td>\n", " <td>0.060549</td>\n", " <td>NaN</td>\n", " <td>-0.62873</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.088681</td>\n", " <td>0.203899</td>\n", " <td>0.261918</td>\n", " <td>0.192734</td>\n", " <td>-0.244333</td>\n", " <td>0.169655</td>\n", " <td>NaN</td>\n", " <td>0.223638</td>\n", " <td>0.358561</td>\n", " <td>-0.314030</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.411541</td>\n", " <td>0.043151</td>\n", " <td>0.461451</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.300528</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>-0.846555</td>\n", " <td>-0.286751</td>\n", " <td>-0.468015</td>\n", " <td>0.249142</td>\n", " <td>0.013797</td>\n", " <td>-0.606966</td>\n", " <td>-0.303256</td>\n", " <td>-0.398076</td>\n", " <td>1.017079</td>\n", " <td>-0.385280</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>1.423702</td>\n", " <td>-0.524652</td>\n", " <td>0.111429</td>\n", " <td>0.172027</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.102475</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.539019</td>\n", " <td>-0.098589</td>\n", " <td>0.605331</td>\n", " <td>0.571185</td>\n", " <td>0.178541</td>\n", " <td>-0.567123</td>\n", " <td>NaN</td>\n", " <td>0.053186</td>\n", " <td>0.390269</td>\n", " <td>1.059128</td>\n", " <td>...</td>\n", " <td>-0.737756</td>\n", " <td>NaN</td>\n", " <td>0.580644</td>\n", " <td>-0.108808</td>\n", " <td>0.429643</td>\n", " <td>-0.218494</td>\n", " <td>NaN</td>\n", " <td>0.314156</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 1001 columns</p>\n", "</div>" ], "text/plain": [ "Name ARF5 AKAP11 ARHGEF5 \\\n", "Database_ID ENSP00000000233.5 ENSP00000025301.2 ENSP00000056217.5 \n", "Patient_ID \n", "C3L-00006 -0.056513 -0.385278 0.188877 \n", "C3L-00008 0.549959 -0.491451 0.277281 \n", "C3L-00032 0.088681 0.203899 0.261918 \n", "C3L-00084 -0.846555 -0.286751 -0.468015 \n", "C3L-00090 0.539019 -0.098589 0.605331 \n", "\n", "Name APPBP2 AQR ACAP1 \\\n", "Database_ID ENSP00000083182.3 ENSP00000156471.5 ENSP00000158762.3 \n", "Patient_ID \n", "C3L-00006 -0.059319 0.276154 -0.252270 \n", "C3L-00008 0.225857 0.400321 -0.485365 \n", "C3L-00032 0.192734 -0.244333 0.169655 \n", "C3L-00084 0.249142 0.013797 -0.606966 \n", "C3L-00090 0.571185 0.178541 -0.567123 \n", "\n", "Name ANO8 AP3M2 ASNS \\\n", "Database_ID ENSP00000159087.4 ENSP00000174653.3 ENSP00000175506.4 \n", "Patient_ID \n", "C3L-00006 1.280740 0.086567 0.334008 \n", "C3L-00008 NaN -0.544367 1.634042 \n", "C3L-00032 NaN 0.223638 0.358561 \n", "C3L-00084 -0.303256 -0.398076 1.017079 \n", "C3L-00090 NaN 0.053186 0.390269 \n", "\n", "Name ALDH3A2 ... ABCC2 ANKS1A \\\n", "Database_ID ENSP00000176643.6 ... ENSP00000497274.1 ENSP00000497393.1 \n", "Patient_ID ... \n", "C3L-00006 1.048464 ... NaN 0.454359 \n", "C3L-00008 -0.848812 ... NaN 0.438931 \n", "C3L-00032 -0.314030 ... NaN NaN \n", "C3L-00084 -0.385280 ... NaN 1.423702 \n", "C3L-00090 1.059128 ... -0.737756 NaN \n", "\n", "Name AL034430.2 AP5Z1 ATP8B1 \\\n", "Database_ID ENSP00000497510.1 ENSP00000497815.1 ENSP00000497896.1 \n", "Patient_ID \n", "C3L-00006 1.346643 -0.186762 -0.361594 \n", "C3L-00008 0.250021 0.005658 1.065706 \n", "C3L-00032 0.411541 0.043151 0.461451 \n", "C3L-00084 -0.524652 0.111429 0.172027 \n", "C3L-00090 0.580644 -0.108808 0.429643 \n", "\n", "Name AKR1B15 AC004706.3 ATAD3B \\\n", "Database_ID ENSP00000498877.1 ENSP00000499350.1 ENSP00000500094.1 \n", "Patient_ID \n", "C3L-00006 NaN NaN NaN \n", "C3L-00008 -0.310341 0.060549 NaN \n", "C3L-00032 NaN NaN 0.300528 \n", "C3L-00084 NaN NaN 0.102475 \n", "C3L-00090 -0.218494 NaN 0.314156 \n", "\n", "Name ANK2 \n", "Database_ID ENSP00000500102.1 ENSP00000500937.1 \n", "Patient_ID \n", "C3L-00006 NaN NaN \n", "C3L-00008 -0.62873 NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 NaN NaN \n", "C3L-00090 NaN NaN \n", "\n", "[5 rows x 1001 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene1_filter = prot.columns.get_level_values(\"Name\").str.startswith(\"A\") # Select all columns where the gene starts with \"A\". This will grab every column where the key \"Name\" starts with AA\n", "gene1_data = prot.loc[:, gene1_filter]\n", "gene1_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting based on a different level of the multiindex\n", "We can also select based on one of the inner levels of the multiindex. For example, to get data for all tyrosine phosphorylation sites:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Patient_ID</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: [C3L-00006, C3L-00008, C3L-00032, C3L-00084, C3L-00090]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_site_filter = prot.columns.get_level_values(\"Database_ID\").str.contains(\"ENSp\") # Create a boolean filter selecting all columns where the Site level contains a \"Y\"\n", "\n", "y_sites = prot.loc[:, y_site_filter] # Select the columns\n", "y_sites.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# How to use `cptac.utils.reduce_multiindex()`\n", "To make it easier to work with multi-level indices, we provide the `reduce_multiindex` function, available for import from the `cptac.utils` submodule. It can both drop levels from a multiindex, and \"flatten\" a multi-level index into a single-level index by concatenating the keys from multiple levels into a single key for each column." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import cptac.utils as ut" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dropping Levels\n", "We can drop levels based on index or name. We can also drop single or multiple levels at once. \n", "Note that it will warn you if duplicate column key combinations arise due to dropping levels. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dropping by index or name" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "cptac warning: Due to dropping the specified levels, dataframe now has 1299 duplicated column headers. (C:\\Users\\sabme\\AppData\\Local\\Temp\\ipykernel_21892\\2675409348.py, line 1)\n" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Name</th>\n", " <th>ARF5</th>\n", " <th>M6PR</th>\n", " <th>ESRRA</th>\n", " <th>FKBP4</th>\n", " <th>NDUFAF7</th>\n", " <th>FUCA2</th>\n", " <th>DBNDD1</th>\n", " <th>SEMA3F</th>\n", " <th>CFTR</th>\n", " <th>CYP51A1</th>\n", " <th>...</th>\n", " <th>SCRIB</th>\n", " <th>WIZ</th>\n", " <th>BPIFB4</th>\n", " <th>LDB1</th>\n", " <th>WIZ</th>\n", " <th>TSGA10</th>\n", " <th>RFX7</th>\n", " <th>SWSAP1</th>\n", " <th>MSANTD2</th>\n", " <th>SVIL</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.056513</td>\n", " <td>0.016557</td>\n", " <td>0.002569</td>\n", " <td>0.389819</td>\n", " <td>0.603610</td>\n", " <td>-0.332543</td>\n", " <td>-0.790426</td>\n", " <td>NaN</td>\n", " <td>0.822732</td>\n", " <td>0.039134</td>\n", " <td>...</td>\n", " <td>0.161720</td>\n", " <td>-0.884807</td>\n", " <td>NaN</td>\n", " <td>0.268247</td>\n", " <td>0.125392</td>\n", " <td>-0.880833</td>\n", " <td>0.108554</td>\n", " <td>0.107413</td>\n", " <td>-0.085833</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.549959</td>\n", " <td>-0.206129</td>\n", " <td>0.905784</td>\n", " <td>-0.303631</td>\n", " <td>0.018767</td>\n", " <td>0.503513</td>\n", " <td>0.950955</td>\n", " <td>0.080142</td>\n", " <td>NaN</td>\n", " <td>-0.063213</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.054284</td>\n", " <td>NaN</td>\n", " <td>-0.106450</td>\n", " <td>0.380557</td>\n", " <td>-0.756099</td>\n", " <td>0.264611</td>\n", " <td>0.044423</td>\n", " <td>-0.248319</td>\n", " <td>-1.206596</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.088681</td>\n", " <td>-0.154447</td>\n", " <td>-0.190515</td>\n", " <td>0.170753</td>\n", " <td>0.196356</td>\n", " <td>0.544194</td>\n", " <td>-0.179078</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.377405</td>\n", " <td>...</td>\n", " <td>-1.086905</td>\n", " <td>0.055991</td>\n", " <td>NaN</td>\n", " <td>-0.021986</td>\n", " <td>-0.229645</td>\n", " <td>1.923986</td>\n", " <td>NaN</td>\n", " <td>-0.176694</td>\n", " <td>-0.332384</td>\n", " <td>-1.330653</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>-0.846555</td>\n", " <td>0.027740</td>\n", " <td>NaN</td>\n", " <td>0.178700</td>\n", " <td>0.264054</td>\n", " <td>-0.183548</td>\n", " <td>0.077215</td>\n", " <td>-0.247164</td>\n", " <td>0.152277</td>\n", " <td>-0.279549</td>\n", " <td>...</td>\n", " <td>-0.125796</td>\n", " <td>0.944212</td>\n", " <td>NaN</td>\n", " <td>0.917409</td>\n", " <td>0.026862</td>\n", " <td>-0.885976</td>\n", " <td>-0.006510</td>\n", " <td>-0.014162</td>\n", " <td>0.365158</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.539019</td>\n", " <td>0.956619</td>\n", " <td>-0.039516</td>\n", " <td>0.323656</td>\n", " <td>0.064605</td>\n", " <td>0.173433</td>\n", " <td>-0.524325</td>\n", " <td>-0.038590</td>\n", " <td>-0.311486</td>\n", " <td>0.309905</td>\n", " <td>...</td>\n", " <td>0.853362</td>\n", " <td>-0.716947</td>\n", " <td>NaN</td>\n", " <td>-0.286277</td>\n", " <td>-0.046076</td>\n", " <td>0.089645</td>\n", " <td>-0.444506</td>\n", " <td>-0.072531</td>\n", " <td>-0.463495</td>\n", " <td>NaN</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 12662 columns</p>\n", "</div>" ], "text/plain": [ "Name ARF5 M6PR ESRRA FKBP4 NDUFAF7 FUCA2 \\\n", "Patient_ID \n", "C3L-00006 -0.056513 0.016557 0.002569 0.389819 0.603610 -0.332543 \n", "C3L-00008 0.549959 -0.206129 0.905784 -0.303631 0.018767 0.503513 \n", "C3L-00032 0.088681 -0.154447 -0.190515 0.170753 0.196356 0.544194 \n", "C3L-00084 -0.846555 0.027740 NaN 0.178700 0.264054 -0.183548 \n", "C3L-00090 0.539019 0.956619 -0.039516 0.323656 0.064605 0.173433 \n", "\n", "Name DBNDD1 SEMA3F CFTR CYP51A1 ... SCRIB WIZ \\\n", "Patient_ID ... \n", "C3L-00006 -0.790426 NaN 0.822732 0.039134 ... 0.161720 -0.884807 \n", "C3L-00008 0.950955 0.080142 NaN -0.063213 ... NaN 0.054284 \n", "C3L-00032 -0.179078 NaN NaN 0.377405 ... -1.086905 0.055991 \n", "C3L-00084 0.077215 -0.247164 0.152277 -0.279549 ... -0.125796 0.944212 \n", "C3L-00090 -0.524325 -0.038590 -0.311486 0.309905 ... 0.853362 -0.716947 \n", "\n", "Name BPIFB4 LDB1 WIZ TSGA10 RFX7 SWSAP1 \\\n", "Patient_ID \n", "C3L-00006 NaN 0.268247 0.125392 -0.880833 0.108554 0.107413 \n", "C3L-00008 NaN -0.106450 0.380557 -0.756099 0.264611 0.044423 \n", "C3L-00032 NaN -0.021986 -0.229645 1.923986 NaN -0.176694 \n", "C3L-00084 NaN 0.917409 0.026862 -0.885976 -0.006510 -0.014162 \n", "C3L-00090 NaN -0.286277 -0.046076 0.089645 -0.444506 -0.072531 \n", "\n", "Name MSANTD2 SVIL \n", "Patient_ID \n", "C3L-00006 -0.085833 NaN \n", "C3L-00008 -0.248319 -1.206596 \n", "C3L-00032 -0.332384 -1.330653 \n", "C3L-00084 0.365158 NaN \n", "C3L-00090 -0.463495 NaN \n", "\n", "[5 rows x 12662 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ut.reduce_multiindex(df=prot, levels_to_drop=\"Database_ID\").head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dropping single or multiple levels at once\n", "By passing a list (or array-like) to levels_to drop, we can drop multiple levels of the multiindex at the same time. Note that we must leave at least one existing level. \n", "\n", "We will show this with the colon data." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>ARF5</th>\n", " <th>M6PR</th>\n", " <th>ESRRA</th>\n", " <th>FKBP4</th>\n", " <th>NDUFAF7</th>\n", " <th>FUCA2</th>\n", " <th>CFTR</th>\n", " <th>CYP51A1</th>\n", " <th>USP28</th>\n", " <th>TMEM176A</th>\n", " <th>...</th>\n", " <th>TMUB1</th>\n", " <th>CSNK1A1</th>\n", " <th>MICAL2</th>\n", " <th>ANK2</th>\n", " <th>SEPTIN7</th>\n", " <th>ATAD3B</th>\n", " <th>ETNK1</th>\n", " <th>MYO6</th>\n", " <th>WIZ</th>\n", " <th>HSPA12A</th>\n", " </tr>\n", " <tr>\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000000412.3</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000001008.4</th>\n", " <th>ENSP00000002125.4</th>\n", " <th>ENSP00000002165.5</th>\n", " <th>ENSP00000003084.6</th>\n", " <th>ENSP00000003100.8</th>\n", " <th>ENSP00000003302.4</th>\n", " <th>ENSP00000004103.3</th>\n", " <th>...</th>\n", " <th>ENSP00000499339.1</th>\n", " <th>ENSP00000499757.1</th>\n", " <th>ENSP00000499778.1</th>\n", " <th>ENSP00000499869.1</th>\n", " <th>ENSP00000499937.1</th>\n", " <th>ENSP00000500094.1</th>\n", " <th>ENSP00000500633.1</th>\n", " <th>ENSP00000500710.1</th>\n", " <th>ENSP00000501300.1</th>\n", " <th>ENSP00000501491.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ "Name ARF5 M6PR ESRRA \\\n", "Database_ID ENSP00000000233.5 ENSP00000000412.3 ENSP00000000442.6 \n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 -0.283633 \n", "01CO006 0.188931 0.544620 NaN \n", "01CO008 0.404810 -0.246523 -0.053940 \n", "01CO013 -0.276982 -0.017659 NaN \n", "01CO014 -0.160155 0.100022 0.259696 \n", "\n", "Name FKBP4 NDUFAF7 FUCA2 \\\n", "Database_ID ENSP00000001008.4 ENSP00000002125.4 ENSP00000002165.5 \n", "Patient_ID \n", "01CO005 -0.612614 0.514855 -0.824026 \n", "01CO006 -0.571640 -0.209734 0.799090 \n", "01CO008 0.252995 0.190861 0.101419 \n", "01CO013 -0.455055 0.500686 -0.350366 \n", "01CO014 0.341345 -0.310265 0.095461 \n", "\n", "Name CFTR CYP51A1 USP28 \\\n", "Database_ID ENSP00000003084.6 ENSP00000003100.8 ENSP00000003302.4 \n", "Patient_ID \n", "01CO005 NaN 0.045383 NaN \n", "01CO006 NaN -0.338493 -0.042567 \n", "01CO008 -0.502876 0.627060 0.089815 \n", "01CO013 NaN 0.263168 0.683830 \n", "01CO014 -0.745855 1.006614 NaN \n", "\n", "Name TMEM176A ... TMUB1 CSNK1A1 \\\n", "Database_ID ENSP00000004103.3 ... ENSP00000499339.1 ENSP00000499757.1 \n", "Patient_ID ... \n", "01CO005 -0.248511 ... NaN NaN \n", "01CO006 NaN ... -0.411664 -0.454109 \n", "01CO008 -0.106411 ... 0.192279 -0.558236 \n", "01CO013 NaN ... 0.220231 NaN \n", "01CO014 NaN ... -0.198671 0.226146 \n", "\n", "Name MICAL2 ANK2 SEPTIN7 \\\n", "Database_ID ENSP00000499778.1 ENSP00000499869.1 ENSP00000499937.1 \n", "Patient_ID \n", "01CO005 -0.042548 NaN NaN \n", "01CO006 -0.725892 NaN NaN \n", "01CO008 -0.093708 -1.874293 -0.248307 \n", "01CO013 0.241860 -3.939263 NaN \n", "01CO014 0.036229 NaN NaN \n", "\n", "Name ATAD3B ETNK1 MYO6 \\\n", "Database_ID ENSP00000500094.1 ENSP00000500633.1 ENSP00000500710.1 \n", "Patient_ID \n", "01CO005 0.925011 -0.173468 -0.180521 \n", "01CO006 -0.707588 -0.846624 0.329813 \n", "01CO008 -0.899186 -0.526260 0.668713 \n", "01CO013 0.514931 -0.078267 0.122032 \n", "01CO014 1.189468 0.117736 0.586529 \n", "\n", "Name WIZ HSPA12A \n", "Database_ID ENSP00000501300.1 ENSP00000501491.1 \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colon = cptac.Coad()\n", "prot = colon.get_proteomics('umich')\n", "prot.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000000412.3</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000001008.4</th>\n", " <th>ENSP00000002125.4</th>\n", " <th>ENSP00000002165.5</th>\n", " <th>ENSP00000003084.6</th>\n", " <th>ENSP00000003100.8</th>\n", " <th>ENSP00000003302.4</th>\n", " <th>ENSP00000004103.3</th>\n", " <th>...</th>\n", " <th>ENSP00000499339.1</th>\n", " <th>ENSP00000499757.1</th>\n", " <th>ENSP00000499778.1</th>\n", " <th>ENSP00000499869.1</th>\n", " <th>ENSP00000499937.1</th>\n", " <th>ENSP00000500094.1</th>\n", " <th>ENSP00000500633.1</th>\n", " <th>ENSP00000500710.1</th>\n", " <th>ENSP00000501300.1</th>\n", " <th>ENSP00000501491.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ "Database_ID ENSP00000000233.5 ENSP00000000412.3 ENSP00000000442.6 \\\n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 -0.283633 \n", "01CO006 0.188931 0.544620 NaN \n", "01CO008 0.404810 -0.246523 -0.053940 \n", "01CO013 -0.276982 -0.017659 NaN \n", "01CO014 -0.160155 0.100022 0.259696 \n", "\n", "Database_ID ENSP00000001008.4 ENSP00000002125.4 ENSP00000002165.5 \\\n", "Patient_ID \n", "01CO005 -0.612614 0.514855 -0.824026 \n", "01CO006 -0.571640 -0.209734 0.799090 \n", "01CO008 0.252995 0.190861 0.101419 \n", "01CO013 -0.455055 0.500686 -0.350366 \n", "01CO014 0.341345 -0.310265 0.095461 \n", "\n", "Database_ID ENSP00000003084.6 ENSP00000003100.8 ENSP00000003302.4 \\\n", "Patient_ID \n", "01CO005 NaN 0.045383 NaN \n", "01CO006 NaN -0.338493 -0.042567 \n", "01CO008 -0.502876 0.627060 0.089815 \n", "01CO013 NaN 0.263168 0.683830 \n", "01CO014 -0.745855 1.006614 NaN \n", "\n", "Database_ID ENSP00000004103.3 ... ENSP00000499339.1 ENSP00000499757.1 \\\n", "Patient_ID ... \n", "01CO005 -0.248511 ... NaN NaN \n", "01CO006 NaN ... -0.411664 -0.454109 \n", "01CO008 -0.106411 ... 0.192279 -0.558236 \n", "01CO013 NaN ... 0.220231 NaN \n", "01CO014 NaN ... -0.198671 0.226146 \n", "\n", "Database_ID ENSP00000499778.1 ENSP00000499869.1 ENSP00000499937.1 \\\n", "Patient_ID \n", "01CO005 -0.042548 NaN NaN \n", "01CO006 -0.725892 NaN NaN \n", "01CO008 -0.093708 -1.874293 -0.248307 \n", "01CO013 0.241860 -3.939263 NaN \n", "01CO014 0.036229 NaN NaN \n", "\n", "Database_ID ENSP00000500094.1 ENSP00000500633.1 ENSP00000500710.1 \\\n", "Patient_ID \n", "01CO005 0.925011 -0.173468 -0.180521 \n", "01CO006 -0.707588 -0.846624 0.329813 \n", "01CO008 -0.899186 -0.526260 0.668713 \n", "01CO013 0.514931 -0.078267 0.122032 \n", "01CO014 1.189468 0.117736 0.586529 \n", "\n", "Database_ID ENSP00000501300.1 ENSP00000501491.1 \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Drop level 'Name'\n", "ut.reduce_multiindex(df=prot, levels_to_drop='Name').head()\n", "#You can also pass a list in order to drop multiple levels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining levels (Flattening)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can combine levels of a multiindexed dataframe. When combined the levels will be sepereated by an underscore, by default. We could specify a different seperator using the `sep` parameter." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Name</th>\n", " <th>ARF5_ENSP00000000233.5</th>\n", " <th>M6PR_ENSP00000000412.3</th>\n", " <th>ESRRA_ENSP00000000442.6</th>\n", " <th>FKBP4_ENSP00000001008.4</th>\n", " <th>NDUFAF7_ENSP00000002125.4</th>\n", " <th>FUCA2_ENSP00000002165.5</th>\n", " <th>CFTR_ENSP00000003084.6</th>\n", " <th>CYP51A1_ENSP00000003100.8</th>\n", " <th>USP28_ENSP00000003302.4</th>\n", " <th>TMEM176A_ENSP00000004103.3</th>\n", " <th>...</th>\n", " <th>TMUB1_ENSP00000499339.1</th>\n", " <th>CSNK1A1_ENSP00000499757.1</th>\n", " <th>MICAL2_ENSP00000499778.1</th>\n", " <th>ANK2_ENSP00000499869.1</th>\n", " <th>SEPTIN7_ENSP00000499937.1</th>\n", " <th>ATAD3B_ENSP00000500094.1</th>\n", " <th>ETNK1_ENSP00000500633.1</th>\n", " <th>MYO6_ENSP00000500710.1</th>\n", " <th>WIZ_ENSP00000501300.1</th>\n", " <th>HSPA12A_ENSP00000501491.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ "Name ARF5_ENSP00000000233.5 M6PR_ENSP00000000412.3 \\\n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 \n", "01CO006 0.188931 0.544620 \n", "01CO008 0.404810 -0.246523 \n", "01CO013 -0.276982 -0.017659 \n", "01CO014 -0.160155 0.100022 \n", "\n", "Name ESRRA_ENSP00000000442.6 FKBP4_ENSP00000001008.4 \\\n", "Patient_ID \n", "01CO005 -0.283633 -0.612614 \n", "01CO006 NaN -0.571640 \n", "01CO008 -0.053940 0.252995 \n", "01CO013 NaN -0.455055 \n", "01CO014 0.259696 0.341345 \n", "\n", "Name NDUFAF7_ENSP00000002125.4 FUCA2_ENSP00000002165.5 \\\n", "Patient_ID \n", "01CO005 0.514855 -0.824026 \n", "01CO006 -0.209734 0.799090 \n", "01CO008 0.190861 0.101419 \n", "01CO013 0.500686 -0.350366 \n", "01CO014 -0.310265 0.095461 \n", "\n", "Name CFTR_ENSP00000003084.6 CYP51A1_ENSP00000003100.8 \\\n", "Patient_ID \n", "01CO005 NaN 0.045383 \n", "01CO006 NaN -0.338493 \n", "01CO008 -0.502876 0.627060 \n", "01CO013 NaN 0.263168 \n", "01CO014 -0.745855 1.006614 \n", "\n", "Name USP28_ENSP00000003302.4 TMEM176A_ENSP00000004103.3 ... \\\n", "Patient_ID ... \n", "01CO005 NaN -0.248511 ... \n", "01CO006 -0.042567 NaN ... \n", "01CO008 0.089815 -0.106411 ... \n", "01CO013 0.683830 NaN ... \n", "01CO014 NaN NaN ... \n", "\n", "Name TMUB1_ENSP00000499339.1 CSNK1A1_ENSP00000499757.1 \\\n", "Patient_ID \n", "01CO005 NaN NaN \n", "01CO006 -0.411664 -0.454109 \n", "01CO008 0.192279 -0.558236 \n", "01CO013 0.220231 NaN \n", "01CO014 -0.198671 0.226146 \n", "\n", "Name MICAL2_ENSP00000499778.1 ANK2_ENSP00000499869.1 \\\n", "Patient_ID \n", "01CO005 -0.042548 NaN \n", "01CO006 -0.725892 NaN \n", "01CO008 -0.093708 -1.874293 \n", "01CO013 0.241860 -3.939263 \n", "01CO014 0.036229 NaN \n", "\n", "Name SEPTIN7_ENSP00000499937.1 ATAD3B_ENSP00000500094.1 \\\n", "Patient_ID \n", "01CO005 NaN 0.925011 \n", "01CO006 NaN -0.707588 \n", "01CO008 -0.248307 -0.899186 \n", "01CO013 NaN 0.514931 \n", "01CO014 NaN 1.189468 \n", "\n", "Name ETNK1_ENSP00000500633.1 MYO6_ENSP00000500710.1 \\\n", "Patient_ID \n", "01CO005 -0.173468 -0.180521 \n", "01CO006 -0.846624 0.329813 \n", "01CO008 -0.526260 0.668713 \n", "01CO013 -0.078267 0.122032 \n", "01CO014 0.117736 0.586529 \n", "\n", "Name WIZ_ENSP00000501300.1 HSPA12A_ENSP00000501491.1 \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ut.reduce_multiindex(df=prot, flatten=True).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When flatteing levels , NaNs and empty strings will automitically be dropped." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead tr th {\n", " text-align: left;\n", " }\n", "\n", " .dataframe thead tr:last-of-type th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr>\n", " <th>Name</th>\n", " <th>A1BG_washu_CNV</th>\n", " <th>A1CF_washu_CNV</th>\n", " <th>A2M_washu_CNV</th>\n", " <th>A2ML1_washu_CNV</th>\n", " <th>A3GALT2_washu_CNV</th>\n", " <th>A4GALT_washu_CNV</th>\n", " <th>A4GNT_washu_CNV</th>\n", " <th>AAAS_washu_CNV</th>\n", " <th>AACS_washu_CNV</th>\n", " <th>AADAC_washu_CNV</th>\n", " <th>...</th>\n", " <th colspan=\"2\" halign=\"left\">SCRIB_umich_phosphoproteomics</th>\n", " <th colspan=\"6\" halign=\"left\">TSGA10_umich_phosphoproteomics</th>\n", " <th colspan=\"2\" halign=\"left\">SVIL_umich_phosphoproteomics</th>\n", " </tr>\n", " <tr>\n", " <th>Site</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th>...</th>\n", " <th>S1575T1588S1594</th>\n", " <th>S1594</th>\n", " <th>S11</th>\n", " <th>S173</th>\n", " <th>S213</th>\n", " <th>S391</th>\n", " <th>S779</th>\n", " <th>S101</th>\n", " <th>S296</th>\n", " <th>S459</th>\n", " </tr>\n", " <tr>\n", " <th>Peptide</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th>...</th>\n", " <th>LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR</th>\n", " <th>LsPDFAEELR</th>\n", " <th>sPGRDPELQVEAAEVTTK</th>\n", " <th>sPSRLDSFVK</th>\n", " <th>RPsPTAR</th>\n", " <th>AMDTEsELGR</th>\n", " <th>GLDRsLEENLCYR;GLDRsLEENLCYRDF</th>\n", " <th>EVVSSQVDDLTsHNEHLCK</th>\n", " <th>DSEGDTPsLINWPSSK</th>\n", " <th>LPsPTVAR</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.00659</td>\n", " <td>-0.01982</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01418</td>\n", " <td>-0.00839</td>\n", " <td>-0.01305</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01305</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.667426</td>\n", " <td>NaN</td>\n", " <td>0.905606</td>\n", " <td>NaN</td>\n", " <td>-0.069911</td>\n", " <td>-0.584774</td>\n", " <td>NaN</td>\n", " <td>-0.561657</td>\n", " <td>-0.652457</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.02578</td>\n", " <td>0.00726</td>\n", " <td>0.01350</td>\n", " <td>0.01350</td>\n", " <td>0.00732</td>\n", " <td>0.01642</td>\n", " <td>0.01005</td>\n", " <td>0.01225</td>\n", " <td>0.01225</td>\n", " <td>0.01005</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.488427</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.431599</td>\n", " <td>-1.079638</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.01262</td>\n", " <td>0.00425</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>0.00166</td>\n", " <td>0.00549</td>\n", " <td>-0.00038</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>-0.00038</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.104862</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.439041</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.399718</td>\n", " <td>-0.875016</td>\n", " <td>-0.579824</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.505807</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.521725</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.00100</td>\n", " <td>0.41191</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.02436</td>\n", " <td>-0.01198</td>\n", " <td>-0.03307</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.03307</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>1.069439</td>\n", " <td>NaN</td>\n", " <td>0.510268</td>\n", " <td>-1.889144</td>\n", " <td>NaN</td>\n", " <td>-0.592203</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.126482</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 104580 columns</p>\n", "</div>" ], "text/plain": [ "Name A1BG_washu_CNV A1CF_washu_CNV A2M_washu_CNV A2ML1_washu_CNV \\\n", "Site \n", "Peptide \n", "Patient_ID \n", "C3L-00006 -0.00659 -0.01982 -0.01402 -0.01402 \n", "C3L-00008 0.02578 0.00726 0.01350 0.01350 \n", "C3L-00032 0.01262 0.00425 -0.00275 -0.00275 \n", "C3L-00084 NaN NaN NaN NaN \n", "C3L-00090 0.00100 0.41191 -0.02299 -0.02299 \n", "\n", "Name A3GALT2_washu_CNV A4GALT_washu_CNV A4GNT_washu_CNV AAAS_washu_CNV \\\n", "Site \n", "Peptide \n", "Patient_ID \n", "C3L-00006 -0.01418 -0.00839 -0.01305 -0.01402 \n", "C3L-00008 0.00732 0.01642 0.01005 0.01225 \n", "C3L-00032 0.00166 0.00549 -0.00038 -0.00275 \n", "C3L-00084 NaN NaN NaN NaN \n", "C3L-00090 -0.02436 -0.01198 -0.03307 -0.02299 \n", "\n", "Name AACS_washu_CNV AADAC_washu_CNV ... \\\n", "Site ... \n", "Peptide ... \n", "Patient_ID ... \n", "C3L-00006 -0.01402 -0.01305 ... \n", "C3L-00008 0.01225 0.01005 ... \n", "C3L-00032 -0.00275 -0.00038 ... \n", "C3L-00084 NaN NaN ... \n", "C3L-00090 -0.02299 -0.03307 ... \n", "\n", "Name SCRIB_umich_phosphoproteomics \\\n", "Site S1575T1588S1594 S1594 \n", "Peptide LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR LsPDFAEELR \n", "Patient_ID \n", "C3L-00006 NaN 0.667426 \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN 0.104862 \n", "C3L-00084 NaN 0.399718 \n", "C3L-00090 NaN 1.069439 \n", "\n", "Name TSGA10_umich_phosphoproteomics \\\n", "Site S11 S173 S213 S391 \n", "Peptide sPGRDPELQVEAAEVTTK sPSRLDSFVK RPsPTAR AMDTEsELGR \n", "Patient_ID \n", "C3L-00006 NaN 0.905606 NaN -0.069911 \n", "C3L-00008 NaN -0.488427 NaN NaN \n", "C3L-00032 NaN NaN NaN NaN \n", "C3L-00084 -0.875016 -0.579824 NaN NaN \n", "C3L-00090 NaN 0.510268 -1.889144 NaN \n", "\n", "Name \\\n", "Site S779 S101 \n", "Peptide GLDRsLEENLCYR;GLDRsLEENLCYRDF EVVSSQVDDLTsHNEHLCK \n", "Patient_ID \n", "C3L-00006 -0.584774 NaN \n", "C3L-00008 NaN NaN \n", "C3L-00032 NaN NaN \n", "C3L-00084 -0.505807 NaN \n", "C3L-00090 -0.592203 NaN \n", "\n", "Name SVIL_umich_phosphoproteomics \n", "Site S296 S459 \n", "Peptide DSEGDTPsLINWPSSK LPsPTVAR \n", "Patient_ID \n", "C3L-00006 -0.561657 -0.652457 \n", "C3L-00008 -0.431599 -1.079638 \n", "C3L-00032 NaN -1.439041 \n", "C3L-00084 NaN -1.521725 \n", "C3L-00090 NaN -1.126482 \n", "\n", "[5 rows x 104580 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "phospho_and_CNV = en.join_omics_to_omics(df1_name=\"CNV\", df2_name=\"phosphoproteomics\", df1_source = 'washu', df2_source = 'umich')\n", "phospho_and_CNV.head()\n", "\n", "# Note that the CNV columns all have empty strings in the \"Site\" level of the columns,\n", "# since the CNV data doesn't have any values for that." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Name</th>\n", " <th>A1BG_washu_CNV</th>\n", " <th>A1CF_washu_CNV</th>\n", " <th>A2M_washu_CNV</th>\n", " <th>A2ML1_washu_CNV</th>\n", " <th>A3GALT2_washu_CNV</th>\n", " <th>A4GALT_washu_CNV</th>\n", " <th>A4GNT_washu_CNV</th>\n", " <th>AAAS_washu_CNV</th>\n", " <th>AACS_washu_CNV</th>\n", " <th>AADAC_washu_CNV</th>\n", " <th>...</th>\n", " <th>SCRIB_umich_phosphoproteomics_S1575T1588S1594_LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR</th>\n", " <th>SCRIB_umich_phosphoproteomics_S1594_LsPDFAEELR</th>\n", " <th>TSGA10_umich_phosphoproteomics_S11_sPGRDPELQVEAAEVTTK</th>\n", " <th>TSGA10_umich_phosphoproteomics_S173_sPSRLDSFVK</th>\n", " <th>TSGA10_umich_phosphoproteomics_S213_RPsPTAR</th>\n", " <th>TSGA10_umich_phosphoproteomics_S391_AMDTEsELGR</th>\n", " <th>TSGA10_umich_phosphoproteomics_S779_GLDRsLEENLCYR;GLDRsLEENLCYRDF</th>\n", " <th>TSGA10_umich_phosphoproteomics_S101_EVVSSQVDDLTsHNEHLCK</th>\n", " <th>SVIL_umich_phosphoproteomics_S296_DSEGDTPsLINWPSSK</th>\n", " <th>SVIL_umich_phosphoproteomics_S459_LPsPTVAR</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>C3L-00006</th>\n", " <td>-0.00659</td>\n", " <td>-0.01982</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01418</td>\n", " <td>-0.00839</td>\n", " <td>-0.01305</td>\n", " <td>-0.01402</td>\n", " <td>-0.01402</td>\n", " <td>-0.01305</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.667426</td>\n", " <td>NaN</td>\n", " <td>0.905606</td>\n", " <td>NaN</td>\n", " <td>-0.069911</td>\n", " <td>-0.584774</td>\n", " <td>NaN</td>\n", " <td>-0.561657</td>\n", " <td>-0.652457</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00008</th>\n", " <td>0.02578</td>\n", " <td>0.00726</td>\n", " <td>0.01350</td>\n", " <td>0.01350</td>\n", " <td>0.00732</td>\n", " <td>0.01642</td>\n", " <td>0.01005</td>\n", " <td>0.01225</td>\n", " <td>0.01225</td>\n", " <td>0.01005</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.488427</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.431599</td>\n", " <td>-1.079638</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00032</th>\n", " <td>0.01262</td>\n", " <td>0.00425</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>0.00166</td>\n", " <td>0.00549</td>\n", " <td>-0.00038</td>\n", " <td>-0.00275</td>\n", " <td>-0.00275</td>\n", " <td>-0.00038</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.104862</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.439041</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00084</th>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>0.399718</td>\n", " <td>-0.875016</td>\n", " <td>-0.579824</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.505807</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.521725</td>\n", " </tr>\n", " <tr>\n", " <th>C3L-00090</th>\n", " <td>0.00100</td>\n", " <td>0.41191</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.02436</td>\n", " <td>-0.01198</td>\n", " <td>-0.03307</td>\n", " <td>-0.02299</td>\n", " <td>-0.02299</td>\n", " <td>-0.03307</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>1.069439</td>\n", " <td>NaN</td>\n", " <td>0.510268</td>\n", " <td>-1.889144</td>\n", " <td>NaN</td>\n", " <td>-0.592203</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-1.126482</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 104580 columns</p>\n", "</div>" ], "text/plain": [ "Name A1BG_washu_CNV A1CF_washu_CNV A2M_washu_CNV A2ML1_washu_CNV \\\n", "Patient_ID \n", "C3L-00006 -0.00659 -0.01982 -0.01402 -0.01402 \n", "C3L-00008 0.02578 0.00726 0.01350 0.01350 \n", "C3L-00032 0.01262 0.00425 -0.00275 -0.00275 \n", "C3L-00084 NaN NaN NaN NaN \n", "C3L-00090 0.00100 0.41191 -0.02299 -0.02299 \n", "\n", "Name A3GALT2_washu_CNV A4GALT_washu_CNV A4GNT_washu_CNV \\\n", "Patient_ID \n", "C3L-00006 -0.01418 -0.00839 -0.01305 \n", "C3L-00008 0.00732 0.01642 0.01005 \n", "C3L-00032 0.00166 0.00549 -0.00038 \n", "C3L-00084 NaN NaN NaN \n", "C3L-00090 -0.02436 -0.01198 -0.03307 \n", "\n", "Name AAAS_washu_CNV AACS_washu_CNV AADAC_washu_CNV ... \\\n", "Patient_ID ... \n", "C3L-00006 -0.01402 -0.01402 -0.01305 ... \n", "C3L-00008 0.01225 0.01225 0.01005 ... \n", "C3L-00032 -0.00275 -0.00275 -0.00038 ... \n", "C3L-00084 NaN NaN NaN ... \n", "C3L-00090 -0.02299 -0.02299 -0.03307 ... \n", "\n", "Name SCRIB_umich_phosphoproteomics_S1575T1588S1594_LAEAPSPAPTPsPTPVEDLGPQTStSPGRLsPDFAEELR \\\n", "Patient_ID \n", "C3L-00006 NaN \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 NaN \n", "C3L-00090 NaN \n", "\n", "Name SCRIB_umich_phosphoproteomics_S1594_LsPDFAEELR \\\n", "Patient_ID \n", "C3L-00006 0.667426 \n", "C3L-00008 NaN \n", "C3L-00032 0.104862 \n", "C3L-00084 0.399718 \n", "C3L-00090 1.069439 \n", "\n", "Name TSGA10_umich_phosphoproteomics_S11_sPGRDPELQVEAAEVTTK \\\n", "Patient_ID \n", "C3L-00006 NaN \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 -0.875016 \n", "C3L-00090 NaN \n", "\n", "Name TSGA10_umich_phosphoproteomics_S173_sPSRLDSFVK \\\n", "Patient_ID \n", "C3L-00006 0.905606 \n", "C3L-00008 -0.488427 \n", "C3L-00032 NaN \n", "C3L-00084 -0.579824 \n", "C3L-00090 0.510268 \n", "\n", "Name TSGA10_umich_phosphoproteomics_S213_RPsPTAR \\\n", "Patient_ID \n", "C3L-00006 NaN \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 NaN \n", "C3L-00090 -1.889144 \n", "\n", "Name TSGA10_umich_phosphoproteomics_S391_AMDTEsELGR \\\n", "Patient_ID \n", "C3L-00006 -0.069911 \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 NaN \n", "C3L-00090 NaN \n", "\n", "Name TSGA10_umich_phosphoproteomics_S779_GLDRsLEENLCYR;GLDRsLEENLCYRDF \\\n", "Patient_ID \n", "C3L-00006 -0.584774 \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 -0.505807 \n", "C3L-00090 -0.592203 \n", "\n", "Name TSGA10_umich_phosphoproteomics_S101_EVVSSQVDDLTsHNEHLCK \\\n", "Patient_ID \n", "C3L-00006 NaN \n", "C3L-00008 NaN \n", "C3L-00032 NaN \n", "C3L-00084 NaN \n", "C3L-00090 NaN \n", "\n", "Name SVIL_umich_phosphoproteomics_S296_DSEGDTPsLINWPSSK \\\n", "Patient_ID \n", "C3L-00006 -0.561657 \n", "C3L-00008 -0.431599 \n", "C3L-00032 NaN \n", "C3L-00084 NaN \n", "C3L-00090 NaN \n", "\n", "Name SVIL_umich_phosphoproteomics_S459_LPsPTVAR \n", "Patient_ID \n", "C3L-00006 -0.652457 \n", "C3L-00008 -1.079638 \n", "C3L-00032 -1.439041 \n", "C3L-00084 -1.521725 \n", "C3L-00090 -1.126482 \n", "\n", "[5 rows x 104580 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ut.reduce_multiindex(df=phospho_and_CNV, flatten=True).head()\n", "# Notice that the empty strings have been dropped" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting a single level index of tuples\n", "\n", "You can also use `reduce_multiindex` to turn the multi-level column index into a single level index of tuples, with each value in a column's tuple corresponding to the column's value for that level of the index:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>(ARF5, ENSP00000000233.5)</th>\n", " <th>(M6PR, ENSP00000000412.3)</th>\n", " <th>(ESRRA, ENSP00000000442.6)</th>\n", " <th>(FKBP4, ENSP00000001008.4)</th>\n", " <th>(NDUFAF7, ENSP00000002125.4)</th>\n", " <th>(FUCA2, ENSP00000002165.5)</th>\n", " <th>(CFTR, ENSP00000003084.6)</th>\n", " <th>(CYP51A1, ENSP00000003100.8)</th>\n", " <th>(USP28, ENSP00000003302.4)</th>\n", " <th>(TMEM176A, ENSP00000004103.3)</th>\n", " <th>...</th>\n", " <th>(TMUB1, ENSP00000499339.1)</th>\n", " <th>(CSNK1A1, ENSP00000499757.1)</th>\n", " <th>(MICAL2, ENSP00000499778.1)</th>\n", " <th>(ANK2, ENSP00000499869.1)</th>\n", " <th>(SEPTIN7, ENSP00000499937.1)</th>\n", " <th>(ATAD3B, ENSP00000500094.1)</th>\n", " <th>(ETNK1, ENSP00000500633.1)</th>\n", " <th>(MYO6, ENSP00000500710.1)</th>\n", " <th>(WIZ, ENSP00000501300.1)</th>\n", " <th>(HSPA12A, ENSP00000501491.1)</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ " (ARF5, ENSP00000000233.5) (M6PR, ENSP00000000412.3) \\\n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 \n", "01CO006 0.188931 0.544620 \n", "01CO008 0.404810 -0.246523 \n", "01CO013 -0.276982 -0.017659 \n", "01CO014 -0.160155 0.100022 \n", "\n", " (ESRRA, ENSP00000000442.6) (FKBP4, ENSP00000001008.4) \\\n", "Patient_ID \n", "01CO005 -0.283633 -0.612614 \n", "01CO006 NaN -0.571640 \n", "01CO008 -0.053940 0.252995 \n", "01CO013 NaN -0.455055 \n", "01CO014 0.259696 0.341345 \n", "\n", " (NDUFAF7, ENSP00000002125.4) (FUCA2, ENSP00000002165.5) \\\n", "Patient_ID \n", "01CO005 0.514855 -0.824026 \n", "01CO006 -0.209734 0.799090 \n", "01CO008 0.190861 0.101419 \n", "01CO013 0.500686 -0.350366 \n", "01CO014 -0.310265 0.095461 \n", "\n", " (CFTR, ENSP00000003084.6) (CYP51A1, ENSP00000003100.8) \\\n", "Patient_ID \n", "01CO005 NaN 0.045383 \n", "01CO006 NaN -0.338493 \n", "01CO008 -0.502876 0.627060 \n", "01CO013 NaN 0.263168 \n", "01CO014 -0.745855 1.006614 \n", "\n", " (USP28, ENSP00000003302.4) (TMEM176A, ENSP00000004103.3) ... \\\n", "Patient_ID ... \n", "01CO005 NaN -0.248511 ... \n", "01CO006 -0.042567 NaN ... \n", "01CO008 0.089815 -0.106411 ... \n", "01CO013 0.683830 NaN ... \n", "01CO014 NaN NaN ... \n", "\n", " (TMUB1, ENSP00000499339.1) (CSNK1A1, ENSP00000499757.1) \\\n", "Patient_ID \n", "01CO005 NaN NaN \n", "01CO006 -0.411664 -0.454109 \n", "01CO008 0.192279 -0.558236 \n", "01CO013 0.220231 NaN \n", "01CO014 -0.198671 0.226146 \n", "\n", " (MICAL2, ENSP00000499778.1) (ANK2, ENSP00000499869.1) \\\n", "Patient_ID \n", "01CO005 -0.042548 NaN \n", "01CO006 -0.725892 NaN \n", "01CO008 -0.093708 -1.874293 \n", "01CO013 0.241860 -3.939263 \n", "01CO014 0.036229 NaN \n", "\n", " (SEPTIN7, ENSP00000499937.1) (ATAD3B, ENSP00000500094.1) \\\n", "Patient_ID \n", "01CO005 NaN 0.925011 \n", "01CO006 NaN -0.707588 \n", "01CO008 -0.248307 -0.899186 \n", "01CO013 NaN 0.514931 \n", "01CO014 NaN 1.189468 \n", "\n", " (ETNK1, ENSP00000500633.1) (MYO6, ENSP00000500710.1) \\\n", "Patient_ID \n", "01CO005 -0.173468 -0.180521 \n", "01CO006 -0.846624 0.329813 \n", "01CO008 -0.526260 0.668713 \n", "01CO013 -0.078267 0.122032 \n", "01CO014 0.117736 0.586529 \n", "\n", " (WIZ, ENSP00000501300.1) (HSPA12A, ENSP00000501491.1) \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ut.reduce_multiindex(df=prot, tuples=True).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Turning off warnings\n", "\n", "If your multiindex operation creates duplicate column headers, or has no effect, `reduce_multiindex` will warn you. You can silence these warnings by passing `True` to the `quiet` parameter:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000000412.3</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000001008.4</th>\n", " <th>ENSP00000002125.4</th>\n", " <th>ENSP00000002165.5</th>\n", " <th>ENSP00000003084.6</th>\n", " <th>ENSP00000003100.8</th>\n", " <th>ENSP00000003302.4</th>\n", " <th>ENSP00000004103.3</th>\n", " <th>...</th>\n", " <th>ENSP00000499339.1</th>\n", " <th>ENSP00000499757.1</th>\n", " <th>ENSP00000499778.1</th>\n", " <th>ENSP00000499869.1</th>\n", " <th>ENSP00000499937.1</th>\n", " <th>ENSP00000500094.1</th>\n", " <th>ENSP00000500633.1</th>\n", " <th>ENSP00000500710.1</th>\n", " <th>ENSP00000501300.1</th>\n", " <th>ENSP00000501491.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ "Database_ID ENSP00000000233.5 ENSP00000000412.3 ENSP00000000442.6 \\\n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 -0.283633 \n", "01CO006 0.188931 0.544620 NaN \n", "01CO008 0.404810 -0.246523 -0.053940 \n", "01CO013 -0.276982 -0.017659 NaN \n", "01CO014 -0.160155 0.100022 0.259696 \n", "\n", "Database_ID ENSP00000001008.4 ENSP00000002125.4 ENSP00000002165.5 \\\n", "Patient_ID \n", "01CO005 -0.612614 0.514855 -0.824026 \n", "01CO006 -0.571640 -0.209734 0.799090 \n", "01CO008 0.252995 0.190861 0.101419 \n", "01CO013 -0.455055 0.500686 -0.350366 \n", "01CO014 0.341345 -0.310265 0.095461 \n", "\n", "Database_ID ENSP00000003084.6 ENSP00000003100.8 ENSP00000003302.4 \\\n", "Patient_ID \n", "01CO005 NaN 0.045383 NaN \n", "01CO006 NaN -0.338493 -0.042567 \n", "01CO008 -0.502876 0.627060 0.089815 \n", "01CO013 NaN 0.263168 0.683830 \n", "01CO014 -0.745855 1.006614 NaN \n", "\n", "Database_ID ENSP00000004103.3 ... ENSP00000499339.1 ENSP00000499757.1 \\\n", "Patient_ID ... \n", "01CO005 -0.248511 ... NaN NaN \n", "01CO006 NaN ... -0.411664 -0.454109 \n", "01CO008 -0.106411 ... 0.192279 -0.558236 \n", "01CO013 NaN ... 0.220231 NaN \n", "01CO014 NaN ... -0.198671 0.226146 \n", "\n", "Database_ID ENSP00000499778.1 ENSP00000499869.1 ENSP00000499937.1 \\\n", "Patient_ID \n", "01CO005 -0.042548 NaN NaN \n", "01CO006 -0.725892 NaN NaN \n", "01CO008 -0.093708 -1.874293 -0.248307 \n", "01CO013 0.241860 -3.939263 NaN \n", "01CO014 0.036229 NaN NaN \n", "\n", "Database_ID ENSP00000500094.1 ENSP00000500633.1 ENSP00000500710.1 \\\n", "Patient_ID \n", "01CO005 0.925011 -0.173468 -0.180521 \n", "01CO006 -0.707588 -0.846624 0.329813 \n", "01CO008 -0.899186 -0.526260 0.668713 \n", "01CO013 0.514931 -0.078267 0.122032 \n", "01CO014 1.189468 0.117736 0.586529 \n", "\n", "Database_ID ENSP00000501300.1 ENSP00000501491.1 \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ut.reduce_multiindex(df=prot, levels_to_drop=\"Name\").head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>Database_ID</th>\n", " <th>ENSP00000000233.5</th>\n", " <th>ENSP00000000412.3</th>\n", " <th>ENSP00000000442.6</th>\n", " <th>ENSP00000001008.4</th>\n", " <th>ENSP00000002125.4</th>\n", " <th>ENSP00000002165.5</th>\n", " <th>ENSP00000003084.6</th>\n", " <th>ENSP00000003100.8</th>\n", " <th>ENSP00000003302.4</th>\n", " <th>ENSP00000004103.3</th>\n", " <th>...</th>\n", " <th>ENSP00000499339.1</th>\n", " <th>ENSP00000499757.1</th>\n", " <th>ENSP00000499778.1</th>\n", " <th>ENSP00000499869.1</th>\n", " <th>ENSP00000499937.1</th>\n", " <th>ENSP00000500094.1</th>\n", " <th>ENSP00000500633.1</th>\n", " <th>ENSP00000500710.1</th>\n", " <th>ENSP00000501300.1</th>\n", " <th>ENSP00000501491.1</th>\n", " </tr>\n", " <tr>\n", " <th>Patient_ID</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>01CO005</th>\n", " <td>-0.203037</td>\n", " <td>-0.223341</td>\n", " <td>-0.283633</td>\n", " <td>-0.612614</td>\n", " <td>0.514855</td>\n", " <td>-0.824026</td>\n", " <td>NaN</td>\n", " <td>0.045383</td>\n", " <td>NaN</td>\n", " <td>-0.248511</td>\n", " <td>...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.042548</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>0.925011</td>\n", " <td>-0.173468</td>\n", " <td>-0.180521</td>\n", " <td>0.139707</td>\n", " <td>-0.882283</td>\n", " </tr>\n", " <tr>\n", " <th>01CO006</th>\n", " <td>0.188931</td>\n", " <td>0.544620</td>\n", " <td>NaN</td>\n", " <td>-0.571640</td>\n", " <td>-0.209734</td>\n", " <td>0.799090</td>\n", " <td>NaN</td>\n", " <td>-0.338493</td>\n", " <td>-0.042567</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.411664</td>\n", " <td>-0.454109</td>\n", " <td>-0.725892</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>-0.707588</td>\n", " <td>-0.846624</td>\n", " <td>0.329813</td>\n", " <td>-0.311147</td>\n", " <td>-0.446358</td>\n", " </tr>\n", " <tr>\n", " <th>01CO008</th>\n", " <td>0.404810</td>\n", " <td>-0.246523</td>\n", " <td>-0.053940</td>\n", " <td>0.252995</td>\n", " <td>0.190861</td>\n", " <td>0.101419</td>\n", " <td>-0.502876</td>\n", " <td>0.627060</td>\n", " <td>0.089815</td>\n", " <td>-0.106411</td>\n", " <td>...</td>\n", " <td>0.192279</td>\n", " <td>-0.558236</td>\n", " <td>-0.093708</td>\n", " <td>-1.874293</td>\n", " <td>-0.248307</td>\n", " <td>-0.899186</td>\n", " <td>-0.526260</td>\n", " <td>0.668713</td>\n", " <td>0.109366</td>\n", " <td>-1.125296</td>\n", " </tr>\n", " <tr>\n", " <th>01CO013</th>\n", " <td>-0.276982</td>\n", " <td>-0.017659</td>\n", " <td>NaN</td>\n", " <td>-0.455055</td>\n", " <td>0.500686</td>\n", " <td>-0.350366</td>\n", " <td>NaN</td>\n", " <td>0.263168</td>\n", " <td>0.683830</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>0.220231</td>\n", " <td>NaN</td>\n", " <td>0.241860</td>\n", " <td>-3.939263</td>\n", " <td>NaN</td>\n", " <td>0.514931</td>\n", " <td>-0.078267</td>\n", " <td>0.122032</td>\n", " <td>0.130764</td>\n", " <td>-1.146911</td>\n", " </tr>\n", " <tr>\n", " <th>01CO014</th>\n", " <td>-0.160155</td>\n", " <td>0.100022</td>\n", " <td>0.259696</td>\n", " <td>0.341345</td>\n", " <td>-0.310265</td>\n", " <td>0.095461</td>\n", " <td>-0.745855</td>\n", " <td>1.006614</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>...</td>\n", " <td>-0.198671</td>\n", " <td>0.226146</td>\n", " <td>0.036229</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>1.189468</td>\n", " <td>0.117736</td>\n", " <td>0.586529</td>\n", " <td>-0.006767</td>\n", " <td>-1.106068</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>5 rows × 9457 columns</p>\n", "</div>" ], "text/plain": [ "Database_ID ENSP00000000233.5 ENSP00000000412.3 ENSP00000000442.6 \\\n", "Patient_ID \n", "01CO005 -0.203037 -0.223341 -0.283633 \n", "01CO006 0.188931 0.544620 NaN \n", "01CO008 0.404810 -0.246523 -0.053940 \n", "01CO013 -0.276982 -0.017659 NaN \n", "01CO014 -0.160155 0.100022 0.259696 \n", "\n", "Database_ID ENSP00000001008.4 ENSP00000002125.4 ENSP00000002165.5 \\\n", "Patient_ID \n", "01CO005 -0.612614 0.514855 -0.824026 \n", "01CO006 -0.571640 -0.209734 0.799090 \n", "01CO008 0.252995 0.190861 0.101419 \n", "01CO013 -0.455055 0.500686 -0.350366 \n", "01CO014 0.341345 -0.310265 0.095461 \n", "\n", "Database_ID ENSP00000003084.6 ENSP00000003100.8 ENSP00000003302.4 \\\n", "Patient_ID \n", "01CO005 NaN 0.045383 NaN \n", "01CO006 NaN -0.338493 -0.042567 \n", "01CO008 -0.502876 0.627060 0.089815 \n", "01CO013 NaN 0.263168 0.683830 \n", "01CO014 -0.745855 1.006614 NaN \n", "\n", "Database_ID ENSP00000004103.3 ... ENSP00000499339.1 ENSP00000499757.1 \\\n", "Patient_ID ... \n", "01CO005 -0.248511 ... NaN NaN \n", "01CO006 NaN ... -0.411664 -0.454109 \n", "01CO008 -0.106411 ... 0.192279 -0.558236 \n", "01CO013 NaN ... 0.220231 NaN \n", "01CO014 NaN ... -0.198671 0.226146 \n", "\n", "Database_ID ENSP00000499778.1 ENSP00000499869.1 ENSP00000499937.1 \\\n", "Patient_ID \n", "01CO005 -0.042548 NaN NaN \n", "01CO006 -0.725892 NaN NaN \n", "01CO008 -0.093708 -1.874293 -0.248307 \n", "01CO013 0.241860 -3.939263 NaN \n", "01CO014 0.036229 NaN NaN \n", "\n", "Database_ID ENSP00000500094.1 ENSP00000500633.1 ENSP00000500710.1 \\\n", "Patient_ID \n", "01CO005 0.925011 -0.173468 -0.180521 \n", "01CO006 -0.707588 -0.846624 0.329813 \n", "01CO008 -0.899186 -0.526260 0.668713 \n", "01CO013 0.514931 -0.078267 0.122032 \n", "01CO014 1.189468 0.117736 0.586529 \n", "\n", "Database_ID ENSP00000501300.1 ENSP00000501491.1 \n", "Patient_ID \n", "01CO005 0.139707 -0.882283 \n", "01CO006 -0.311147 -0.446358 \n", "01CO008 0.109366 -1.125296 \n", "01CO013 0.130764 -1.146911 \n", "01CO014 -0.006767 -1.106068 \n", "\n", "[5 rows x 9457 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# No warning will be issued\n", "ut.reduce_multiindex(df=prot, levels_to_drop=\"Name\", quiet=True).head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }