{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# X2K API Tutorial Notebook\n", "February 25st, 2019\n", "\n", "This Jupyter Notebook contains an interactive tutorial for **running the Expression2Kinases (X2K) API** using Python 3.\n", "\n", "### Table of Contents\n", "The notebook contains the following sections:\n", "1. **API Documentation** - shows how to programmatically analyze your gene list in Python.\n", "2. **Using the X2K API** - overview of the input parameters and output of the API.\n", "3. **Interpreting the results** - gives an overview of the structure and meaning of the analysis results.\n", " * **Transcription Factor Enrichment Analysis** (ChEA)\n", " * **Protein-Protein Interaction Expansion** (G2N)\n", " * **Kinase Enrichment Analysis** (KEA)\n", " * **Expression2Kinases** (X2K)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Using the X2K API\n", "The X2K API allows for programmatic analysis of an input gene list.\n", "\n", "The `run_X2K()` function displayed below can be used to analyze a gene list and load the results in a Python dictionary by performing a **POST request**.\n", "\n", "The function requires only one input, `input_genes`, **a list of gene symbols ** to be analyzed. Additional optional parameters can be specified with the `options` parameters." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import modules\n", "import requests\n", "import json\n", "\n", "\n", "##### Function to run X2K\n", "### Input: a Python list of gene symbols\n", "### Output: a dictionary containing the results of X2K, ChEA, G2N, KEA.\n", "\n", "def run_X2K(input_genes, options={}):\n", " # Set default options\n", " all_options = {'included_organisms': 'both',\n", " 'TF-target gene background database used for enrichment': 'ChEA & ENCODE Consensus',\n", " 'sort transcription factors by': 'p-value',\n", " 'min_network_size': 10,\n", " 'number of top TFs': 10,\n", " 'path_length': 2,\n", " 'min_number_of_articles_supporting_interaction': 0,\n", " 'max_number_of_interactions_per_protein': 200,\n", " 'max_number_of_interactions_per_article': 100,\n", " 'enable_BioGRID': True,\n", " 'enable_IntAct': True,\n", " 'enable_MINT': True,\n", " 'enable_ppid': True,\n", " 'enable_Stelzl': True,\n", " 'kinase interactions to include': 'kea 2018',\n", " 'sort kinases by': 'p-value'}\n", "\n", " # Override defaults with options\n", " all_options.update(options)\n", " all_options['text-genes'] = '\\n'.join(input_genes)\n", "\n", " # Perform request & get response\n", " res = requests.post(\n", " 'https://maayanlab.cloud/X2K/api',\n", " files=[(k, (None, v)) for k, v in default_options.items()],\n", " )\n", "\n", " # Read response\n", " data = res.json()\n", "\n", " # Convert to dictionary\n", " x2k_results = {key: json.loads(value) if key != 'input' else value for key, value in data.items()}\n", "\n", " # Clean results\n", " x2k_results['ChEA'] = x2k_results['ChEA']['tfs']\n", " x2k_results['G2N'] = x2k_results['G2N']['network']\n", " x2k_results['KEA'] = x2k_results['KEA']['kinases']\n", " x2k_results['X2K'] = x2k_results['X2K']['network']\n", "\n", " # Return results\n", " return x2k_results\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['X2K', 'ChEA', 'KEA', 'G2N', 'input'])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get input genes\n", "input_genes = ['Nsun3', 'Polrmt', 'Nlrx1', 'Sfxn5', 'Zc3h12c', 'Slc25a39', 'Arsg', 'Defb29', 'Ndufb6', 'Zfand1',\n", " 'Tmem77', '5730403B10Rik', 'Tlcd1', 'Psmc6', 'Slc30a6', 'LOC100047292', 'Lrrc40', 'Orc5l', 'Mpp7',\n", " 'Unc119b', 'Prkaca', 'Tcn2', 'Psmc3ip', 'Pcmtd2', 'Acaa1a', 'Lrrc1', '2810432D09Rik', 'Sephs2', 'Sac3d1',\n", " 'Tmlhe', 'LOC623451', 'Tsr2', 'Plekha7', 'Gys2', 'Arhgef12', 'Hibch', 'Lyrm2', 'Zbtb44', 'Entpd5',\n", " 'Rab11fip2', 'Lipt1', 'Intu', 'Anxa13', 'Klf12', 'Sat2', 'Gal3st2', 'Vamp8', 'Fkbpl', 'Aqp11', 'Trap1',\n", " 'Pmpcb', 'Tm7sf3', 'Rbm39', 'Bri3', 'Kdr', 'Zfp748', 'Nap1l1', 'Dhrs1', 'Lrrc56', 'Wdr20a', 'Stxbp2',\n", " 'Klf1', 'Ufc1', 'Ccdc16', '9230114K14Rik', 'Rwdd3', '2610528K11Rik', 'Aco1', 'Cables1', 'LOC100047214',\n", " 'Yars2', 'Lypla1', 'Kalrn', 'Gyk', 'Zfp787', 'Zfp655', 'Rabepk', 'Zfp650', '4732466D17Rik', 'Exosc4',\n", " 'Wdr42a', 'Gphn', '2610528J11Rik', '1110003E01Rik', 'Mdh1', '1200014M14Rik', 'AW209491', 'Mut',\n", " '1700123L14Rik', '2610036D13Rik', 'Cox15', 'Tmem30a', 'Nsmce4a', 'Tm2d2', 'Rhbdd3', 'Atxn2', 'Nfs1',\n", " '3110001I20Rik', 'BC038156', 'LOC100047782', '2410012H22Rik', 'Rilp', 'A230062G08Rik', 'Pttg1ip', 'Rab1',\n", " 'Afap1l1', 'Lyrm5', '2310026E23Rik', 'C330002I19Rik', 'Zfyve20', 'Poli', 'Tomm70a', 'Slc7a6os', 'Mat2b',\n", " '4932438A13Rik', 'Lrrc8a', 'Smo', 'Nupl2', 'Trpc2', 'Arsk', 'D630023B12Rik', 'Mtfr1', '5730414N17Rik',\n", " 'Scp2', 'Zrsr1', 'Nol7', 'C330018D20Rik', 'Ift122', 'LOC100046168', 'D730039F16Rik', 'Scyl1',\n", " '1700023B02Rik', '1700034H14Rik', 'Fbxo8', 'Paip1', 'Tmem186', 'Atpaf1', 'LOC100046254', 'LOC100047604',\n", " 'Coq10a', 'Fn3k', 'Sipa1l1', 'Slc25a16', 'Slc25a40', 'Rps6ka5', 'Trim37', 'Lrrc61', 'Abhd3', 'Gbe1',\n", " 'Parp16', 'Hsd3b2', 'Esm1', 'Dnajc18', 'Dolpp1', 'Lass2', 'Wdr34', 'Rfesd', 'Cacnb4', '2310042D19Rik',\n", " 'Srr', 'Bpnt1', '6530415H11Rik', 'Clcc1', 'Tfb1m', '4632404H12Rik', 'D4Bwg0951e', 'Med14', 'Adhfe1',\n", " 'Thtpa', 'Cat', 'Ell3', 'Akr7a5', 'Mtmr14', 'Timm44', 'Sf1', 'Ipp', 'Iah1', 'Trim23', 'Wdr89', 'Gstz1',\n", " 'Cradd', '2510006D16Rik', 'Fbxl6', 'LOC100044400', 'Zfp106', 'Cd55', '0610013E23Rik', 'Afmid', 'Tmem86a',\n", " 'Aldh6a1', 'Dalrd3', 'Smyd4', 'Nme7', 'Fars2', 'Tasp1', 'Cldn10', 'A930005H10Rik', 'Slc9a6', 'Adk',\n", " 'Rbks', '2210016F16Rik', 'Vwce', '4732435N03Rik', 'Zfp11', 'Vldlr', '9630013D21Rik', '4933407N01Rik',\n", " 'Fahd1', 'Mipol1', '1810019D21Rik', '1810049H13Rik', 'Tfam', 'Paics', '1110032A03Rik', 'LOC100044139',\n", " 'Dnajc19', 'BC016495', 'A930041I02Rik', 'Rqcd1', 'Usp34', 'Zcchc3', 'H2afj', 'Phf7', '4921508D12Rik',\n", " 'Kmo', 'Prpf18', 'Mcat', 'Txndc4', '4921530L18Rik', 'Vps13b', 'Scrn3', 'Tor1a', 'AI316807', 'Acbd4',\n", " 'Fah', 'Apool', 'Col4a4', 'Lrrc19', 'Gnmt', 'Nr3c1', 'Sip1', 'Ascc1', 'Fech', 'Abhd14a', 'Arhgap18',\n", " '2700046G09Rik', 'Yme1l1', 'Gk5', 'Glo1', 'Sbk1', 'Cisd1', '2210011C24Rik', 'Nxt2', 'Notum', 'Ankrd42',\n", " 'Ube2e1', 'Ndufv1', 'Slc33a1', 'Cep68', 'Rps6kb1', 'Hyi', 'Aldh1a3', 'Mynn', '3110048L19Rik', 'Rdh14',\n", " 'Proz', 'Gorasp1', 'LOC674449', 'Zfp775', '5430437P03Rik', 'Npy', 'Adh5', 'Sybl1', '4930432O21Rik',\n", " 'Nat9', 'LOC100048387', 'Mettl8', 'Eny2', '2410018G20Rik', 'Pgm2', 'Fgfr4', 'Mobkl2b', 'Atad3a',\n", " '4932432K03Rik', 'Dhtkd1', 'Ubox5', 'A530050D06Rik', 'Zdhhc5', 'Mgat1', 'Nudt6', 'Tpmt', 'Wbscr18',\n", " 'LOC100041586', 'Cdk5rap1', '4833426J09Rik', 'Myo6', 'Cpt1a', 'Gadd45gip1', 'Tmbim4', '2010309E21Rik',\n", " 'Asb9', '2610019F03Rik', '7530414M10Rik', 'Atp6v1b2', '2310068J16Rik', 'Ddt', 'Klhdc4', 'Hpn', 'Lifr',\n", " 'Ovol1', 'Nudt12', 'Cdan1', 'Fbxo9', 'Fbxl3', 'Hoxa7', 'Aldh8a1', '3110057O12Rik', 'Abhd11', 'Psmb1',\n", " 'ENSMUSG00000074286', 'Chpt1', 'Oxsm', '2310009A05Rik', '1700001L05Rik', 'Zfp148', '39509', 'Mrpl9',\n", " 'Tmem80', '9030420J04Rik', 'Naglu', 'Plscr2', 'Agbl3', 'Pex1', 'Cno', 'Neo1', 'Asf1a', 'Tnfsf5ip1',\n", " 'Pkig', 'AI931714', 'D130020L05Rik', 'Cntd1', 'Clec2h', 'Zkscan1', '1810044D09Rik', 'Mettl7a', 'Siae',\n", " 'Fbxo3', 'Fzd5', 'Tmem166', 'Tmed4', 'Gpr155', 'Rnf167', 'Sptlc1', 'Riok2', 'Tgds', 'Pms1', 'Pitpnc1',\n", " 'Pcsk7', '4933403G14Rik', 'Ei24', 'Crebl2', 'Tln1', 'Mrpl35', '2700038C09Rik', 'Ubie', 'Osgepl1',\n", " '2410166I05Rik', 'Wdr24', 'Ap4s1', 'Lrrc44', 'B3bp', 'Itfg1', 'Dmxl1', 'C1d']\n", "\n", "# Run X2K results\n", "x2k_results = run_X2K(input_genes)\n", "x2k_results.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. X2K API Documentation\n", "\n", "### 2.1 API Inputs\n", "A **full list of the input parameters** for the `run_X2K()` function is available below.\n", "\n", "The optional parameters can provided to the function in the `options` dictionary." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ParameterStepDescriptionNotes
**input_genes** (required)X2KContains the input gene set for the X2K analysis.A list of strings representing the input gene symbols.
*organism* (optional)ChEAThe organism from which TF-target interaction data should be integrated.One of `('human_only', 'mouse_only', 'both')`. Default `'both'`.
*TF-target gene background database used for enrichment* (optional)ChEAThe database from which TF-target interaction data should be integrated,One of `('ChEA 2015', 'ENCODE 2015', 'ChEA & ENCODE Consensus', 'Transfac and Jaspar', 'ChEA 2016', 'ARCHS4 TFs Coexp', 'CREEDS', 'Enrichr Submissions TF-Gene Coocurrence')` Default `'ChEA & ENCODE Consensus')`.
*sort transcription factors by* (optional) \n", " ChEAThe method used to sort the top Transcription Factors identified by ChEA.One of `('p-value', 'rank', 'combined score')`. Default `'p-value'`.
*path_length* (optional)G2NThe maximum Protein-Protein Interaction path length for the network expansion.Integer, default `2`.
*minimum_network_size* (optional)\n", " G2NThe minimum size of the Protein-Protein interaction network generated using Genes2Networks.Integer, default `50`.
*min_number_of_articles_supporting_interaction* (optional) \n", " G2NThe minimum number of published articles supporting a Protein-Protein Interaction for the expanded subnetwork.Integer, default `2`.
*max_number_of_interactions_per_protein* (optional) \n", " G2NThe maximum number of physical interactions allowed for the proteins in the expanded subnetwork.Integer, default `200`.
*max_number_of_interactions_per_article* (optional) \n", " G2NThe maximum number of physical interactions reported in each published articleInteger, default `100`.
enable_Biocarta (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_BioGRID (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'true'`.
enable_BioPlex (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_DIP (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_huMAP (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_InnateDB (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_IntAct (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'true'`.
enable_KEGG (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_MINT (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'true'`.
enable_ppid (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'true'`.
enable_SNAVI (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_iREF (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_Stelzl (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'true'`.
enable_vidal (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_BIND (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_figeys (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
enable_HPRD (optional) \n", " G2NThe Protein-Protein Interaction databases to integrate for generation of the expanded subnetwork.Either `'true'` or `'false'`. Default `'false'`.
*number_of_results* (optional) \n", " G2NThe maximum network size of the expanded network generated using Genes2Networks.Integer, default `50`.
*kinase interactions to include* \n", " KEAKinase interactions databases to include.One of `('p-value', 'rank', 'combined score')`. Default `'p-value'`.
*sort_kinases_by* (optional) \n", " KEAThe method used to sort the top Transcription Factors identified by KEA.One of `('kea 2018', 'ARCHS4', 'iPTMnet', 'NetworkIN', 'Phospho.ELM', 'Phosphopoint', 'PhosphoPlus', 'MINT')`. Default `'kea 2018'`.
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 API Output\n", "The `run_X2K()` function returns results as `dict` containing **four keys**, whose contents are described below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
KeyNotesContents
**ChEA**Contains the results of the **Transcription Factor Enrichment Analysis**, generated using ChEA.A `list` of `dict`s containing information on the top TFs predicted to regulate the input genes.
**G2N**Contains the results of the **Protein-Protein Interaction Expansion**, generated using Genes2Networks (G2N).A `dict` containing two keys:\n", "
    \n", "
  • nodes: A `list` containing information on the nodes of the expanded subnetwork.
  • \n", "
  • interactions: A `list` containing information on the edges of the expanded subnetwork.
  • \n", "
\n", "
**KEA**Contains the results of the **Kinase Enrichment Analysis**, generated using KEA.A `list` of `dict`s containing information on the top kinases predicted to regulate the subnetwork identified by G2N.
**X2K**Contains the **Expression2Kinases network**, generated by integrating the results of ChEA, G2N and KEA.A `dict` containing two keys:\n", "
    \n", "
  • nodes: A `list` containing information on the nodes of the final X2K network.
  • \n", "
  • interactions: A `list` containing information on the edges of the final X2K network.
  • \n", "
\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Interpreting the Results\n", "\n", "### 3.1 ChEA results\n", "The results for the ChEA analysis can be accessed in x2k_results['ChEA']. Here, the results are converted to a pandas DataFrame for easier interpretation." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
combinedScoreenrichedTargetsmetanamepvaluesimpleNamezscore
00.0[NUDT6, ZFYVE20, YME1L1, ARSK, NOL7, TSR2, PMP...{}NR2C2_ENCODE0.000643NR2C20.0
10.0[MTMR14, MED14, POLRMT, ZKSCAN1, VPS13B, NUDT6...{}GABPA_ENCODE0.000643GABPA0.0
20.0[SF1, ENY2, ACO1, LYPLA1, MTFR1, TLCD1, ZBTB44...{}ERG_CHEA0.000679ERG0.0
30.0[SF1, CREBL2, AP4S1, ZKSCAN1, VPS13B, C1D, TGD...{}TAF1_ENCODE0.002002TAF10.0
40.0[SF1, MTMR14, TSR2, ZKSCAN1, PGM2, VPS13B, TMB...{}ELF1_ENCODE0.003928ELF10.0
\n", "
" ], "text/plain": [ " combinedScore enrichedTargets meta \\\n", "0 0.0 [NUDT6, ZFYVE20, YME1L1, ARSK, NOL7, TSR2, PMP... {} \n", "1 0.0 [MTMR14, MED14, POLRMT, ZKSCAN1, VPS13B, NUDT6... {} \n", "2 0.0 [SF1, ENY2, ACO1, LYPLA1, MTFR1, TLCD1, ZBTB44... {} \n", "3 0.0 [SF1, CREBL2, AP4S1, ZKSCAN1, VPS13B, C1D, TGD... {} \n", "4 0.0 [SF1, MTMR14, TSR2, ZKSCAN1, PGM2, VPS13B, TMB... {} \n", "\n", " name pvalue simpleName zscore \n", "0 NR2C2_ENCODE 0.000643 NR2C2 0.0 \n", "1 GABPA_ENCODE 0.000643 GABPA 0.0 \n", "2 ERG_CHEA 0.000679 ERG 0.0 \n", "3 TAF1_ENCODE 0.002002 TAF1 0.0 \n", "4 ELF1_ENCODE 0.003928 ELF1 0.0 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import pandas\n", "import pandas as pd\n", "\n", "# Read results\n", "chea_dataframe = pd.DataFrame(x2k_results['ChEA'])\n", "chea_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 1 | Results of the ChEA analysis. ** Each row represents a transcription factor predicted to regulate the input gene list.\n", "\n", "### 3.2 G2N Results\n", "The results for the G2N analysis can be accessed in x2k_results['G2N'].\n", "\n", "The results are stored in a dictionary containing two keys:\n", "* `edges`\n", "* `interactions`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametype
0NR2C2_ENCODEtf
1GABPA_ENCODEtf
2ERG_CHEAtf
3TAF1_ENCODEtf
4ELF1_ENCODEtf
\n", "
" ], "text/plain": [ " name type\n", "0 NR2C2_ENCODE tf\n", "1 GABPA_ENCODE tf\n", "2 ERG_CHEA tf\n", "3 TAF1_ENCODE tf\n", "4 ELF1_ENCODE tf" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# G2N nodes dataframe\n", "g2n_nodes_dataframe = pd.DataFrame(x2k_results['G2N']['nodes']).drop('pvalue', axis=1)\n", "g2n_nodes_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 2 | Nodes of the Genes2Networks expanded subnetwork. ** Each row represents a node in the expanded subnetwork. The type column indicates whether the node is a Transcription Factor identified by ChEA, or an intermediate protein." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcetarget
0106
1107
21010
31014
41015
\n", "
" ], "text/plain": [ " source target\n", "0 10 6\n", "1 10 7\n", "2 10 10\n", "3 10 14\n", "4 10 15" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# G2N edges dataframe\n", "g2n_edges_dataframe = pd.DataFrame(x2k_results['G2N']['interactions'])\n", "g2n_edges_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 3 | Edges of the Genes2Networks expanded subnetwork. ** Each row represents an edge in the expanded subnetwork generated by G2N on the top transcription factors identified by ChEA.\n", "\n", "### 3.3 KEA Results\n", "The results for the KEA analysis can be accessed in x2k_results['KEA']." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
combinedScoreenrichedSubstratesnamepvaluezscore
00.0[RB1, SUB1, JUN, RCOR1, YY1, SP1, HDAC4, HDAC2...CK2ALPHA2.373774e-090.0
10.0[RB1, JUN, SMAD4, SMAD3, SP1, SREBF1, HDAC4]ERK25.720848e-070.0
20.0[JUN, SP1, SP3, SREBF1, ZNF384, HDAC2, GABPA, ...CDK22.061995e-060.0
30.0[RB1, MED1, JUN, EZH2, ERG, SP1, SP3, SREBF1, ...MAPK17.049956e-060.0
40.0[JUN, SMAD4, SMAD3, SP1, SREBF1, HDAC4]ERK11.296236e-050.0
\n", "
" ], "text/plain": [ " combinedScore enrichedSubstrates name \\\n", "0 0.0 [RB1, SUB1, JUN, RCOR1, YY1, SP1, HDAC4, HDAC2... CK2ALPHA \n", "1 0.0 [RB1, JUN, SMAD4, SMAD3, SP1, SREBF1, HDAC4] ERK2 \n", "2 0.0 [JUN, SP1, SP3, SREBF1, ZNF384, HDAC2, GABPA, ... CDK2 \n", "3 0.0 [RB1, MED1, JUN, EZH2, ERG, SP1, SP3, SREBF1, ... MAPK1 \n", "4 0.0 [JUN, SMAD4, SMAD3, SP1, SREBF1, HDAC4] ERK1 \n", "\n", " pvalue zscore \n", "0 2.373774e-09 0.0 \n", "1 5.720848e-07 0.0 \n", "2 2.061995e-06 0.0 \n", "3 7.049956e-06 0.0 \n", "4 1.296236e-05 0.0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# KEA Results\n", "kea_dataframe = pd.DataFrame(x2k_results['KEA'])\n", "kea_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 4 | Results of the KEA analysis. ** Each row represents a protein kinase predicted to regulate the expanded subnetwork generated by G2N.\n", "\n", "### 3.4 X2K Results\n", "The results for the X2K analysis can be accessed in x2k_results['X2K'].\n", "\n", "The results are stored in a dictionary containing two keys:\n", "* `nodes`\n", "* `interactions`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametype
0NR2C2_ENCODEtf
1GABPA_ENCODEtf
2ERG_CHEAtf
3TAF1_ENCODEtf
4ELF1_ENCODEtf
\n", "
" ], "text/plain": [ " name type\n", "0 NR2C2_ENCODE tf\n", "1 GABPA_ENCODE tf\n", "2 ERG_CHEA tf\n", "3 TAF1_ENCODE tf\n", "4 ELF1_ENCODE tf" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# X2K nodes dataframe\n", "x2k_nodes_dataframe = pd.DataFrame(x2k_results['X2K']['nodes']).drop('pvalue', axis=1)\n", "x2k_nodes_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 5 | Nodes of the final Expression2Kinases network. ** Each row represents a node in the final X2K network network. The type column indicates whether the node is a Transcription Factor identified by ChEA, an intermediate protein identified by G2N, or a protein kinase identified by KEA." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcetarget
01033
11030
21025
3109
4106
\n", "
" ], "text/plain": [ " source target\n", "0 10 33\n", "1 10 30\n", "2 10 25\n", "3 10 9\n", "4 10 6" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# X2K edges dataframe\n", "x2k_edges_dataframe = pd.DataFrame(x2k_results['X2K']['interactions'])\n", "x2k_edges_dataframe.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Table 6 | Edges of the final Expression2Kinases subnetwork. ** Each row represents an edge in the final network identified by integrating the results of ChEA, G2N, and KEA." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 2 }