{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "How_to_use_the_Targetome_and_Reactome_BQ_datasets.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyO/sUEpmSiCwz+mCeeR/uBH",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qgwv4lK5V8Zc"
},
"source": [
"# How to use the Targetome and Reactome BigQuery datasets\n",
"Check out other notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!\n",
"\n",
"- **Title:** How to use the Targetome and Reactome BigQuery datasets\n",
"- **Author:** John Phan\n",
"- **Created:** 2021-07-23\n",
"- **Purpose:** Demonstrate basic usage of the Targetome and Reactome BigQuery datasets\n",
"- **URL:** https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_use_the_Targetome_and_Reactome_BQ_datasets.ipynb\n",
"\n",
"This notebook demonstrates basic usage of the Targetome and Reactome BigQuery datasets. Joint analysis of these datasets can provide a powerful tool for identifying pathways affected by cancer drugs via known drug targets (i.e., genes or proteins). \n",
"\n",
"The Cancer Targetome dataset is a curation of cancer drug and target (e.g., protein, RNA, DNA) interactions, supported by either experimental or literature evidence. More information about the Targetome study can be found here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5759325/. The original data can be found here: https://github.com/ablucher/The-Cancer-Targetome.\n",
"\n",
"The Reactome is an open-source, manually curated, and peer-reviewed pathway database. More information can be found here: https://reactome.org/. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1zVfEL8i8__O"
},
"source": [
"# Initialize Notebook Environment\n",
"\n",
"Before running the analysis, we need to load depdnencies, authenticate to BigQuery, and customize notebook parameters."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TJmCyMPjNwjw"
},
"source": [
"## Import Dependencies"
]
},
{
"cell_type": "code",
"metadata": {
"id": "LNlADvO-gken"
},
"source": [
"# GCP Libraries\n",
"from google.cloud import bigquery\n",
"from google.colab import auth\n",
"\n",
"# Data Analytics\n",
"import numpy as np\n",
"from scipy import stats\n",
"\n",
"# Visualization\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"import seaborn as sns"
],
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "LBHnu8bZDEcE"
},
"source": [
"## Authenticate\n",
"\n",
"Before using BigQuery, we need to get authorization for access to BigQuery and the Google Cloud. For more information see ['Quick Start Guide to ISB-CGC'](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowToGetStartedonISB-CGC.html). Alternative authentication methods can be found [here](https://googleapis.dev/python/google-api-core/latest/auth.html)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "M6BEFHj2dd5a"
},
"source": [
"# if you're using Google Colab, authenticate to gcloud with the following\n",
"auth.authenticate_user()\n",
"\n",
"# alternatively, use the gcloud SDK\n",
"#!gcloud auth application-default login"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "RWuECMsRptWJ"
},
"source": [
"## Parameters\n",
"\n",
"Customize the following parameters based on your notebook, execution environment, or project."
]
},
{
"cell_type": "code",
"metadata": {
"id": "veP6gPatu2iW"
},
"source": [
"# set the google project that will be billed for this notebook's computations\n",
"google_project = 'google-project' ## Change me"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "F6zTODbDq3g9"
},
"source": [
"## BigQuery Client\n",
"\n",
"Create the BigQuery client."
]
},
{
"cell_type": "code",
"metadata": {
"id": "HNkfSXedwzbC"
},
"source": [
"# Create a client to access the data within BigQuery\n",
"client = bigquery.Client(google_project)"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "5OCZiJyNOF4i"
},
"source": [
"# Targetome Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i4jPuJnUYkju"
},
"source": [
"## Identify Drugs Interacting w/ Target\n",
"\n",
"The following query identifies all cancer drugs that have been found to interact with a specific target. In this example, the target is MS4A1, a gene that encodes a member of the membrane-spanning 4A gene family. Each identified drug is also associated with a literature evidence, identified by its PubMed ID. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "Q13ninrpb1hH"
},
"source": [
"# set parameters for query\n",
"target_name = 'MS4A1' # target for which we want to identify drugs"
],
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WYKK8nT7Rzpa"
},
"source": [
"# run query and put results in data frame\n",
"drug_interactions = client.query((\"\"\"\n",
" SELECT\n",
" inter.targetName,\n",
" inter.drugName,\n",
" src.PubMedID\n",
"\n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.sources_v1` AS src\n",
" -- link interactions to literature sources\n",
" ON inter.sourceID = src.sourceID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.target_synonyms_v1` AS targsyn\n",
" -- filter for interactions that match target\n",
" ON inter.targetID = targsyn.targetID\n",
" \n",
" WHERE\n",
" LOWER(targsyn.synonym) = LOWER('{target_name}')\n",
"\n",
" ORDER BY drugName ASC\n",
"\"\"\").format(\n",
" target_name=target_name\n",
")).result().to_dataframe()"
],
"execution_count": 6,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"id": "XKmbMwbic2sN",
"outputId": "ec7d73dc-9355-48a3-c807-6117a69a3584"
},
"source": [
"# display result data frame\n",
"drug_interactions"
],
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
targetName
\n",
"
drugName
\n",
"
PubMedID
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
NA_IUPHAR
\n",
"
\n",
"
\n",
"
1
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
11752352
\n",
"
\n",
"
\n",
"
2
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
12011122
\n",
"
\n",
"
\n",
"
3
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
15045033
\n",
"
\n",
"
\n",
"
4
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
20113680
\n",
"
\n",
"
\n",
"
5
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
10541376
\n",
"
\n",
"
\n",
"
6
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
11418316
\n",
"
\n",
"
\n",
"
7
\n",
"
MS4A1
\n",
"
Ibritumomab Tiuxetan
\n",
"
11879282
\n",
"
\n",
"
\n",
"
8
\n",
"
MS4A1
\n",
"
Obinutuzumab
\n",
"
23537278
\n",
"
\n",
"
\n",
"
9
\n",
"
MS4A1
\n",
"
Obinutuzumab
\n",
"
21378274
\n",
"
\n",
"
\n",
"
10
\n",
"
MS4A1
\n",
"
Obinutuzumab
\n",
"
19513948
\n",
"
\n",
"
\n",
"
11
\n",
"
MS4A1
\n",
"
Obinutuzumab
\n",
"
NA_TTD
\n",
"
\n",
"
\n",
"
12
\n",
"
MS4A1
\n",
"
Ofatumumab
\n",
"
17768100
\n",
"
\n",
"
\n",
"
13
\n",
"
MS4A1
\n",
"
Ofatumumab
\n",
"
19427037
\n",
"
\n",
"
\n",
"
14
\n",
"
MS4A1
\n",
"
Ofatumumab
\n",
"
18388516
\n",
"
\n",
"
\n",
"
15
\n",
"
MS4A1
\n",
"
Ofatumumab
\n",
"
NA_TTD
\n",
"
\n",
"
\n",
"
16
\n",
"
MS4A1
\n",
"
Rituximab
\n",
"
11752352
\n",
"
\n",
"
\n",
"
17
\n",
"
MS4A1
\n",
"
Rituximab
\n",
"
20350667
\n",
"
\n",
"
\n",
"
18
\n",
"
MS4A1
\n",
"
Rituximab
\n",
"
20350663
\n",
"
\n",
"
\n",
"
19
\n",
"
MS4A1
\n",
"
Rituximab
\n",
"
NA_TTD
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" targetName drugName PubMedID\n",
"0 MS4A1 Ibritumomab Tiuxetan NA_IUPHAR\n",
"1 MS4A1 Ibritumomab Tiuxetan 11752352\n",
"2 MS4A1 Ibritumomab Tiuxetan 12011122\n",
"3 MS4A1 Ibritumomab Tiuxetan 15045033\n",
"4 MS4A1 Ibritumomab Tiuxetan 20113680\n",
"5 MS4A1 Ibritumomab Tiuxetan 10541376\n",
"6 MS4A1 Ibritumomab Tiuxetan 11418316\n",
"7 MS4A1 Ibritumomab Tiuxetan 11879282\n",
"8 MS4A1 Obinutuzumab 23537278\n",
"9 MS4A1 Obinutuzumab 21378274\n",
"10 MS4A1 Obinutuzumab 19513948\n",
"11 MS4A1 Obinutuzumab NA_TTD\n",
"12 MS4A1 Ofatumumab 17768100\n",
"13 MS4A1 Ofatumumab 19427037\n",
"14 MS4A1 Ofatumumab 18388516\n",
"15 MS4A1 Ofatumumab NA_TTD\n",
"16 MS4A1 Rituximab 11752352\n",
"17 MS4A1 Rituximab 20350667\n",
"18 MS4A1 Rituximab 20350663\n",
"19 MS4A1 Rituximab NA_TTD"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nbxHgDDeBbA7"
},
"source": [
"## Identify Interactions Supported by Experiments w/ Exact Binding Values\n",
"\n",
"As reported in Blucher et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5759325/) Figure 2, Level III Exact interactions refer to interactions annotated with exact (“=”) binding values reported, rather than “<” or “>”. There are approximately 2000 of these interactions. The following query identifies these interactions. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "kpYwsxrICHwd"
},
"source": [
"# run query and put results in data frame\n",
"exact_interactions = client.query((\"\"\"\n",
" SELECT\n",
" DISTINCT\n",
" inter.interactionID,\n",
" inter.drugName,\n",
" inter.targetName\n",
"\n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- filter for interactions with experiments\n",
" ON inter.expID = exp.expID\n",
"\n",
" WHERE\n",
" -- filter for exact binding evidence\n",
" exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
" \n",
" -- filter for only human targets\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
"\n",
" ORDER BY\n",
" targetName ASC,\n",
" drugName ASC\n",
"\"\"\")).result().to_dataframe()"
],
"execution_count": 8,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 419
},
"id": "-17ga4JskFIo",
"outputId": "60d50fbc-ad50-4d83-80f5-e33d82a9e975"
},
"source": [
"# display result data frame\n",
"exact_interactions"
],
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
interactionID
\n",
"
drugName
\n",
"
targetName
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2130
\n",
"
Flutamide
\n",
"
AADAC
\n",
"
\n",
"
\n",
"
1
\n",
"
1921
\n",
"
Axitinib
\n",
"
AAK1
\n",
"
\n",
"
\n",
"
2
\n",
"
1538
\n",
"
Bosutinib
\n",
"
AAK1
\n",
"
\n",
"
\n",
"
3
\n",
"
1616
\n",
"
Crizotinib
\n",
"
AAK1
\n",
"
\n",
"
\n",
"
4
\n",
"
1061
\n",
"
Erlotinib Hydrochloride
\n",
"
AAK1
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
2023
\n",
"
865
\n",
"
Sorafenib Tosylate
\n",
"
ZAK
\n",
"
\n",
"
\n",
"
2024
\n",
"
866
\n",
"
Vandetanib
\n",
"
ZAK
\n",
"
\n",
"
\n",
"
2025
\n",
"
5936
\n",
"
Bosutinib
\n",
"
ZAP70
\n",
"
\n",
"
\n",
"
2026
\n",
"
6782
\n",
"
Ceritinib
\n",
"
ZAP70
\n",
"
\n",
"
\n",
"
2027
\n",
"
4879
\n",
"
Crizotinib
\n",
"
ZAP70
\n",
"
\n",
" \n",
"
\n",
"
2028 rows × 3 columns
\n",
"
"
],
"text/plain": [
" interactionID drugName targetName\n",
"0 2130 Flutamide AADAC\n",
"1 1921 Axitinib AAK1\n",
"2 1538 Bosutinib AAK1\n",
"3 1616 Crizotinib AAK1\n",
"4 1061 Erlotinib Hydrochloride AAK1\n",
"... ... ... ...\n",
"2023 865 Sorafenib Tosylate ZAK\n",
"2024 866 Vandetanib ZAK\n",
"2025 5936 Bosutinib ZAP70\n",
"2026 6782 Ceritinib ZAP70\n",
"2027 4879 Crizotinib ZAP70\n",
"\n",
"[2028 rows x 3 columns]"
]
},
"metadata": {},
"execution_count": 9
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z1Nr4RTDJbcY"
},
"source": [
"## Experimental Evidence Supporting Target Binding for Imatinib\n",
"\n",
"The Cancer Targetome database can help identify drugs with strong interactions supported by experimental data. Blucher et al. highlight the drub imatinib, a protein kinase inhibitor, as an example. Imatinib is involved in 87 interactions supported by experimental binding evidence under 100nM. The following query identifies all 87 interactions. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "4unk8cQ6oiVG"
},
"source": [
"# set parameters for query\n",
"drug_name = 'imatinib'"
],
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "0gKYgFLbJ1tj"
},
"source": [
"# run query and put results in data frame\n",
"drug_interactions = client.query((\"\"\"\n",
" SELECT\n",
" DISTINCT\n",
" inter.* EXCEPT (sourceID),\n",
" exp.exp_assayType,\n",
" exp.exp_assayValueMedian\n",
"\n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- filter for interactions with experimental evidence\n",
" ON inter.expID = exp.expID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID\n",
"\n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('{drug_name}')\n",
"\n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
"\n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
"\n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
"\n",
" ORDER BY inter.targetName ASC\n",
"\"\"\").format(\n",
" drug_name=drug_name\n",
")).result().to_dataframe()"
],
"execution_count": 11,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 609
},
"id": "SFue9YfqpDtZ",
"outputId": "e01575bf-ffaf-45cc-c446-099a64ff4595"
},
"source": [
"# display result data frame\n",
"drug_interactions"
],
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
"
]
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xqY0iJedOVv9"
},
"source": [
"# Integrated Targetome and Reactome Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7kVZ88dU85Vs"
},
"source": [
"## Identify all Reactome Pathways Targeted by Imatinib\n",
"\n",
"We can join tables from the Reactome BigQuery dataset to identy all pathways targeted by Imatinib. Targetome UniProt IDs map to Reactome \"physical entities\", which can then be mapped to Reactome pathways. We further filter the physical entity to pathway evidence codes to retain only interactions that have \"Traceable Author Statements\" (TAS) rather than just \"Inferred from Electronic Annotation\" (IEA) to avoid evidence that have not been manually curated. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "933QxR4tTzFc"
},
"source": [
"# set parameters for query\n",
"drug_name = 'imatinib'"
],
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ZwJGDXql9j6P"
},
"source": [
"# run query and put results in data frame\n",
"drug_pathways = client.query((\"\"\"\n",
" SELECT\n",
" DISTINCT pathway.*\n",
"\n",
" FROM\n",
" `isb-cgc-bq.reactome_versioned.pathway_v77` as pathway\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` as pe2pathway\n",
" -- link pathways to physical entities via intermediate table\n",
" ON pathway.stable_id = pe2pathway.pathway_stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- link pathways to physical entities\n",
" ON pe2pathway.pe_stable_id = pe.stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
" -- link physical entities to interactions\n",
" ON pe.uniprot_id = inter.target_uniprotID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- link interactions to experiments\n",
" ON inter.expID = exp.expID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID\n",
"\n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('{drug_name}')\n",
"\n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
"\n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
"\n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
" \n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" AND pe2pathway.evidence_code = 'TAS' \n",
" ORDER BY pathway.name ASC \n",
"\"\"\").format(\n",
" drug_name=drug_name\n",
")).result().to_dataframe()"
],
"execution_count": 15,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 589
},
"id": "khdvl_xPWWxM",
"outputId": "1fa3ec20-2107-47b3-92b8-893a298b20ce"
},
"source": [
"# display result data frame\n",
"drug_pathways"
],
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
stable_id
\n",
"
url
\n",
"
name
\n",
"
species
\n",
"
lowest_level
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
R-HSA-1280218
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-12...
\n",
"
Adaptive Immune System
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
1
\n",
"
R-HSA-422475
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-42...
\n",
"
Axon guidance
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
2
\n",
"
R-HSA-389356
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CD28 co-stimulation
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
3
\n",
"
R-HSA-389357
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CD28 dependent PI3K/Akt signaling
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
4
\n",
"
R-HSA-389359
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CD28 dependent Vav1 pathway
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
175
\n",
"
R-HSA-8864260
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-88...
\n",
"
Transcriptional regulation by the AP-2 (TFAP2)...
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
176
\n",
"
R-HSA-202430
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-20...
\n",
"
Translocation of ZAP-70 to Immunological synapse
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
177
\n",
"
R-HSA-425366
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-42...
\n",
"
Transport of bile salts and organic acids, met...
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
178
\n",
"
R-HSA-382551
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
Transport of small molecules
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
"
\n",
"
179
\n",
"
R-HSA-5653656
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-56...
\n",
"
Vesicle-mediated transport
\n",
"
Homo sapiens
\n",
"
False
\n",
"
\n",
" \n",
"
\n",
"
180 rows × 5 columns
\n",
"
"
],
"text/plain": [
" stable_id ... lowest_level\n",
"0 R-HSA-1280218 ... False\n",
"1 R-HSA-422475 ... False\n",
"2 R-HSA-389356 ... False\n",
"3 R-HSA-389357 ... True\n",
"4 R-HSA-389359 ... True\n",
".. ... ... ...\n",
"175 R-HSA-8864260 ... False\n",
"176 R-HSA-202430 ... True\n",
"177 R-HSA-425366 ... False\n",
"178 R-HSA-382551 ... False\n",
"179 R-HSA-5653656 ... False\n",
"\n",
"[180 rows x 5 columns]"
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4IT_WO6PO0bH"
},
"source": [
"If we're only interested in the lowest level pathways, i.e., pathways that are not parents of other pathways in the hierarchy, we can filter by the \"lowest_level\" field in the pathways table."
]
},
{
"cell_type": "code",
"metadata": {
"id": "IQcGozM6OzpW"
},
"source": [
"# run query and put results in data frame\n",
"drug_lowest_pathways = client.query((\"\"\"\n",
" SELECT\n",
" DISTINCT pathway.*\n",
"\n",
" FROM\n",
" `isb-cgc-bq.reactome_versioned.pathway_v77` as pathway\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` as pe2pathway\n",
" -- link pathways to physical entities via intermediate table\n",
" ON pathway.stable_id = pe2pathway.pathway_stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- link pathways to physical entities\n",
" ON pe2pathway.pe_stable_id = pe.stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
" -- link physical entities to interactions\n",
" ON pe.uniprot_id = inter.target_uniprotID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- link interactions to experiments\n",
" ON inter.expID = exp.expID\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID\n",
"\n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('{drug_name}')\n",
"\n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
"\n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
"\n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
"\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" AND pe2pathway.evidence_code = 'TAS' \n",
" \n",
" -- filter to include just lowest level pathways\n",
" AND pathway.lowest_level = TRUE\n",
" ORDER BY pathway.name ASC \n",
"\"\"\").format(\n",
" drug_name=drug_name\n",
")).result().to_dataframe()"
],
"execution_count": 17,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 589
},
"id": "q5bZg8w6Pn7Q",
"outputId": "3cdecae6-892b-458f-e11e-dfe7df316de6"
},
"source": [
"# display data frame\n",
"drug_lowest_pathways"
],
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
stable_id
\n",
"
url
\n",
"
name
\n",
"
species
\n",
"
lowest_level
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
R-HSA-389357
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CD28 dependent PI3K/Akt signaling
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
1
\n",
"
R-HSA-389359
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CD28 dependent Vav1 pathway
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
2
\n",
"
R-HSA-389513
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-38...
\n",
"
CTLA4 inhibitory signaling
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
3
\n",
"
R-HSA-8856825
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-88...
\n",
"
Cargo recognition for clathrin-mediated endocy...
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
4
\n",
"
R-HSA-2219530
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-22...
\n",
"
Constitutive Signaling by Aberrant PI3K in Cancer
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
75
\n",
"
R-HSA-9669934
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-96...
\n",
"
Sunitinib-resistant KIT mutants
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
76
\n",
"
R-HSA-9674401
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-96...
\n",
"
Sunitinib-resistant PDGFR mutants
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
77
\n",
"
R-HSA-8866910
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-88...
\n",
"
TFAP2 (AP-2) family regulates transcription of...
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
78
\n",
"
R-HSA-8853884
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-88...
\n",
"
Transcriptional Regulation by VENTX
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
"
\n",
"
79
\n",
"
R-HSA-202430
\n",
"
https://reactome.org/PathwayBrowser/#/R-HSA-20...
\n",
"
Translocation of ZAP-70 to Immunological synapse
\n",
"
Homo sapiens
\n",
"
True
\n",
"
\n",
" \n",
"
\n",
"
80 rows × 5 columns
\n",
"
"
],
"text/plain": [
" stable_id ... lowest_level\n",
"0 R-HSA-389357 ... True\n",
"1 R-HSA-389359 ... True\n",
"2 R-HSA-389513 ... True\n",
"3 R-HSA-8856825 ... True\n",
"4 R-HSA-2219530 ... True\n",
".. ... ... ...\n",
"75 R-HSA-9669934 ... True\n",
"76 R-HSA-9674401 ... True\n",
"77 R-HSA-8866910 ... True\n",
"78 R-HSA-8853884 ... True\n",
"79 R-HSA-202430 ... True\n",
"\n",
"[80 rows x 5 columns]"
]
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ybqSP2aVNE99"
},
"source": [
"## Pathway Enrichment Analysis\n",
"We can identify pathways that are \"enriched\" with the targets associated with a particular drug. In other words, we can answer the question: given a set of targets for a drug, which pathways contain those targets at a frequency higher than random chance? By calculating the probability that a number of target genes are contained in each pathway, we can identify pathways most likely to be targeted by the drug. \n",
"\n",
"To do this, we can use a **chi-squared** test to determine if there is a statistically significant difference between the expected frequency of targets in a pathway compared to the observed frequency. \n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sKLy2sxQNIu6"
},
"source": [
"# set query parameters\n",
"drug_name = 'sorafenib'\n",
"lowest_level = True # only show pathways at the lowest level"
],
"execution_count": 19,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jDdXWbOHa32V"
},
"source": [
"### Construct SQL Query\n",
"\n",
"A single query can be used to calculate the chi-squared statistic for all pathways. This query is rather lengthy, but can be broken up into a series of sub-queries to create temporary tables. We step through each query below.\n",
"\n",
"First, we write a query that simply gets a list of all targets of the drug: "
]
},
{
"cell_type": "code",
"metadata": {
"id": "r3NeGvfMbgBp"
},
"source": [
"target_list_query = \"\"\"\n",
" -- Table that contains a list of all distinct targets of the drug\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID\n",
"\n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID \n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- filter for interactions with experimental evidence\n",
" ON inter.expID = exp.expID \n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON inter.target_uniprotID = pe.uniprot_id\n",
"\n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('{drug_name}')\n",
"\n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
"\n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
"\n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
"\"\"\".format(\n",
" drug_name=drug_name\n",
").strip(\"\\n\")"
],
"execution_count": 20,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wTVNQFJjgQ1y",
"outputId": "b46c9057-8a2d-4a9c-ea35-cf0ae6dae03d"
},
"source": [
"print(target_list_query)"
],
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" -- Table that contains a list of all distinct targets of the drug\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID\n",
"\n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID \n",
"\n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- filter for interactions with experimental evidence\n",
" ON inter.expID = exp.expID \n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON inter.target_uniprotID = pe.uniprot_id\n",
"\n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('sorafenib')\n",
"\n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
"\n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
"\n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bY4QCtDfgwBq"
},
"source": [
"We then create a query that counts all targets associated with each Reactome pathway. This query depends on the previous \"target_list_query\" sub-query. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "H_3xAQdjhJ9s"
},
"source": [
"target_pp_query = \"\"\"\n",
" -- Table that maps pathways to the total number of drug targets within that pathway\n",
" SELECT\n",
" COUNT(DISTINCT target_list_query.target_uniprotID) as num_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
"\n",
" FROM\n",
" target_list_query\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON target_list_query.target_uniprotID = pe.uniprot_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" -- link physical entities to pathways via intermediate table\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" -- link physical entities to pathways\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
"\n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
"\n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_targets DESC\n",
"\"\"\".format(\n",
" drug_name=drug_name\n",
").strip(\"\\n\")"
],
"execution_count": 22,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "v29oHYEukrzy",
"outputId": "64c123fa-e9bd-46f1-f9cf-4d8214f90f08"
},
"source": [
"print(target_pp_query)"
],
"execution_count": 23,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" -- Table that maps pathways to the total number of drug targets within that pathway\n",
" SELECT\n",
" COUNT(DISTINCT target_list_query.target_uniprotID) as num_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
"\n",
" FROM\n",
" target_list_query\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON target_list_query.target_uniprotID = pe.uniprot_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" -- link physical entities to pathways via intermediate table\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" -- link physical entities to pathways\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
"\n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
"\n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_targets DESC\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_XXiAJZUlBxC"
},
"source": [
"Now we construct the same queries for proteins that are NOT targets of the drug. These are prefixed with \"not_target\". This query depends on the \"target_list_query\" sub-query."
]
},
{
"cell_type": "code",
"metadata": {
"id": "1KyR-_WqlVVn"
},
"source": [
"not_target_list_query = \"\"\"\n",
" -- Table that contains a list of all proteins that are NOT targets of the drug\n",
" -- This query depends on \"target_list_query\", which is created by a previous\n",
" -- query.\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID AS target_uniprotID\n",
"\n",
" FROM `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" WHERE\n",
" inter.targetSpecies = 'Homo sapiens'\n",
" \n",
" AND inter.target_uniprotID NOT IN (\n",
" SELECT target_uniprotID FROM target_list_query\n",
" )\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 24,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "92MaTzsAn3es",
"outputId": "24f49cc4-b6a7-4900-c762-c8a7f0afe38c"
},
"source": [
"print(not_target_list_query)"
],
"execution_count": 25,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" -- Table that contains a list of all proteins that are NOT targets of the drug\n",
" -- This query depends on \"target_list_query\", which is created by a previous\n",
" -- query.\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID AS target_uniprotID\n",
"\n",
" FROM `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
"\n",
" WHERE\n",
" inter.targetSpecies = 'Homo sapiens'\n",
" \n",
" AND inter.target_uniprotID NOT IN (\n",
" SELECT target_uniprotID FROM target_list_query\n",
" )\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "mI1SxkGtn-ow"
},
"source": [
"not_target_pp_query = \"\"\"\n",
" -- Table that maps pathways to the number of proteins that are NOT drug\n",
" -- targets in that pathway.\n",
" SELECT\n",
" COUNT(DISTINCT not_target_list_query.target_uniprotID) AS num_not_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
"\n",
" FROM not_target_list_query\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" ON not_target_list_query.target_uniprotID = pe.uniprot_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
"\n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
"\n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_not_targets DESC\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 26,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TgY33Z3ukByC",
"outputId": "ee298f11-03f4-43bf-8be8-958d73d3121f"
},
"source": [
"print(not_target_pp_query)"
],
"execution_count": 27,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" -- Table that maps pathways to the number of proteins that are NOT drug\n",
" -- targets in that pathway.\n",
" SELECT\n",
" COUNT(DISTINCT not_target_list_query.target_uniprotID) AS num_not_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
"\n",
" FROM not_target_list_query\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" ON not_target_list_query.target_uniprotID = pe.uniprot_id\n",
"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
"\n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
"\n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_not_targets DESC\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Z2eN5_UfVgcX"
},
"source": [
"For convenience, we create a sub-query to keep the counts of the number of proteins that are targets and the number of proteins that are not targets."
]
},
{
"cell_type": "code",
"metadata": {
"id": "VAYDIHsdWLFA"
},
"source": [
"target_count_query = \"\"\"\n",
" -- Table that contains the counts of # of proteins that are/are not targets\n",
" SELECT\n",
" target_count,\n",
" not_target_count,\n",
" target_count + not_target_count AS total_count\n",
"\n",
" FROM \n",
" (SELECT COUNT(*) AS target_count FROM target_list_query),\n",
" (SELECT COUNT(*) AS not_target_count FROM not_target_list_query)\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 28,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "qrt8DcUraJm3"
},
"source": [
"Now we can create more interesting queries for the contingency matrices that contain the observed and expected values. \n",
"\n",
"First, the observed contingency table counts for each pathway:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8Dc-wMSZakad"
},
"source": [
"observed_query = \"\"\"\n",
" -- Table with observed values per pathway in the contingency matrix\n",
" SELECT\n",
" target_pp_query.num_targets AS in_target_in_pathway,\n",
" not_target_pp_query.num_not_targets AS not_target_in_pathway,\n",
" target_count_query.target_count - target_pp_query.num_targets AS in_target_not_pathway,\n",
" target_count_query.not_target_count - not_target_pp_query.num_not_targets AS not_target_not_pathway,\n",
" target_pp_query.stable_id,\n",
" target_pp_query.name\n",
"\n",
" FROM \n",
" target_pp_query,\n",
" target_count_query\n",
"\n",
" INNER JOIN not_target_pp_query\n",
" ON target_pp_query.stable_id = not_target_pp_query.stable_id\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 29,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "STbl3Toxcq7B"
},
"source": [
"Then the observed row and column sums of the contingency table:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4kXNQ4SRcmPK"
},
"source": [
"sum_query = \"\"\"\n",
" -- Table with summed observed values per pathway in the contingency matrix\n",
" SELECT\n",
" observed_query.in_target_in_pathway + observed_query.not_target_in_pathway AS pathway_total,\n",
" observed_query.in_target_not_pathway + observed_query.not_target_not_pathway AS not_pathway_total,\n",
" observed_query.in_target_in_pathway + observed_query.in_target_not_pathway AS target_total,\n",
" observed_query.not_target_in_pathway + observed_query.not_target_not_pathway AS not_target_total,\n",
" observed_query.stable_id,\n",
" observed_query.name\n",
"\n",
" FROM\n",
" observed_query\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 30,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6HiQlxAudQEk"
},
"source": [
"And the expected contingency table values for each pathway:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "V0Rj098QdWe6"
},
"source": [
"expected_query = \"\"\" \n",
" -- Table with the expected values per pathway in the contingency matrix\n",
" SELECT \n",
" sum_query.target_total * sum_query.pathway_total / target_count_query.total_count AS exp_in_target_in_pathway,\n",
" sum_query.not_target_total * sum_query.pathway_total / target_count_query.total_count AS exp_not_target_in_pathway,\n",
" sum_query.target_total * sum_query.not_pathway_total / target_count_query.total_count AS exp_in_target_not_pathway,\n",
" sum_query.not_target_total * sum_query.not_pathway_total / target_count_query.total_count AS exp_not_target_not_pathway,\n",
" sum_query.stable_id,\n",
" sum_query.name\n",
"\n",
" FROM \n",
" sum_query, target_count_query\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 31,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "wC57CDA6d_4H"
},
"source": [
"Finally, we can calculate the chi-squared statistic for each pathway:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "kxlrSyiTd_Ia"
},
"source": [
"chi_squared_query = \"\"\"\n",
" -- Table with the chi-squared statistic for each pathway\n",
" SELECT\n",
" -- Chi squared statistic with Yates' correction\n",
" POW(ABS(observed_query.in_target_in_pathway - expected_query.exp_in_target_in_pathway) - 0.5, 2) / expected_query.exp_in_target_in_pathway \n",
" + POW(ABS(observed_query.not_target_in_pathway - expected_query.exp_not_target_in_pathway) - 0.5, 2) / expected_query.exp_not_target_in_pathway\n",
" + POW(ABS(observed_query.in_target_not_pathway - expected_query.exp_in_target_not_pathway) - 0.5, 2) / expected_query.exp_in_target_not_pathway\n",
" + POW(ABS(observed_query.not_target_not_pathway - expected_query.exp_not_target_not_pathway) - 0.5, 2) / expected_query.exp_not_target_not_pathway\n",
" AS chi_squared_stat,\n",
" observed_query.stable_id,\n",
" observed_query.name\n",
"\n",
" FROM observed_query\n",
"\n",
" INNER JOIN expected_query\n",
" ON observed_query.stable_id = expected_query.stable_id\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 32,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "lTRfNbLKuarY"
},
"source": [
"The final query optionally adds a filter that removes all pathways that are not at the lowest level of the hierarchy. This helps remove non-specific \"pathways\" such as the generic \"Disease\" pathway."
]
},
{
"cell_type": "code",
"metadata": {
"id": "IBpHKWxjvCSx"
},
"source": [
"lowest_level_filter = \"\"\"\n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON chi_squared_query.stable_id = pathway.stable_id\n",
"\n",
" WHERE pathway.lowest_level = TRUE\n",
"\"\"\".strip(\"\\n\")"
],
"execution_count": 33,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "V_Qx1J2IvrBT",
"outputId": "369415dd-4562-47a1-afa3-f340eb36e7c6"
},
"source": [
"print(lowest_level_filter)"
],
"execution_count": 34,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON chi_squared_query.stable_id = pathway.stable_id\n",
"\n",
" WHERE pathway.lowest_level = TRUE\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QsJQyREafSZE"
},
"source": [
"Now we can combine all sub-queries into the final query:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "dOXXXAmkfcnb"
},
"source": [
"final_query = \"\"\"\n",
" WITH\n",
" target_list_query AS (\n",
" {target_list_query}\n",
" ),\n",
"\n",
" target_pp_query AS (\n",
" {target_pp_query}\n",
" ),\n",
"\n",
" not_target_list_query AS (\n",
" {not_target_list_query}\n",
" ),\n",
"\n",
" not_target_pp_query AS (\n",
" {not_target_pp_query}\n",
" ),\n",
"\n",
" target_count_query AS (\n",
" {target_count_query}\n",
" ),\n",
"\n",
" observed_query AS (\n",
" {observed_query}\n",
" ),\n",
"\n",
" sum_query AS (\n",
" {sum_query}\n",
" ),\n",
"\n",
" expected_query AS (\n",
" {expected_query}\n",
" ),\n",
" \n",
" chi_squared_query AS (\n",
" {chi_squared_query}\n",
" )\n",
"\n",
" SELECT\n",
" observed_query.in_target_in_pathway,\n",
" observed_query.in_target_not_pathway,\n",
" observed_query.not_target_in_pathway,\n",
" observed_query.not_target_not_pathway,\n",
" chi_squared_query.chi_squared_stat,\n",
" chi_squared_query.stable_id,\n",
" chi_squared_query.name\n",
"\n",
" FROM chi_squared_query\n",
"\n",
" INNER JOIN observed_query\n",
" ON chi_squared_query.stable_id = observed_query.stable_id\n",
" {lowest_level_filter}\n",
" ORDER BY chi_squared_stat DESC\n",
"\"\"\".format(\n",
" # make final query a little easier to read by removing/adding some white space \n",
" target_list_query=\"\\n \".join(target_list_query.strip().splitlines()),\n",
" target_pp_query=\"\\n \".join(target_pp_query.strip().splitlines()),\n",
" not_target_list_query=\"\\n \".join(not_target_list_query.strip().splitlines()),\n",
" not_target_pp_query=\"\\n \".join(not_target_pp_query.strip().splitlines()),\n",
" target_count_query=\"\\n \".join(target_count_query.strip().splitlines()),\n",
" observed_query=\"\\n \".join(observed_query.strip().splitlines()),\n",
" sum_query=\"\\n \".join(sum_query.strip().splitlines()),\n",
" expected_query=\"\\n \".join(expected_query.strip().splitlines()),\n",
" chi_squared_query=\"\\n \".join(chi_squared_query.strip().splitlines()),\n",
" lowest_level_filter=(\n",
" \"\\n \"+\"\\n\".join(lowest_level_filter.strip().splitlines())+\"\\n\" if lowest_level else \"\"\n",
" )\n",
").strip(\"\\n\")"
],
"execution_count": 35,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "fSUidOyTOwMu"
},
"source": [
"### Display Final Query"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MfitLyzmgq_P",
"outputId": "ea9f957b-ff15-4795-c00f-067d1719ca6c"
},
"source": [
"# print the formatted final query\n",
"print(final_query)"
],
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" WITH\n",
" target_list_query AS (\n",
" -- Table that contains a list of all distinct targets of the drug\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID\n",
" \n",
" FROM\n",
" `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
" \n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.drug_synonyms_v1` AS drugsyn\n",
" -- filter for interactions matching drug id\n",
" ON inter.drugID = drugsyn.drugID \n",
" \n",
" INNER JOIN `isb-cgc-bq.targetome_versioned.experiments_v1` AS exp\n",
" -- filter for interactions with experimental evidence\n",
" ON inter.expID = exp.expID \n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON inter.target_uniprotID = pe.uniprot_id\n",
" \n",
" WHERE\n",
" -- filter by drug name\n",
" LOWER(drugsyn.synonym) = LOWER('sorafenib')\n",
" \n",
" -- make sure that all assay ranges are at or below 100nM\n",
" AND exp.exp_assayValueMedian <= 100\n",
" AND (exp.exp_assayValueLow <= 100 OR exp.exp_assayValueLow is null)\n",
" AND (exp.exp_assayValueHigh <= 100 OR exp.exp_assayValueHigh is null)\n",
" \n",
" -- make sure the assay type is known (KD, Ki, IC50, or EC50)\n",
" AND exp.exp_assayType IS NOT NULL\n",
" AND exp.exp_assayRelation = '='\n",
" \n",
" -- limit to just experiments in humans\n",
" AND inter.targetSpecies = 'Homo sapiens'\n",
" ),\n",
"\n",
" target_pp_query AS (\n",
" -- Table that maps pathways to the total number of drug targets within that pathway\n",
" SELECT\n",
" COUNT(DISTINCT target_list_query.target_uniprotID) as num_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
" \n",
" FROM\n",
" target_list_query\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" -- filter for interactions with targets that match a reactome\n",
" -- physical entity\n",
" ON target_list_query.target_uniprotID = pe.uniprot_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" -- link physical entities to pathways via intermediate table\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" -- link physical entities to pathways\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
" \n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
" \n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_targets DESC\n",
" ),\n",
"\n",
" not_target_list_query AS (\n",
" -- Table that contains a list of all proteins that are NOT targets of the drug\n",
" -- This query depends on \"target_list_query\", which is created by a previous\n",
" -- query.\n",
" SELECT\n",
" DISTINCT inter.target_uniprotID AS target_uniprotID\n",
" \n",
" FROM `isb-cgc-bq.targetome_versioned.interactions_v1` AS inter\n",
" \n",
" WHERE\n",
" inter.targetSpecies = 'Homo sapiens'\n",
" \n",
" AND inter.target_uniprotID NOT IN (\n",
" SELECT target_uniprotID FROM target_list_query\n",
" )\n",
" ),\n",
"\n",
" not_target_pp_query AS (\n",
" -- Table that maps pathways to the number of proteins that are NOT drug\n",
" -- targets in that pathway.\n",
" SELECT\n",
" COUNT(DISTINCT not_target_list_query.target_uniprotID) AS num_not_targets,\n",
" pathway.stable_id,\n",
" pathway.name\n",
" \n",
" FROM not_target_list_query\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.physical_entity_v77` AS pe\n",
" ON not_target_list_query.target_uniprotID = pe.uniprot_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pe_to_pathway_v77` AS pe2pathway\n",
" ON pe.stable_id = pe2pathway.pe_stable_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON pe2pathway.pathway_stable_id = pathway.stable_id\n",
" \n",
" WHERE\n",
" -- filter by stronger evidence: \"Traceable Author Statement\" \n",
" pe2pathway.evidence_code = 'TAS'\n",
" \n",
" GROUP BY pathway.stable_id, pathway.name\n",
" ORDER BY num_not_targets DESC\n",
" ),\n",
"\n",
" target_count_query AS (\n",
" -- Table that contains the counts of # of proteins that are/are not targets\n",
" SELECT\n",
" target_count,\n",
" not_target_count,\n",
" target_count + not_target_count AS total_count\n",
" \n",
" FROM \n",
" (SELECT COUNT(*) AS target_count FROM target_list_query),\n",
" (SELECT COUNT(*) AS not_target_count FROM not_target_list_query)\n",
" ),\n",
"\n",
" observed_query AS (\n",
" -- Table with observed values per pathway in the contingency matrix\n",
" SELECT\n",
" target_pp_query.num_targets AS in_target_in_pathway,\n",
" not_target_pp_query.num_not_targets AS not_target_in_pathway,\n",
" target_count_query.target_count - target_pp_query.num_targets AS in_target_not_pathway,\n",
" target_count_query.not_target_count - not_target_pp_query.num_not_targets AS not_target_not_pathway,\n",
" target_pp_query.stable_id,\n",
" target_pp_query.name\n",
" \n",
" FROM \n",
" target_pp_query,\n",
" target_count_query\n",
" \n",
" INNER JOIN not_target_pp_query\n",
" ON target_pp_query.stable_id = not_target_pp_query.stable_id\n",
" ),\n",
"\n",
" sum_query AS (\n",
" -- Table with summed observed values per pathway in the contingency matrix\n",
" SELECT\n",
" observed_query.in_target_in_pathway + observed_query.not_target_in_pathway AS pathway_total,\n",
" observed_query.in_target_not_pathway + observed_query.not_target_not_pathway AS not_pathway_total,\n",
" observed_query.in_target_in_pathway + observed_query.in_target_not_pathway AS target_total,\n",
" observed_query.not_target_in_pathway + observed_query.not_target_not_pathway AS not_target_total,\n",
" observed_query.stable_id,\n",
" observed_query.name\n",
" \n",
" FROM\n",
" observed_query\n",
" ),\n",
"\n",
" expected_query AS (\n",
" -- Table with the expected values per pathway in the contingency matrix\n",
" SELECT \n",
" sum_query.target_total * sum_query.pathway_total / target_count_query.total_count AS exp_in_target_in_pathway,\n",
" sum_query.not_target_total * sum_query.pathway_total / target_count_query.total_count AS exp_not_target_in_pathway,\n",
" sum_query.target_total * sum_query.not_pathway_total / target_count_query.total_count AS exp_in_target_not_pathway,\n",
" sum_query.not_target_total * sum_query.not_pathway_total / target_count_query.total_count AS exp_not_target_not_pathway,\n",
" sum_query.stable_id,\n",
" sum_query.name\n",
" \n",
" FROM \n",
" sum_query, target_count_query\n",
" ),\n",
" \n",
" chi_squared_query AS (\n",
" -- Table with the chi-squared statistic for each pathway\n",
" SELECT\n",
" -- Chi squared statistic with Yates' correction\n",
" POW(ABS(observed_query.in_target_in_pathway - expected_query.exp_in_target_in_pathway) - 0.5, 2) / expected_query.exp_in_target_in_pathway \n",
" + POW(ABS(observed_query.not_target_in_pathway - expected_query.exp_not_target_in_pathway) - 0.5, 2) / expected_query.exp_not_target_in_pathway\n",
" + POW(ABS(observed_query.in_target_not_pathway - expected_query.exp_in_target_not_pathway) - 0.5, 2) / expected_query.exp_in_target_not_pathway\n",
" + POW(ABS(observed_query.not_target_not_pathway - expected_query.exp_not_target_not_pathway) - 0.5, 2) / expected_query.exp_not_target_not_pathway\n",
" AS chi_squared_stat,\n",
" observed_query.stable_id,\n",
" observed_query.name\n",
" \n",
" FROM observed_query\n",
" \n",
" INNER JOIN expected_query\n",
" ON observed_query.stable_id = expected_query.stable_id\n",
" )\n",
"\n",
" SELECT\n",
" observed_query.in_target_in_pathway,\n",
" observed_query.in_target_not_pathway,\n",
" observed_query.not_target_in_pathway,\n",
" observed_query.not_target_not_pathway,\n",
" chi_squared_query.chi_squared_stat,\n",
" chi_squared_query.stable_id,\n",
" chi_squared_query.name\n",
"\n",
" FROM chi_squared_query\n",
"\n",
" INNER JOIN observed_query\n",
" ON chi_squared_query.stable_id = observed_query.stable_id\n",
" \n",
" INNER JOIN `isb-cgc-bq.reactome_versioned.pathway_v77` AS pathway\n",
" ON chi_squared_query.stable_id = pathway.stable_id\n",
"\n",
" WHERE pathway.lowest_level = TRUE\n",
"\n",
" ORDER BY chi_squared_stat DESC\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Fjkbq4FR19in"
},
"source": [
"### Execute the Query\n",
"\n",
"Now execute the query to calculate a chi-squared statistic for each pathway:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ua5oYRFY1aVu"
},
"source": [
"# run query and put results in data frame\n",
"chi_squared_pathways = client.query(final_query).result().to_dataframe()"
],
"execution_count": 37,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "9Yt-GatY2awc",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "ee40e93b-9f65-4091-da0e-166ef56a8b52"
},
"source": [
"# display the data frame\n",
"chi_squared_pathways"
],
"execution_count": 38,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
in_target_in_pathway
\n",
"
in_target_not_pathway
\n",
"
not_target_in_pathway
\n",
"
not_target_not_pathway
\n",
"
chi_squared_stat
\n",
"
stable_id
\n",
"
name
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
3
\n",
"
18
\n",
"
3
\n",
"
678
\n",
"
31.192658
\n",
"
R-HSA-195399
\n",
"
VEGF binds to VEGFR leading to receptor dimeri...
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
19
\n",
"
1
\n",
"
680
\n",
"
22.942602
\n",
"
R-HSA-194306
\n",
"
Neurophilin interactions with VEGF and VEGFR
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
17
\n",
"
14
\n",
"
667
\n",
"
17.232611
\n",
"
R-HSA-2219530
\n",
"
Constitutive Signaling by Aberrant PI3K in Cancer
\n",
"
\n",
"
\n",
"
3
\n",
"
2
\n",
"
19
\n",
"
2
\n",
"
679
\n",
"
16.508319
\n",
"
R-HSA-525793
\n",
"
Myogenesis
\n",
"
\n",
"
\n",
"
4
\n",
"
2
\n",
"
19
\n",
"
4
\n",
"
677
\n",
"
10.101124
\n",
"
R-HSA-5674499
\n",
"
Negative feedback regulation of MAPK pathway
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
62
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-1989781
\n",
"
PPARA activates gene expression
\n",
"
\n",
"
\n",
"
63
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2644606
\n",
"
Constitutive Signaling by NOTCH1 PEST Domain M...
\n",
"
\n",
"
\n",
"
64
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2894862
\n",
"
Constitutive Signaling by NOTCH1 HD+PEST Domai...
\n",
"
\n",
"
\n",
"
65
\n",
"
1
\n",
"
20
\n",
"
20
\n",
"
661
\n",
"
0.027803
\n",
"
R-HSA-8939236
\n",
"
RUNX1 regulates transcription of genes involve...
\n",
"
\n",
"
\n",
"
66
\n",
"
1
\n",
"
20
\n",
"
19
\n",
"
662
\n",
"
0.017134
\n",
"
R-HSA-2559580
\n",
"
Oxidative Stress Induced Senescence
\n",
"
\n",
" \n",
"
\n",
"
67 rows × 7 columns
\n",
"
"
],
"text/plain": [
" in_target_in_pathway ... name\n",
"0 3 ... VEGF binds to VEGFR leading to receptor dimeri...\n",
"1 2 ... Neurophilin interactions with VEGF and VEGFR\n",
"2 4 ... Constitutive Signaling by Aberrant PI3K in Cancer\n",
"3 2 ... Myogenesis\n",
"4 2 ... Negative feedback regulation of MAPK pathway\n",
".. ... ... ...\n",
"62 1 ... PPARA activates gene expression\n",
"63 1 ... Constitutive Signaling by NOTCH1 PEST Domain M...\n",
"64 1 ... Constitutive Signaling by NOTCH1 HD+PEST Domai...\n",
"65 1 ... RUNX1 regulates transcription of genes involve...\n",
"66 1 ... Oxidative Stress Induced Senescence\n",
"\n",
"[67 rows x 7 columns]"
]
},
"metadata": {},
"execution_count": 38
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qM42ztRJDgbM"
},
"source": [
"### Calculate P-Values\n",
"\n",
"BigQuery does not have statistical functions to calculate p-values, so we use the SciPy stats library and update the data frame with a new p-value column:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Beim_ePN2cs7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "68edcc6b-f59b-4416-af2e-30f566575c7f"
},
"source": [
"chi_squared_pathways['p_value'] = 1-stats.chi2.cdf(chi_squared_pathways['chi_squared_stat'], 1)\n",
"chi_squared_pathways"
],
"execution_count": 39,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
in_target_in_pathway
\n",
"
in_target_not_pathway
\n",
"
not_target_in_pathway
\n",
"
not_target_not_pathway
\n",
"
chi_squared_stat
\n",
"
stable_id
\n",
"
name
\n",
"
p_value
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
3
\n",
"
18
\n",
"
3
\n",
"
678
\n",
"
31.192658
\n",
"
R-HSA-195399
\n",
"
VEGF binds to VEGFR leading to receptor dimeri...
\n",
"
2.336481e-08
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
19
\n",
"
1
\n",
"
680
\n",
"
22.942602
\n",
"
R-HSA-194306
\n",
"
Neurophilin interactions with VEGF and VEGFR
\n",
"
1.669114e-06
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
17
\n",
"
14
\n",
"
667
\n",
"
17.232611
\n",
"
R-HSA-2219530
\n",
"
Constitutive Signaling by Aberrant PI3K in Cancer
\n",
"
3.307106e-05
\n",
"
\n",
"
\n",
"
3
\n",
"
2
\n",
"
19
\n",
"
2
\n",
"
679
\n",
"
16.508319
\n",
"
R-HSA-525793
\n",
"
Myogenesis
\n",
"
4.843709e-05
\n",
"
\n",
"
\n",
"
4
\n",
"
2
\n",
"
19
\n",
"
4
\n",
"
677
\n",
"
10.101124
\n",
"
R-HSA-5674499
\n",
"
Negative feedback regulation of MAPK pathway
\n",
"
1.481790e-03
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
62
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-1989781
\n",
"
PPARA activates gene expression
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
63
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2644606
\n",
"
Constitutive Signaling by NOTCH1 PEST Domain M...
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
64
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2894862
\n",
"
Constitutive Signaling by NOTCH1 HD+PEST Domai...
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
65
\n",
"
1
\n",
"
20
\n",
"
20
\n",
"
661
\n",
"
0.027803
\n",
"
R-HSA-8939236
\n",
"
RUNX1 regulates transcription of genes involve...
\n",
"
8.675732e-01
\n",
"
\n",
"
\n",
"
66
\n",
"
1
\n",
"
20
\n",
"
19
\n",
"
662
\n",
"
0.017134
\n",
"
R-HSA-2559580
\n",
"
Oxidative Stress Induced Senescence
\n",
"
8.958574e-01
\n",
"
\n",
" \n",
"
\n",
"
67 rows × 8 columns
\n",
"
"
],
"text/plain": [
" in_target_in_pathway ... p_value\n",
"0 3 ... 2.336481e-08\n",
"1 2 ... 1.669114e-06\n",
"2 4 ... 3.307106e-05\n",
"3 2 ... 4.843709e-05\n",
"4 2 ... 1.481790e-03\n",
".. ... ... ...\n",
"62 1 ... 8.551126e-01\n",
"63 1 ... 8.551126e-01\n",
"64 1 ... 8.551126e-01\n",
"65 1 ... 8.675732e-01\n",
"66 1 ... 8.958574e-01\n",
"\n",
"[67 rows x 8 columns]"
]
},
"metadata": {},
"execution_count": 39
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TqPYE1JZDq5L"
},
"source": [
"### Adjust for Multiple Testing\n",
"\n",
"Since we're testing multiple pathways, we need to adjust the p-value threshold for significance accordingly. The number of pathways tested depends on whether or not we're considering all pathways, or just the lowest level pathways. That count can be obtained with the following query."
]
},
{
"cell_type": "code",
"metadata": {
"id": "7KziIxfED8kB"
},
"source": [
"# run query and put results in data frame\n",
"num_pathways_result = client.query((\"\"\"\n",
" SELECT\n",
" COUNT (*) AS num_pathways\n",
" FROM\n",
" `isb-cgc-bq.reactome_versioned.pathway_v77` as pathway\n",
" {lowest_level_filter}\n",
"\"\"\").format(\n",
" lowest_level_filter=(\"WHERE lowest_level = TRUE\" if lowest_level else \"\")\n",
")).result().to_dataframe()"
],
"execution_count": 40,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 80
},
"id": "E_7ncwlOEmuG",
"outputId": "e3d92d7d-88ed-40c6-9eed-22231db81c38"
},
"source": [
"# display data frame\n",
"num_pathways_result"
],
"execution_count": 41,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
num_pathways
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1773
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" num_pathways\n",
"0 1773"
]
},
"metadata": {},
"execution_count": 41
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7B9__TlA4M9u",
"outputId": "5189244f-d1ae-4668-df2f-7d71fe3d98b0"
},
"source": [
"# adjust significance threshold for multiple testing, using a p-value of 0.01\n",
"num_pathways = num_pathways_result['num_pathways'][0]\n",
"significance_threshold = 0.01/num_pathways\n",
"print('Significance Threshold: {}'.format(significance_threshold))\n",
"\n",
"# find all pathways that meet the significance criteria after adjusting for\n",
"# multiple testing\n",
"significant_pathway_index = chi_squared_pathways['p_value']\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
in_target_in_pathway
\n",
"
in_target_not_pathway
\n",
"
not_target_in_pathway
\n",
"
not_target_not_pathway
\n",
"
chi_squared_stat
\n",
"
stable_id
\n",
"
name
\n",
"
p_value
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
3
\n",
"
18
\n",
"
3
\n",
"
678
\n",
"
31.192658
\n",
"
R-HSA-195399
\n",
"
VEGF binds to VEGFR leading to receptor dimeri...
\n",
"
2.336481e-08
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
19
\n",
"
1
\n",
"
680
\n",
"
22.942602
\n",
"
R-HSA-194306
\n",
"
Neurophilin interactions with VEGF and VEGFR
\n",
"
1.669114e-06
\n",
"
\n",
" \n",
"
\n",
""
],
"text/plain": [
" in_target_in_pathway ... p_value\n",
"0 3 ... 2.336481e-08\n",
"1 2 ... 1.669114e-06\n",
"\n",
"[2 rows x 8 columns]"
]
},
"metadata": {},
"execution_count": 43
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "if9v8H2T12cp"
},
"source": [
"The results of this analysis suggest that at least two pathways may be affected by the drug sorafenib, based on the known targets of the sorafenib."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s31t8Ed-7cfK"
},
"source": [
"### Verify Results by Comparing to SciPy Chi-Squared Function\n",
"\n",
"We verify these BigQuery results by calculating the same chi-squared statistic using the SciPy package. Comparing the SciPy p-values and statistics to those of the BigQuery-derived results confirms that they are identical."
]
},
{
"cell_type": "code",
"metadata": {
"id": "-BPVmeMF1CGO"
},
"source": [
"# extract observed values from bigquery result\n",
"observed = chi_squared_pathways[[\n",
" 'in_target_in_pathway',\n",
" 'in_target_not_pathway',\n",
" 'not_target_in_pathway',\n",
" 'not_target_not_pathway'\n",
"]]\n",
"\n",
"# calculate the chi-squared statistic using the scipy stats package \n",
"chi2_stat = []\n",
"chi2_pvalue = []\n",
"for index, row in observed.iterrows():\n",
" stat, pvalue, _, _ = stats.chi2_contingency(\n",
" np.reshape(np.matrix(row), (2,2)), correction=True\n",
" )\n",
" chi2_stat.append(stat)\n",
" chi2_pvalue.append(pvalue)\n",
"\n",
"# add columns to the original data frame\n",
"chi_squared_pathways['scipy_stat'] = chi2_stat\n",
"chi_squared_pathways['scipy_p_value'] = chi2_pvalue"
],
"execution_count": 44,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WWgNFm6w36s9",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "7d2e6fb9-35e4-425b-cc81-7a3c5ee668c7"
},
"source": [
"# display the updated data frame\n",
"chi_squared_pathways"
],
"execution_count": 45,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
in_target_in_pathway
\n",
"
in_target_not_pathway
\n",
"
not_target_in_pathway
\n",
"
not_target_not_pathway
\n",
"
chi_squared_stat
\n",
"
stable_id
\n",
"
name
\n",
"
p_value
\n",
"
scipy_stat
\n",
"
scipy_p_value
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
3
\n",
"
18
\n",
"
3
\n",
"
678
\n",
"
31.192658
\n",
"
R-HSA-195399
\n",
"
VEGF binds to VEGFR leading to receptor dimeri...
\n",
"
2.336481e-08
\n",
"
31.192658
\n",
"
2.336481e-08
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
19
\n",
"
1
\n",
"
680
\n",
"
22.942602
\n",
"
R-HSA-194306
\n",
"
Neurophilin interactions with VEGF and VEGFR
\n",
"
1.669114e-06
\n",
"
22.942602
\n",
"
1.669114e-06
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
17
\n",
"
14
\n",
"
667
\n",
"
17.232611
\n",
"
R-HSA-2219530
\n",
"
Constitutive Signaling by Aberrant PI3K in Cancer
\n",
"
3.307106e-05
\n",
"
17.232611
\n",
"
3.307106e-05
\n",
"
\n",
"
\n",
"
3
\n",
"
2
\n",
"
19
\n",
"
2
\n",
"
679
\n",
"
16.508319
\n",
"
R-HSA-525793
\n",
"
Myogenesis
\n",
"
4.843709e-05
\n",
"
16.508319
\n",
"
4.843709e-05
\n",
"
\n",
"
\n",
"
4
\n",
"
2
\n",
"
19
\n",
"
4
\n",
"
677
\n",
"
10.101124
\n",
"
R-HSA-5674499
\n",
"
Negative feedback regulation of MAPK pathway
\n",
"
1.481790e-03
\n",
"
10.101124
\n",
"
1.481790e-03
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
62
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-1989781
\n",
"
PPARA activates gene expression
\n",
"
8.551126e-01
\n",
"
0.033342
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
63
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2644606
\n",
"
Constitutive Signaling by NOTCH1 PEST Domain M...
\n",
"
8.551126e-01
\n",
"
0.033342
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
64
\n",
"
1
\n",
"
20
\n",
"
12
\n",
"
669
\n",
"
0.033342
\n",
"
R-HSA-2894862
\n",
"
Constitutive Signaling by NOTCH1 HD+PEST Domai...
\n",
"
8.551126e-01
\n",
"
0.033342
\n",
"
8.551126e-01
\n",
"
\n",
"
\n",
"
65
\n",
"
1
\n",
"
20
\n",
"
20
\n",
"
661
\n",
"
0.027803
\n",
"
R-HSA-8939236
\n",
"
RUNX1 regulates transcription of genes involve...
\n",
"
8.675732e-01
\n",
"
0.027803
\n",
"
8.675732e-01
\n",
"
\n",
"
\n",
"
66
\n",
"
1
\n",
"
20
\n",
"
19
\n",
"
662
\n",
"
0.017134
\n",
"
R-HSA-2559580
\n",
"
Oxidative Stress Induced Senescence
\n",
"
8.958574e-01
\n",
"
0.017134
\n",
"
8.958574e-01
\n",
"
\n",
" \n",
"
\n",
"
67 rows × 10 columns
\n",
"
"
],
"text/plain": [
" in_target_in_pathway in_target_not_pathway ... scipy_stat scipy_p_value\n",
"0 3 18 ... 31.192658 2.336481e-08\n",
"1 2 19 ... 22.942602 1.669114e-06\n",
"2 4 17 ... 17.232611 3.307106e-05\n",
"3 2 19 ... 16.508319 4.843709e-05\n",
"4 2 19 ... 10.101124 1.481790e-03\n",
".. ... ... ... ... ...\n",
"62 1 20 ... 0.033342 8.551126e-01\n",
"63 1 20 ... 0.033342 8.551126e-01\n",
"64 1 20 ... 0.033342 8.551126e-01\n",
"65 1 20 ... 0.027803 8.675732e-01\n",
"66 1 20 ... 0.017134 8.958574e-01\n",
"\n",
"[67 rows x 10 columns]"
]
},
"metadata": {},
"execution_count": 45
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6dXWAZhLPEcH"
},
"source": [
"# Conclusion\n",
"\n",
"This notebook demonstrated an integrated analysis of the Targetome and Reactome BigQuery datasets for identification of drug targets, identification of cancer pathways targeted by a drug, and pathway enrichment analysis using a chi-squared statistic."
]
}
]
}