{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ck3BpShLQ-zl"
},
"source": [
"# ISB-CGC Community Notebooks\n",
"\n",
"Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!\n",
"\n",
"```\n",
"Title: Gene Set Scoring using hg38 mutation tables\n",
"Author: Lauren Hagen\n",
"Created: 2019-07-22\n",
"Purpose: To demonstrate how to join two tables in BigQuery to compare the expression of a set of genes between two groups.\n",
"URL: https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_score_gene_sets_with_BigQuery.ipynb\n",
"Notes: This notebook was adapted from work by David l Gibbs, January 2018 Query of the Month.\n",
"```\n",
"***\n",
"\n",
"In this notebook, we’re going to implement a common bioinformatics task: gene set scoring. In this procedure, we will compare the *joint* expression of a set of genes between two groups.\n",
"\n",
"Gene sets frequently result from experiments in which, for example, an expression is compared between two groups (e.g. control and treatment), and the genes that are significantly differentially expressed between the two groups form the “gene set”. Given a gene set, numerous approaches have been proposed to assign functional interpretations to a particular list of genes.\n",
"\n",
"A related approach is to consider pathways as gene sets: each gene set will, therefore, be a canonical representation of a biological process compiled by domain experts. We will use the [WikiPathways](https://www.wikipathways.org/index.php/Download_Pathways) gene sets that have been previously assembled and stored in BigQuery. In total there were 381 pathways downloaded from WikiPathways. In the BQ table, each row contains a pathway and a gene associated with that pathway.\n",
"\n",
"Our task will be to determine which genesets are differentially expressed when comparing two groups of samples. We will define groups of samples using the new hg38 Somatic Mutations table from the NCI Genomic Data Commons.\n",
"\n",
"The BigQuery table at ISB-CGC is found at: isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation_DR10\n",
"\n",
"The BigQuery pathway table is found at: isb-cgc.QotM.WikiPathways_20170425_Annotated\n",
"\n",
"We will implement a method found in a paper titled “Gene set enrichment analysis made simple” (Rafael A Irrazary et al, PMID 20048385). They propose a simple solution that outperforms a popular and more complex method known as GSEA.\n",
"\n",
"In short, (I’ll explain later), the gene set score comes from an average of T-tests, where the T-statistics come from testing each gene for differential expression between the two groups. The statistic is then weighted by the square root of the sample size (number of genes in the set), so that with larger gene sets, the ‘significant’ effect size can get pretty small.\n",
"\n",
"At the end of it, the full query processes 29.8GB in 10.1s and costs ~$0.15.\n",
"\n",
"First, we will need to import bigquery module, autheticate ourselves, and create a client variable. For more information see ['Quick Start Guide to ISB-CGC'](INSERT LINK) and alternative authentication methods can be found [here](https://googleapis.github.io/google-cloud-python/latest/core/auth.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "nEHzCZ1MQzEi"
},
"outputs": [],
"source": [
"# Import BigQuery Module\n",
"from google.cloud import bigquery"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 360
},
"colab_type": "code",
"id": "6JM4sBB6Tfsk",
"outputId": "ad3a5101-9c83-4004-8d73-31634a31da1d"
},
"outputs": [],
"source": [
"# Autheticate ourselves\n",
"!gcloud auth application-default login"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "H_ZoX1U8TqF4"
},
"outputs": [],
"source": [
"# Create a variable for which client to use with BigQuery\n",
"project_num = 'your_project_number'# Replace with your project ID\n",
"if project_num == 'your_project_number':\n",
" print('Please update the project number with your Google Cloud Project')\n",
"else:\n",
" client = bigquery.Client(project) "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "wi3LaliGSuhf"
},
"source": [
"Let’s get started on creating the full query. First, which tissue type should we focus on? Let’s choose PARP1 as an interesting gene – it encodes an enzyme involved in DNA damage repair and is also the target of some therapeutic drugs – and use the Somatic Mutation table to choose a cancer type:\n",
"\n",
"** Note: you will need to update 'your_project_number' with your project number before continuing with the notebook **"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 731
},
"colab_type": "code",
"id": "AQxiYAEpUFua",
"outputId": "ac5e1d42-cd76-43d8-c619-d60f6d275aac"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
project_short_name
\n",
"
n
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
TCGA-UCEC
\n",
"
46
\n",
"
\n",
"
\n",
"
1
\n",
"
TCGA-COAD
\n",
"
22
\n",
"
\n",
"
\n",
"
2
\n",
"
TCGA-STAD
\n",
"
22
\n",
"
\n",
"
\n",
"
3
\n",
"
TCGA-BRCA
\n",
"
17
\n",
"
\n",
"
\n",
"
4
\n",
"
TCGA-SKCM
\n",
"
16
\n",
"
\n",
"
\n",
"
5
\n",
"
TCGA-BLCA
\n",
"
15
\n",
"
\n",
"
\n",
"
6
\n",
"
TCGA-LUSC
\n",
"
12
\n",
"
\n",
"
\n",
"
7
\n",
"
TCGA-CESC
\n",
"
9
\n",
"
\n",
"
\n",
"
8
\n",
"
TCGA-GBM
\n",
"
8
\n",
"
\n",
"
\n",
"
9
\n",
"
TCGA-LUAD
\n",
"
6
\n",
"
\n",
"
\n",
"
10
\n",
"
TCGA-HNSC
\n",
"
5
\n",
"
\n",
"
\n",
"
11
\n",
"
TCGA-OV
\n",
"
5
\n",
"
\n",
"
\n",
"
12
\n",
"
TCGA-KIRC
\n",
"
4
\n",
"
\n",
"
\n",
"
13
\n",
"
TCGA-LIHC
\n",
"
4
\n",
"
\n",
"
\n",
"
14
\n",
"
TCGA-PRAD
\n",
"
3
\n",
"
\n",
"
\n",
"
15
\n",
"
TCGA-KIRP
\n",
"
3
\n",
"
\n",
"
\n",
"
16
\n",
"
TCGA-ACC
\n",
"
3
\n",
"
\n",
"
\n",
"
17
\n",
"
TCGA-LGG
\n",
"
2
\n",
"
\n",
"
\n",
"
18
\n",
"
TCGA-ESCA
\n",
"
2
\n",
"
\n",
"
\n",
"
19
\n",
"
TCGA-SARC
\n",
"
2
\n",
"
\n",
"
\n",
"
20
\n",
"
TCGA-READ
\n",
"
2
\n",
"
\n",
"
\n",
"
21
\n",
"
TCGA-PAAD
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" project_short_name n\n",
"0 TCGA-UCEC 46\n",
"1 TCGA-COAD 22\n",
"2 TCGA-STAD 22\n",
"3 TCGA-BRCA 17\n",
"4 TCGA-SKCM 16\n",
"5 TCGA-BLCA 15\n",
"6 TCGA-LUSC 12\n",
"7 TCGA-CESC 9\n",
"8 TCGA-GBM 8\n",
"9 TCGA-LUAD 6\n",
"10 TCGA-HNSC 5\n",
"11 TCGA-OV 5\n",
"12 TCGA-KIRC 4\n",
"13 TCGA-LIHC 4\n",
"14 TCGA-PRAD 3\n",
"15 TCGA-KIRP 3\n",
"16 TCGA-ACC 3\n",
"17 TCGA-LGG 2\n",
"18 TCGA-ESCA 2\n",
"19 TCGA-SARC 2\n",
"20 TCGA-READ 2\n",
"21 TCGA-PAAD 1"
]
},
"execution_count": 226,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"# Magic command of bigquery with the project id as your project number and create a Pandas Dataframe\n",
"# Change your_project_number to your project ID\n",
"%%bigquery --project your_project_number\n",
"SELECT\n",
" project_short_name,\n",
" COUNT(DISTINCT(sample_barcode_tumor)) AS n\n",
"FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation_DR10`\n",
"WHERE\n",
" Hugo_Symbol = 'PARP1'\n",
"GROUP BY\n",
" project_short_name\n",
"ORDER BY\n",
" n DESC"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "apFRdmeMUsAW"
},
"source": [
"The result of that query shows that 46 tumor samples have PARP1 mutations in UCEC, followed by COAD and STAD with 22 each. That’s a big lead by UCEC, so let’s focus our work there.\n",
"\n",
"Here’s where our main query will begin. Since this is standard SQL, we’ll be naming each subtable, and the full query can be constructed by concatenating each of the following sub-queries. Many of the code lines that run the query at each step will be marked with '%%script false' so that extra queries are not charged."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "LNFIqrBoJx7j"
},
"outputs": [],
"source": [
"# Create a variable for the first subquery\n",
"# This subquery will return sample barcodes with at least one known somatic mutation\n",
"s1 = \"\"\"\n",
"WITH\n",
"s1 AS (\n",
" SELECT\n",
" sample_barcode_tumor AS sample_barcode\n",
" FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation_DR10`\n",
" WHERE\n",
" project_short_name = 'TCGA-UCEC'\n",
" GROUP BY\n",
" 1\n",
")\n",
"\"\"\"\n",
"# Create a variable to return the values from the first subquery\n",
"query1 = s1 + \"SELECT * FROM s1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "3XWOXPMRKpeK"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the subquery \n",
"s1_df = client.query(query1).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(s1_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8wvwqWKKPYH5"
},
"source": [
"This query returns 530 tumor sample barcodes with at least one known somatic mutation. (The TCGA Biospecimen table includes information for a total of 553 UCEC tumor samples, but some may have not been sequenced or may have no somatic mutations – the former being more likely than the latter.) Recall that somatic mutations are variants in the DNA that are found when comparing the tumor sequence to the ‘matched normal’ sequence (typically from a blood sample, but sometimes from adjacent tissue). Next, for all these samples, we’ll want to restrict our analysis to samples for which we also have mRNA expression data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "tzDkfWj5Qli_"
},
"outputs": [],
"source": [
"sampleGroup = \"\"\"\n",
"sampleGroup AS (\n",
" SELECT\n",
" sample_barcode\n",
" FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.RNAseq_Gene_Expression`\n",
" WHERE\n",
" project_short_name = 'TCGA-UCEC'\n",
" AND sample_barcode IN\n",
" (select sample_barcode from s1)\n",
" GROUP BY\n",
" 1\n",
")\"\"\"\n",
"# Combine previous subquery with sampleGroup\n",
"query2 = s1 + \",\" + sampleGroup + \"SELECT * FROM sampleGroup\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "TSIrnwbRQrxq"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the query2 subquery \n",
"sampleGroup_df = client.query(query2).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(sampleGroup_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dIKwJ1qISQ_1"
},
"source": [
"Now we have 526 samples for which we have gene expression and somatic mutation calls. We are interested in partitioning this group into two parts: one with a mutation of interest, and one without. So let’s gather barcodes for tumors with non-silent mutations in PARP1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "wpXfg27pTGYW"
},
"outputs": [],
"source": [
"groups = \"\"\"\n",
"--\n",
"-- The first group has non-synonymous mutations in PARP1\n",
"--\n",
"grp1 AS (\n",
" SELECT\n",
" sample_barcode_tumor AS sample_barcode\n",
" FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.Somatic_Mutation_DR10`\n",
" WHERE\n",
" Hugo_Symbol = 'PARP1'\n",
" AND One_Consequence <> 'synonymous_variant'\n",
" AND sample_barcode_tumor IN (\n",
" SELECT\n",
" sample_barcode\n",
" FROM\n",
" sampleGroup )\n",
" GROUP BY sample_barcode\n",
" ),\n",
"--\n",
"-- group 2 is the rest of the samples\n",
"--\n",
"grp2 AS (\n",
" SELECT\n",
" sample_barcode\n",
" FROM\n",
" sampleGroup\n",
" WHERE\n",
" sample_barcode NOT IN (\n",
" SELECT\n",
" sample_barcode\n",
" FROM\n",
" grp1)\n",
")\n",
"\"\"\"\n",
"query3_grp1 = s1 + \",\" + sampleGroup + \",\" + groups + \"SELECT * FROM grp1\"\n",
"query3_grp2 = s1 + \",\" + sampleGroup + \",\" + groups + \"SELECT * FROM grp2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "TYVp--Q2T7ud"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the query3 subquery \n",
"grp1_df = client.query(query3_grp1).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(grp1_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ZsCFeBRCUAmD"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the query4 subquery \n",
"grp2_df = client.query(query3_grp2).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(grp2_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "7CJHzvzWUPew"
},
"source": [
"This results in 41 tumor samples with non-synonymous PARP1 variants and 485 samples without.\n",
"\n",
"Next we’re going to summarize the gene expression within each of these groups. This will be used for calulating T-statistics in the following portion of the query we are constructing. For each gene, we’ll take the mean, variance, and count of samples."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "M57Qsd2AUVEp"
},
"outputs": [],
"source": [
"summaries = \"\"\"\n",
"-- Summaries for Group 1 (with mutation)\n",
"--\n",
"summaryGrp1 AS (\n",
" select\n",
" gene_name as symbol,\n",
" AVG(LOG10( HTSeq__FPKM_UQ +1)) as genemean,\n",
" VAR_SAMP(LOG10( HTSeq__FPKM_UQ +1)) as genevar,\n",
" count(sample_barcode) as genen\n",
" FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.RNAseq_Gene_Expression`\n",
" WHERE\n",
" sample_barcode IN (select sample_barcode FROM grp1)\n",
" AND gene_name IN (\n",
" SELECT\n",
" Symbol as gene_name\n",
" FROM\n",
" `isb-cgc.QotM.WikiPathways_20170425_Annotated`\n",
" )\n",
" GROUP BY\n",
" gene_name\n",
"),\n",
"--\n",
"-- Summaries for Group 2 (without mutation)\n",
"--\n",
"summaryGrp2 AS (\n",
" select\n",
" gene_name as symbol,\n",
" AVG(LOG10( HTSeq__FPKM_UQ +1)) as genemean,\n",
" VAR_SAMP(LOG10( HTSeq__FPKM_UQ +1)) as genevar,\n",
" count(sample_barcode) as genen\n",
" FROM\n",
" `isb-cgc.TCGA_hg38_data_v0.RNAseq_Gene_Expression`\n",
" WHERE\n",
" sample_barcode IN (select sample_barcode FROM grp2)\n",
" AND gene_name IN (\n",
" SELECT\n",
" Symbol as gene_name\n",
" FROM\n",
" `isb-cgc.QotM.WikiPathways_20170425_Annotated`\n",
" )\n",
" GROUP BY\n",
" gene_name\n",
")\n",
"--\n",
"\"\"\"\n",
"query4_grp1 = s1 + \",\" + sampleGroup + \",\" + groups + \",\" + summaries + \"SELECT * FROM summaryGrp1\"\n",
"query4_grp2 = s1 + \",\" + sampleGroup + \",\" + groups + \",\" + summaries + \"SELECT * FROM summaryGrp2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "cuOjZFc0UoX9",
"outputId": "3061641d-856a-48e7-e2fe-745b63c78965"
},
"outputs": [
{
"data": {
"text/plain": [
"6505"
]
},
"execution_count": 11,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"%%script false\n",
"# Run the query5 subquery \n",
"summaryGrp1_df = client.query(query4_grp1).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(summaryGrp1_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "2IwH35lxUzKG"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the query6 subquery \n",
"summaryGrp2_df = client.query(query4_grp2).to_dataframe()\n",
"# Print the number of rows in the dataframe\n",
"len(summaryGrp2_df)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "NJUjQlhPVS_I"
},
"source": [
"This results in two sets of summaries for 4,822 genes. (There are 4,962 unique gene symbols in the WikiPathways table, but 140 of them do not match any of the symbols in the TCGA hg38 expression table.) With this, we are ready to calculate T-statistics. Here we’re going to use a two sample T-test assuming independent variance (and that we have enough samples to assume that). The T-statistic is found by taking the difference in means (of gene expression between our two groups), and normalizing it by measures of variance and sample size. Here, we want to keep T-statistics that are zero, which might come from having zero variance, because having a T-stat for each gene is important in the gene set score, even if it’s a zero. To do that, you’ll see the use of an IF statement below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "2wPLVTNBVZtq"
},
"outputs": [],
"source": [
"tStatsPerGene = \"\"\"\n",
"tStatsPerGene AS (\n",
"SELECT\n",
" grp1.symbol as symbol,\n",
" grp1.genen as grp1_n,\n",
" grp2.genen AS grp2_n,\n",
" grp1.genemean AS grp1_mean,\n",
" grp2.genemean AS grp2_mean,\n",
" grp1.genemean - grp2.genemean as meandiff,\n",
" IF ((grp1.genevar > 0\n",
" AND grp2.genevar > 0\n",
" AND grp1.genen > 0\n",
" AND grp2.genen > 0),\n",
" (grp1.genemean - grp2.genemean) / SQRT( (POW(grp1.genevar,2)/grp1.genen)+ (POW(grp2.genevar,2)/grp2.genen) ),\n",
" 0.0) AS tstat\n",
"FROM\n",
" summaryGrp1 as grp1\n",
" JOIN\n",
" summaryGrp2 AS grp2\n",
" ON\n",
" grp1.symbol = grp2.symbol\n",
"GROUP BY\n",
" grp1.symbol,\n",
" grp1.genemean,\n",
" grp2.genemean,\n",
" grp1.genevar,\n",
" grp2.genevar,\n",
" grp1.genen,\n",
" grp2.genen\n",
")\n",
"\"\"\"\n",
"query5 = s1 + \",\" + sampleGroup + \",\" + groups + \",\" + summaries + \",\" + tStatsPerGene + \"SELECT * FROM tStatsPerGene\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"id": "PzVFFZgFVpwL",
"outputId": "c8e50299-37c8-4a55-c278-6187b217fd10"
},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {
"tags": []
},
"output_type": "display_data"
}
],
"source": [
"# Set x as the t statistic for each gene\n",
"x1 = tStatsPerGene_df.tstat\n",
"# Set y as the absolute value of the mean difference\n",
"y1 = abs(tStatsPerGene_df.meandiff)\n",
"# Set the color of the dots of the absolute value of the t statistic \n",
"color = abs(tStatsPerGene_df.tstat)\n",
"# Create the scatter plot\n",
"plt.scatter(x1,y1,c=color)\n",
"# Add a legend for the colors\n",
"plt.colorbar()\n",
"# Add labels\n",
"plt.xlabel('T Stat')\n",
"plt.ylabel('Mean Absolute Difference')\n",
"# Display the scatter plot\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"colab_type": "code",
"id": "KN4HYAkxKMqS",
"outputId": "0f81ece1-5f49-40e0-8db7-13cba6464445"
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 243,
"metadata": {
"tags": []
},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEKCAYAAAD+XoUoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xt0XOV57/HvMzO6S5Z8ka8y2FyD\nAJuLMaQ0txIIkBSSFWggaUMuDUla2p6VdKUk7aI5abJOaJPQnIbTlgIpSUiB0jZxg1PCJW0IAWIb\njMEYg3GwLeO7JVnSSJrbc/7Ye4yY6DKSJc1sze+zlhcze96ZeTRIP7169t7vNndHREQqQ6zUBYiI\nyPRR6IuIVBCFvohIBVHoi4hUEIW+iEgFUeiLiFQQhb6ISAVR6IuIVBCFvohIBUmUuoBC8+bN82XL\nlpW6DBGRSNmwYcNBd28da1zZhf6yZctYv359qcsQEYkUM9tRzDi1d0REKohCX0Skgij0RUQqiEJf\nRKSCKPRFRCqIQl9EpIIo9EVEKohCX0Skgij0Rcbh09/bwFceeKHUZYhMmEJfZBw2dXTz4+f3lroM\nkQlT6IuMQ89Amo7OfvYdGSh1KSITotAXKZK70zuYAeDpHZ0lrkZkYhT6IkVKprLkPLi9QaEvEaXQ\nFylSfpYPsGGnQl+iSaEvUqSegTQAS1rqeH53NwPpbIkrEhk/hb5IkY4MBDP9t53aSjrrPL+7u8QV\niYyfQl+kSL350D8luDiR+voSRQp9kSL1hKG/bG4Dy+bWK/QlkhT6IkXqHQx6+k21Cc45fjZP7+zE\n3Utclcj4FBX6ZnapmW01s21mduMwj7/VzJ42s4yZXVXw2HVm9nL477rJKlxkuuVn+o21CU5f3MzB\n3hSdyXSJqxIZnzFD38ziwK3AZUA7cK2ZtRcM2wl8BPh+wXPnAH8JnA+sBv7SzGYfe9ki0+/IQAYz\naKxO0FSbAKBvyGGcIlFQzEx/NbDN3be7ewq4B7hy6AB3f9XdNwG5gue+C3jI3Q+7eyfwEHDpJNQt\nMu16BzI0VieIxYz66jgA/TpsUyKmmNBfAuwacr8j3FaMop5rZteb2XozW3/gwIEiX1pkevUMpGkM\nZ/gN1ZrpSzSVxY5cd7/N3Ve5+6rW1tZSlyMyrJ6BzNG2Tl1+pp/STF+ipZjQ3w0sHXK/LdxWjGN5\nrkhZ6R3M0FgThH6+vZNU6EvEFBP664CTzWy5mVUD1wBrinz9B4FLzGx2uAP3knCbSOT0DKRpqq0C\nXg/9vpTaOxItY4a+u2eAGwjCegtwn7tvNrMvmdkVAGZ2npl1AFcD/2hmm8PnHgb+iuAXxzrgS+E2\nkcjpGcwc7enXhz19tXckahLFDHL3tcDagm03Dbm9jqB1M9xz7wTuPIYaRcpCz0CGWbVq70i0lcWO\nXJEoGNreqTsa+mrvSLQo9EWKkM7mGEjnju7IrY7HSMRMM32JHIW+SBHyK2zmD9k0M+qq4wp9iRyF\nvkgR8lfNys/0Iejra0euRI1CX6QIRwbyK2xWHd3WUJ3QIZsSOQp9kSLkV9jMH70Dwc5czfQlahT6\nIkXoHbKscl69evoSQQp9kSL0DP56e6e+OqFDNiVyFPoiRTh6AZUazfQl2hT6IkXoKThkE9AhmxJJ\nCn2RIvQMZKiKGzWJ139kGtTekQhS6IsUoXcwWILBzI5uU3tHokihL1KEoRdQyaurjjOYyZHNeYmq\nEhk/hb5IEXoGMm/YiQuvXzJRLR6JEoW+SBF6R5jpg9bUl2hR6IsU4chAmsaaqjds05r6EkUKfZEi\n9A5m3rAEA7x+9SytvyNRotAXKcJwO3Lr1d6RCFLoi4zB3ekdcn3cPLV3JIoU+iJj6E9nyeb8Devu\nwOvtHR29I1Gi0BcZw3Dr7oBm+hJNCn2RMfQcvYCKQl+iT6EvMobhFlsDqA9n/tqRK1Gi0BcZQ34m\nnz8DN6+uKpjp65BNiRKFvsgY+sKLojcU9PTjsWDVTc30JUoU+iJjyM/08z38oRpqEurpS6Qo9EXG\nkG/fFM70IWjxqL0jUaLQFxlDvn1TN8xMv746rvaORIpCX2QMfYNhe6dqmNBXe0cipqjQN7NLzWyr\nmW0zsxuHebzGzO4NH3/KzJaF26vM7C4ze87MtpjZ5ye3fJGpl0xlqEnESMR//celviquM3IlUsYM\nfTOLA7cClwHtwLVm1l4w7ONAp7ufBNwC3BxuvxqocfczgXOBT+Z/IYhERV8qM2w/H3TJRImeYmb6\nq4Ft7r7d3VPAPcCVBWOuBO4Kb98PXGTBxUQdaDCzBFAHpIAjk1K5yDRJDmaHPXIHgvaOevoSJcWE\n/hJg15D7HeG2Yce4ewboBuYS/ALoA/YAO4GvufvhY6xZZFolU6OEvo7ekYiZ6h25q4EssBhYDnzW\nzE4oHGRm15vZejNbf+DAgSkuSWR8+lKZoytqFqpTe0cippjQ3w0sHXK/Ldw27JiwldMMHAI+CPyX\nu6fdfT/wOLCq8A3c/TZ3X+Xuq1pbW8f/VYhMoWQqS0PN8DP9hprgkE13n+aqRCammNBfB5xsZsvN\nrBq4BlhTMGYNcF14+yrgUQ9+CnYCvwVgZg3ABcCLk1G4yHTpGxx5pl9fnSCTc1LZ3DRXJTIxY4Z+\n2KO/AXgQ2ALc5+6bzexLZnZFOOwOYK6ZbQM+A+QP67wVaDSzzQS/PL7t7psm+4sQmUrJVJaGEXr6\n+UXXtDNXomL46UsBd18LrC3YdtOQ2wMEh2cWPq93uO0iUZJMZakbYaafb/skU1la6qezKpGJ0Rm5\nImNIpjIjz/SPXjJRM32JBoW+yChyOQ8O2Rzp5Kyq/Exfh21KNCj0RUbRn85fQGWkk7N0yUSJFoW+\nyCiOrqU/4jIMumSiRItCX2QU+bbNcCtswusXVtFZuRIVCn2RUeSXVR7p5Kx86Ku9I1Gh0BcZxdGZ\n/ignZ4HaOxIdCn2RUfSlipvpq70jUaHQFxlF/xgz/ZpEjHjM6BtU6Es0KPRFRnH0UokjHLJpZjTW\nJI6OEyl3Cn2RUYzV0wdorEnQM6CZvkSDQl9kFGP19POPqb0jUaHQFxlFcjCDGdQmRg79xpqEduRK\nZCj0RUaRTGWpr4oTi9mIYxrU3pEIUeiLjKJvlGWV84IduQp9iQaFvsgokqnMqP18CGb6Cn2JCoW+\nyCj6BrOjHrkD4dE7Cn2JCIW+yCj60yNfQCUv397RxdElChT6IqPoGxz5Aip5DTUJcg4DaV0cXcqf\nQl9kFKNdKjGvsTb4pdCrFo9EgEJfZBR9g1nqxmzvBI8r9CUKFPoiowhm+mO0d8LHdQSPRIFCX2QU\nwUXRx96RC5rpSzQo9EVGkMnmGMzkxpzpH+3p66xciQCFvsgIkunRl1XOawhn+lp/R6JAoS8yguTR\ntfTHPjkL1N6RaFDoi4wgP3MfaxmGo6Gv9o5EgEJfZAT5i52PNdOvr45jpqN3JBoU+iIjyIf4WCdn\nmRkN1Ql6dclEiQCFvsgIkvmZ/hjLMICuniXRUVTom9mlZrbVzLaZ2Y3DPF5jZveGjz9lZsuGPLbC\nzJ4ws81m9pyZ1U5e+SJTp+/o9XFHn+lD0NfXjlyJgjFD38ziwK3AZUA7cK2ZtRcM+zjQ6e4nAbcA\nN4fPTQDfAz7l7qcDbwfSk1a9yBQ6OtNX6MsMUsxMfzWwzd23u3sKuAe4smDMlcBd4e37gYvMzIBL\ngE3u/iyAux9ydzU+JRKSR3v6xbR3dCEViYZiQn8JsGvI/Y5w27Bj3D0DdANzgVMAN7MHzexpM/vc\nsZcsMj36jvb0NdOXmWPsKcyxv/5vAucBSeARM9vg7o8MHWRm1wPXAxx33HFTXJJIcZKpDImYUR0f\ne26k0JeoKGamvxtYOuR+W7ht2DFhH78ZOETwV8HP3P2guyeBtcA5hW/g7re5+yp3X9Xa2jr+r0Jk\nCuSXVQ46laNTe0eiopjQXwecbGbLzawauAZYUzBmDXBdePsq4FEPrh33IHCmmdWHvwzeBrwwOaWL\nTK1illXOa9BMXyJizO9od8+Y2Q0EAR4H7nT3zWb2JWC9u68B7gC+a2bbgMMEvxhw904z+wbBLw4H\n1rr7A1P0tYhMqu7+NM11VUWNbapNkM46g5ksNYmx9wGIlEpR0xh3X0vQmhm67aYhtweAq0d47vcI\nDtsUiZSuZJrm+uJCP3/Wbt+gQl/Km87IFRlBd3+aliJn+keXV1aLR8qcQl9kBJ3JFLPrq4sa2xRe\nSKVHK21KmVPoi4ygK5mmpdj2ji6kIhGh0BcZRn8qy2AmR0uRM/0GXUhFIkKhLzKMrv4UQNEzfV1I\nRaJCoS8yjK5ksC5gsTtyG7UjVyJCoS8yjM5kfqav9o7MLAp9kWF052f64zxOX6Ev5U6hLzKMzjD0\niz1kMxGPUVelq2dJ+VPoiwxjvDtyIb/+ji4XIeVNoS8yjO5kmppEjNqq4pdUaNR1ciUCFPoiwxjP\n2bh5WmlTokChLzKM8ZyNm6cLqUgUKPRFhtGVLH5Z5bxGXUhFIkChLzKMrv7xt3caaxMcGUhPUUUi\nk0OhLzKMibR35jfVsP/IIMFF40TKk0JfpIC7h6E/vpn+glm1DGZydPdrti/lS6EvUqA/nSWVzY17\npr+ouQ6APd0DU1GWyKRQ6IsU6BznYmt5C5trANh7RKEv5UuhL1Kga5yLreUtDGf6+zTTlzKm0Bcp\nMN7F1vLmN9VgpvaOlDeFvkiB8S62llcVjzGvsYZ9au9IGVPoixSYyGJreQtn1WqmL2VNoS9SIH/V\nrPGekQuwsLlWM30pawp9kQJdyRR1VfFxrbCZp5m+lDuFvkiBiZyNm7ewuZbu/jQDaa2rL+VJoS9S\noHMCi63lLZxVC8BezfalTCn0RQp0T2CxtbyFzUHoq8Uj5UqhL1Kg8xjbO4B25krZUuiLFJjIYmt5\n+faOZvpSrooKfTO71My2mtk2M7txmMdrzOze8PGnzGxZwePHmVmvmf3p5JQtMjXcne7+1IRn+g01\nCZpqE5rpS9kaM/TNLA7cClwGtAPXmll7wbCPA53ufhJwC3BzwePfAH587OWKTK3DfSnSWae1sWbC\nr7FwVq125ErZKmamvxrY5u7b3T0F3ANcWTDmSuCu8Pb9wEVmZgBm9l7gV8DmySlZZOq8eigJwLJ5\n9RN+jYXNtezRTF/KVDGhvwTYNeR+R7ht2DHungG6gblm1gj8GfC/R3sDM7vezNab2foDBw4UW7vI\npNt5uA+A4+Y0TPg1Fs6q1UqbUramekfuF4Fb3L13tEHufpu7r3L3Va2trVNcksjIXj2YxAyWzqmb\n8Gssaq5lf88AmWxuEisTmRyJIsbsBpYOud8WbhtuTIeZJYBm4BBwPnCVmf010ALkzGzA3b91zJWL\nTIEdh/pY3FxHTWL8SzDkLWiuJedwsDd19BBOkXJRTOivA042s+UE4X4N8MGCMWuA64AngKuARz24\nOvRb8gPM7ItArwJfytmOw0mOnzvxfj4EM32A17r7FfpSdsZs74Q9+huAB4EtwH3uvtnMvmRmV4TD\n7iDo4W8DPgP82mGdIlGw41CS4+dOvJ8PsCx8/sv7eiajJJFJVcxMH3dfC6wt2HbTkNsDwNVjvMYX\nJ1CfyLQ5MpDmcF+KZcc40182t4Gm2gTPdnTzgfMmqTiRSaIzckVCO8PDNY+1vROLGSvbWnh2V9dk\nlCUyqRT6IqFXDwWHax5rewdgRVszW/f2aIllKTsKfZHQjkma6QOsaGshk3Ne2HPkmF9LZDIp9EVC\nOw710dpUQ311Ubu6RrVyaTMAm9TikTKj0BcJvXooecw7cfMWzqplflMNz3Z0T8rriUwWhb5IaOck\nHK6ZZ2asaGvh2Q7N9KW8KPRFgP5Ulr1HBjh+zuTM9AFWtjWz/UAfRwbSk/aaIsdKoS8C7Dwc7sSd\nNzkzfYAVS1sAeF4tHikjCn0Rgp24wKT19CGY6QPq60tZUeiLAC/vDxaCnayePkBLfTXL5taz/tXD\nk/aaIsdKoS8CbNzVxQnzGmium9hlEkfy9lPn8/NtB0mmMpP6uiITpdCXiufuPLOzi7PCHvxkurh9\nAYOZHI+9fHDSX1tkIhT6UvF2d/VzsHeQs46b/NBfvXwOTbUJHnph36S/tshEKPSl4m0Mz5qdipl+\nVTzGO06dz6Mv7ieb80l/fZHxUuhLxdu4s4vqRIw3LZw1Ja9/cfsCDvel2LCjc0peX2Q8FPpS8Tbu\n6uKMxbOoTkzNj8PbT22lKm489MLeKXl9kfFQ6EtFS2dzPLe7m7OWzp6y92iqreKCE+by0Av7CK4i\nKlI6Cn2paFv39jCYyU3JTtyhLmlfwKuHkrxyoHdK30dkLAp9qWjPhDtxz56CnbhDvbN9AQA/0VE8\nUmIKfaloG3d2MbehmrbZdVP6Poua6zhzSbMO3ZSSU+hLRdu4q5OzlrZgZlP+Xhe3L2Djri729wxM\n+XuJjEShLxWruz/NKwf6OHuK+/l5F7cvwB0e2bJ/Wt5PZDgKfalYmzryJ2VN3ZE7Q71pYRNts+vU\n4pGSUuhLxdq4swszWBFez3aqmRkXty/g59sO0jeoBdikNBT6UrE27urixNZGZtVO7sqao7m4fQGp\nTI7HXj4wbe8pMpRCXyqSu7Nx19SsrDma1cvmMLu+ih8/r7NzpTQU+lKROjr7OdSXmvbQT8RjXHrG\nQh5+YR8D6ey0vrcIKPSlQj0zhStrjuXyMxfRl8ry31vV4pHpp9CXivTMzk5qq2K8aWHTtL/3m0+Y\ny+z6Kh54bs+0v7dIUaFvZpea2VYz22ZmNw7zeI2Z3Rs+/pSZLQu3X2xmG8zsufC/vzW55YtMzMZd\nXZy5pJlEfPrnPUGLZxGPbNlHf0otHpleY37Hm1kcuBW4DGgHrjWz9oJhHwc63f0k4Bbg5nD7QeC3\n3f1M4Drgu5NVuMhEpTI5Nr92pCStnbz3rFhEMpXlv7fqRC2ZXsVMc1YD29x9u7ungHuAKwvGXAnc\nFd6+H7jIzMzdn3H318Ltm4E6M6uZjMJFJur517pJZXLTdlLWcM5fPoe5DdX8SC0emWbFhP4SYNeQ\n+x3htmHHuHsG6AbmFox5P/C0uw9OrFSRyfHolv3EY8aFJxV+i06f/FE8j27ZrxaPTKtpaWia2ekE\nLZ9PjvD49Wa23szWHzigIxpkaj28ZR+rjp9NS311Set494pF9Kez/FQtHplGxYT+bmDpkPtt4bZh\nx5hZAmgGDoX324D/AD7s7q8M9wbufpu7r3L3Va2treP7CkTGoaMzyYt7e3jnaQtKXQrnL5/LvMZq\nHtikFo9Mn2JCfx1wspktN7Nq4BpgTcGYNQQ7agGuAh51dzezFuAB4EZ3f3yyihaZqPwKl/mLmpRS\nPGZcdsYiHnlxH8mU1uKR6TFm6Ic9+huAB4EtwH3uvtnMvmRmV4TD7gDmmtk24DNA/rDOG4CTgJvM\nbGP4b/6kfxUiRXp4yz5OaG1g+byGUpcCBC2egXSOR19Ui0emR6KYQe6+FlhbsO2mIbcHgKuHed6X\ngS8fY40ik6JnIM2T2w/xsQuXl7qUo85bNofWphoe2LSH96xYXOpypALojFypGD976SDprHNRGfTz\n8+Ix4/IzFvLoi/u13LJMC4W+VIy1z+9hdn0V50zTlbKK9dsrFzOYybHm2dfGHixyjBT6UhEO9g7y\nk817ee/ZS0qy9MJozj1+NmcsmcXtj20nl/NSlyMzXHl994tMkX9d30E663zo/ONKXcqvMTM+8ZYT\neOVAn47Zlymn0JcZL5dz/uWXO1m9fA4nzZ/+VTWLcfmZi1jcXMs/Pba91KXIDKfQlxnv8VcOsvNw\nsixn+XlV8RgfvXA5T24/zHMd3aUuR2Ywhb7MeHc/uZM5DdVcesbCUpcyqmtWL6WpJsFNa56nV0fy\nyBRR6MuMtrd7gIe37OOqc9uoScRLXc6ommqr+OurVrCpo5sP3/EUPQPpUpckM5BCX2a07z25g6w7\nv3fB8aUupSiXnbmIWz94Nps6uvnQ7U+x/UBvqUuSGUahLzPWQDrL93+5k3eetoClc+pLXU7RLj1j\nEX//u+fyq4N9XPrNx/i7R14mlcmVuiyZIRT6MmOtefY1Dvel+OiFy0pdyrhd3L6ARz7zNi5uX8DX\nH3qJj3z7l3T3q90jx06hLzOSu/Ptx1/l1AVNvPmE0l0s5VjMn1XLrR88h69fvZJ1rx7m6n/4BR2d\nyVKXJRGn0JcZ6cnth9my5wgfvXAZZlbqco7J+89t466PrmZP9wAXff1/+Mx9G9mwo7PUZUlEKfRl\nxnF3bv6vF5nfVMOVZxVe2TOafuOkeay54Te56tw2frJ5H+//+19w91M7Sl2WRJBCX2ac/9y0h427\nuvjTS06lrrq8D9Mcj+XzGvjK+87kqS9cxDtObeWmH27mf17S5UVlfBT6MqMMpLPc/OMXOW3RLN5/\nblupy5kSDTUJ/u6D53DKgib+8O6neXHvkVKXJBGi0JcZ5c7Hf8Xurn7+4t2nEY9Fu5c/msaaBHd+\nZBX11XH+9F+f1eqcUjSFvswYP3vpAN/4yUtc3L6AC0+aV+pyptyi5jq+cPlpPL/7CD/YuLvU5UhE\nKPRlRniuo5tPf28DJ81v5GtXryx1OdPmipWLWdHWzN88uJX+VLbU5UgEKPQl8nYdTvKRb/+Slvpq\n7vrYaprrqkpd0rSJxYw/v/w09nQPcMfPtSyzjE2hL5HWO5jh9+9aTzqb466PrWbBrNpSlzTtzj9h\nLpe0L+BbP92mo3lkTAp9iaxczvnMvRt5eX8Pt37oHE6a31jqkkrmy+87g+XzGvnYP6/jvvW7Sl2O\nlDGFvkTWLQ+/xE9e2MdfvLudt5zcWupySmp+Uy33ffICfuPEuXzu/k184jvreeKVQ7jrqB55o0Sp\nCxCZiP989jX+7tFt/M6qtkguqDYVmmqruOO68/jWT7fx3Sde5aEX9rG4uZb2xc2c0NpAfypLd3+a\ns49r4drVx1FbNXNOXJPiWbnNBFatWuXr168vdRlSxp7f3c1V//ALzljczN2fOL/sL45SCgPpLGs2\nvsZj2w7y4p4j7DicpLEmQV1VnN1d/SxqruUP3n4i7z17CU21VeRyzrpXD9M7mGHVsjkVtTN8pjCz\nDe6+asxxCn2Jkh2H+vjAPz5JzOCHN/wmrU01pS4pcn7xykG+9uBWnt7ZRW1VjLec3Mrm3d281j0A\nQMzg7ONm8/WrV7JsXkOJq5ViKfRlxtlxqI9rb3uSZDrLv3ziAk5bNKvUJUWWu7NxVxf/9nQHD7+w\nn/bFs3jv2UuY31TDE68c4jtPvEp9dYJ7P3kBbbOjcwGaSqbQlxlj1+EkT7xyiL99+CWS6Szf//0L\naF+swJ9Km1/r5trbnqS5vorv//4FkbryWKVS6EtkdSfTPLp1H49vO8QTrxxid1c/AEta6vinD69S\n4E+TZ3d18bu3P0V/OssVZy3mt1cu5kh/msN9KVrqq1jUXMdJ8xuZ16gWWzlQ6Etk9KeyPP9aN8/u\n6uKxlw/y+LaDZHJOS30VFyyfy5tPDP6dPL8x8hdEiZqOziS3P/Yr7l23i/708Ms8LJ/XwPnL5/Cu\n0xdy4UnzqE7oSPBSmNTQN7NLgW8CceB2d/9qweM1wHeAc4FDwAfc/dXwsc8DHweywB+7+4OjvZdC\nf+Zyd/rTWbYf6GNTRxDyz3Z08fL+XrLhKpHHzannsjMWcvmZizhzSTOxGbxSZpR0JVO8uLeHeY3V\nzK6vpqs/zWtd/bzw2hHWvXqYp7Yfpmcww6zaBFectZjfu2AZpy5sKnXZFWXSQt/M4sBLwMVAB7AO\nuNbdXxgy5g+AFe7+KTO7Bnifu3/AzNqBfwFWA4uBh4FT3H3ElaEU+tHXN5jh5f29bNlzhBf3HGHL\n3h5e3tdDd3+aoSsAt9RXsaKthZVtzaxsa2HF0mbmN1XeMgozwWAmy89fPsiPNu3hgef2kMrkOH3x\nLE5d2MRxc+rpSqbZeThJ70CGWAxqEnEWt9TSNruexpoE8ZhRFTfisRhVcWNxSx3Hz62ntbFGf90V\naTJD/83AF939XeH9zwO4+/8ZMubBcMwTZpYA9gKtwI1Dxw4dN9L7TVXouzs5h1z49SZiNmXfTO5O\n/mP18P7rt/PbXx/DMNvT2Rz96SzpjFOVMKrjMaoTMariMarjsVFnwLmck3MnG9aRcyebczJZpyvs\nyaazuV9733yR/eksh/tSdCZTHO5L09mXIhYzGmviVCdipDI5BtI5BjNZBtJBnf2pLEcG0uzu7OdQ\nX+poLQ3Vcd60aBanLGhibkM1jbUJFrfUsbKtmePm1OsHegbq7Evxrxt28d9bD7D9QB97jwzQWJPg\nuDn1zKpLkMsF32Ovdb3xe2UkdVVxaqtiwX+r49RVhf+q49Qkgv/WFTxeWxWnNhGjpip+9KS07v40\nR/rT9AxmqKuKM6suQVNtFbNqq2iqTTCrropZtcG25iGP1VbFMDPcnUwu+FnK5l6/HY8ZNeHPZuE1\nHFKZ3OvvPZCmZyBDdTxGc10Vs+oSNNdV0ViTmJSfg2JDv5gzcpcAQxfz6ADOH2mMu2fMrBuYG25/\nsuC5U3LR0uc6uvnAbU+QC8N9aMiP9HstZpCIBf+jzIIAzIdfcDs0wnZ3H3J7Kr6qkVXFjUQshhN+\nnWHQT/a1NKrixuz6anIezOBT2Rw1iRg1iRi1VfGj/62vjtNSX83pi5tpmx3s4Dtt4SzaZtepRVNh\nZjdUc/1bT+T6t54IBH8FVMdjwwZbfypLMpV5Q4gOZrJ0dPaz41CSQ30pBsJJRX86y0D4rz+dpW8w\nw8HeNz7en86SyuSGraup9vWQHUhnOTKQoWcgTTo7+g9NPGZH82Qs8ZiRiBlO8DOZKeJJZlAVixGL\nweVnLuIbv3PW2G90DMpiGQYzux64Przba2ZbS1DGPOBgCd53Mqj20lDtpTFja98K3PKBCb/28cUM\nKib0dwNLh9xvC7cNN6YjbO80E+zQLea5uPttwG3FFDxVzGx9MX8alSPVXhqqvTRU+7Ep5tiqdcDJ\nZrbczKqBa4A1BWPWANeFt68CHvWgkb0GuMbMasxsOXAy8MvJKV1ERMZrzJl+2KO/AXiQ4JDNO919\ns5l9CVjv7muAO4Dvmtk24DDsU4chAAAFx0lEQVTBLwbCcfcBLwAZ4A9HO3JHRESmVlE9fXdfC6wt\n2HbTkNsDwNUjPPcrwFeOocbpUtL20jFS7aWh2ktDtR+DsjsjV0REpo7OlxYRqSAKfcDM/sjMXjSz\nzWb210O2f97MtpnZVjN7VylrHI2ZfdbM3MzmhffNzP5vWPsmMzun1DUWMrO/CT/zTWb2H2bWMuSx\nsv7czezSsLZtZnZjqesZjZktNbOfmtkL4ff3n4Tb55jZQ2b2cvjf2aWudSRmFjezZ8zsR+H95Wb2\nVPj53xseYFJ2zKzFzO4Pv8+3mNmby+Fzr/jQN7N3AFcCK939dOBr4fZ2gh3SpwOXAv8vXJKirJjZ\nUuASYOeQzZcRHCl1MsH5D39fgtLG8hBwhruvIFjm4/NQ/p97WMutBJ9xO3BtWHO5ygCfdfd24ALg\nD8N6bwQecfeTgUfC++XqT4AtQ+7fDNzi7icBnQRre5WjbwL/5e5vAlYSfA0l/9wrPvSBTwNfdfdB\nAHffH26/ErjH3Qfd/VfANoI1hMrNLcDnGHICMUHt3/HAk0CLmS0qSXUjcPefuHsmvPskwTkcUP6f\n+2pgm7tvd/cUcA9BzWXJ3fe4+9Ph7R6C4FlCUPNd4bC7gPeWpsLRmVkb8G7g9vC+Ab8F3B8OKcva\nzawZeCvBkY24e8rduyiDz12hD6cAbwn/XPwfMzsv3D7c8hNTsoTERJnZlcBud3+24KGyr73Ax4Af\nh7fLvfZyr29EZrYMOBt4Cljg7nvCh/YCC0pU1lj+lmBSk19bYS7QNWTCUK6f/3LgAPDtsDV1u5k1\nUAafe1kswzDVzOxhYOEwD/05wWcwh+BP3/OA+8zshGksb1Rj1P4FgtZOWRqtdnf/YTjmzwlaEHdP\nZ22VxswagX8D/pe7Hxm6Do67u5mV3WF8ZvYeYL+7bzCzt5e6nnFKAOcAf+TuT5nZNylo5ZTqc6+I\n0Hf3d470mJl9Gvj38AziX5pZjmB9jKKWkJhqI9VuZmcSzCaeDX+A24CnzWw1ZV57npl9BHgPcJG/\nfuxwWdQ+inKv79eYWRVB4N/t7v8ebt5nZovcfU/Y+ts/8iuUzIXAFWZ2OVALzCLok7eYWSKc7Zfr\n598BdLj7U+H9+wlCv+Sfu9o78APgHQBmdgpQTbAgUlkvIeHuz7n7fHdf5u7LCL7JznH3vQS1fzg8\niucCoHvIn5RlwYIL83wOuMLdk0MeKuvPneKWJSkbYQ/8DmCLu39jyENDl065DvjhdNc2Fnf/vLu3\nhd/f1xAs7/Ih4KcEy71A+da+F9hlZqeGmy4iWJmg5J97Rcz0x3AncKeZPQ+kgOvCWWeUl5BYC1xO\nsBM0CXy0tOUM61tADfBQ+JfKk+7+qXJfumOkZUlKXNZoLgR+D3jOzDaG274AfJWglflxYAfwOyWq\nbyL+DLjHzL4MPEO4s7QM/RFwdzg52E7wcxijxJ+7zsgVEakgau+IiFQQhb6ISAVR6IuIVBCFvohI\nBVHoi4hUEIW+VLRwJcQ/GGPMF4p8raLGiZSSDtmUihauR/Mjdz9jlDG97t5YxGsVNU6klHRyllS6\nrwInhicurQNOJTjdP0GwAuu7gbrw8c3u/iEz+wHBUgy1wDfd/TYz+2rhuFJ8MSJj0UxfKtrQmb6Z\nfRaodfevhOvm17t7T+EM3szmuPthM6sj+EXxNnc/pJm+RIFm+iKvW0ewJEcV8AN33zjCuD82s/eF\nt5cSrA90aDoKFDlW2pErEnL3nxFc+GI38M9m9uHCMeESv+8E3uzuKwnWfqmdzjpFjoVCXypdD9AE\nYGbHA/vc/Z8IrtSUv7ZwOpz9AzQDne6eNLM3EVyHgWHGiZQltXekooW9+MfDVVYbgD4zSwO9QH6m\nfxuwycyeJrjK16fMbAuwleBSjxSO045cKVfakSsiUkHU3hERqSAKfRGRCqLQFxGpIAp9EZEKotAX\nEakgCn0RkQqi0BcRqSAKfRGRCvL/AciuJI5rUn2XAAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {
"tags": []
},
"output_type": "display_data"
}
],
"source": [
"# Create a Density Plot of the Data \n",
"sns.distplot(x1, hist=False, kde=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "4WWmmXYUNum6"
},
"source": [
"OK! We have a distribution of T statistics. The question is whether there’s some hidden structure to these values. Are there gene sets with unusually high T-statistics? And do these gene sets make any sort of biological sense? Let’s find out!\n",
"\n",
"Now we are going to integrate our gene set table. This is as easy as doing a table join."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "V2y-tOXyPKg1"
},
"outputs": [],
"source": [
"geneSetTable = \"\"\"geneSetTable AS (\n",
"SELECT\n",
" gs.pathway,\n",
" gs.wikiID,\n",
" gs.Symbol,\n",
" st.grp1_n,\n",
" st.grp2_n,\n",
" st.grp1_mean,\n",
" st.grp2_mean,\n",
" st.meandiff,\n",
" st.tstat\n",
"FROM\n",
" `isb-cgc.QotM.WikiPathways_20170425_Annotated` as gs\n",
"JOIN\n",
" tStatsPerGene as st\n",
"ON\n",
" st.symbol = gs.symbol\n",
"GROUP BY\n",
" gs.pathway,\n",
" gs.wikiID,\n",
" gs.Symbol,\n",
" st.grp1_n,\n",
" st.grp2_n,\n",
" st.grp1_mean,\n",
" st.grp2_mean,\n",
" st.meandiff,\n",
" st.tstat\n",
")\n",
"\"\"\"\n",
"query6 = s1 + \",\" + sampleGroup + \",\" + groups + \",\" + summaries + \",\" + tStatsPerGene + \",\" + geneSetTable + \"SELECT * FROM geneSetTable\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "UaqjMIK7PPFE"
},
"outputs": [],
"source": [
"%%script false\n",
"# Run the query6 subquery \n",
"geneSetTable_df = client.query(query6).to_dataframe()\n",
"geneSetTable_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "oGxo1kTDP39E"
},
"source": [
"That’s it! For each gene in the pathways (gene sets) table, we have joined in the T-statistic comparing our two groups. Now for the gene set score! To get this, we’re going to simply average over the T’s within each pathway, and scale the result by the square root of the number of genes. When the number of genes gets large (reasonably so), the value approximates a Z-score. In this way, using R for example, we could get a p-value and perform multiple testing correction in order to control the false discovery rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ePhAsbFyQD6w"
},
"outputs": [],
"source": [
"geneSetScores = \"\"\"geneSetScores AS (\n",
"SELECT\n",
" pathway,\n",
" wikiID,\n",
" COUNT(symbol) AS n_genes,\n",
" AVG(ABS(meandiff)) AS avgAbsDiff,\n",
" (SQRT(COUNT(symbol))/COUNT(symbol)) * SUM(tstat) AS score\n",
"FROM\n",
" geneSetTable\n",
"GROUP BY\n",
" pathway,\n",
" wikiID )\n",
"--\n",
"--\n",
"SELECT\n",
" *\n",
"FROM\n",
" geneSetScores\n",
"ORDER BY\n",
" score DESC \"\"\"\n",
"query7 = s1 + \",\" + sampleGroup + \",\" + groups + \",\" + summaries + \",\" + tStatsPerGene + \",\" + geneSetTable + \",\" + geneSetScores"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"id": "HfzRaVEYQXTM",
"outputId": "a1ed8260-db99-4da9-df68-59443702b299"
},
"outputs": [
{
"data": {
"text/html": [
"