{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regulome Explorer Kruskal-Wallis test for numerical and categorical data\n",
"CCheck out more notebooks at our ['Regulome Explorer Repository'](https://github.com/isb-cgc/Community-Notebooks/tree/master/RegulomeExplorer)!\n",
"\n",
"In this notebook we describe how Regulome Explorer uses Kruskal-Wallis test to compute the significance of associations between a numerical feature (Gene expression, Somatic copy number, etc.) and a categorical feature. Details of the Kruskal-Wallist test can be found in the following link: https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance \n",
"\n",
"To describe the implementation of the test using BigQuery, we will use Gene expresion data of a user defined gene and a user defined clinical feature. This data is read from a BigQuery table in the pancancer-atlas dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Authenticate with Google (IMPORTANT)\n",
"The first step is to authorize access to BigQuery and the Google Cloud. For more information see ['Quick Start Guide to ISB-CGC'](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowToGetStartedonISB-CGC.html) and alternative authentication methods can be found [here](https://googleapis.github.io/google-cloud-python/latest/core/auth.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import Python libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from google.cloud import bigquery\n",
"import numpy as np\n",
"import pandas as pd\n",
"from scipy import stats\n",
"from scipy.stats import mstats\n",
"import seaborn as sns\n",
"import re_module.bq_functions as regulome"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## User defined Parameters\n",
"The parameters for this experiment are the cancer type, the name of gene for which gene expression data will be obtained, and the clinical feature name. Categorical groups with number of samples smaller than 'MinSampleSize' will be ignored in the test. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"cancer_type = 'LGG'\n",
"gene_name = 'IGF2'\n",
"clinical_feature = 'icd_o_3_histology'\n",
"MinSampleSize = 26\n",
"\n",
"bqclient = bigquery.Client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data from BigQuery tables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Gene expression data from the BigQuery:** The following query string retrieves the gene expression data of the user specified gene ('gene_name') from the 'Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered' table available in pancancer-atlas dataset."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"query_table1 = \"\"\"table1 AS (\n",
"SELECT symbol, data, ParticipantBarcode\n",
"FROM ( \n",
" SELECT \n",
" Symbol AS symbol, AVG( LOG10( normalized_count + 1 )) AS data, ParticipantBarcode\n",
" FROM `pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered` \n",
" WHERE Study = '{0}' AND Symbol ='{1}' AND normalized_count IS NOT NULL\n",
" GROUP BY \n",
" ParticipantBarcode, symbol\n",
" )\n",
")\n",
"\"\"\".format(cancer_type, gene_name )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Clinical data from the BigQuery:** The following string query will retrieve clinical data fromthe 'pancancer-atlas.Filtered.clinical_PANCAN_patient_with_followup_filtered' table available in pancancer-atlas dataset. It is worth noting that some of the values of the clinical feature may be 'indetermined' or 'not-evaluated'; typically these values are inside square brackets. The 'REGEXP_CONTAINS' command is used to avoid using those values in the test."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"query_table2 = \"\"\"table2 AS (\n",
"SELECT\n",
" symbol,\n",
" avgdata AS data,\n",
" ParticipantBarcode\n",
"FROM (\n",
" SELECT\n",
" '{0}' AS symbol, \n",
" {0} AS avgdata,\n",
" bcr_patient_barcode AS ParticipantBarcode\n",
" FROM `pancancer-atlas.Filtered.clinical_PANCAN_patient_with_followup_filtered`\n",
" WHERE acronym = '{1}' AND {0} IS NOT NULL \n",
" AND NOT REGEXP_CONTAINS({0},r\"^(\\[.*\\]$)\") \n",
" )\n",
")\n",
"\"\"\".format(clinical_feature, cancer_type)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following query combines the two tables based on Participant barcodes. T"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"table_data = \"\"\"table_data AS (\n",
"SELECT \n",
" n1.data as data1,\n",
" n2.data as data2,\n",
" n1.ParticipantBarcode\n",
"FROM\n",
" table1 AS n1\n",
"INNER JOIN\n",
" table2 AS n2\n",
"ON\n",
" n1.ParticipantBarcode = n2.ParticipantBarcode\n",
") \n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point we can take a look at output table"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" in runQuery ... \n",
" the results for this query were previously cached \n"
]
},
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" data1 | \n",
" data2 | \n",
" ParticipantBarcode | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 2.359532 | \n",
" 9382/3 | \n",
" TCGA-E1-A7YW | \n",
"
\n",
" \n",
" 2 | \n",
" 2.868692 | \n",
" 9382/3 | \n",
" TCGA-FG-7637 | \n",
"
\n",
" \n",
" 3 | \n",
" 2.713119 | \n",
" 9382/3 | \n",
" TCGA-TQ-A7RG | \n",
"
\n",
" \n",
" 4 | \n",
" 3.064997 | \n",
" 9382/3 | \n",
" TCGA-DB-5280 | \n",
"
\n",
" \n",
" 5 | \n",
" 2.554518 | \n",
" 9382/3 | \n",
" TCGA-IK-8125 | \n",
"
\n",
" \n",
" 6 | \n",
" 2.799724 | \n",
" 9382/3 | \n",
" TCGA-DU-8163 | \n",
"
\n",
" \n",
" 7 | \n",
" 2.800062 | \n",
" 9382/3 | \n",
" TCGA-HT-8018 | \n",
"
\n",
" \n",
" 8 | \n",
" 2.558207 | \n",
" 9382/3 | \n",
" TCGA-DU-7019 | \n",
"
\n",
" \n",
" 9 | \n",
" 2.793514 | \n",
" 9382/3 | \n",
" TCGA-DU-5852 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" data1 data2 ParticipantBarcode\n",
"1 2.359532 9382/3 TCGA-E1-A7YW\n",
"2 2.868692 9382/3 TCGA-FG-7637\n",
"3 2.713119 9382/3 TCGA-TQ-A7RG\n",
"4 3.064997 9382/3 TCGA-DB-5280\n",
"5 2.554518 9382/3 TCGA-IK-8125\n",
"6 2.799724 9382/3 TCGA-DU-8163\n",
"7 2.800062 9382/3 TCGA-HT-8018\n",
"8 2.558207 9382/3 TCGA-DU-7019\n",
"9 2.793514 9382/3 TCGA-DU-5852"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql_data = 'WITH\\n' +query_table1+','+query_table2+','+table_data \n",
"\n",
"sql = (sql_data + '\\n' +\n",
"\"\"\"SELECT * FROM table_data \n",
" ORDER BY data2\n",
"\"\"\")\n",
"\n",
"\n",
"df_data = regulome.runQuery ( bqclient, sql, [] , dryRun=False )\n",
"df_data[1:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use a 'violinplot' to visualize the populations in each category. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEQCAYAAACk818iAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd0W9ed6PvvRiMBEGDvlKhuFapYVrFKbDXLkrsT2yku\nsR1PeTNjZzLzptxZs17sO3PffTPrvUmdOMUtdmLHsZ3EvduUJVnF6qK6RFGkKLF3EkTd7w8AFCmz\nijgHh8D+rMVFEDg4Z/Pw8Id9dvltIaVEURRFSRymeBdAURRFiS0V2BVFURKMCuyKoigJRgV2RVGU\nBKMCu6IoSoJRgV1RFCXBaB7YhRDpQohXhBDHhBBHhBDLtT6moihKMrPocIwfAe9IKe8WQlgAhw7H\nVBRFSVpCywlKQgg3sF9KOV2zgyiKoigDaN0UMxVoEkI8K4TYJ4T4pRDCrvExFUVRkprWgd0CLAb+\nW0q5GOgB/lnjYyqKoiQ1rdvYzwM1Uso9kZ9fBf7p8o2EECphjaIoyhhJKcVgz2taY5dS1gM1QohZ\nkafWA0eH2DauX9///vfjXgajfKlzoc6FOhfGPxfD0WNUzGPAb4UQVqASeEiHYyqKoiQtzQO7lPIg\nsFTr4yiKoihhauZpxJo1a+JdBMNQ5+ISdS4uUefiEqOfC03HsY+6EEJII5RDURRlohBCIOPReaoo\niqLoTwV2RVGUBKMCu6IoSoJRgV1RFCXBqMCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVESjArsiqIo\nCUYFdkVRlASjAruiKEqCUYFdURQlwajAriiKMgZdXV289NJL8S7GsFRgVxRFGYPW1lYqKiriXYxh\nqcCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVHGIBQKDfhuRCqwK4qijEE0oAeDwTiXZGgqsCuKooyB\nqrEriqIkmEAgAKgau6IoSsKIBnQV2BVFURJENKBHa+5GpAK7oijKGKgau6IoSoKJ1tRVjV1RFCVB\nqMCuKIqSYFRgVxRFSTB+vx9QgV1RFCVhRAN6NMAbkQrsiqIoY+D3+wBVY1eUCcvItTIlPgK+cGA3\n8rWhAruiDKG3t5fHH3/c0DUzRX8+vw+TSajArigTkS9SMzNysidFf36/H3uKRQV2RZmIJkJ6VkV/\nfp8Ph8EDu0XrAwghqoB2IAT4pZTLtD6mosTCREjPqujP7/fjSLX23dEZkeaBnXBAXyOlbNXhWIoS\nMxNhIoqiP5/fR1qKBb+vN95FGZIeTTFCp+MoSkxNhGRPiv78PuPX2PUIuBL4UAjxhRDiz3Q4nqLE\nxESYYajoz+f340ixGDqw69EUs0pKeVEIkUs4wB+TUm67fKPHH3+87/GaNWtYs2aNDkVTlKFNhBmG\niv6ibexNbfoG9vLycsrLy0e1reaBXUp5MfK9UQjxR2AZMGxgVxQj8E2AiSiK/nz+AE67FV+DV9fj\nXl7hfeKJJ4bcVtOmGCGEQwiRFnnsBDYCFVoeU1FiJRrQjXzLrehLSonfH8CRYsXnN+51oXWNPR/4\noxBCRo71WynlBxofU1Fiwuv1DviuKIFAAJNJkGIz4/cZ905O08AupTwLLNLyGIqiFRXYlcv5fD5s\nVgtWixmfgZvo1DBERRlCb294nLIK7EqUz+fDajFjs5rwGbjGrgK7ogzB0+vBYrXg8XjiXRTFIPx+\nP1arWdXYFWWi8ng8OFwOPL0qsCthXq8Xm8WMzWLC7w8gpYx3kQalAruiDMHj6cHpdtDT0xPvoigG\n4ff7sVrMCCGwmM2GHQqrAruiDKHH4yEt3aGaYpQ+4Tb2cNi0Ws2GHQqrAruiDKHX04vT7VSBXenT\nP7DbrMZN3asCu6IMobe3F6fbQa9qY1ci/H4/VnOkxm5RTTGKMuF4e8M19t5eNdxRCfP7/VjMAgCL\n2aSaYhRlIgmFQuGp42l2vCqwKxHhGnsksFtMhs38qQK7ogzC5/NhtVqx2CwEAgG1ipIChFMKRGvs\nVrNJNcUoykTS29uLxWYJD2uzGjv3tqIfv9+PORLYzWZVY1eUCcXv92OxhlMpqcCuRAWDAcymSGA3\nCcOurqUCu6IMwu/3Y7GYAbBYLIatmSn6CgYCmE3hsGkSxl02UQV2RRlEIBDAZA4HdiPfciv6CoVC\nmMSlGrtR+15UYFeUQYRCIUyRW25h4H9gRV9SSiJxHSGEyhWjKBNJ+B84EtiFCuxKmJShfoEdw14X\nKrAriqKMibj0SIhhtosfFdgVZRAmk6nvNltKicmk/lUUANHvuohzUYahrlZFGYTJZCIUDN9mh0Iq\nsCthJiGIxnMjf+Abs1SKEmcWi4VgJLAHA0EsFq3XfVcmApPZRCgUDu0hiQrsijKRWK1WgoHwGOVg\nIIjVao1ziRQjMJktlwK7ge/kjFkqRYkzm81GwB8eu+73+7HZbHEukWIEZrOFYGQkTDAkMUfmOhiN\nCuyKMgibzYbf70dKqWrsSh+LxUIgGK6xB0Mhw14XKrAryiCiNXa/L5wzxqi33Iq+rFYrwUhTTCAo\nDdv3oq5WRRmEyWTCYrXQ292rmmGUPlarFX+kxu4PhFRgV5SJJiUlhZ5ODykpKfEuimIQVqu1rykm\nEDRuE50K7IoyBFuKDU93LympKrArYVarFX8g3HnqD4QMezenAruiDCHFloKn20OKQf95Ff3ZbDb8\nkfkNfn9ABXZFmWhsKTZ6e7zYbKrGroTZbDb8gXBTjM/Ao6VUYFeUIVitVrwer2FrZYr+wk0xQaSU\nBFRgV5SJx2qx4vNeWiJPUcI19iCBYAiz2WTYYbDGLFUc9Pb2xrsIisFYLGaC/iAWswrsSlg4sIfC\nHacGra2DCuxAOKj/27/9G42NjfEuimIgQpgIBoN9KykpitVqxe8P4DN4YjgV2IGenh4Auru741wS\nRVGMLNrGHlA1duOLBvbod0WB8DJoJtOlNK2KEg3s/kAIq4H7XnQJ7EIIkxBinxDiDT2ON1bRmnpX\nV1ecS6IYSTAYwmI1EwwG410UxSDMZjNCCLx+446IAf1q7N8Fjup0rDHr7Owc8F1RIJKuNzUFv98f\n76IoBmK1mOn1BZK7jV0IUQLcBDyl9bGuVHt7O3arlfbW1ngXRTEQr9dLqiMFr88b76IoBmI2m/H6\ngliSvMb+A+AfAMM2VLa3tFCc7qK9rS3eRVEMpNfbi8Nlp9ejhsIql1gslkiN3ZiLbIDGgV0IcTNQ\nL6U8AIjIl+G0t7dT5HbT0dER76IoBuLp8ZCWnobH44l3URQDMZtN+AycshdA65KtAm4TQtwE2AGX\nEOJ5KeUDl2/4+OOP9z1es2YNa9as0bhol3R1dZFXlM+eC3W6HVMxPo/HgzvTpUZLKQOYTSb8gSAm\nnWvs5eXllJeXj2pbTQO7lPJfgH8BEEJcD/z9YEEdBgZ2vXk8HrLsdjxeL1JKhDDkjYVuGhoayM3N\nTerz4PWG29Wdbge9nl5CoZBhp48r+jKZTQSDErNN38B+eYX3iSeeGHJbdaUCvoCfFIsZkxAEAoF4\nFyeuurq6+NGPfkRbkvc3dHZ24nDaMZlN2FJtaiis0sckTARDIYSBZyTrFtillFuklLfpdbyxkBKE\nEAghkNKwfby6iObM8fl8cS5JfHV0dGB32gFwOB2q/0UZIBwmVGA3NLMp/AkcDIUwm43b062HaECP\nNkUkq/b2duxpqQA40lJVYFf6hCuAYOSWShXYgZQUG10+H2aTKekDe7TGnuyBva2tjVRneIGN1LTU\npG+aUi6RUmIyRWvtxqQCO+C0O2jq6sFpt8e7KHEXDezJPsSvta0Vuyt8PdhVYCcUCnHixIl4F8MQ\nQn05hIybakIFdsDldlHX2YUrLS3eRYm76NA+FdhbcaQ5AHC4HLS2Jfes5KamJp5//nmVXoFwDiGr\n2WToHEIqsAPpGZlc6OggPSMj3kWJO5XCOKy9rQ1HpMbucNmTPrBHm+aSvVMdIBgMkmIzdnI4FdiB\n9MxMLnZ0qsAOdHV1Y0u105nEw/uklHS0d+J0OXjlp3/C4XLQ0Z7cnafRwJ7sfS8AgUCAVJuZgIHv\nXlRgB9xud/i7Cux0dnXicGXS2Zm8gd3j8SBMAmtKOMmT3ZGKx+NJ6jkO0aa5ZG+iA/D5AzhTbYa+\ne1GBHXC5XAO+J7Ourm6c7oykboppbW0lze3s+1mYBM40B+3t7XEsVXypxWjCpJT4/QGcdquh715U\nYAeczvA/scPhiHNJ4q+7uxuHOzOpA3tbWxsO18BrweFyJPXIGBXYw7xeL1armVSbhV4V2I0tNTU8\nEcWuhjvS09ODw5VObxLfcre1tfVNTopyuOy0JnG+/p5uFdgh/PvbU2zYU8x4DJzOedjALoRwCyH+\ntxDiBSHEty577WfaFk0/KSkpA74ns95eD/Y0N729nqRNr9Da2vqlwJ7sY9k9Hg/2FHvSt7F3d3fj\nSLVhT7HS6/USCoXiXaRBjVRjf5ZwQoTXgG8IIV4TQkSj37WalkxH0bULjZxfWQ/BYJCA348t1YGE\npB2z3NLajMP95aaY5pbmOJUo/rxeL3a7A6/XuB2Geujq6sKRasFkEqTYrIa9gxkpsE+XUv6zlPJP\nkQRe+4BPhBDZOpRNN9F0rMmeTsDr9WK12RBCYLOl9M1CTTaXd55COH1vS0tLnEoUf4FAEJvFRjCY\nvCODIJz102UPVwBdDuPmEBopsKcIIfq2kVL+L+BXwGdAwgT3ZM473p/H48FqCzdBWG0pSXnbLaWk\nra0d52U1dqfbmdRt7BBep0CGkrN5Lqq9vY201HAF0OW0TdjA/iawrv8TUsrngL8HEu6eLFnblKPC\ngd0GgDUlOQN7T09P+I4l1TbgeUdaeO1TI49d1pLJZCIQDGAyJ/d4i9aWFtLTwq3RbofVsP0uw/6V\npJT/KKX8aJDn35NSztSuWPGR7DX3cGAPX7QWa4ph2w+11NTUhCvjy/MZhEngSk9L2lq7xWLB5/f1\n9Uclq9aW5r7AnuG00tLcFOcSDW6kUTHP9Xv8bc1Lo8RVd3c3lkhTjMWWvIE9LcM56GtpGWk0Njbq\nXCJjsNls9Ho92Gy2kTdOYM0trWS5w8OiM92pNDU2xLlEgxvpvmphv8ff1bIgSvx1dnb2C+ypdHZ2\nxrlE+mtobCAtY/CJas4MBw0NxvxH1prNZsPr8yZ1YPd4PPj8ftLs4buWbHcqjU0TsMYOJHejc5Jp\nbWvDmhoOaja7k9ZWY7Yfaqm+vh5X5uCpJdyZLuob6nQukTFEA3oyN8U0NDSQm5nW12Sb6Uqlo6PL\nkMOCRxq4XSKE+DHhsezRx32klI9pVjJFd83NLaRmTwIg1ZFG88UzcS6R/hoa6pm+tHTQ19Kz3VQd\nPqJziYwh2mmazEOC6+rqyE2/NHHNbDaRme6koaGB4uLiOJbsy0YK7P/Q7/EeLQuixF9zczPTSucB\nYE9LpzrJJuR4vV66u3twpg/eFOPKTKO1pZVgMJi0AS6ZR45dqD1PXubAGcn5mXYuXrw4sQK7lPLX\nehUknqIXazJftIFAgI72dhxp6QDYnS66u7rx+XxJ065aV1dHelZ634S1y1msFpxuJ01NTeTn5+tc\nuvjy+8LNDUZsdtDLhQvnmbMwd8Bz+ZmpXDhfA0uWxKlUgxtpVMxqIcQD/X5+VQjxSeRr3XDvnUii\nK6EYeUUUrTU2NuJwuTBFaqLCZCItPYP6+vo4l0w/Fy9eJD17+NTNGdluLl68qFOJjMPT68FmTd7Z\nyH6/n4bGZvKzBt7NFeWkUVNTHadSDW2kztMnGNgEcxXh5pnHgX/UqEy6U6vDQG1tLWnpAycTp6Vn\nc+HChTiVSH+1F2pJz3EPu407x8X52vM6lcg4ujq7yXRn0pWkC7BcuHCBnIw0rJaBTXD5WU4am5oN\nN3FtpMDullIe7ffzKSnlXinlZ0DCrEoRzT2ezDnIq6trcFwW2B0Z2ZyrNl5tRCu1tbVk5g2/ilZG\nbga1SRjY29vbyM3KM+xMS61VVVVRkvvlvherxURelovz5411TYwU2Adc5VLKr/b7MWEaGaOzCZP1\nogWorqnBnZU34DlXVi7V1TVxKpG+AoEAzU3NpGcPX2PPzE2nrq7esOlatdLe3k5hbmHSzrytPHOK\nSXlpg742Kc9BZaWxRpCNFNiPCyFuvvxJIcQtwAltiqS/aDtyfRK2nUK4Caq1pZm0jMuaYtxZdHV2\nJkXOmLq6OtyZLizW4QeKpdhTSEmx0dycPCOGenp6CIVC5GXn09rWmnSDDAKBANU155mcP/iHfmm+\nmzOnT+pcquGNFNi/B/yXEOJZIcSjka/ngP+KvJYQzp4+zTUlRZytrIx3UeLi/PnzuLNy+zpOo4TJ\nRHp2LtVJ0BxTU1MzYjNMVFZ+puFuvbXU3NxMujuDFFsKyORbRam6upqcdAeO1EuTs/738zv7Hk/K\nd1FX12CojuWRkoCdBhYAW4Epka/PgAVSSmN9RF2h3t5eztXUsGLKZDo6OpIy5/bZs1WkZVwaxrXl\ntaf7HjszczlbVRWHUumr5nw1Gbnpo9o2Pc9NtQFHQmilubkZd5obIQQZ7gyaDDqNXisnThxnauHQ\nXYpWi5mS/HROnz6tY6mGN2IOTimlV0r5jJTy7yNfz0gpjfPRNE579+5lSlYmTpuNufl57NqxI95F\n0t2Zykpc2YN3mbizC6isPKtzifRXXV1DdkHmqLbNLsiiuvqcxiUyjsaGRtyOcDOEOy09qQK7lJJj\nRyqYWTL83dyMIhdHjxzWqVQjG2kce6cQomOQr04hhDEzzI+B1+tl65YtLC8pAmBJcSF79+41bPJ8\nLQQCAS5eqCU9Z/DAnp6dR319neGGc8VSd3c33V3duLOG7ziNyszNoMmAQ9y00tDQQLorHNhcDndS\nZbhsaGjA7/dRkD14xs+oWZOzOHHyJIGAMVaYGqkpxiWldA/y5ZJSju6/wMA+/uhDJqe7KXSHb7PS\n7aksKMzn7TffjHPJ9FNdXY3TndmXh/1yZosVd2YO584lbg313LlzZBdmIUyjy8dvtpjJzMlImnb2\npqYm0l3hZqp0VzqNDckT2A8ePMDsyZkjrtXgctjIzXBy6tQpnUo2vKRdDuXMmTMc3L+fddOmDHh+\nVekkLtRUs3///vgUTGfHT5wgPbdo2G3cOYUcO35cpxLp7+zZs2QXjq4ZJiqrMIvKs4nf2R4KhWhp\nbemrsWe4M5MmdXEoFOLggQPMnZo1qu3nlqZzYP8+jUs1OkkZ2FtbW/n9737HzVfNxGEbmIbUajZz\n++xZvPPWmwk/dVxKyZEjR8gumjzsdtlFpRw9cjRhx26fqTxDblHOmN6TW5zNmTPG6SzTSnNzMw67\nE6vFytOv/pJ0VzodnR1J0QxVVVWFzSwpyBq+GSZqzpQcTp0+ZYhRQ0kX2D0eD88/9xzLiguZkjV4\nLS3PlcaG6dN4/rnnaG9v17mE+jl//jyBYIi0jOGDmtOdiTBbErI5pru7m9aWVrLyx1Zjzy3M5uLF\nxO57gPBs3JzMS9eH2WQmMz0z4Ss9ALt37WT+tKxRL5lpT7EwozjTEHf7SRXYfT4fzz/3HJMcqSwp\nGb75YU5+LosL8nj26acN8QmshR07d5I7eeaIF64QgtzJM/l8x85ht5uITp8+TV5J7pgXabbYLGTl\nZVKZ4HMfzpypJDdz4IzkvKz8hP+9Ozo6OHXqFAtm5I68cT+LZ+Wyc8f2uN/dahrYhRApQohdQoj9\nQojDQojva3m84fj9fn7zwvO4QgHWT5/6pWD2H5989qX3LJ9cwjSXk2eeeirhZl+2trZy/NhxCqdc\nNartC6bM4syZ0wk34/LY8WPkTR5bM0xU3uRcjp84FuMSGUcoFOLkiRNMKpg04PmS/EkcP5a4fS4A\nu3buYM6ULFJtIy1ZMVBxbho2s+TkyfhO89E0sEspvcBaKeXVwCJgsxBimZbHHIzf7+e3L7yAtaeH\nm2aNXEPt7/qppRSlWHn26acTKri//8EHFEybjTUldeSNAYvVRtG0ebz73nsal0w/wWCQU6dOUTSl\n4IreXzS1gGPHjse9dqaVM2fOkJpi7+s4jSrMK6KlpSVhx7N7vV52797Nsjljvy6EECyfk8eW8k80\nKNnoad4UI6WMtmOkEF7YQ9dEEz6fjxd+/Rzm7i5unT0L0yiHtEUJIVg/fSoFVjPP/OpXCdEsU1lZ\nyenTZ5g0a8GY3lc8s4xz1efjXhuJlcrKSlzpadjT7Ff0/nBuGXNCDnuUUlL+aTlzps390mtmk5lZ\nU2azpXxLHEqmvV07d1Ba4CLLPbpKz+Vml2bT2dEW1+YqzQO7EMIkhNgP1AEfSim/0PqYUR6Ph2ee\negq713tFQT0qGtwn21P45ZNPTugOVY/HwyuvvMr0RSsxW8a2MLHZYmHGopW89oc/JsQH3IGDByia\ncWW19ajiGYUcPHQwRiUyjoqKCtrb2pkxeeagr5fNnM/x48cT7kPN6/Wybds2Vs8vvOJ9mEyCVWUF\nfPTh+3FLmKZHjT0UaYopAZYLIb5cBdBAW1sbv3zySfLNJm66asYVB/UoIQTXT5vCvKwMfvHkkxNy\nZaFQKMTLv38Fd14J2YXDD3EcSmZ+MVmFpbz0u5cndBOE3+/n2LFjTJoxvrUqJ80s4dChQwm1+lZb\nWxtvvvEmq67+ypDLBKbYUli24Fpe/t3Lhkp+NV7btn7GlEIXORmDr3s7WvOm5tDV0Rq3CUtj6xkY\nByllhxDiU2ATcPTy1x9//PG+x2vWrGHNmjVXfKyamhp++8ILLC0qYOmk2C4yu3xyCS6bjad++Uvu\nuucerrpqdJ2PRvDBBx/S1NpO2apN49pP6bwlHP38fd55911uuflLWZ0nhKNHj5KVl3nFzTBRrsw0\nnG4Hp06dYvbs2TEqXfz09PTw7LPPUjZzAfk5w9/NTJ80g8aWBl54/gUefOhBrNax3QEaTVdXFzt2\n7ODBzXPGvS+TSXD9omLee/dtZsyYMeQH5FiUl5dTXl4+qm01DexCiBzAL6VsF0LYgRuA/2ewbfsH\n9vHYu3cv773zDptnTWdGTvbIb7gCcwvycKem8Iff/55V113HV667bkwdsvGwc9cu9h04yILrb/5S\net6xMplMXLVsHYe2vEVmRgarVq2KUSn1s/uL3UyeXTLsNq/89E993+/+mzuG3G7y7BJ2f7F7wgf2\n9vZ2nn3mWYpzSiibOX9U71m24Fq27tnCc88+x/0P3E9q6pW1SxvBhx+8T9m0bDJcsfkdZk3KZPex\nBvbt28eSGCx2fXmF94knnhhyW62bYgqBT4UQB4BdwPtSyne0OJDf7+dPf/gD5R98wDcXztMsqEeV\nZKRz/+IFHNq9ixd/8xtD344eOHCAjz76mHkrN2JLGV8NNcpqS2Heyo2Ub/mMPXv3xmSfemlqaqKu\nvo7i6Vfejtrf5JklVFVVTei+l+rqap782ZNMKZzKkrLRD1wzCRNfWXI9DquTJ3/25IQdKVNXV8ex\no0dYNX/4+S1jIYRg/TUlfPjB+7qvp6z1cMfDUsrFUspFUsoFUsr/pcVxmpub+cWTT9JxvoYHFi8g\nxzm6KcDj5U5N5VsL55PS1cl//+Qnhlz4+fDhw7z51tvMW7kRe1ps87alOl3MW3Uj7773PgcOHIjp\nvrW0Y+cOpsyejHmcdy5RFpuFSTOL2f3F7pjsT0+hUIjy8i288PwLXLtgJQuuWjTmfZiEiRWLVnFV\n6Rx+/uTP2bt374RaZUlKyZtvvM6q+YXYU2LbiFGUk8bUAheffqrv8McJP/O0oqKCn//sZ8xNT+P2\nuVeRYtGt2wAAi9nExlnTWVVUwLNPPcXuXbsMc1EfPnyYP73+BvNWbsSZPrpERmPlcGVQtupG3nr7\nnQkR3L1eLwcOHGBaWWlM9zt9/hS+2L3bMGlbR6OhoYFf/PwXHDl0hFvW3M7kovGdk9nT5rBp9U1s\n+XQLzz///IS5g6moqKCns5WrZ2mzjPOaq4vZ88UXuk7um7CBPRgM8s5bb/HO66/ztbLZLCkpjms7\n99yCPL61aD7byz/l1d//Hr/fH7eyABw6dKgvqF++lmmsOd2ZzFt1I2++/Q779hkju91Q9u3bR25R\nDk53bO/q3Flu3NluDh06FNP9asHn8/HB+x/wy1/8kkm5pWxafRMu59ArBI1FVkY2t669A6cljZ/8\n+Cds377d0COG/H4/773zNhuWlIx75NxQ0hw2rp2bzztv65cOfEIGdo/Hw6+ffYYLp07y7WsWUuQ2\nRmr4bKeD+69egK+xIa7j3ffv388bb77FvFU3ah7Uo5zuTMpWbeKdd99jz549uhxzrEKhENs/3870\nBVM02f/0BVPYtn2bYe7YLielpKKigh/+4IfUVNVy+/qvMnfGvJhXiMwmM4vnLuGm627l4L5D/PQn\nP+XsWWOuwrV162fkZ6VSWjC6ZRGv1JI5BdRfvKDb8McJF9g7Ojr41c9/TnogwF1lc7EbbIiVzWzm\nltkzmZHm4BdP6t+ZtHfvXt5+591wUNeo+WUoTncGZas3894HH7J7t/Ham0+ePInJIsgp0ubDrqA0\nn16vhyoDrhFbV1fH0089zfvvfsDKRatZu2wdTru2fVEZ7gxuXLWZsukLePl3L/Pb375Ia2urpscc\ni87OTj7fvp21V49tSHR0Iev+C1qPxGI2sXZxMe++/ZYu8z8mVGDv6enh2aeeYqY7jQ0zpsbs1ima\nAGywRGBXQgjBitJJrCgu4Olf/Yq2traY7Hcke/fu5d333qds1Sac7rGloY2KLmTdf0HrsXC40pm/\nejMffvQxuwwW3Ldu+4zpC7+cAC5WhBBMXzCVrdu2arL/K+HxeHjj9Td46ldPUZBRyO3r7qQoL7Zz\nO4YjhGBqyTS+uuFuUkjlpz/5KR9//HHcmyoBPvnoQ+ZPzyYzRsMbRzJrUiY2U0CX5soJE9hDoRAv\nv/QSk512Vk2ZbPhx4wALiwq5piCPF379a8071fbv398X1B3u4Rfe1Zo9zU1ZJLh/YZBmmfr6ehoa\nGsc903QkpbMnce5cFS0tLZoeZyRSSvbv388P/usHtDd38tUb7mbujLKYTJS5EhaLhcVzr+G2dXdy\n9lQVP/xFA6R9AAAgAElEQVTBD+O6jFxrayuHKw6zYl5shryOhhCC6xcV8eknH2keDyZMYN+/fz/d\nLc2suWwpO6NbOqmYNCRbP4vN3cBgDh8+3Nf8Eu+gHmVPc1O2ahPvv/+BIUbLbP98O1PnlY457/pY\nWawWSmdPZsfOHZoeZzjt7e08++yzfPpxOeuvvYFVV68mdZRZPLXmcrpYt3wDy+ev4A+v/YGXX345\nLllTyz/9hKtn5uFI1bcpd1K+m8w0q+aLcUyIwC6lpPyTT1g7tRRznGocV0oIwdppU9i+fZsmt5+n\nTp3ij396nbkrbrji5hetOFzpzFu5kTffepvjcVwztbe3l4qKCqbNi+0Qx6FMK5vC/n374tLccPz4\ncX7605+SnprJbWvvIDcrb+Q3xUFJwSTuWP81Aj1BfvzjH1NTU6Pbsbu7u6moqGDpHG2GN47k2rn5\nbN/2maad7BMiSjY0NBDy+ynJ0LbnWitZDjvZTmfMRwZcuHCB3738MrOXr9Nt9MtYOdOzmLN8Pa+8\n8mrcMgEePHiQ/JJcUp361FrT0p1k5GZQUVGhy/Gitm/fzh9e+wPrlm3g6jmLY9rs8vSrvxzwPRas\nFisrFq1i6bxr+fVzv9btfO3Z8wVXTc7UvbYeVVrghpBf05FCEyKw19XVUeB2TYh29aEUOB0xzQjZ\n1dXF88+/wNQF15IxQrKmeHNn5zF90Uqef+E3cRkCumfvnhHzwsTa5Nkl7N2nX6qFnTt2sn3rdm5Z\nc/uIybuMZkrxFG5cvZnXX3+DY8e0X5Hq4P59lE2LX0VICEHZlEwO7NeuE3VCBHav10tqjKZ/x0uq\n2RyzfDKhUIgXX/odWSXTyCuZFpN9ai2neAp5pbN48aWXdJ2w0tLSQktrC/mT9W2SKJ5WSG1tLV1d\nXZof6/z583z88cdsXLWZNEea5sfTQnZGDuuX38AfXvuDph/+TU1NdHd3MykvNhOyrtScKdkcPXZU\ns6GPEyKwW61W/BqO/bRYLAghsGiYjsAfCsUsren27Z/T2dNL6ZyrY7I/vUy6aiEef4jPtuo3HPDI\nkSMUTS3UfTSI2WKmsDSfo0e/lKE65t55+x2umbsEd4xzAektLzuPWVNm8+EHH2p2jKqqKkoL0+N+\n95+elkKq1UJjY6Mm+58QgT09PZ12DbOjBYNBHn74YU1rkh0+PxkZ4x+x0tnZyaflnzLj6tUIMSH+\nfH2EEMxYtIqtn23VrUnm+MnjFJReWW19vB/4+aV5nDh54oreO1rNzc00NjYyo3SWpsfRy7wZZRw5\negSfz6fJ/s/XVFOUPb4Mp7GqCBbnpmnWaTwhIkN+fj6NHZ2ENOpFNpvNPPPMMzHL9jeYhq5u8vPH\n3wu/dds2ckumxzxTo15SnS7yS2eyRcPhn1HBYJDa87XkFl9Ze+p4P/DzinOoqqrSdPRDZWUlxfkl\ncRufHmupKalkZWRrFvBaW5rJSEsZ1z5iVRFMd1poa9NmJu6EuBqcTieutDTqO7VprwwEAkgpNZs0\n0OX10u3zjTuwSyk5cOAgBVO1W9BBj2apgqmzOXjwkOZTqxsbG3GkObCl2K7o/eP9wLen2TGZTZpO\no2+obyDDZYy5C7GS4cqkoaFBk31393TjHOdomFhVBJ2pVro6O8e1j6FMiMAOMHvuXE426Zf2MpZO\nNjYzKwbLY3V2dhIIBHBqOAlJj2Ype5obk8mkeaqFhoYG0rOu/M4mFh/4GdnpmgUpCK9PmuaIb0dg\nrKXZnZpdG4Lxt63HsiJo0qitf8IE9muWLKGiroFAcGItoCyl5EBdA9csG/2qNEPxer1YbeO7jRyJ\nHs1SABZbimbtqFEdHR2kjvO2e7zszlQ6NaqVAXR0dmBPjc2qWEZhT3XQ2aHNObPZbHj9xkgj7PUH\nsaVoc31OmMCel5dHyaRJ7Ks13ipFwznW0IjVbmf69Onj3pfL5aLX000oqF2eCa2bpSA8XNPT3YnL\npW1N0+PxYLHpu/DK5SwpFk2XTez19JJi1f7DS48muqgUW4pmaQaysrJp7TTGMpZtXX6ysnM02feE\nCewAm2+5hZ01tbTFIbfElejx+SmvPMfNt90Wk+FVqampFBQU0nShOgali5+Wi9Xk5OTi1HgJQykl\nRpjTpmXnafh31P6X1KOJLkogNOt/yS8sor7VGPGjvrUnJgMqBjOhAntOTg5r163j9WMn8Rt4VRaA\nkJS8c+I0CxYtYsqUKTHb74b166g5vo9gIP5pT69EMBCg+tg+1q9bq/mxUlNTCfjie50E/AFSNLrd\nBkhzuej2dGu2/yi9mugAuj3dmt3NTZs2jXP12k8aG4nHG6Clw0NJiTYzoidUYAdYuWoVucUlvHvi\ntGFXqgEoP1NF0G5n46ZNMd3vzJkzmTFtGqf2GXelnqFIKTlz8HMmTyph9mztRvZEuVwuerv1XR3+\ncr3dXk2bnKZNm0r1xXOa7T9Kjya6qJq6aqbPGH/T5WAKCgoIBCWNrT2a7H+0Tla3MH3aVM0+KCdc\nYBdC8LW776bbYuXjM2cNGdx2VZ+nqruHe++/X5M2ydtvvw0rAc4c3GHI338wUkrOHt6F8Pfwta/e\nqUvzQV5eHh0tHZofZzjtzR2a3W4DLF++nKraszS36btSl1aqL1bT0d3O/PnzNdm/yWRi4cJFVJy9\n8hF2/+OBawd8vxIVVa1cvXjJFb9/JBMusEM4xcADDz5IrcfL9qrxtzf/07rrBnwfjwMX6jjQ0MRD\n3/kODodj3PsbjM1m46EHv43wdXNyzxZNO1NjIRQMcmrfVoI9bTz04IOaNk30l5+fT1dHNz6vtqNv\nhuLp8hDwB8jM1C6dssvl4rbbb+OjHR/S3qnPSl1aqW+qY9u+LXz9G1/XtJN2ydJlHDrdhC9Oo2Pq\nW7pp6fBy1VVXaXaMCRnYAex2Ow898gjHW9vZc7423sUB4HhDI5/X1PLwI4+Qnq5tiuHU1FQe+c7D\nZLlSObz1Xbw9sWlnvf5r3xnwfby8nh4qtr+HO8XMnz3yiGYfdoOxWCxMmjSJhhpt8nGMpK66galT\np2o+K3TBggXcsHED73z2Fufr9MtrHitSSk5WneDjnR9yzz33UFqqbd78nJwcJpdO5vCZ+FwXu481\nsGLFSk0/vCZsYAdIS0vjoUceYXdtHccb4vNHiqpubePD02d54MEHyc7WJyWo1WrlW9/8JksWL+RA\n+Ru01Mcn3/lQWhsucLD8DRaVzeW+++7FZruyGaDjMb9sPrVn6nQ/LsCFMxeZX6ZNk8LllixZwr33\n3cvnB7ex48B2fP743KWMlae3h093f8yxs0f487/4c2bN0ifnzdp1G9hxpB5/QN95Mc3tHiovtLP8\n2itvxhmNCR3YATIzM3ngwQf58PRZ6jSa1DCSNo+HN46d5J5vfIOioiJdjy2EYO2aNdz7rW9SeeBz\nzlZ8ocsq6MORoRBVR/dxet9Wvn7P3WzYsD5uuUzKysqoq67H69E30Hm6PDTXtzJnzhzdjjllyhS+\n+93vkpJm448fvcrZ85WG7YMJyRDHzhzljx+9RnFpEX/z6N+Ql6dfauWSkhKKSyax70Ts1kgYjc8O\nXmDVqtXY7dpOKpvwgR2gqKiI2++8kz8dPUGvzsuRBYIh/nT0BGvWrWPmzJm6Hru/adOm8d3HHsUa\n9HB469t4uuPzIdfb3cnhbe8iett57NG/ies5AXA4HMyZM4ezR8c+cuTuv7ljwPexOFNRxaJFi3S/\nS7Hb7dx1911845vf4PCZg7y//V1a28e/sPZ37vrzAd/Ho76pjjc/fZ3zTdU88mePsHnz5piltB6L\nGzdtZueROnp69YkZ5xs6udDcw8pVqzQ/VkIEdgjXzObOn8+7J8/oWkspP1tFdmERK1au1O2YQ3E6\nnTz04LdZsfQaDpW/SeN57ZbeGkxTbRUHt7zF0sUL+c7DD2k+s3S0Vq9azZnDZwkG9Oks8/v8nD1y\njpUr4ndNTJ06lUcffZRF1yzk3W1vs+vQzrg3z3h6e/hsTzlb9nzK2vVr+PO/+HMKCuK32lNeXh4L\nFi5g6yHtZ7NLKflo73lu2LhJlw/7hAnsADdu3kxbMERFnT63V2dbWjnV0sadX/ta3BP3RwkhWL16\nNQ8//BC1x/dx5sAOQhpP5gqFglQe2kXN0T08+O0HuP666wyVRraoqIji4mIqj1TpcrzTh84yc+ZM\ncnK0mS4+WmazmZUrV/K3f/u3WB1m/vjRa5y7UKV7OaSUnDh7nD9+9Br5xXl87+++x6JFiwzxP7N+\nw0ZOVLdR36LtJK9DpxuxpDhZtGiRpseJMs5/XwxYrVbu+cY3KK88R7tH23wQHr+f906e4at33aXr\nSI/RKi4u5rHHHsVpDXF42zt4NZqd6PX0ULHtPVLw8dhjjzJp0iRNjjNeGzds5MTe0/h92t52ez0+\nTh+sZMP6DZoeZyzS0tK46+67+Po37mHv0S/YuncLfp1mLnt6PXy0431O157kO498h803bdZtuOto\nOBwONtywkQ++qNHsTt/jDbDl4AVuve0O3So8CRXYAQoLC/nKddfx1olThELa/KGklLx/qpJ58+fH\nvQ15OKmpqdx/330sWbSAg+Vv0tka25FDXW3NHNryJovK5vDtB+7XvENoPIqKipgxYwYn95/R9DjH\n956krKws7rX1wUybNo1HH3sUR7qDt8pfp6NL28lbjS0NvPHpH5kyfQp/9Vd/RWFhoabHu1JLlixB\nmlM5XKnNJK8tB2opKyujuLhYk/0PJuECO8Dq664jJT2Dz6q0mWq9v/Yi7SHJjZs3a7L/WBJCsHbt\nWu6843aO7viQ5hiNc26pr+XI5+9z2623sGHDBkM1vQxl042bOHP4LD2d2kwn72ztpPrEeW7YcIMm\n+4+FlJQU7rrra6xcvZJ3PnuTlhh0rA7mQn0tH37+Prfdfhs3brpRlxwzV8pkMnHb7XeyZX8tvb7Y\nTva72NzFyZo2btgY29QiIzH+f+MVMJlMfP2b3+R4cyvH6mNbS61ubePzmlq+dd99cenJv1Lz5s3j\n2w88wJn928fdqdpUW8WpvZ9x/333aTb1WwsZGRmsuHYFhz8/FvN9Syk5tO0o119/PWlpaTHffywJ\nIVixYgU33XwTH2x/l84Yj6BqbGmk/ItPuPe+e5k3b15M962VkpISZs+Zy9aDsetIlVLy4Rc1bLxx\nk+53swkZ2CE8QuS+Bx7gozNnqW2PzS1nS08Pbxw7yd1f/7puk5BiafLkyXzn4Yc4e3gXjbVVV7SP\n5ovVVB7cwcMPPRjTrJV6uf7662mtb6OxNra33Rer6unt8sZ1JMxYLVq0iK9c9xU+3f0xwVBsOti9\nvl4+3fURd9x5B1OnTo3JPvWy8cZNHK1qiVmCsIrKJrDYWbx4cUz2NxYJG9gh3K76tbvv5k9HjtPS\nM74/VrfPxyuHj3HDpk2GblcfSWFhIQ8/9CCVB3fQ1nhxTO9tb6rn9P5tfPvbD+jaXhhLNpuNm2+6\nmQOfVcRsIlcwEOTQtiPcduttuixEEUurV68mIyudwycOxmR/uw/vYs68OZSVlcVkf3pyOp2sXbee\nj/eeH3dHqs8fZMuBWm659fa4NFMmdGAHmD17Nhs2beKVw0fp8l5ZCldvIMArh4+yeNkyli5dGuMS\n6q+oqIhvffMbnPiiHM8gHWiD5Ynp7e7k+O5PuOfuuw078mW0ysrKyHBncOZwbMb5n9x/mqLCogn5\ngS+E4I477uDI6Qq6esaXp7yhuYELDbVsinGqaj0tX76cLi+cqR1fQrWdRy8yddoMJk+eHKOSjY2m\ngV0IUSKE+EQIcUQIcVgI8ZiWxxvK0qVLuWb5tbxacQzvGPNJB0PhmaWTZ8xk3fr1GpVQf9OnT2f9\n+nUc3/3piNkhQ6EgJ74o5/rrrtM0I51ehBDcftvtHN9zCq9nfPnaezp7OH3oLLfecmuMSqe/jIwM\nli9fzv5je694H1JK9hzZxQ0bbzDUcMaxMpvN3Lj5Jj7dX3vFo+q6enzsO9HAxhvj9wGndY09APyd\nlHIesAL4ayGE9issDGLtunVMmj6DN4+fIjTK2ywpJR+eriQlM5Nbb7/dEBMqYmnFtddSlJ/LuWP7\nh92u+vgBcrLS+cpXVutUMu3l5eWxaNEiju46Ma79VOw8zvJlyzVNzauH666/jvN1NbR2tF7R+2vr\nz+MP+uPSnhxrs2fPxunKoOLslfXDbK+4yOLF18T1mtA0sEsp66SUByKPu4BjQFwaZ4UQ3HbHHYTs\nDraeHd0wyAMX6qjr9fH1b37L0MO1rpQQgjvvvIPGmtN0tQ2+8EB3Ryv1Z0/wta9+NeE+2Das38CF\nyou0N19Z53pzXQvNF1q4/vrrY1wy/aWmpnLdddex7+ieMb83JEPsOfIFN2y8YUIMex2JEIIbNm5i\n++GLBMfYD9Pe5eVYVQvXr9F+6cfh6PZXEEJMARYBu/Q65uXMZjPfvPdejja1UNk8/Pjd+s4utp2r\n4d4HHpjQt5YjSUtL44YNG6iq2D3o6+cqvmDd+nW43W6dS6Y9u93O2rVrqdgx9uGPUkoqPj/GDRsm\ndtNDf9euuJbWjhYuNo5tyN/pc6ewO1InzNDG0Zg6dSqZWTkcGeOkpR1H6li6bJnmC7WPRJcufCFE\nGvAq8N1Izf1LHn/88b7Ha9asYc2aNZqUxel0ctc99/DKSy/xnaVuUgYZxRAMhXj7xCluuuWWCTms\ncayWLFlC+ZbPaGuqIyPnUlKmjpZGervaWb5sWRxLp61ly5azbft2Gs43kleSO+r3XayqQwZkQjQ9\nRFmtVjZt3sRHH3zMbWtHN/3d5/ex7+ge7n/g/oS7o1u7bgN/eu1lyqblYjKN/Lt19fg4VtXM9776\noCblKS8vp7y8fFTbah7YhRAWwkH9BSnl60Nt1z+wa2369OlcNWcOW6uq2TBj2pde33O+lozcPN0S\n9sSb2WzmK6tXsffw8QGBve7sMVau0nall3izWCzcsOEGtny+hdzinFEFJyklR3ed5OZNNydE00N/\n8+fPZ9fOXRw9c4SymSNPPtt/dA9XXXXVhB8pNZhp06aR6kjjTG0bMycNbC8fbL3TvScbWLhwgWYT\n1C6v8D7xxBNDbqvHVfkMcFRK+SMdjjVqGzdt4mh9I609ngHPe/x+dtVc4Jbbbku4Gshwrr76apov\n1hCMJIcKBQM0Xahm8dVXx7lk2lu4cCFBb3DUS+jVVl4kxZbC7NlxGQegKSEEd9x5BwdPHKB7hMRx\nTa1NVNZWsmnzxB3eOBwhBKtWX8eeEyNfF4FgiAOnGlm56is6lGxkWg93XAXcC6wTQuwXQuwTQhji\nKnA6nVy7YgU7a8LrpUYXst57/gJz5841ZBInLdntdvILC2lrCi8j197cQE5OjuGnx8eCyWRi7Zq1\nnNh7esRtpZSc3Hua9WvXJ+wHf25uLtcuX86uQzuG3CYkQ3x+YCubNm2Ke3uylsrKymhs89Dc4Rl2\nu+PnmiksLDRM3NB6VMx2KaVZSrlISnm1lHKxlPI9LY85FitWruRkYxOeyKpLwVCIAxfr+UoCjHK4\nElOnlNLZEq6ddLY2MkXjRYWNZOHChfR0emipH364X2NtE4REQtbW+1uzdg1tna3UXKwe9PXjlcew\nOx0J1ccwGIvFwuJrruHgqeFr7QdOt7D8WuOkk0isBsIxcjqdzJo1k6P1DQCcaW4hJyeH3NzRd6Il\nkrzcXHw94YRQvu5O8vKS5zyYzWZWrVzF6UPDz0Y9ffAsq1etTri29ctZrVZuufUWdh/e2Zd6Ibos\nntfXy4Fj+7j99uRorly8+BqOnG0ZcsJSW2cvze0eQ03eS+yrcxQWXr2YE83h6cMnmlpYmOA1kOG4\nXC4CvvACJX5fb1I0w/S3ZMkS6qrq6e0ZfDZqd0cPzXUtSdOpPnv2bLKyszh+duBw0IPHDzBv3ry4\nLmunp7y8PNxuN+fq2gd9/WhVM2VlZYYaZJD0gX369OnUd7TT6w9wtqU14W+xh2OxWPqW0QsFg7ov\nxBxvdrudOXPncO7EpeaH/gtZVx2rZuHChUl1Xm7cdCOHTx7sy/7o8Xo4de4k69avi3PJ9DV/4SJO\n1AyeP+ZETTsLFhrrwz7pA7vVaqWooIBDF+uwp6aSnp4e7yLFjZQSIrfWQghdFwU3iiXXLKHmxJcn\n6EgpqTl5nmsWXxOHUsVPSUkJOTk5VEVy+J84e4y5c+cm3f/JvHllnKxp/dL/RFtXLx09PkoN1h+V\n9IEdoGTyZA5euDhhU9HGis/nw2QO306aLBZ8vviuah8PpaWl+L1+OloGphloqW/FarFRVFQUp5LF\nz/Jrl3O65hRSSk5Xn2L5tcvjXSTdZWdnk5pqp75lYPrvyto2Zs6Yabg+F2OVJk7y8gto6fGQZ9A1\nGfXi9XqxWMKrQpktVnp7tV0Q3IhMJhNlZWWcPzMwV/2FyovMnz8/KToLLzd79mzqm+poaGlAQtJW\ngGbNuorKCwObY87WdXPV7DlxKtHQVGCHvixsWVlZcS5JfHk8HszWcPuxyWLF4xl+7G6imjtnLg3V\nA4e31Z1rZO6cuXEqUXzZbDaKi4o5cGwfM6ZPT8oPN4Bp02dQ03ipxi6lpKa+3ZArRanATng0SP/v\nycrj8WCyhAO72WKjJ0kDe2lpKW3N7fh6w01RPZ099Pb0JmUzTFTJpBLO19UwaXLipQ4YrSlTpnC+\nob1v2GNjmwe73W7IBHkqsBNuP/v6178et9VOjKK3txdzpCnGYrUlZVMMhEcHlZQU03ghnMq48UIz\nU6ZOMVw7qp7y8vIAknaOB4DD4SDN6aS5PVzhudjUZdgcOcl7pfZjMplYsGABVqs13kWJK78/gCmS\nd16YTATGuNpUIpk6ZRotdeHUzi11rUwtNd7ttp4KCgqwWCxJHdghPEroYnM4Qe3FFg/FJcasDKrA\nrvSRXBrKlazDHaMmTZpEW2N4ZExbQ7tha2Z6KS4u5oknnki6SWuXKywqobEtfCfb2N5LoUEHXKjA\nrvSxmC3IyPTxUCiUkKtGjVZhYSFtTW3IkKStuT1pZlkqw8vLy6Opw4uUkqbWLvLz8+NdpEGpwK70\nSU1NIRBJ2xsM+ElNSY1zieInXDMVNNe3YHfYE2aVJGV8cnNzaenw4PEGQJhwOBzxLtKgVGBX+jid\nToL+cJ6UoM9LWlripmMdiRCCzMxM6qobyM5J/FW0lNHJyMigoyuc9CsrI92wQz9VYFf6uFwuAt5w\nj7/f25P0wz8zszJprG0iKzO55zcol5jNZtKcDs43dvbNfzEiFdiVPhkZGfT2hHv8vT3dZGRkxLlE\n8ZWZkUlzXQsZ6cl9HpSB0t0uLjR2kZ6hArsyAWRmZtLT1YGUkp6ujqSfiet2uZEhmfR3LspALpeL\n+pYeXG7jJkJTgV3p43A4MAkTvd2dBAOBpB/aFl3yLZGXflPGzpnmor3ba+j/DxXYlQEyMjNpbagl\nPSPDsB1DeklNDY8KstvtcS6JYiQOZzigG/m6UIFdGSArK4vWhguG7hjSSzSwR78rClwK6CqwKxNG\nVlYm7U11ZCd5+zrQt1KSGsOu9DcRPvBVYFcGyEhPx+/tJSPDuB1DeiksLOThhx9O+k5kZaDoB76R\nl0g0zuqriiGoFMaXmEwmpk+fHu9iKAYzEQK7qrErA0RHgBh1qrSixFs0C6zFYtx6sQrsygDRdkMV\n2BVlcNH/ESOn+VaBXRkgPT0ds9mcdKvQK8polZSU8Pjjjxs6+6kwQs5tIYQ0QjkURVEmisiaCYNO\nNlE1dkVRlASjAruiKEqCUYFdURQlwajAriiKkmBUYFcURUkwKrAriqIkGBXYFUVREowK7IqiKAlG\n08AuhHhaCFEvhDik5XEURVGUS7SusT8L3KjxMWKivLw83kUwDHUuLlHn4hJ1Li4x+rnQNLBLKbcB\nrVoeI1aM/ofSkzoXl6hzcYk6F5cY/VyoNnZFUZQEowK7oihKgtE8u6MQohR4U0q5YJhtVGpHRVGU\nMRoqu6MeS4CIyNeQhiqcoiiKMnZaD3d8EfgcmCWEqBZCPKTl8RRFURSDLLShKIqixE7CdJ4KIb4r\nhDgc+Xos8tz/FEIcFELsF0K8J4QoiDxvEUI8J4Q4JIQ4IoT458jzdiHEW0KIY5H9/N+XHaNACPG+\nEGKyEGKvEGJfZLu/0P83Htpg56Lfa38vhAgJIbL6Pfc/hBCnIr/3xn7PL46co5NCiB9etp9kOxf/\nHrnr7BjkGAl3LoQQpUKInsjvsk8I8bN+2ybVdTHCuTDmdSGlnPBfwDzgEJACmIEPgGlAWr9tHgWe\njDz+JvBi5LEdOAtMjjy+PvK8BfgMuLHfPh4Evhd5zRp5zhF5f0G8z8MQ5+JDYFrktRLgvUh5syLP\nzQH2R36nKcBpLt3J7QKWRh6/k+TnYhmQD3QMcpxEPBelwKEh9pVs18Vw58KQ10Wi1NjnALuklF4p\nZZBwQP6qlLKr3zZOIBR5LAGnEMJM+ER7Cf9hPFLKLQBSygCwj/AfOmoT8K6UMiCl9EeeszNC57DO\nLj8XW4CvRl77AfAPl21/O/C7yO9UBZwClkXublxSyi8i2z0P3NHvfUlzLgCklLullPVDHCcRzwUM\nUv4kvS5giPIb9bpIlMBeAXxFCJEphHAANwGT4NKtEvAt4P+KbP8q0ANcBKqA/1dK2dZ/h0KIDOBW\n4OPIzyZglpTyeOTnEiHEQeAc8B9Syjptf8VRG/RcCCFuA85LKQ9ftn0xUNPv59rIc8XA+X7Pn488\nl4znYkgJfC4ApkSaDz4VQqyOPJeM1wUMfi6GFO9zocdwR81JKY8LIf6D8C1VF+Hb6WDktX8F/lUI\n8U+Em2MeJ1wLCwAFQDawVQjxUaSWRqQm/yLww+hzwHLCt6DRY54HFkZqMK8LIV6VUjZq/KuOaIhz\nkQr8C3BDjA6jzsUliXYuorXJC8BkKWWrEGIx8CchxNwRDpNU5+KyFoHLxfVcJEqNHSnls1LKJVLK\nNVZ3f74AAAW7SURBVEAbcPKyTV7k0u3Wt4D3pJShyMndDizpt+0vgRNSyp/0e24z4ba3y49bR6QG\nEJNfJAYGORcVhNuMDwohzhJuXtonhMgjXCud3O/tJZHnaonc9Vz2PCTfuRhOop2LvUKIPCmlX0rZ\nGnnfPuAMMIvkui5GOhfDie+50LIBX88vIDfyfTJwFHADM/q9/ijw+8jjfwSejjx2AkeAssjP/w68\nMsj+twPOyONiIDXyOBM4AcyL9zkY7lxc9vpZIDPyeC7hGosNmMrADsOdhO9uBP06yZLxXPTbvjNJ\nroscwBR5PI1wE1VGkl4XQ54Lo14XCdEUE/FaZHiSH/grKWWHEOIZIcQswp2m54C/jGz738CzQoiK\nyM9PSykrhBDFhG/Hjgkh9hPuZP0p8AbgkVJ2R7afA/x/QogQ4Yv7P6WUR/T4JUfpS+fistclkdtM\nKeVRIcTvCV/c0e2jkxv+GniO8G3qO1LK94UQOSThuYjcun8LsEf6bJ4CfkaCngvgOuB/CiF8hP9/\n/kJe6odKquuCYc6FUa8LNUFpFIQQ9wLFUsr/jHdZ4k2di0vUubhEnYtLjHAuVGBXFEVJMAnTeaoo\niqKEqcCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVESjArsiqIoCUYFdkV3QohtY9i2VAgxWFKmKznu\nUhHOzR/9umOE7TuHeP4vhBD3DfO+64UQK0ZRnu8LIf5u5JIrytgk0sxTZYKQUo6YHe/yt8To0IeB\na6SUoUgypoNCiDeklKEhth/0uFLKX4xwnDWEk0vtuOKSKso4qBq7orv+NWEhxD+J8Go8+0VkxSoh\nxDVCiAORtA5/PcK+UiKpIw6J8Co1a4baVkrZ2y+I27mUn3+Y3Yt/j5TlcyFEbuTJvpq2EOIxEV6F\n64AQ4kUhRCnh1BV/G0nzuipy1/FxZJsPhRAlgxxokRBiR2Sb14QQ6ZHnl4rwKmD7hBD/Gb17EUJs\nEUIs6Pf+rUKI+SP8PkqSUIFdiYdo/pXNhHPeL5VSXg1Ep2A/A/x15LmR/DUQklIuIJyz49dCCNtQ\nGwshlkVyBB0E/nKY2jqEE8R9LqVcBGwF/myQbf4JWBTZ5i+llOeAnwM/kFIullJuB34CPBvZ5sXI\nz5f7NfAPkW0qgO9Hnn8G+DMp5WLCqaijdxFPAw9FfqeZQIocPI+4koRUYFfiaT3hgOcFkFK2RWqq\n6ZGACPDCCPtYDfwm8v4ThBdOGTKlqgyveFMGLAX+ZbgPAcArpXwn8ngv4bSulzsIvBjJDxIcYj8r\ngJcij18AVvV/UQjhJvw7R/sefg1cFzkXaVLK3ZHnX+z3tleAm0V47YCHCSflUhRABXYl8Yxq2bHI\nh0AXUDbMZv5+j4MM3id1M+EMoIuBL0R45ZwvHW4URRqq3EMtyeYhvFDEHcDdwG9HcQwlSajArsRD\nNFh9CDwkhLADCCEypZTtQJsQYmVkm3tH2NfW6DaRFM2TCOe7/vJBhZgSqeESaQu/inANf6RyDmey\nDK+T+8+E1wBIAzojj6M+J7yAOsB9kTL3iaSMbRFCRGvy9wNbIueiQwixNPL8Ny479tPAj4HdkW0V\nBVCjYpT4kACRPN4LgT1CCC/hRRv+lXDTwjOR/NUfjLCvnwFPCiEOEa5hf1teWjj4cquBf+6XV/v/\nkFK2jFTOoQghLMBvIk0pAvhRZB2AN4FXRXgNzUcjX88JIf5PoJFI2/hlHgR+HvmQq+y3zXeAp4QQ\n0UWX+wK4lHKfEKIDeHa4cirJR6XtVRQDE0I4ows2iPC6vQVSyu9Ffi4CPpFSzo5nGRXjUU0ximJs\nN0eGgh4mfMfx7wBCiPsJj5P/l3gWTjEmVWNXJgQhxEbgP7jUPCKASinl18azbWT7nYTXOY1uK4H7\nDbaUm6KMmgrsiqIoCUY1xSiKoiQYFdgVRVESjArsiqIoCUYFdkVRlASjAruiKEqC+f8BVw1FbAlK\nBu8AAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"new_data = df_data[ df_data.data2.str.contains('^\\[.*\\]$',na=True,regex=True) == False ]\n",
"new_data.rename(columns={ \"data1\": gene_name , \"data2\": clinical_feature }, inplace=True)\n",
" \n",
"sns.violinplot( x=new_data[clinical_feature], y=new_data[gene_name], palette=\"Pastel1\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BigQuery to Compute statistical association \n",
"\n",
"The Kruskal-Wallis score (H) is computed by using the following equation: \n",
"\n",
"$$H = \\frac{(N-1)\\sum_{i=1}^{g} n_i (\\bar{r_{i}} -\\bar{r} )^2 }{ \\sum_{i=1}^{g} \\sum_{j=1}^{n_i} (r_{ij}-\\bar{r})^2 }$$\n",
"where\n",
"\n",
"- $n_i$ is the number of observations in category $i$\n",
"- $r_{ij}$ is the rank (among all participants) of the gene expression of participant $j$ that belongs to category $i$\n",
"- $N$ is the total number of participants considered in the test\n",
"- $\\bar{r_i}$ is the averange rank of gene expression values for particpants in category $i$\n",
"- $\\bar{r}$ is the average of all $r_{ij}$\n",
"\n",
"To avoid reading that table multiple times, we rearranged the equations above as :\n",
"\n",
"$$H = (N-1)\\frac{ \\sum_{i=1}^{g}S_i^2/n_i - (\\sum_{i=1}^{g}S_i)^2 / N }{ \\sum_{i=1}^{g}Q_i - (\\sum_{i=1}^{g}S_i)^2 / N }$$\n",
"\n",
"Where $S_i = \\sum_{j=1}^{n_i}r_{ij}$ and $Q_i = \\sum_{j=1}^{n_i}r_{ij}^2$\n",
"\n",
"The following query string computes $S_i$ and $Q_i$:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"summ_table = \"\"\"\n",
"summ_table AS (\n",
"SELECT \n",
" COUNT( ParticipantBarcode) AS ni,\n",
" SUM( rnkdata ) AS Si,\n",
" SUM( rnkdata * rnkdata ) AS Qi,\n",
" data2\n",
"FROM ( \n",
" SELECT \n",
" (RANK() OVER (ORDER BY data1 ASC)) + (COUNT(*) OVER ( PARTITION BY CAST(data1 as STRING)) - 1)/2.0 AS rnkdata,\n",
" data2, ParticipantBarcode\n",
" FROM\n",
" table_data \n",
" WHERE data2 IN ( SELECT data2 from table_data GROUP BY data2 HAVING count(data2)>{0} ) \n",
")\n",
"GROUP BY\n",
" data2\n",
")\n",
"\"\"\".format( str(MinSampleSize) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The query above ingnores the categories that have a number of participants smaller or equal than 'MinSampleSize'. Moreover, the gene expression is ranked, assigning **average** of ranks to the similar values( https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rankdata.html). Finally, The Kruskall-Wallis score ($H$) is computed by the following BigQuery string. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" in runQuery ... \n",
" the results for this query were previously cached \n"
]
},
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Ngroups | \n",
" Nsamples | \n",
" Hscore | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5 | \n",
" 513 | \n",
" 7.857342 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Ngroups Nsamples Hscore\n",
"0 5 513 7.857342"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_hscore = \"\"\"\n",
"SELECT \n",
" Ngroups,\n",
" N as Nsamples, \n",
" (N-1)*( sumSi2overni - (sumSi *sumSi)/N ) / ( sumQi - (sumSi *sumSi)/N ) AS Hscore \n",
"FROM (\n",
" SELECT \n",
" SUM( ni ) As N, \n",
" SUM( Si ) AS sumSi,\n",
" SUM( Qi ) AS sumQi,\n",
" SUM( Si * Si / ni ) AS sumSi2overni,\n",
" COUNT ( data2 ) AS Ngroups \n",
" FROM summ_table\n",
" )\n",
"WHERE \n",
" Ngroups > 1\n",
"ORDER BY Hscore DESC\n",
"\"\"\"\n",
"\n",
"sql = ( sql_data + ',\\n' + summ_table + query_hscore )\n",
"df_hscore = regulome.runQuery ( bqclient, sql, [] , dryRun=False )\n",
"df_hscore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test our implementation we can use the 'kruskalwallis' function available in python:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"KruskalResult(statistic=7.8573421918435997, pvalue=0.096945947430261636)\n"
]
}
],
"source": [
"CategoryData = []\n",
"CategoryNames = [] \n",
"\n",
"for name, group in new_data.groupby( clinical_feature ) :\n",
" data = group[ gene_name ].values \n",
" \n",
" if ( len( data ) > MinSampleSize ) :\n",
" \n",
" CategoryData.append( data )\n",
" CategoryNames.append( name )\n",
" \n",
"if len( CategoryData ) > 1 :\n",
" print( mstats.kruskalwallis( *[ mydata for mydata in CategoryData ] ) )\n",
" \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}