{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Regulome Explorer Kruskal-Wallis test for numerical and categorical data\n", "CCheck out more notebooks at our ['Regulome Explorer Repository'](https://github.com/isb-cgc/Community-Notebooks/tree/master/RegulomeExplorer)!\n", "\n", "In this notebook we describe how Regulome Explorer uses Kruskal-Wallis test to compute the significance of associations between a numerical feature (Gene expression, Somatic copy number, etc.) and a categorical feature. Details of the Kruskal-Wallist test can be found in the following link: https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance \n", "\n", "To describe the implementation of the test using BigQuery, we will use Gene expresion data of a user defined gene and a user defined clinical feature. This data is read from a BigQuery table in the pancancer-atlas dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Authenticate with Google (IMPORTANT)\n", "The first step is to authorize access to BigQuery and the Google Cloud. For more information see ['Quick Start Guide to ISB-CGC'](https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/HowToGetStartedonISB-CGC.html) and alternative authentication methods can be found [here](https://googleapis.github.io/google-cloud-python/latest/core/auth.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import Python libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "from google.cloud import bigquery\n", "import numpy as np\n", "import pandas as pd\n", "from scipy import stats\n", "from scipy.stats import mstats\n", "import seaborn as sns\n", "import re_module.bq_functions as regulome" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## User defined Parameters\n", "The parameters for this experiment are the cancer type, the name of gene for which gene expression data will be obtained, and the clinical feature name. Categorical groups with number of samples smaller than 'MinSampleSize' will be ignored in the test. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cancer_type = 'LGG'\n", "gene_name = 'IGF2'\n", "clinical_feature = 'icd_o_3_histology'\n", "MinSampleSize = 26\n", "\n", "bqclient = bigquery.Client()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data from BigQuery tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Gene expression data from the BigQuery:** The following query string retrieves the gene expression data of the user specified gene ('gene_name') from the 'Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered' table available in pancancer-atlas dataset." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "query_table1 = \"\"\"table1 AS (\n", "SELECT symbol, data, ParticipantBarcode\n", "FROM ( \n", " SELECT \n", " Symbol AS symbol, AVG( LOG10( normalized_count + 1 )) AS data, ParticipantBarcode\n", " FROM `pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered` \n", " WHERE Study = '{0}' AND Symbol ='{1}' AND normalized_count IS NOT NULL\n", " GROUP BY \n", " ParticipantBarcode, symbol\n", " )\n", ")\n", "\"\"\".format(cancer_type, gene_name )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Clinical data from the BigQuery:** The following string query will retrieve clinical data fromthe 'pancancer-atlas.Filtered.clinical_PANCAN_patient_with_followup_filtered' table available in pancancer-atlas dataset. It is worth noting that some of the values of the clinical feature may be 'indetermined' or 'not-evaluated'; typically these values are inside square brackets. The 'REGEXP_CONTAINS' command is used to avoid using those values in the test." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "query_table2 = \"\"\"table2 AS (\n", "SELECT\n", " symbol,\n", " avgdata AS data,\n", " ParticipantBarcode\n", "FROM (\n", " SELECT\n", " '{0}' AS symbol, \n", " {0} AS avgdata,\n", " bcr_patient_barcode AS ParticipantBarcode\n", " FROM `pancancer-atlas.Filtered.clinical_PANCAN_patient_with_followup_filtered`\n", " WHERE acronym = '{1}' AND {0} IS NOT NULL \n", " AND NOT REGEXP_CONTAINS({0},r\"^(\\[.*\\]$)\") \n", " )\n", ")\n", "\"\"\".format(clinical_feature, cancer_type)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following query combines the two tables based on Participant barcodes. T" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "table_data = \"\"\"table_data AS (\n", "SELECT \n", " n1.data as data1,\n", " n2.data as data2,\n", " n1.ParticipantBarcode\n", "FROM\n", " table1 AS n1\n", "INNER JOIN\n", " table2 AS n2\n", "ON\n", " n1.ParticipantBarcode = n2.ParticipantBarcode\n", ") \n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point we can take a look at output table" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " in runQuery ... \n", " the results for this query were previously cached \n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
data1data2ParticipantBarcode
12.3595329382/3TCGA-E1-A7YW
22.8686929382/3TCGA-FG-7637
32.7131199382/3TCGA-TQ-A7RG
43.0649979382/3TCGA-DB-5280
52.5545189382/3TCGA-IK-8125
62.7997249382/3TCGA-DU-8163
72.8000629382/3TCGA-HT-8018
82.5582079382/3TCGA-DU-7019
92.7935149382/3TCGA-DU-5852
\n", "
" ], "text/plain": [ " data1 data2 ParticipantBarcode\n", "1 2.359532 9382/3 TCGA-E1-A7YW\n", "2 2.868692 9382/3 TCGA-FG-7637\n", "3 2.713119 9382/3 TCGA-TQ-A7RG\n", "4 3.064997 9382/3 TCGA-DB-5280\n", "5 2.554518 9382/3 TCGA-IK-8125\n", "6 2.799724 9382/3 TCGA-DU-8163\n", "7 2.800062 9382/3 TCGA-HT-8018\n", "8 2.558207 9382/3 TCGA-DU-7019\n", "9 2.793514 9382/3 TCGA-DU-5852" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql_data = 'WITH\\n' +query_table1+','+query_table2+','+table_data \n", "\n", "sql = (sql_data + '\\n' +\n", "\"\"\"SELECT * FROM table_data \n", " ORDER BY data2\n", "\"\"\")\n", "\n", "\n", "df_data = regulome.runQuery ( bqclient, sql, [] , dryRun=False )\n", "df_data[1:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use a 'violinplot' to visualize the populations in each category. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXYAAAEQCAYAAACk818iAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd0W9ed6PvvRiMBEGDvlKhuFapYVrFKbDXLkrsT2yku\nsR1PeTNjZzLzptxZs17sO3PffTPrvUmdOMUtdmLHsZ3EvduUJVnF6qK6RFGkKLF3EkTd7w8AFCmz\nijgHh8D+rMVFEDg4Z/Pw8Id9dvltIaVEURRFSRymeBdAURRFiS0V2BVFURKMCuyKoigJRgV2RVGU\nBKMCu6IoSoJRgV1RFCXBaB7YhRDpQohXhBDHhBBHhBDLtT6moihKMrPocIwfAe9IKe8WQlgAhw7H\nVBRFSVpCywlKQgg3sF9KOV2zgyiKoigDaN0UMxVoEkI8K4TYJ4T4pRDCrvExFUVRkprWgd0CLAb+\nW0q5GOgB/lnjYyqKoiQ1rdvYzwM1Uso9kZ9fBf7p8o2EECphjaIoyhhJKcVgz2taY5dS1gM1QohZ\nkafWA0eH2DauX9///vfjXgajfKlzoc6FOhfGPxfD0WNUzGPAb4UQVqASeEiHYyqKoiQtzQO7lPIg\nsFTr4yiKoihhauZpxJo1a+JdBMNQ5+ISdS4uUefiEqOfC03HsY+6EEJII5RDURRlohBCIOPReaoo\niqLoTwV2RVGUBKMCu6IoSoJRgV1RFCXBqMCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVESjArsiqIo\nCUYFdkVRlASjAruiKEqCUYFdURQlwajAriiKMgZdXV289NJL8S7GsFRgVxRFGYPW1lYqKiriXYxh\nqcCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVHGIBQKDfhuRCqwK4qijEE0oAeDwTiXZGgqsCuKooyB\nqrEriqIkmEAgAKgau6IoSsKIBnQV2BVFURJENKBHa+5GpAK7oijKGKgau6IoSoKJ1tRVjV1RFCVB\nqMCuKIqSYFRgVxRFSTB+vx9QgV1RFCVhRAN6NMAbkQrsiqIoY+D3+wBVY1eUCcvItTIlPgK+cGA3\n8rWhAruiDKG3t5fHH3/c0DUzRX8+vw+TSajArigTkS9SMzNysidFf36/H3uKRQV2RZmIJkJ6VkV/\nfp8Ph8EDu0XrAwghqoB2IAT4pZTLtD6mosTCREjPqujP7/fjSLX23dEZkeaBnXBAXyOlbNXhWIoS\nMxNhIoqiP5/fR1qKBb+vN95FGZIeTTFCp+MoSkxNhGRPiv78PuPX2PUIuBL4UAjxhRDiz3Q4nqLE\nxESYYajoz+f340ixGDqw69EUs0pKeVEIkUs4wB+TUm67fKPHH3+87/GaNWtYs2aNDkVTlKFNhBmG\niv6ibexNbfoG9vLycsrLy0e1reaBXUp5MfK9UQjxR2AZMGxgVxQj8E2AiSiK/nz+AE67FV+DV9fj\nXl7hfeKJJ4bcVtOmGCGEQwiRFnnsBDYCFVoeU1FiJRrQjXzLrehLSonfH8CRYsXnN+51oXWNPR/4\noxBCRo71WynlBxofU1Fiwuv1DviuKIFAAJNJkGIz4/cZ905O08AupTwLLNLyGIqiFRXYlcv5fD5s\nVgtWixmfgZvo1DBERRlCb294nLIK7EqUz+fDajFjs5rwGbjGrgK7ogzB0+vBYrXg8XjiXRTFIPx+\nP1arWdXYFWWi8ng8OFwOPL0qsCthXq8Xm8WMzWLC7w8gpYx3kQalAruiDMHj6cHpdtDT0xPvoigG\n4ff7sVrMCCGwmM2GHQqrAruiDKHH4yEt3aGaYpQ+4Tb2cNi0Ws2GHQqrAruiDKHX04vT7VSBXenT\nP7DbrMZN3asCu6IMobe3F6fbQa9qY1ci/H4/VnOkxm5RTTGKMuF4e8M19t5eNdxRCfP7/VjMAgCL\n2aSaYhRlIgmFQuGp42l2vCqwKxHhGnsksFtMhs38qQK7ogzC5/NhtVqx2CwEAgG1ipIChFMKRGvs\nVrNJNcUoykTS29uLxWYJD2uzGjv3tqIfv9+PORLYzWZVY1eUCcXv92OxhlMpqcCuRAWDAcymSGA3\nCcOurqUCu6IMwu/3Y7GYAbBYLIatmSn6CgYCmE3hsGkSxl02UQV2RRlEIBDAZA4HdiPfciv6CoVC\nmMSlGrtR+15UYFeUQYRCIUyRW25h4H9gRV9SSiJxHSGEyhWjKBNJ+B84EtiFCuxKmJShfoEdw14X\nKrAriqKMibj0SIhhtosfFdgVZRAmk6nvNltKicmk/lUUANHvuohzUYahrlZFGYTJZCIUDN9mh0Iq\nsCthJiGIxnMjf+Abs1SKEmcWi4VgJLAHA0EsFq3XfVcmApPZRCgUDu0hiQrsijKRWK1WgoHwGOVg\nIIjVao1ziRQjMJktlwK7ge/kjFkqRYkzm81GwB8eu+73+7HZbHEukWIEZrOFYGQkTDAkMUfmOhiN\nCuyKMgibzYbf70dKqWrsSh+LxUIgGK6xB0Mhw14XKrAryiCiNXa/L5wzxqi33Iq+rFYrwUhTTCAo\nDdv3oq5WRRmEyWTCYrXQ292rmmGUPlarFX+kxu4PhFRgV5SJJiUlhZ5ODykpKfEuimIQVqu1rykm\nEDRuE50K7IoyBFuKDU93LympKrArYVarFX8g3HnqD4QMezenAruiDCHFloKn20OKQf95Ff3ZbDb8\nkfkNfn9ABXZFmWhsKTZ6e7zYbKrGroTZbDb8gXBTjM/Ao6VUYFeUIVitVrwer2FrZYr+wk0xQaSU\nBFRgV5SJx2qx4vNeWiJPUcI19iCBYAiz2WTYYbDGLFUc9Pb2xrsIisFYLGaC/iAWswrsSlg4sIfC\nHacGra2DCuxAOKj/27/9G42NjfEuimIgQpgIBoN9KykpitVqxe8P4DN4YjgV2IGenh4Auru741wS\nRVGMLNrGHlA1duOLBvbod0WB8DJoJtOlNK2KEg3s/kAIq4H7XnQJ7EIIkxBinxDiDT2ON1bRmnpX\nV1ecS6IYSTAYwmI1EwwG410UxSDMZjNCCLx+446IAf1q7N8Fjup0rDHr7Owc8F1RIJKuNzUFv98f\n76IoBmK1mOn1BZK7jV0IUQLcBDyl9bGuVHt7O3arlfbW1ngXRTEQr9dLqiMFr88b76IoBmI2m/H6\ngliSvMb+A+AfAMM2VLa3tFCc7qK9rS3eRVEMpNfbi8Nlp9ejhsIql1gslkiN3ZiLbIDGgV0IcTNQ\nL6U8AIjIl+G0t7dT5HbT0dER76IoBuLp8ZCWnobH44l3URQDMZtN+AycshdA65KtAm4TQtwE2AGX\nEOJ5KeUDl2/4+OOP9z1es2YNa9as0bhol3R1dZFXlM+eC3W6HVMxPo/HgzvTpUZLKQOYTSb8gSAm\nnWvs5eXllJeXj2pbTQO7lPJfgH8BEEJcD/z9YEEdBgZ2vXk8HrLsdjxeL1JKhDDkjYVuGhoayM3N\nTerz4PWG29Wdbge9nl5CoZBhp48r+jKZTQSDErNN38B+eYX3iSeeGHJbdaUCvoCfFIsZkxAEAoF4\nFyeuurq6+NGPfkRbkvc3dHZ24nDaMZlN2FJtaiis0sckTARDIYSBZyTrFtillFuklLfpdbyxkBKE\nEAghkNKwfby6iObM8fl8cS5JfHV0dGB32gFwOB2q/0UZIBwmVGA3NLMp/AkcDIUwm43b062HaECP\nNkUkq/b2duxpqQA40lJVYFf6hCuAYOSWShXYgZQUG10+H2aTKekDe7TGnuyBva2tjVRneIGN1LTU\npG+aUi6RUmIyRWvtxqQCO+C0O2jq6sFpt8e7KHEXDezJPsSvta0Vuyt8PdhVYCcUCnHixIl4F8MQ\nQn05hIybakIFdsDldlHX2YUrLS3eRYm76NA+FdhbcaQ5AHC4HLS2Jfes5KamJp5//nmVXoFwDiGr\n2WToHEIqsAPpGZlc6OggPSMj3kWJO5XCOKy9rQ1HpMbucNmTPrBHm+aSvVMdIBgMkmIzdnI4FdiB\n9MxMLnZ0qsAOdHV1Y0u105nEw/uklHS0d+J0OXjlp3/C4XLQ0Z7cnafRwJ7sfS8AgUCAVJuZgIHv\nXlRgB9xud/i7Cux0dnXicGXS2Zm8gd3j8SBMAmtKOMmT3ZGKx+NJ6jkO0aa5ZG+iA/D5AzhTbYa+\ne1GBHXC5XAO+J7Ourm6c7oykboppbW0lze3s+1mYBM40B+3t7XEsVXypxWjCpJT4/QGcdquh715U\nYAeczvA/scPhiHNJ4q+7uxuHOzOpA3tbWxsO18BrweFyJPXIGBXYw7xeL1armVSbhV4V2I0tNTU8\nEcWuhjvS09ODw5VObxLfcre1tfVNTopyuOy0JnG+/p5uFdgh/PvbU2zYU8x4DJzOedjALoRwCyH+\ntxDiBSHEty577WfaFk0/KSkpA74ns95eD/Y0N729nqRNr9Da2vqlwJ7sY9k9Hg/2FHvSt7F3d3fj\nSLVhT7HS6/USCoXiXaRBjVRjf5ZwQoTXgG8IIV4TQkSj37WalkxH0bULjZxfWQ/BYJCA348t1YGE\npB2z3NLajMP95aaY5pbmOJUo/rxeL3a7A6/XuB2Geujq6sKRasFkEqTYrIa9gxkpsE+XUv6zlPJP\nkQRe+4BPhBDZOpRNN9F0rMmeTsDr9WK12RBCYLOl9M1CTTaXd55COH1vS0tLnEoUf4FAEJvFRjCY\nvCODIJz102UPVwBdDuPmEBopsKcIIfq2kVL+L+BXwGdAwgT3ZM473p/H48FqCzdBWG0pSXnbLaWk\nra0d52U1dqfbmdRt7BBep0CGkrN5Lqq9vY201HAF0OW0TdjA/iawrv8TUsrngL8HEu6eLFnblKPC\ngd0GgDUlOQN7T09P+I4l1TbgeUdaeO1TI49d1pLJZCIQDGAyJ/d4i9aWFtLTwq3RbofVsP0uw/6V\npJT/KKX8aJDn35NSztSuWPGR7DX3cGAPX7QWa4ph2w+11NTUhCvjy/MZhEngSk9L2lq7xWLB5/f1\n9Uclq9aW5r7AnuG00tLcFOcSDW6kUTHP9Xv8bc1Lo8RVd3c3lkhTjMWWvIE9LcM56GtpGWk0Njbq\nXCJjsNls9Ho92Gy2kTdOYM0trWS5w8OiM92pNDU2xLlEgxvpvmphv8ff1bIgSvx1dnb2C+ypdHZ2\nxrlE+mtobCAtY/CJas4MBw0NxvxH1prNZsPr8yZ1YPd4PPj8ftLs4buWbHcqjU0TsMYOJHejc5Jp\nbWvDmhoOaja7k9ZWY7Yfaqm+vh5X5uCpJdyZLuob6nQukTFEA3oyN8U0NDSQm5nW12Sb6Uqlo6PL\nkMOCRxq4XSKE+DHhsezRx32klI9pVjJFd83NLaRmTwIg1ZFG88UzcS6R/hoa6pm+tHTQ19Kz3VQd\nPqJziYwh2mmazEOC6+rqyE2/NHHNbDaRme6koaGB4uLiOJbsy0YK7P/Q7/EeLQuixF9zczPTSucB\nYE9LpzrJJuR4vV66u3twpg/eFOPKTKO1pZVgMJi0AS6ZR45dqD1PXubAGcn5mXYuXrw4sQK7lPLX\nehUknqIXazJftIFAgI72dhxp6QDYnS66u7rx+XxJ065aV1dHelZ634S1y1msFpxuJ01NTeTn5+tc\nuvjy+8LNDUZsdtDLhQvnmbMwd8Bz+ZmpXDhfA0uWxKlUgxtpVMxqIcQD/X5+VQjxSeRr3XDvnUii\nK6EYeUUUrTU2NuJwuTBFaqLCZCItPYP6+vo4l0w/Fy9eJD17+NTNGdluLl68qFOJjMPT68FmTd7Z\nyH6/n4bGZvKzBt7NFeWkUVNTHadSDW2kztMnGNgEcxXh5pnHgX/UqEy6U6vDQG1tLWnpAycTp6Vn\nc+HChTiVSH+1F2pJz3EPu407x8X52vM6lcg4ujq7yXRn0pWkC7BcuHCBnIw0rJaBTXD5WU4am5oN\nN3FtpMDullIe7ffzKSnlXinlZ0DCrEoRzT2ezDnIq6trcFwW2B0Z2ZyrNl5tRCu1tbVk5g2/ilZG\nbga1SRjY29vbyM3KM+xMS61VVVVRkvvlvherxURelovz5411TYwU2Adc5VLKr/b7MWEaGaOzCZP1\nogWorqnBnZU34DlXVi7V1TVxKpG+AoEAzU3NpGcPX2PPzE2nrq7esOlatdLe3k5hbmHSzrytPHOK\nSXlpg742Kc9BZaWxRpCNFNiPCyFuvvxJIcQtwAltiqS/aDtyfRK2nUK4Caq1pZm0jMuaYtxZdHV2\nJkXOmLq6OtyZLizW4QeKpdhTSEmx0dycPCOGenp6CIVC5GXn09rWmnSDDAKBANU155mcP/iHfmm+\nmzOnT+pcquGNFNi/B/yXEOJZIcSjka/ngP+KvJYQzp4+zTUlRZytrIx3UeLi/PnzuLNy+zpOo4TJ\nRHp2LtVJ0BxTU1MzYjNMVFZ+puFuvbXU3NxMujuDFFsKyORbRam6upqcdAeO1EuTs/738zv7Hk/K\nd1FX12CojuWRkoCdBhYAW4Epka/PgAVSSmN9RF2h3t5eztXUsGLKZDo6OpIy5/bZs1WkZVwaxrXl\ntaf7HjszczlbVRWHUumr5nw1Gbnpo9o2Pc9NtQFHQmilubkZd5obIQQZ7gyaDDqNXisnThxnauHQ\nXYpWi5mS/HROnz6tY6mGN2IOTimlV0r5jJTy7yNfz0gpjfPRNE579+5lSlYmTpuNufl57NqxI95F\n0t2Zykpc2YN3mbizC6isPKtzifRXXV1DdkHmqLbNLsiiuvqcxiUyjsaGRtyOcDOEOy09qQK7lJJj\nRyqYWTL83dyMIhdHjxzWqVQjG2kce6cQomOQr04hhDEzzI+B1+tl65YtLC8pAmBJcSF79+41bPJ8\nLQQCAS5eqCU9Z/DAnp6dR319neGGc8VSd3c33V3duLOG7ziNyszNoMmAQ9y00tDQQLorHNhcDndS\nZbhsaGjA7/dRkD14xs+oWZOzOHHyJIGAMVaYGqkpxiWldA/y5ZJSju6/wMA+/uhDJqe7KXSHb7PS\n7aksKMzn7TffjHPJ9FNdXY3TndmXh/1yZosVd2YO584lbg313LlzZBdmIUyjy8dvtpjJzMlImnb2\npqYm0l3hZqp0VzqNDckT2A8ePMDsyZkjrtXgctjIzXBy6tQpnUo2vKRdDuXMmTMc3L+fddOmDHh+\nVekkLtRUs3///vgUTGfHT5wgPbdo2G3cOYUcO35cpxLp7+zZs2QXjq4ZJiqrMIvKs4nf2R4KhWhp\nbemrsWe4M5MmdXEoFOLggQPMnZo1qu3nlqZzYP8+jUs1OkkZ2FtbW/n9737HzVfNxGEbmIbUajZz\n++xZvPPWmwk/dVxKyZEjR8gumjzsdtlFpRw9cjRhx26fqTxDblHOmN6TW5zNmTPG6SzTSnNzMw67\nE6vFytOv/pJ0VzodnR1J0QxVVVWFzSwpyBq+GSZqzpQcTp0+ZYhRQ0kX2D0eD88/9xzLiguZkjV4\nLS3PlcaG6dN4/rnnaG9v17mE+jl//jyBYIi0jOGDmtOdiTBbErI5pru7m9aWVrLyx1Zjzy3M5uLF\nxO57gPBs3JzMS9eH2WQmMz0z4Ss9ALt37WT+tKxRL5lpT7EwozjTEHf7SRXYfT4fzz/3HJMcqSwp\nGb75YU5+LosL8nj26acN8QmshR07d5I7eeaIF64QgtzJM/l8x85ht5uITp8+TV5J7pgXabbYLGTl\nZVKZ4HMfzpypJDdz4IzkvKz8hP+9Ozo6OHXqFAtm5I68cT+LZ+Wyc8f2uN/dahrYhRApQohdQoj9\nQojDQojva3m84fj9fn7zwvO4QgHWT5/6pWD2H5989qX3LJ9cwjSXk2eeeirhZl+2trZy/NhxCqdc\nNartC6bM4syZ0wk34/LY8WPkTR5bM0xU3uRcjp84FuMSGUcoFOLkiRNMKpg04PmS/EkcP5a4fS4A\nu3buYM6ULFJtIy1ZMVBxbho2s+TkyfhO89E0sEspvcBaKeXVwCJgsxBimZbHHIzf7+e3L7yAtaeH\nm2aNXEPt7/qppRSlWHn26acTKri//8EHFEybjTUldeSNAYvVRtG0ebz73nsal0w/wWCQU6dOUTSl\n4IreXzS1gGPHjse9dqaVM2fOkJpi7+s4jSrMK6KlpSVhx7N7vV52797Nsjljvy6EECyfk8eW8k80\nKNnoad4UI6WMtmOkEF7YQ9dEEz6fjxd+/Rzm7i5unT0L0yiHtEUJIVg/fSoFVjPP/OpXCdEsU1lZ\nyenTZ5g0a8GY3lc8s4xz1efjXhuJlcrKSlzpadjT7Ff0/nBuGXNCDnuUUlL+aTlzps390mtmk5lZ\nU2azpXxLHEqmvV07d1Ba4CLLPbpKz+Vml2bT2dEW1+YqzQO7EMIkhNgP1AEfSim/0PqYUR6Ph2ee\negq713tFQT0qGtwn21P45ZNPTugOVY/HwyuvvMr0RSsxW8a2MLHZYmHGopW89oc/JsQH3IGDByia\ncWW19ajiGYUcPHQwRiUyjoqKCtrb2pkxeeagr5fNnM/x48cT7kPN6/Wybds2Vs8vvOJ9mEyCVWUF\nfPTh+3FLmKZHjT0UaYopAZYLIb5cBdBAW1sbv3zySfLNJm66asYVB/UoIQTXT5vCvKwMfvHkkxNy\nZaFQKMTLv38Fd14J2YXDD3EcSmZ+MVmFpbz0u5cndBOE3+/n2LFjTJoxvrUqJ80s4dChQwm1+lZb\nWxtvvvEmq67+ypDLBKbYUli24Fpe/t3Lhkp+NV7btn7GlEIXORmDr3s7WvOm5tDV0Rq3CUtj6xkY\nByllhxDiU2ATcPTy1x9//PG+x2vWrGHNmjVXfKyamhp++8ILLC0qYOmk2C4yu3xyCS6bjad++Uvu\nuucerrpqdJ2PRvDBBx/S1NpO2apN49pP6bwlHP38fd55911uuflLWZ0nhKNHj5KVl3nFzTBRrsw0\nnG4Hp06dYvbs2TEqXfz09PTw7LPPUjZzAfk5w9/NTJ80g8aWBl54/gUefOhBrNax3QEaTVdXFzt2\n7ODBzXPGvS+TSXD9omLee/dtZsyYMeQH5FiUl5dTXl4+qm01DexCiBzAL6VsF0LYgRuA/2ewbfsH\n9vHYu3cv773zDptnTWdGTvbIb7gCcwvycKem8Iff/55V113HV667bkwdsvGwc9cu9h04yILrb/5S\net6xMplMXLVsHYe2vEVmRgarVq2KUSn1s/uL3UyeXTLsNq/89E993+/+mzuG3G7y7BJ2f7F7wgf2\n9vZ2nn3mWYpzSiibOX9U71m24Fq27tnCc88+x/0P3E9q6pW1SxvBhx+8T9m0bDJcsfkdZk3KZPex\nBvbt28eSGCx2fXmF94knnhhyW62bYgqBT4UQB4BdwPtSyne0OJDf7+dPf/gD5R98wDcXztMsqEeV\nZKRz/+IFHNq9ixd/8xtD344eOHCAjz76mHkrN2JLGV8NNcpqS2Heyo2Ub/mMPXv3xmSfemlqaqKu\nvo7i6Vfejtrf5JklVFVVTei+l+rqap782ZNMKZzKkrLRD1wzCRNfWXI9DquTJ3/25IQdKVNXV8ex\no0dYNX/4+S1jIYRg/TUlfPjB+7qvp6z1cMfDUsrFUspFUsoFUsr/pcVxmpub+cWTT9JxvoYHFi8g\nxzm6KcDj5U5N5VsL55PS1cl//+Qnhlz4+fDhw7z51tvMW7kRe1ps87alOl3MW3Uj7773PgcOHIjp\nvrW0Y+cOpsyejHmcdy5RFpuFSTOL2f3F7pjsT0+hUIjy8i288PwLXLtgJQuuWjTmfZiEiRWLVnFV\n6Rx+/uTP2bt374RaZUlKyZtvvM6q+YXYU2LbiFGUk8bUAheffqrv8McJP/O0oqKCn//sZ8xNT+P2\nuVeRYtGt2wAAi9nExlnTWVVUwLNPPcXuXbsMc1EfPnyYP73+BvNWbsSZPrpERmPlcGVQtupG3nr7\nnQkR3L1eLwcOHGBaWWlM9zt9/hS+2L3bMGlbR6OhoYFf/PwXHDl0hFvW3M7kovGdk9nT5rBp9U1s\n+XQLzz///IS5g6moqKCns5WrZ2mzjPOaq4vZ88UXuk7um7CBPRgM8s5bb/HO66/ztbLZLCkpjms7\n99yCPL61aD7byz/l1d//Hr/fH7eyABw6dKgvqF++lmmsOd2ZzFt1I2++/Q779hkju91Q9u3bR25R\nDk53bO/q3Flu3NluDh06FNP9asHn8/HB+x/wy1/8kkm5pWxafRMu59ArBI1FVkY2t669A6cljZ/8\n+Cds377d0COG/H4/773zNhuWlIx75NxQ0hw2rp2bzztv65cOfEIGdo/Hw6+ffYYLp07y7WsWUuQ2\nRmr4bKeD+69egK+xIa7j3ffv388bb77FvFU3ah7Uo5zuTMpWbeKdd99jz549uhxzrEKhENs/3870\nBVM02f/0BVPYtn2bYe7YLielpKKigh/+4IfUVNVy+/qvMnfGvJhXiMwmM4vnLuGm627l4L5D/PQn\nP+XsWWOuwrV162fkZ6VSWjC6ZRGv1JI5BdRfvKDb8McJF9g7Ojr41c9/TnogwF1lc7EbbIiVzWzm\nltkzmZHm4BdP6t+ZtHfvXt5+591wUNeo+WUoTncGZas3894HH7J7t/Ham0+ePInJIsgp0ubDrqA0\nn16vhyoDrhFbV1fH0089zfvvfsDKRatZu2wdTru2fVEZ7gxuXLWZsukLePl3L/Pb375Ia2urpscc\ni87OTj7fvp21V49tSHR0Iev+C1qPxGI2sXZxMe++/ZYu8z8mVGDv6enh2aeeYqY7jQ0zpsbs1ima\nAGywRGBXQgjBitJJrCgu4Olf/Yq2traY7Hcke/fu5d333qds1Sac7rGloY2KLmTdf0HrsXC40pm/\nejMffvQxuwwW3Ldu+4zpC7+cAC5WhBBMXzCVrdu2arL/K+HxeHjj9Td46ldPUZBRyO3r7qQoL7Zz\nO4YjhGBqyTS+uuFuUkjlpz/5KR9//HHcmyoBPvnoQ+ZPzyYzRsMbRzJrUiY2U0CX5soJE9hDoRAv\nv/QSk512Vk2ZbPhx4wALiwq5piCPF379a8071fbv398X1B3u4Rfe1Zo9zU1ZJLh/YZBmmfr6ehoa\nGsc903QkpbMnce5cFS0tLZoeZyRSSvbv388P/usHtDd38tUb7mbujLKYTJS5EhaLhcVzr+G2dXdy\n9lQVP/xFA6R9AAAgAElEQVTBD+O6jFxrayuHKw6zYl5shryOhhCC6xcV8eknH2keDyZMYN+/fz/d\nLc2suWwpO6NbOqmYNCRbP4vN3cBgDh8+3Nf8Eu+gHmVPc1O2ahPvv/+BIUbLbP98O1PnlY457/pY\nWawWSmdPZsfOHZoeZzjt7e08++yzfPpxOeuvvYFVV68mdZRZPLXmcrpYt3wDy+ev4A+v/YGXX345\nLllTyz/9hKtn5uFI1bcpd1K+m8w0q+aLcUyIwC6lpPyTT1g7tRRznGocV0oIwdppU9i+fZsmt5+n\nTp3ij396nbkrbrji5hetOFzpzFu5kTffepvjcVwztbe3l4qKCqbNi+0Qx6FMK5vC/n374tLccPz4\ncX7605+SnprJbWvvIDcrb+Q3xUFJwSTuWP81Aj1BfvzjH1NTU6Pbsbu7u6moqGDpHG2GN47k2rn5\nbN/2maad7BMiSjY0NBDy+ynJ0LbnWitZDjvZTmfMRwZcuHCB3738MrOXr9Nt9MtYOdOzmLN8Pa+8\n8mrcMgEePHiQ/JJcUp361FrT0p1k5GZQUVGhy/Gitm/fzh9e+wPrlm3g6jmLY9rs8vSrvxzwPRas\nFisrFq1i6bxr+fVzv9btfO3Z8wVXTc7UvbYeVVrghpBf05FCEyKw19XVUeB2TYh29aEUOB0xzQjZ\n1dXF88+/wNQF15IxQrKmeHNn5zF90Uqef+E3cRkCumfvnhHzwsTa5Nkl7N2nX6qFnTt2sn3rdm5Z\nc/uIybuMZkrxFG5cvZnXX3+DY8e0X5Hq4P59lE2LX0VICEHZlEwO7NeuE3VCBHav10tqjKZ/x0uq\n2RyzfDKhUIgXX/odWSXTyCuZFpN9ai2neAp5pbN48aWXdJ2w0tLSQktrC/mT9W2SKJ5WSG1tLV1d\nXZof6/z583z88cdsXLWZNEea5sfTQnZGDuuX38AfXvuDph/+TU1NdHd3MykvNhOyrtScKdkcPXZU\ns6GPEyKwW61W/BqO/bRYLAghsGiYjsAfCsUsren27Z/T2dNL6ZyrY7I/vUy6aiEef4jPtuo3HPDI\nkSMUTS3UfTSI2WKmsDSfo0e/lKE65t55+x2umbsEd4xzAektLzuPWVNm8+EHH2p2jKqqKkoL0+N+\n95+elkKq1UJjY6Mm+58QgT09PZ12DbOjBYNBHn74YU1rkh0+PxkZ4x+x0tnZyaflnzLj6tUIMSH+\nfH2EEMxYtIqtn23VrUnm+MnjFJReWW19vB/4+aV5nDh54oreO1rNzc00NjYyo3SWpsfRy7wZZRw5\negSfz6fJ/s/XVFOUPb4Mp7GqCBbnpmnWaTwhIkN+fj6NHZ2ENOpFNpvNPPPMMzHL9jeYhq5u8vPH\n3wu/dds2ckumxzxTo15SnS7yS2eyRcPhn1HBYJDa87XkFl9Ze+p4P/DzinOoqqrSdPRDZWUlxfkl\ncRufHmupKalkZWRrFvBaW5rJSEsZ1z5iVRFMd1poa9NmJu6EuBqcTieutDTqO7VprwwEAkgpNZs0\n0OX10u3zjTuwSyk5cOAgBVO1W9BBj2apgqmzOXjwkOZTqxsbG3GkObCl2K7o/eP9wLen2TGZTZpO\no2+obyDDZYy5C7GS4cqkoaFBk31393TjHOdomFhVBJ2pVro6O8e1j6FMiMAOMHvuXE426Zf2MpZO\nNjYzKwbLY3V2dhIIBHBqOAlJj2Ype5obk8mkeaqFhoYG0rOu/M4mFh/4GdnpmgUpCK9PmuaIb0dg\nrKXZnZpdG4Lxt63HsiJo0qitf8IE9muWLKGiroFAcGItoCyl5EBdA9csG/2qNEPxer1YbeO7jRyJ\nHs1SABZbimbtqFEdHR2kjvO2e7zszlQ6NaqVAXR0dmBPjc2qWEZhT3XQ2aHNObPZbHj9xkgj7PUH\nsaVoc31OmMCel5dHyaRJ7Ks13ipFwznW0IjVbmf69Onj3pfL5aLX000oqF2eCa2bpSA8XNPT3YnL\npW1N0+PxYLHpu/DK5SwpFk2XTez19JJi1f7DS48muqgUW4pmaQaysrJp7TTGMpZtXX6ysnM02feE\nCewAm2+5hZ01tbTFIbfElejx+SmvPMfNt90Wk+FVqampFBQU0nShOgali5+Wi9Xk5OTi1HgJQykl\nRpjTpmXnafh31P6X1KOJLkogNOt/yS8sor7VGPGjvrUnJgMqBjOhAntOTg5r163j9WMn8Rt4VRaA\nkJS8c+I0CxYtYsqUKTHb74b166g5vo9gIP5pT69EMBCg+tg+1q9bq/mxUlNTCfjie50E/AFSNLrd\nBkhzuej2dGu2/yi9mugAuj3dmt3NTZs2jXP12k8aG4nHG6Clw0NJiTYzoidUYAdYuWoVucUlvHvi\ntGFXqgEoP1NF0G5n46ZNMd3vzJkzmTFtGqf2GXelnqFIKTlz8HMmTyph9mztRvZEuVwuerv1XR3+\ncr3dXk2bnKZNm0r1xXOa7T9Kjya6qJq6aqbPGH/T5WAKCgoIBCWNrT2a7H+0Tla3MH3aVM0+KCdc\nYBdC8LW776bbYuXjM2cNGdx2VZ+nqruHe++/X5M2ydtvvw0rAc4c3GHI338wUkrOHt6F8Pfwta/e\nqUvzQV5eHh0tHZofZzjtzR2a3W4DLF++nKraszS36btSl1aqL1bT0d3O/PnzNdm/yWRi4cJFVJy9\n8hF2/+OBawd8vxIVVa1cvXjJFb9/JBMusEM4xcADDz5IrcfL9qrxtzf/07rrBnwfjwMX6jjQ0MRD\n3/kODodj3PsbjM1m46EHv43wdXNyzxZNO1NjIRQMcmrfVoI9bTz04IOaNk30l5+fT1dHNz6vtqNv\nhuLp8hDwB8jM1C6dssvl4rbbb+OjHR/S3qnPSl1aqW+qY9u+LXz9G1/XtJN2ydJlHDrdhC9Oo2Pq\nW7pp6fBy1VVXaXaMCRnYAex2Ow898gjHW9vZc7423sUB4HhDI5/X1PLwI4+Qnq5tiuHU1FQe+c7D\nZLlSObz1Xbw9sWlnvf5r3xnwfby8nh4qtr+HO8XMnz3yiGYfdoOxWCxMmjSJhhpt8nGMpK66galT\np2o+K3TBggXcsHED73z2Fufr9MtrHitSSk5WneDjnR9yzz33UFqqbd78nJwcJpdO5vCZ+FwXu481\nsGLFSk0/vCZsYAdIS0vjoUceYXdtHccb4vNHiqpubePD02d54MEHyc7WJyWo1WrlW9/8JksWL+RA\n+Ru01Mcn3/lQWhsucLD8DRaVzeW+++7FZruyGaDjMb9sPrVn6nQ/LsCFMxeZX6ZNk8LllixZwr33\n3cvnB7ex48B2fP743KWMlae3h093f8yxs0f487/4c2bN0ifnzdp1G9hxpB5/QN95Mc3tHiovtLP8\n2itvxhmNCR3YATIzM3ngwQf58PRZ6jSa1DCSNo+HN46d5J5vfIOioiJdjy2EYO2aNdz7rW9SeeBz\nzlZ8ocsq6MORoRBVR/dxet9Wvn7P3WzYsD5uuUzKysqoq67H69E30Hm6PDTXtzJnzhzdjjllyhS+\n+93vkpJm448fvcrZ85WG7YMJyRDHzhzljx+9RnFpEX/z6N+Ql6dfauWSkhKKSyax70Ts1kgYjc8O\nXmDVqtXY7dpOKpvwgR2gqKiI2++8kz8dPUGvzsuRBYIh/nT0BGvWrWPmzJm6Hru/adOm8d3HHsUa\n9HB469t4uuPzIdfb3cnhbe8iett57NG/ies5AXA4HMyZM4ezR8c+cuTuv7ljwPexOFNRxaJFi3S/\nS7Hb7dx1911845vf4PCZg7y//V1a28e/sPZ37vrzAd/Ho76pjjc/fZ3zTdU88mePsHnz5piltB6L\nGzdtZueROnp69YkZ5xs6udDcw8pVqzQ/VkIEdgjXzObOn8+7J8/oWkspP1tFdmERK1au1O2YQ3E6\nnTz04LdZsfQaDpW/SeN57ZbeGkxTbRUHt7zF0sUL+c7DD2k+s3S0Vq9azZnDZwkG9Oks8/v8nD1y\njpUr4ndNTJ06lUcffZRF1yzk3W1vs+vQzrg3z3h6e/hsTzlb9nzK2vVr+PO/+HMKCuK32lNeXh4L\nFi5g6yHtZ7NLKflo73lu2LhJlw/7hAnsADdu3kxbMERFnT63V2dbWjnV0sadX/ta3BP3RwkhWL16\nNQ8//BC1x/dx5sAOQhpP5gqFglQe2kXN0T08+O0HuP666wyVRraoqIji4mIqj1TpcrzTh84yc+ZM\ncnK0mS4+WmazmZUrV/K3f/u3WB1m/vjRa5y7UKV7OaSUnDh7nD9+9Br5xXl87+++x6JFiwzxP7N+\nw0ZOVLdR36LtJK9DpxuxpDhZtGiRpseJMs5/XwxYrVbu+cY3KK88R7tH23wQHr+f906e4at33aXr\nSI/RKi4u5rHHHsVpDXF42zt4NZqd6PX0ULHtPVLw8dhjjzJp0iRNjjNeGzds5MTe0/h92t52ez0+\nTh+sZMP6DZoeZyzS0tK46+67+Po37mHv0S/YuncLfp1mLnt6PXy0431O157kO498h803bdZtuOto\nOBwONtywkQ++qNHsTt/jDbDl4AVuve0O3So8CRXYAQoLC/nKddfx1olThELa/KGklLx/qpJ58+fH\nvQ15OKmpqdx/330sWbSAg+Vv0tka25FDXW3NHNryJovK5vDtB+7XvENoPIqKipgxYwYn95/R9DjH\n956krKws7rX1wUybNo1HH3sUR7qDt8pfp6NL28lbjS0NvPHpH5kyfQp/9Vd/RWFhoabHu1JLlixB\nmlM5XKnNJK8tB2opKyujuLhYk/0PJuECO8Dq664jJT2Dz6q0mWq9v/Yi7SHJjZs3a7L/WBJCsHbt\nWu6843aO7viQ5hiNc26pr+XI5+9z2623sGHDBkM1vQxl042bOHP4LD2d2kwn72ztpPrEeW7YcIMm\n+4+FlJQU7rrra6xcvZJ3PnuTlhh0rA7mQn0tH37+Prfdfhs3brpRlxwzV8pkMnHb7XeyZX8tvb7Y\nTva72NzFyZo2btgY29QiIzH+f+MVMJlMfP2b3+R4cyvH6mNbS61ubePzmlq+dd99cenJv1Lz5s3j\n2w88wJn928fdqdpUW8WpvZ9x/333aTb1WwsZGRmsuHYFhz8/FvN9Syk5tO0o119/PWlpaTHffywJ\nIVixYgU33XwTH2x/l84Yj6BqbGmk/ItPuPe+e5k3b15M962VkpISZs+Zy9aDsetIlVLy4Rc1bLxx\nk+53swkZ2CE8QuS+Bx7gozNnqW2PzS1nS08Pbxw7yd1f/7puk5BiafLkyXzn4Yc4e3gXjbVVV7SP\n5ovVVB7cwcMPPRjTrJV6uf7662mtb6OxNra33Rer6unt8sZ1JMxYLVq0iK9c9xU+3f0xwVBsOti9\nvl4+3fURd9x5B1OnTo3JPvWy8cZNHK1qiVmCsIrKJrDYWbx4cUz2NxYJG9gh3K76tbvv5k9HjtPS\nM74/VrfPxyuHj3HDpk2GblcfSWFhIQ8/9CCVB3fQ1nhxTO9tb6rn9P5tfPvbD+jaXhhLNpuNm2+6\nmQOfVcRsIlcwEOTQtiPcduttuixEEUurV68mIyudwycOxmR/uw/vYs68OZSVlcVkf3pyOp2sXbee\nj/eeH3dHqs8fZMuBWm659fa4NFMmdGAHmD17Nhs2beKVw0fp8l5ZCldvIMArh4+yeNkyli5dGuMS\n6q+oqIhvffMbnPiiHM8gHWiD5Ynp7e7k+O5PuOfuuw078mW0ysrKyHBncOZwbMb5n9x/mqLCogn5\ngS+E4I477uDI6Qq6esaXp7yhuYELDbVsinGqaj0tX76cLi+cqR1fQrWdRy8yddoMJk+eHKOSjY2m\ngV0IUSKE+EQIcUQIcVgI8ZiWxxvK0qVLuWb5tbxacQzvGPNJB0PhmaWTZ8xk3fr1GpVQf9OnT2f9\n+nUc3/3piNkhQ6EgJ74o5/rrrtM0I51ehBDcftvtHN9zCq9nfPnaezp7OH3oLLfecmuMSqe/jIwM\nli9fzv5je694H1JK9hzZxQ0bbzDUcMaxMpvN3Lj5Jj7dX3vFo+q6enzsO9HAxhvj9wGndY09APyd\nlHIesAL4ayGE9issDGLtunVMmj6DN4+fIjTK2ywpJR+eriQlM5Nbb7/dEBMqYmnFtddSlJ/LuWP7\nh92u+vgBcrLS+cpXVutUMu3l5eWxaNEiju46Ma79VOw8zvJlyzVNzauH666/jvN1NbR2tF7R+2vr\nz+MP+uPSnhxrs2fPxunKoOLslfXDbK+4yOLF18T1mtA0sEsp66SUByKPu4BjQFwaZ4UQ3HbHHYTs\nDraeHd0wyAMX6qjr9fH1b37L0MO1rpQQgjvvvIPGmtN0tQ2+8EB3Ryv1Z0/wta9+NeE+2Das38CF\nyou0N19Z53pzXQvNF1q4/vrrY1wy/aWmpnLdddex7+ieMb83JEPsOfIFN2y8YUIMex2JEIIbNm5i\n++GLBMfYD9Pe5eVYVQvXr9F+6cfh6PZXEEJMARYBu/Q65uXMZjPfvPdejja1UNk8/Pjd+s4utp2r\n4d4HHpjQt5YjSUtL44YNG6iq2D3o6+cqvmDd+nW43W6dS6Y9u93O2rVrqdgx9uGPUkoqPj/GDRsm\ndtNDf9euuJbWjhYuNo5tyN/pc6ewO1InzNDG0Zg6dSqZWTkcGeOkpR1H6li6bJnmC7WPRJcufCFE\nGvAq8N1Izf1LHn/88b7Ha9asYc2aNZqUxel0ctc99/DKSy/xnaVuUgYZxRAMhXj7xCluuuWWCTms\ncayWLFlC+ZbPaGuqIyPnUlKmjpZGervaWb5sWRxLp61ly5azbft2Gs43kleSO+r3XayqQwZkQjQ9\nRFmtVjZt3sRHH3zMbWtHN/3d5/ex7+ge7n/g/oS7o1u7bgN/eu1lyqblYjKN/Lt19fg4VtXM9776\noCblKS8vp7y8fFTbah7YhRAWwkH9BSnl60Nt1z+wa2369OlcNWcOW6uq2TBj2pde33O+lozcPN0S\n9sSb2WzmK6tXsffw8QGBve7sMVau0nall3izWCzcsOEGtny+hdzinFEFJyklR3ed5OZNNydE00N/\n8+fPZ9fOXRw9c4SymSNPPtt/dA9XXXXVhB8pNZhp06aR6kjjTG0bMycNbC8fbL3TvScbWLhwgWYT\n1C6v8D7xxBNDbqvHVfkMcFRK+SMdjjVqGzdt4mh9I609ngHPe/x+dtVc4Jbbbku4Gshwrr76apov\n1hCMJIcKBQM0Xahm8dVXx7lk2lu4cCFBb3DUS+jVVl4kxZbC7NlxGQegKSEEd9x5BwdPHKB7hMRx\nTa1NVNZWsmnzxB3eOBwhBKtWX8eeEyNfF4FgiAOnGlm56is6lGxkWg93XAXcC6wTQuwXQuwTQhji\nKnA6nVy7YgU7a8LrpUYXst57/gJz5841ZBInLdntdvILC2lrCi8j197cQE5OjuGnx8eCyWRi7Zq1\nnNh7esRtpZSc3Hua9WvXJ+wHf25uLtcuX86uQzuG3CYkQ3x+YCubNm2Ke3uylsrKymhs89Dc4Rl2\nu+PnmiksLDRM3NB6VMx2KaVZSrlISnm1lHKxlPI9LY85FitWruRkYxOeyKpLwVCIAxfr+UoCjHK4\nElOnlNLZEq6ddLY2MkXjRYWNZOHChfR0emipH364X2NtE4REQtbW+1uzdg1tna3UXKwe9PXjlcew\nOx0J1ccwGIvFwuJrruHgqeFr7QdOt7D8WuOkk0isBsIxcjqdzJo1k6P1DQCcaW4hJyeH3NzRd6Il\nkrzcXHw94YRQvu5O8vKS5zyYzWZWrVzF6UPDz0Y9ffAsq1etTri29ctZrVZuufUWdh/e2Zd6Ibos\nntfXy4Fj+7j99uRorly8+BqOnG0ZcsJSW2cvze0eQ03eS+yrcxQWXr2YE83h6cMnmlpYmOA1kOG4\nXC4CvvACJX5fb1I0w/S3ZMkS6qrq6e0ZfDZqd0cPzXUtSdOpPnv2bLKyszh+duBw0IPHDzBv3ry4\nLmunp7y8PNxuN+fq2gd9/WhVM2VlZYYaZJD0gX369OnUd7TT6w9wtqU14W+xh2OxWPqW0QsFg7ov\nxBxvdrudOXPncO7EpeaH/gtZVx2rZuHChUl1Xm7cdCOHTx7sy/7o8Xo4de4k69avi3PJ9DV/4SJO\n1AyeP+ZETTsLFhrrwz7pA7vVaqWooIBDF+uwp6aSnp4e7yLFjZQSIrfWQghdFwU3iiXXLKHmxJcn\n6EgpqTl5nmsWXxOHUsVPSUkJOTk5VEVy+J84e4y5c+cm3f/JvHllnKxp/dL/RFtXLx09PkoN1h+V\n9IEdoGTyZA5euDhhU9HGis/nw2QO306aLBZ8vviuah8PpaWl+L1+OloGphloqW/FarFRVFQUp5LF\nz/Jrl3O65hRSSk5Xn2L5tcvjXSTdZWdnk5pqp75lYPrvyto2Zs6Yabg+F2OVJk7y8gto6fGQZ9A1\nGfXi9XqxWMKrQpktVnp7tV0Q3IhMJhNlZWWcPzMwV/2FyovMnz8/KToLLzd79mzqm+poaGlAQtJW\ngGbNuorKCwObY87WdXPV7DlxKtHQVGCHvixsWVlZcS5JfHk8HszWcPuxyWLF4xl+7G6imjtnLg3V\nA4e31Z1rZO6cuXEqUXzZbDaKi4o5cGwfM6ZPT8oPN4Bp02dQ03ipxi6lpKa+3ZArRanATng0SP/v\nycrj8WCyhAO72WKjJ0kDe2lpKW3N7fh6w01RPZ099Pb0JmUzTFTJpBLO19UwaXLipQ4YrSlTpnC+\nob1v2GNjmwe73W7IBHkqsBNuP/v6178et9VOjKK3txdzpCnGYrUlZVMMhEcHlZQU03ghnMq48UIz\nU6ZOMVw7qp7y8vIAknaOB4DD4SDN6aS5PVzhudjUZdgcOcl7pfZjMplYsGABVqs13kWJK78/gCmS\nd16YTATGuNpUIpk6ZRotdeHUzi11rUwtNd7ttp4KCgqwWCxJHdghPEroYnM4Qe3FFg/FJcasDKrA\nrvSRXBrKlazDHaMmTZpEW2N4ZExbQ7tha2Z6KS4u5oknnki6SWuXKywqobEtfCfb2N5LoUEHXKjA\nrvSxmC3IyPTxUCiUkKtGjVZhYSFtTW3IkKStuT1pZlkqw8vLy6Opw4uUkqbWLvLz8+NdpEGpwK70\nSU1NIRBJ2xsM+ElNSY1zieInXDMVNNe3YHfYE2aVJGV8cnNzaenw4PEGQJhwOBzxLtKgVGBX+jid\nToL+cJ6UoM9LWlripmMdiRCCzMxM6qobyM5J/FW0lNHJyMigoyuc9CsrI92wQz9VYFf6uFwuAt5w\nj7/f25P0wz8zszJprG0iKzO55zcol5jNZtKcDs43dvbNfzEiFdiVPhkZGfT2hHv8vT3dZGRkxLlE\n8ZWZkUlzXQsZ6cl9HpSB0t0uLjR2kZ6hArsyAWRmZtLT1YGUkp6ujqSfiet2uZEhmfR3LspALpeL\n+pYeXG7jJkJTgV3p43A4MAkTvd2dBAOBpB/aFl3yLZGXflPGzpnmor3ba+j/DxXYlQEyMjNpbagl\nPSPDsB1DeklNDY8KstvtcS6JYiQOZzigG/m6UIFdGSArK4vWhguG7hjSSzSwR78rClwK6CqwKxNG\nVlYm7U11ZCd5+zrQt1KSGsOu9DcRPvBVYFcGyEhPx+/tJSPDuB1DeiksLOThhx9O+k5kZaDoB76R\nl0g0zuqriiGoFMaXmEwmpk+fHu9iKAYzEQK7qrErA0RHgBh1qrSixFs0C6zFYtx6sQrsygDRdkMV\n2BVlcNH/ESOn+VaBXRkgPT0ds9mcdKvQK8polZSU8Pjjjxs6+6kwQs5tIYQ0QjkURVEmisiaCYNO\nNlE1dkVRlASjAruiKEqCUYFdURQlwajAriiKkmBUYFcURUkwKrAriqIkGBXYFUVREowK7IqiKAlG\n08AuhHhaCFEvhDik5XEURVGUS7SusT8L3KjxMWKivLw83kUwDHUuLlHn4hJ1Li4x+rnQNLBLKbcB\nrVoeI1aM/ofSkzoXl6hzcYk6F5cY/VyoNnZFUZQEowK7oihKgtE8u6MQohR4U0q5YJhtVGpHRVGU\nMRoqu6MeS4CIyNeQhiqcoiiKMnZaD3d8EfgcmCWEqBZCPKTl8RRFURSDLLShKIqixE7CdJ4KIb4r\nhDgc+Xos8tz/FEIcFELsF0K8J4QoiDxvEUI8J4Q4JIQ4IoT458jzdiHEW0KIY5H9/N+XHaNACPG+\nEGKyEGKvEGJfZLu/0P83Htpg56Lfa38vhAgJIbL6Pfc/hBCnIr/3xn7PL46co5NCiB9etp9kOxf/\nHrnr7BjkGAl3LoQQpUKInsjvsk8I8bN+2ybVdTHCuTDmdSGlnPBfwDzgEJACmIEPgGlAWr9tHgWe\njDz+JvBi5LEdOAtMjjy+PvK8BfgMuLHfPh4Evhd5zRp5zhF5f0G8z8MQ5+JDYFrktRLgvUh5syLP\nzQH2R36nKcBpLt3J7QKWRh6/k+TnYhmQD3QMcpxEPBelwKEh9pVs18Vw58KQ10Wi1NjnALuklF4p\nZZBwQP6qlLKr3zZOIBR5LAGnEMJM+ER7Cf9hPFLKLQBSygCwj/AfOmoT8K6UMiCl9EeeszNC57DO\nLj8XW4CvRl77AfAPl21/O/C7yO9UBZwClkXublxSyi8i2z0P3NHvfUlzLgCklLullPVDHCcRzwUM\nUv4kvS5giPIb9bpIlMBeAXxFCJEphHAANwGT4NKtEvAt4P+KbP8q0ANcBKqA/1dK2dZ/h0KIDOBW\n4OPIzyZglpTyeOTnEiHEQeAc8B9Syjptf8VRG/RcCCFuA85LKQ9ftn0xUNPv59rIc8XA+X7Pn488\nl4znYkgJfC4ApkSaDz4VQqyOPJeM1wUMfi6GFO9zocdwR81JKY8LIf6D8C1VF+Hb6WDktX8F/lUI\n8U+Em2MeJ1wLCwAFQDawVQjxUaSWRqQm/yLww+hzwHLCt6DRY54HFkZqMK8LIV6VUjZq/KuOaIhz\nkQr8C3BDjA6jzsUliXYuorXJC8BkKWWrEGIx8CchxNwRDpNU5+KyFoHLxfVcJEqNHSnls1LKJVLK\nNVZ3f74AAAW7SURBVEAbcPKyTV7k0u3Wt4D3pJShyMndDizpt+0vgRNSyp/0e24z4ba3y49bR6QG\nEJNfJAYGORcVhNuMDwohzhJuXtonhMgjXCud3O/tJZHnaonc9Vz2PCTfuRhOop2LvUKIPCmlX0rZ\nGnnfPuAMMIvkui5GOhfDie+50LIBX88vIDfyfTJwFHADM/q9/ijw+8jjfwSejjx2AkeAssjP/w68\nMsj+twPOyONiIDXyOBM4AcyL9zkY7lxc9vpZIDPyeC7hGosNmMrADsOdhO9uBP06yZLxXPTbvjNJ\nroscwBR5PI1wE1VGkl4XQ54Lo14XCdEUE/FaZHiSH/grKWWHEOIZIcQswp2m54C/jGz738CzQoiK\nyM9PSykrhBDFhG/Hjgkh9hPuZP0p8AbgkVJ2R7afA/x/QogQ4Yv7P6WUR/T4JUfpS+fistclkdtM\nKeVRIcTvCV/c0e2jkxv+GniO8G3qO1LK94UQOSThuYjcun8LsEf6bJ4CfkaCngvgOuB/CiF8hP9/\n/kJe6odKquuCYc6FUa8LNUFpFIQQ9wLFUsr/jHdZ4k2di0vUubhEnYtLjHAuVGBXFEVJMAnTeaoo\niqKEqcCuKIqSYFRgVxRFSTAqsCuKoiQYFdgVRVESjArsiqIoCUYFdkV3QohtY9i2VAgxWFKmKznu\nUhHOzR/9umOE7TuHeP4vhBD3DfO+64UQK0ZRnu8LIf5u5JIrytgk0sxTZYKQUo6YHe/yt8To0IeB\na6SUoUgypoNCiDeklKEhth/0uFLKX4xwnDWEk0vtuOKSKso4qBq7orv+NWEhxD+J8Go8+0VkxSoh\nxDVCiAORtA5/PcK+UiKpIw6J8Co1a4baVkrZ2y+I27mUn3+Y3Yt/j5TlcyFEbuTJvpq2EOIxEV6F\n64AQ4kUhRCnh1BV/G0nzuipy1/FxZJsPhRAlgxxokRBiR2Sb14QQ6ZHnl4rwKmD7hBD/Gb17EUJs\nEUIs6Pf+rUKI+SP8PkqSUIFdiYdo/pXNhHPeL5VSXg1Ep2A/A/x15LmR/DUQklIuIJyz49dCCNtQ\nGwshlkVyBB0E/nKY2jqEE8R9LqVcBGwF/myQbf4JWBTZ5i+llOeAnwM/kFIullJuB34CPBvZ5sXI\nz5f7NfAPkW0qgO9Hnn8G+DMp5WLCqaijdxFPAw9FfqeZQIocPI+4koRUYFfiaT3hgOcFkFK2RWqq\n6ZGACPDCCPtYDfwm8v4ThBdOGTKlqgyveFMGLAX+ZbgPAcArpXwn8ngv4bSulzsIvBjJDxIcYj8r\ngJcij18AVvV/UQjhJvw7R/sefg1cFzkXaVLK3ZHnX+z3tleAm0V47YCHCSflUhRABXYl8Yxq2bHI\nh0AXUDbMZv5+j4MM3id1M+EMoIuBL0R45ZwvHW4URRqq3EMtyeYhvFDEHcDdwG9HcQwlSajArsRD\nNFh9CDwkhLADCCEypZTtQJsQYmVkm3tH2NfW6DaRFM2TCOe7/vJBhZgSqeESaQu/inANf6RyDmey\nDK+T+8+E1wBIAzojj6M+J7yAOsB9kTL3iaSMbRFCRGvy9wNbIueiQwixNPL8Ny479tPAj4HdkW0V\nBVCjYpT4kACRPN4LgT1CCC/hRRv+lXDTwjOR/NUfjLCvnwFPCiEOEa5hf1teWjj4cquBf+6XV/v/\nkFK2jFTOoQghLMBvIk0pAvhRZB2AN4FXRXgNzUcjX88JIf5PoJFI2/hlHgR+HvmQq+y3zXeAp4QQ\n0UWX+wK4lHKfEKIDeHa4cirJR6XtVRQDE0I4ows2iPC6vQVSyu9Ffi4CPpFSzo5nGRXjUU0ximJs\nN0eGgh4mfMfx7wBCiPsJj5P/l3gWTjEmVWNXJgQhxEbgP7jUPCKASinl18azbWT7nYTXOY1uK4H7\nDbaUm6KMmgrsiqIoCUY1xSiKoiQYFdgVRVESjArsiqIoCUYFdkVRlASjAruiKEqC+f8BVw1FbAlK\nBu8AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_data = df_data[ df_data.data2.str.contains('^\\[.*\\]$',na=True,regex=True) == False ]\n", "new_data.rename(columns={ \"data1\": gene_name , \"data2\": clinical_feature }, inplace=True)\n", " \n", "sns.violinplot( x=new_data[clinical_feature], y=new_data[gene_name], palette=\"Pastel1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BigQuery to Compute statistical association \n", "\n", "The Kruskal-Wallis score (H) is computed by using the following equation: \n", "\n", "$$H = \\frac{(N-1)\\sum_{i=1}^{g} n_i (\\bar{r_{i}} -\\bar{r} )^2 }{ \\sum_{i=1}^{g} \\sum_{j=1}^{n_i} (r_{ij}-\\bar{r})^2 }$$\n", "where\n", "\n", "- $n_i$ is the number of observations in category $i$\n", "- $r_{ij}$ is the rank (among all participants) of the gene expression of participant $j$ that belongs to category $i$\n", "- $N$ is the total number of participants considered in the test\n", "- $\\bar{r_i}$ is the averange rank of gene expression values for particpants in category $i$\n", "- $\\bar{r}$ is the average of all $r_{ij}$\n", "\n", "To avoid reading that table multiple times, we rearranged the equations above as :\n", "\n", "$$H = (N-1)\\frac{ \\sum_{i=1}^{g}S_i^2/n_i - (\\sum_{i=1}^{g}S_i)^2 / N }{ \\sum_{i=1}^{g}Q_i - (\\sum_{i=1}^{g}S_i)^2 / N }$$\n", "\n", "Where $S_i = \\sum_{j=1}^{n_i}r_{ij}$ and $Q_i = \\sum_{j=1}^{n_i}r_{ij}^2$\n", "\n", "The following query string computes $S_i$ and $Q_i$:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "summ_table = \"\"\"\n", "summ_table AS (\n", "SELECT \n", " COUNT( ParticipantBarcode) AS ni,\n", " SUM( rnkdata ) AS Si,\n", " SUM( rnkdata * rnkdata ) AS Qi,\n", " data2\n", "FROM ( \n", " SELECT \n", " (RANK() OVER (ORDER BY data1 ASC)) + (COUNT(*) OVER ( PARTITION BY CAST(data1 as STRING)) - 1)/2.0 AS rnkdata,\n", " data2, ParticipantBarcode\n", " FROM\n", " table_data \n", " WHERE data2 IN ( SELECT data2 from table_data GROUP BY data2 HAVING count(data2)>{0} ) \n", ")\n", "GROUP BY\n", " data2\n", ")\n", "\"\"\".format( str(MinSampleSize) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The query above ingnores the categories that have a number of participants smaller or equal than 'MinSampleSize'. Moreover, the gene expression is ranked, assigning **average** of ranks to the similar values( https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rankdata.html). Finally, The Kruskall-Wallis score ($H$) is computed by the following BigQuery string. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " in runQuery ... \n", " the results for this query were previously cached \n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NgroupsNsamplesHscore
055137.857342
\n", "
" ], "text/plain": [ " Ngroups Nsamples Hscore\n", "0 5 513 7.857342" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query_hscore = \"\"\"\n", "SELECT \n", " Ngroups,\n", " N as Nsamples, \n", " (N-1)*( sumSi2overni - (sumSi *sumSi)/N ) / ( sumQi - (sumSi *sumSi)/N ) AS Hscore \n", "FROM (\n", " SELECT \n", " SUM( ni ) As N, \n", " SUM( Si ) AS sumSi,\n", " SUM( Qi ) AS sumQi,\n", " SUM( Si * Si / ni ) AS sumSi2overni,\n", " COUNT ( data2 ) AS Ngroups \n", " FROM summ_table\n", " )\n", "WHERE \n", " Ngroups > 1\n", "ORDER BY Hscore DESC\n", "\"\"\"\n", "\n", "sql = ( sql_data + ',\\n' + summ_table + query_hscore )\n", "df_hscore = regulome.runQuery ( bqclient, sql, [] , dryRun=False )\n", "df_hscore" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test our implementation we can use the 'kruskalwallis' function available in python:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "KruskalResult(statistic=7.8573421918435997, pvalue=0.096945947430261636)\n" ] } ], "source": [ "CategoryData = []\n", "CategoryNames = [] \n", "\n", "for name, group in new_data.groupby( clinical_feature ) :\n", " data = group[ gene_name ].values \n", " \n", " if ( len( data ) > MinSampleSize ) :\n", " \n", " CategoryData.append( data )\n", " CategoryNames.append( name )\n", " \n", "if len( CategoryData ) > 1 :\n", " print( mstats.kruskalwallis( *[ mydata for mydata in CategoryData ] ) )\n", " \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }