{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating TCGA cohorts (part 1)\n", "\n", "This notebook will show you how to create a TCGA cohort using the publicly available TCGA BigQuery tables that the [ISB-CGC](http://isb-cgc.org) project has produced based on the open-access [TCGA](http://cancergenome.nih.gov/) data available at the [Data Portal](https://tcga-data.nci.nih.gov/tcga/). You will need to have access to a Google Cloud Platform (GCP) project in order to use BigQuery. If you don't already have one, you can sign up for a [free-trial](https://cloud.google.com/free-trial/) or contact [us](mailto://info@isb-cgc.org) and become part of the community evaluation phase of our [Cancer Genomics Cloud pilot](https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots).\n", "\n", "We are not attempting to provide a thorough BigQuery or IPython tutorial here, as a wealth of such information already exists. Here are some links to some resources that you might find useful: \n", "* [BigQuery](https://cloud.google.com/bigquery/what-is-bigquery), \n", "* the BigQuery [web UI](https://bigquery.cloud.google.com/) where you can run queries interactively, \n", "* [IPython](http://ipython.org/) (now known as [Jupyter](http://jupyter.org/)), and \n", "* [Cloud Datalab](https://cloud.google.com/datalab/) the recently announced interactive cloud-based platform that this notebook is being developed on. \n", "\n", "There are also many tutorials and samples available on github (see, in particular, the [datalab](https://github.com/GoogleCloudPlatform/datalab) repo and the [Google Genomics]( https://github.com/googlegenomics) project).\n", "\n", "OK then, let's get started! In order to work with BigQuery, the first thing you need to do is import the bigquery module:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import gcp.bigquery as bq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next thing you need to know is how to access the specific tables you are interested in. BigQuery tables are organized into datasets, and datasets are owned by a specific GCP project. The tables we will be working with in this notebook are in a dataset called **`tcga_201607_beta`**, owned by the **`isb-cgc`** project. A full table identifier is of the form `:.`. Let's start by getting some basic information about the tables in this dataset:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 6322 rows 1729204 bytes Annotations\n", " 23797 rows 6382147 bytes Biospecimen_data\n", " 11160 rows 4201379 bytes Clinical_data\n", " 2646095 rows 333774244 bytes Copy_Number_segments\n", "3944304319 rows 445303830985 bytes DNA_Methylation_betas\n", " 382335670 rows 43164264006 bytes DNA_Methylation_chr1\n", " 197519895 rows 22301345198 bytes DNA_Methylation_chr10\n", " 235823572 rows 26623975945 bytes DNA_Methylation_chr11\n", " 198050739 rows 22359642619 bytes DNA_Methylation_chr12\n", " 97301675 rows 10986815862 bytes DNA_Methylation_chr13\n", " 123239379 rows 13913712352 bytes DNA_Methylation_chr14\n", " 124566185 rows 14064712239 bytes DNA_Methylation_chr15\n", " 179772812 rows 20296128173 bytes DNA_Methylation_chr16\n", " 234003341 rows 26417830751 bytes DNA_Methylation_chr17\n", " 50216619 rows 5669139362 bytes DNA_Methylation_chr18\n", " 211386795 rows 23862583107 bytes DNA_Methylation_chr19\n", " 279668485 rows 31577200462 bytes DNA_Methylation_chr2\n", " 86858120 rows 9805923353 bytes DNA_Methylation_chr20\n", " 35410447 rows 3997986812 bytes DNA_Methylation_chr21\n", " 70676468 rows 7978947938 bytes DNA_Methylation_chr22\n", " 201119616 rows 22705358910 bytes DNA_Methylation_chr3\n", " 159148744 rows 17968482285 bytes DNA_Methylation_chr4\n", " 195864180 rows 22113162401 bytes DNA_Methylation_chr5\n", " 290275524 rows 32772371379 bytes DNA_Methylation_chr6\n", " 240010275 rows 27097948808 bytes DNA_Methylation_chr7\n", " 164810092 rows 18607886221 bytes DNA_Methylation_chr8\n", " 81260723 rows 9173717922 bytes DNA_Methylation_chr9\n", " 98082681 rows 11072059468 bytes DNA_Methylation_chrX\n", " 2330426 rows 263109775 bytes DNA_Methylation_chrY\n", " 1867233 rows 207365611 bytes Protein_RPPA_data\n", " 5356089 rows 5715538107 bytes Somatic_Mutation_calls\n", " 5738048 rows 657855993 bytes mRNA_BCGSC_GA_RPKM\n", " 38299138 rows 4459086535 bytes mRNA_BCGSC_HiSeq_RPKM\n", " 44037186 rows 5116942528 bytes mRNA_BCGSC_RPKM\n", " 16794358 rows 1934755686 bytes mRNA_UNC_GA_RSEM\n", " 211284521 rows 24942992190 bytes mRNA_UNC_HiSeq_RSEM\n", " 228078879 rows 26877747876 bytes mRNA_UNC_RSEM\n", " 11997545 rows 2000881026 bytes miRNA_BCGSC_GA_isoform\n", " 4503046 rows 527101917 bytes miRNA_BCGSC_GA_mirna\n", " 90237323 rows 15289326462 bytes miRNA_BCGSC_HiSeq_isoform\n", " 28207741 rows 3381212265 bytes miRNA_BCGSC_HiSeq_mirna\n", " 102234868 rows 17290207488 bytes miRNA_BCGSC_isoform\n", " 32710787 rows 3908314182 bytes miRNA_BCGSC_mirna\n", " 26763022 rows 3265303352 bytes miRNA_Expression\n" ] } ], "source": [ "d = bq.DataSet('isb-cgc:tcga_201607_beta')\n", "for t in d.tables():\n", " print '%10d rows %12d bytes %s' \\\n", " % (t.metadata.rows, t.metadata.size, t.name.table_id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we are going to look at a few different ways that we can use the information in these tables to create cohorts. Now, you maybe asking what we mean by \"cohort\" and why you might be interested in *creating* one, or maybe what it even means to \"create\" a cohort. The TCGA dataset includes clinical, biospecimen, and molecular data from over 10,000 cancer patients who agreed to be a part of this landmark research project to build [The Cancer Genome Atlas](http://cancergenome.nih.gov/). This large dataset was originally organized and studied according to [cancer type](http://cancergenome.nih.gov/cancersselected) but now that this multi-year project is nearing completion, with over 30 types of cancer and over 10,000 tumors analyzed, **you** have the opportunity to look at this dataset from whichever angle most interests you. Maybe you are particularly interested in early-onset cancers, or gastro-intestinal cancers, or a specific type of genetic mutation. This is where the idea of a \"cohort\" comes in. The original TCGA \"cohorts\" were based on cancer type (aka \"study\"), but now you can define a cohort based on virtually any clinical or molecular feature by querying these BigQuery tables. A cohort is simply a list of samples, using the [TCGA barcode](https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode) system. Once you have created a cohort you can use it in any number of ways: you could further explore the data available for one cohort, or compare one cohort to another, for example.\n", "\n", "In the rest of this tutorial, we will create several different cohorts based on different motivating research questions. We hope that these examples will provide you with a starting point from which you can build, to answer your own research questions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exploring the Clinical data table\n", "Let's start by looking at the clinical data table. The TCGA dataset contains a few very basic clinical data elements for almost all patients, and contains additional information for some tumor types only. For example smoking history information is generally available only for lung cancer patients, and BMI (body mass index) is only available for tumor types where that is a known significant risk factor. Let's take a look at the clinical data table and see how many different pieces of information are available to us:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%bigquery schema --table isb-cgc:tcga_201607_beta.Clinical_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's a lot of fields! We can also get at the schema programmatically:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " This table has 70 fields. \n", " The first few field names and types are: \n", " [u'ParticipantBarcode', u'Study', u'Project', u'ParticipantUUID', u'TSSCode']\n", " [u'STRING', u'STRING', u'STRING', u'STRING', u'STRING']\n" ] } ], "source": [ "table = bq.Table('isb-cgc:tcga_201607_beta.Clinical_data')\n", "if ( table.exists() ):\n", " fieldNames = map(lambda tsf: tsf.name, table.schema)\n", " fieldTypes = map(lambda tsf: tsf.data_type, table.schema)\n", " print \" This table has %d fields. \" % ( len(fieldNames) )\n", " print \" The first few field names and types are: \" \n", " print \" \", fieldNames[:5]\n", " print \" \", fieldTypes[:5]\n", "else: \n", " print \" There is no existing table called %s:%s.%s\" % ( table.name.project_id, table.name.dataset_id, table.name.table_id )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at these fields and see which ones might be the most \"interesting\", by looking at how many times they are filled-in (not NULL), or how much variation exists in the values. If we wanted to look at just a single field, \"tobacco_smoking_history\" for example, we could use a very simple query to get a basic summary:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
tobacco_smoking_historyn
 8161
1865
4799
2710
3568
557
\n", "
(rows: 6, time: 1.3s, 8KB processed, job: job_Tp9_oGNMvJ7f0ZcTwdt2TirZePk)
\n", " \n", " " ], "text/plain": [ "QueryResultsTable job_Tp9_oGNMvJ7f0ZcTwdt2TirZePk" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql \n", "\n", "SELECT tobacco_smoking_history, COUNT(*) AS n\n", "FROM [isb-cgc:tcga_201607_beta.Clinical_data]\n", "GROUP BY tobacco_smoking_history\n", "ORDER BY n DESC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But if we want to loop over *all* fields and get a sense of which fields might provide us with useful criteria for specifying a cohort, we'll want to automate that. We'll put a threshold on the minimum number of patients that we expect information for, and the maximum number of unique values (since fields such as the \"ParticipantBarcode\" will be unique for every patient and, although we will need that field later, it's probably not useful for defining a cohort)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " The Clinical_data table describes a total of 11160 patients. \n", " > Study has 11160 values with 33 unique (BRCA occurs 1097 times) \n", " > age_at_initial_pathologic_diagnosis has 11109 values (mean=59, sigma=14) \n", " > batch_number has 11160 values (mean=203, sigma=135) \n", " > vital_status has 11156 values with 2 unique (Alive occurs 7534 times) \n", " > days_to_birth has 11041 values (mean=-21763, sigma=5266) \n", " > days_to_last_known_alive has 11102 values (mean=1037, sigma=1041) \n", " > gender has 11160 values with 2 unique (FEMALE occurs 5815 times) \n", " > year_of_initial_pathologic_diagnosis has 11030 values (mean=2008, sigma=4) \n", " > person_neoplasm_cancer_status has 10236 values with 2 unique (TUMOR FREE occurs 6507 times) \n", " > race has 9835 values with 5 unique (WHITE occurs 8186 times) \n", " \n", " Found 10 potentially interesting features: \n", " [u'Study', u'age_at_initial_pathologic_diagnosis', u'batch_number', u'vital_status', u'days_to_birth', u'days_to_last_known_alive', u'gender', u'year_of_initial_pathologic_diagnosis', u'person_neoplasm_cancer_status', u'race']\n" ] } ], "source": [ "numPatients = table.metadata.rows\n", "print \" The %s table describes a total of %d patients. \" % ( table.name.table_id, numPatients )\n", "\n", "# let's set a threshold for the minimum number of values that a field should have,\n", "# and also the maximum number of unique values\n", "minNumPatients = int(numPatients*0.80)\n", "maxNumValues = 50\n", "\n", "numInteresting = 0\n", "iList = []\n", "for iField in range(len(fieldNames)):\n", " aField = fieldNames[iField]\n", " aType = fieldTypes[iField]\n", " try:\n", " qString = \"SELECT {0} FROM [{1}]\".format(aField,table)\n", " query = bq.Query(qString)\n", " df = query.to_dataframe()\n", " summary = df[str(aField)].describe()\n", " if ( aType == \"STRING\" ):\n", " topFrac = float(summary['freq'])/float(summary['count'])\n", " if ( summary['count'] >= minNumPatients ):\n", " if ( summary['unique'] <= maxNumValues and summary['unique'] > 1 ):\n", " if ( topFrac < 0.90 ):\n", " numInteresting += 1\n", " iList += [aField]\n", " print \" > %s has %d values with %d unique (%s occurs %d times) \" \\\n", " % (str(aField), summary['count'], summary['unique'], summary['top'], summary['freq'])\n", " else:\n", " if ( summary['count'] >= minNumPatients ):\n", " if ( summary['std'] > 0.1 ):\n", " numInteresting += 1\n", " iList += [aField]\n", " print \" > %s has %d values (mean=%.0f, sigma=%.0f) \" \\\n", " % (str(aField), summary['count'], summary['mean'], summary['std'])\n", " except:\n", " pass\n", "\n", "print \" \"\n", "print \" Found %d potentially interesting features: \" % numInteresting\n", "print \" \", iList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above helps us narrow down on which fields are likely to be the most useful, but if you have a specific interest, for example in menopause or HPV status, you can still look at those in more detail very easily: " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
menopause_statusn
Post (prior bilateral ovariectomy OR >12 mo since LMP with no prior hysterectomy)1291
Pre (<6 months since LMP AND no prior bilateral ovariectomy AND not on estrogen replacement)389
Peri (6-12 months since last menstrual period)82
Indeterminate (neither Pre or Postmenopausal)54
\n", "
(rows: 4, time: 0.8s, cached, job: job_R6HF_RzYdXcntPKVIy06TNbDi8Q)
\n", " \n", " " ], "text/plain": [ "QueryResultsTable job_R6HF_RzYdXcntPKVIy06TNbDi8Q" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT menopause_status, COUNT(*) AS n\n", "FROM [isb-cgc:tcga_201607_beta.Clinical_data]\n", "WHERE menopause_status IS NOT NULL\n", "GROUP BY menopause_status\n", "ORDER BY n DESC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We might wonder which specific tumor types have menopause information:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
Studyn
BRCA1007
UCEC517
CESC237
UCS55
\n", "
(rows: 4, time: 0.5s, cached, job: job_WIGSHA6zqYL1yChZi9jnwh-ZL-A)
\n", " \n", " " ], "text/plain": [ "QueryResultsTable job_WIGSHA6zqYL1yChZi9jnwh-ZL-A" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT Study, COUNT(*) AS n\n", "FROM [isb-cgc:tcga_201607_beta.Clinical_data]\n", "WHERE menopause_status IS NOT NULL\n", "GROUP BY Study\n", "ORDER BY n DESC" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
hpv_statushpv_callsn
Negative 664
PositiveHPV16238
PositiveHPV1841
PositiveHPV3325
PositiveHPV4524
\n", "
(rows: 5, time: 0.6s, cached, job: job_4BS0EXKldZBN8Y2QcbPA--vK6_g)
\n", " \n", " " ], "text/plain": [ "QueryResultsTable job_4BS0EXKldZBN8Y2QcbPA--vK6_g" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT hpv_status, hpv_calls, COUNT(*) AS n\n", "FROM [isb-cgc:tcga_201607_beta.Clinical_data]\n", "WHERE hpv_status IS NOT NULL\n", "GROUP BY hpv_status, hpv_calls\n", "HAVING n > 20\n", "ORDER BY n DESC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### TCGA Annotations\n", "\n", "An additional factor to consider, when creating a cohort is that there may be additional information that might lead one to exclude a particular patient from a cohort. In certain instances, patients have been redacted or excluded from analyses for reasons such as prior treatment, etc, but since different researchers may have different criteria for using or excluding certain patients or certain samples from their analyses, in many cases the data is still available while at the same time \"annotations\" may have been entered into a searchable [database](https://tcga-data.nci.nih.gov/annotations/). These annotations have also been uploaded into a BigQuery table and can be used in conjuction with the other BigQuery tables." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Early-onset Breast Cancer\n", "\n", "Now that we have a better idea of what types of information is available in the Clinical data table, let's create a cohort consisting of female breast-cancer patients, diagnosed at the age of 50 or younger." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this next code cell, we define several queries within a **`module`** which allows us to use them both individually and by reference in the final, main query. \n", "+ the first query, called **`select_on_annotations`**, finds all patients in the Annotations table which have either been 'redacted' or had 'unacceptable prior treatment'; \n", "+ the second query, **`select_on_clinical`** selects all female breast-cancer patients who were diagnosed at age 50 or younger, while also pulling out a few additional fields that might be of interest; and\n", "+ the final query joins these two together and returns just those patients that meet the clinical-criteria and do **not** meet the exclusion-criteria." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%sql --module createCohort_and_checkAnnotations\n", "\n", "DEFINE QUERY select_on_annotations\n", "SELECT\n", " ParticipantBarcode,\n", " annotationCategoryName AS categoryName,\n", " annotationClassification AS classificationName\n", "FROM\n", " [isb-cgc:tcga_201607_beta.Annotations]\n", "WHERE\n", " ( itemTypeName=\"Patient\"\n", " AND (annotationCategoryName=\"History of unacceptable prior treatment related to a prior/other malignancy\"\n", " OR annotationClassification=\"Redaction\" ) )\n", "GROUP BY\n", " ParticipantBarcode,\n", " categoryName,\n", " classificationName\n", "\n", "DEFINE QUERY select_on_clinical\n", "SELECT\n", " ParticipantBarcode,\n", " vital_status,\n", " days_to_last_known_alive,\n", " ethnicity,\n", " histological_type,\n", " menopause_status,\n", " race\n", "FROM\n", " [isb-cgc:tcga_201607_beta.Clinical_data]\n", "WHERE\n", " ( Study=\"BRCA\"\n", " AND age_at_initial_pathologic_diagnosis<=50\n", " AND gender=\"FEMALE\" )\n", "\n", "SELECT\n", " c.ParticipantBarcode AS ParticipantBarcode\n", "FROM (\n", " SELECT\n", " a.categoryName,\n", " a.classificationName,\n", " a.ParticipantBarcode,\n", " c.ParticipantBarcode,\n", " FROM ( $select_on_annotations ) AS a\n", " OUTER JOIN EACH \n", " ( $select_on_clinical ) AS c\n", " ON\n", " a.ParticipantBarcode = c.ParticipantBarcode\n", " WHERE\n", " (a.ParticipantBarcode IS NOT NULL\n", " OR c.ParticipantBarcode IS NOT NULL)\n", " ORDER BY\n", " a.classificationName,\n", " a.categoryName,\n", " a.ParticipantBarcode,\n", " c.ParticipantBarcode )\n", "WHERE\n", " ( a.categoryName IS NULL\n", " AND a.classificationName IS NULL\n", " AND c.ParticipantBarcode IS NOT NULL )\n", "ORDER BY\n", " c.ParticipantBarcode" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we explicitly call just the first query in the module, and we get a list of 212 patients with one of these disqualifying annotations:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ParticipantBarcodecategoryNameclassificationName
0TCGA-01-0629Tumor tissue origin incorrectRedaction
1TCGA-13-1479Tumor tissue origin incorrectRedaction
2TCGA-01-0638Tumor tissue origin incorrectRedaction
3TCGA-33-4579Tumor tissue origin incorrectRedaction
4TCGA-GN-A261Tumor tissue origin incorrectRedaction
5TCGA-66-2751Genotype mismatchRedaction
6TCGA-66-2752Genotype mismatchRedaction
7TCGA-66-2750Genotype mismatchRedaction
8TCGA-66-2746Genotype mismatchRedaction
9TCGA-66-2747Genotype mismatchRedaction
10TCGA-35-3621Genotype mismatchRedaction
11TCGA-02-0002Genotype mismatchRedaction
12TCGA-02-0117Genotype mismatchRedaction
13TCGA-08-0384Genotype mismatchRedaction
14TCGA-E2-A1IPGenotype mismatchRedaction
15TCGA-14-0784Genotype mismatchRedaction
16TCGA-14-1036Genotype mismatchRedaction
17TCGA-06-0748Genotype mismatchRedaction
18TCGA-02-2488Genotype mismatchRedaction
19TCGA-14-1824Genotype mismatchRedaction
20TCGA-PN-A8M9Genotype mismatchRedaction
21TCGA-12-1601Subject withdrew consentRedaction
22TCGA-12-0653Subject withdrew consentRedaction
23TCGA-32-2498Subject withdrew consentRedaction
24TCGA-AF-3912Subject withdrew consentRedaction
25TCGA-A6-2670Subject withdrew consentRedaction
26TCGA-06-0131Subject withdrew consentRedaction
27TCGA-AN-A0FGSubject identity unknownRedaction
28TCGA-AN-A0FESubject identity unknownRedaction
29TCGA-F4-6857Subject identity unknownRedaction
............
182TCGA-AP-A053History of unacceptable prior treatment relate...Notification
183TCGA-AX-A06DHistory of unacceptable prior treatment relate...Notification
184TCGA-AX-A1CPHistory of unacceptable prior treatment relate...Notification
185TCGA-AX-A1CRHistory of unacceptable prior treatment relate...Notification
186TCGA-AX-A2H8History of unacceptable prior treatment relate...Notification
187TCGA-AX-A2HFHistory of unacceptable prior treatment relate...Notification
188TCGA-AX-A3G3History of unacceptable prior treatment relate...Notification
189TCGA-B5-A0KBHistory of unacceptable prior treatment relate...Notification
190TCGA-BG-A221History of unacceptable prior treatment relate...Notification
191TCGA-D1-A3JPHistory of unacceptable prior treatment relate...Notification
192TCGA-EY-A1G8History of unacceptable prior treatment relate...Notification
193TCGA-L5-A88THistory of unacceptable prior treatment relate...Notification
194TCGA-WB-A820History of unacceptable prior treatment relate...Notification
195TCGA-XK-AAJ3History of unacceptable prior treatment relate...Notification
196TCGA-EJ-7312History of unacceptable prior treatment relate...Notification
197TCGA-96-A4JKHistory of unacceptable prior treatment relate...Notification
198TCGA-2G-AAFEHistory of unacceptable prior treatment relate...Notification
199TCGA-IC-A6RFHistory of unacceptable prior treatment relate...Notification
200TCGA-BA-4075History of unacceptable prior treatment relate...Notification
201TCGA-06-6391History of unacceptable prior treatment relate...Notification
202TCGA-5L-AAT1History of unacceptable prior treatment relate...Notification
203TCGA-HT-A619History of unacceptable prior treatment relate...Notification
204TCGA-T1-A6J8History of unacceptable prior treatment relate...Notification
205TCGA-BG-A0M8History of unacceptable prior treatment relate...Notification
206TCGA-XK-AAK1History of unacceptable prior treatment relate...Notification
207TCGA-BH-A0B6History of unacceptable prior treatment relate...Notification
208TCGA-BG-A0MSHistory of unacceptable prior treatment relate...Notification
209TCGA-AR-A2LRHistory of unacceptable prior treatment relate...Notification
210TCGA-BH-A1F5History of unacceptable prior treatment relate...Notification
211TCGA-DM-A286Inadvertently shippedRedaction
\n", "

212 rows × 3 columns

\n", "
" ], "text/plain": [ " ParticipantBarcode categoryName \\\n", "0 TCGA-01-0629 Tumor tissue origin incorrect \n", "1 TCGA-13-1479 Tumor tissue origin incorrect \n", "2 TCGA-01-0638 Tumor tissue origin incorrect \n", "3 TCGA-33-4579 Tumor tissue origin incorrect \n", "4 TCGA-GN-A261 Tumor tissue origin incorrect \n", "5 TCGA-66-2751 Genotype mismatch \n", "6 TCGA-66-2752 Genotype mismatch \n", "7 TCGA-66-2750 Genotype mismatch \n", "8 TCGA-66-2746 Genotype mismatch \n", "9 TCGA-66-2747 Genotype mismatch \n", "10 TCGA-35-3621 Genotype mismatch \n", "11 TCGA-02-0002 Genotype mismatch \n", "12 TCGA-02-0117 Genotype mismatch \n", "13 TCGA-08-0384 Genotype mismatch \n", "14 TCGA-E2-A1IP Genotype mismatch \n", "15 TCGA-14-0784 Genotype mismatch \n", "16 TCGA-14-1036 Genotype mismatch \n", "17 TCGA-06-0748 Genotype mismatch \n", "18 TCGA-02-2488 Genotype mismatch \n", "19 TCGA-14-1824 Genotype mismatch \n", "20 TCGA-PN-A8M9 Genotype mismatch \n", "21 TCGA-12-1601 Subject withdrew consent \n", "22 TCGA-12-0653 Subject withdrew consent \n", "23 TCGA-32-2498 Subject withdrew consent \n", "24 TCGA-AF-3912 Subject withdrew consent \n", "25 TCGA-A6-2670 Subject withdrew consent \n", "26 TCGA-06-0131 Subject withdrew consent \n", "27 TCGA-AN-A0FG Subject identity unknown \n", "28 TCGA-AN-A0FE Subject identity unknown \n", "29 TCGA-F4-6857 Subject identity unknown \n", ".. ... ... \n", "182 TCGA-AP-A053 History of unacceptable prior treatment relate... \n", "183 TCGA-AX-A06D History of unacceptable prior treatment relate... \n", "184 TCGA-AX-A1CP History of unacceptable prior treatment relate... \n", "185 TCGA-AX-A1CR History of unacceptable prior treatment relate... \n", "186 TCGA-AX-A2H8 History of unacceptable prior treatment relate... \n", "187 TCGA-AX-A2HF History of unacceptable prior treatment relate... \n", "188 TCGA-AX-A3G3 History of unacceptable prior treatment relate... \n", "189 TCGA-B5-A0KB History of unacceptable prior treatment relate... \n", "190 TCGA-BG-A221 History of unacceptable prior treatment relate... \n", "191 TCGA-D1-A3JP History of unacceptable prior treatment relate... \n", "192 TCGA-EY-A1G8 History of unacceptable prior treatment relate... \n", "193 TCGA-L5-A88T History of unacceptable prior treatment relate... \n", "194 TCGA-WB-A820 History of unacceptable prior treatment relate... \n", "195 TCGA-XK-AAJ3 History of unacceptable prior treatment relate... \n", "196 TCGA-EJ-7312 History of unacceptable prior treatment relate... \n", "197 TCGA-96-A4JK History of unacceptable prior treatment relate... \n", "198 TCGA-2G-AAFE History of unacceptable prior treatment relate... \n", "199 TCGA-IC-A6RF History of unacceptable prior treatment relate... \n", "200 TCGA-BA-4075 History of unacceptable prior treatment relate... \n", "201 TCGA-06-6391 History of unacceptable prior treatment relate... \n", "202 TCGA-5L-AAT1 History of unacceptable prior treatment relate... \n", "203 TCGA-HT-A619 History of unacceptable prior treatment relate... \n", "204 TCGA-T1-A6J8 History of unacceptable prior treatment relate... \n", "205 TCGA-BG-A0M8 History of unacceptable prior treatment relate... \n", "206 TCGA-XK-AAK1 History of unacceptable prior treatment relate... \n", "207 TCGA-BH-A0B6 History of unacceptable prior treatment relate... \n", "208 TCGA-BG-A0MS History of unacceptable prior treatment relate... \n", "209 TCGA-AR-A2LR History of unacceptable prior treatment relate... \n", "210 TCGA-BH-A1F5 History of unacceptable prior treatment relate... \n", "211 TCGA-DM-A286 Inadvertently shipped \n", "\n", " classificationName \n", "0 Redaction \n", "1 Redaction \n", "2 Redaction \n", "3 Redaction \n", "4 Redaction \n", "5 Redaction \n", "6 Redaction \n", "7 Redaction \n", "8 Redaction \n", "9 Redaction \n", "10 Redaction \n", "11 Redaction \n", "12 Redaction \n", "13 Redaction \n", "14 Redaction \n", "15 Redaction \n", "16 Redaction \n", "17 Redaction \n", "18 Redaction \n", "19 Redaction \n", "20 Redaction \n", "21 Redaction \n", "22 Redaction \n", "23 Redaction \n", "24 Redaction \n", "25 Redaction \n", "26 Redaction \n", "27 Redaction \n", "28 Redaction \n", "29 Redaction \n", ".. ... \n", "182 Notification \n", "183 Notification \n", "184 Notification \n", "185 Notification \n", "186 Notification \n", "187 Notification \n", "188 Notification \n", "189 Notification \n", "190 Notification \n", "191 Notification \n", "192 Notification \n", "193 Notification \n", "194 Notification \n", "195 Notification \n", "196 Notification \n", "197 Notification \n", "198 Notification \n", "199 Notification \n", "200 Notification \n", "201 Notification \n", "202 Notification \n", "203 Notification \n", "204 Notification \n", "205 Notification \n", "206 Notification \n", "207 Notification \n", "208 Notification \n", "209 Notification \n", "210 Notification \n", "211 Redaction \n", "\n", "[212 rows x 3 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bq.Query(createCohort_and_checkAnnotations.select_on_annotations).results().to_dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and here we explicitly call just the second query, resulting in 329 patients:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ParticipantBarcodevital_statusdays_to_last_known_aliveethnicityhistological_typemenopause_statusrace
0TCGA-BH-A18MDead2207NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneWHITE
1TCGA-BH-A18VDead1556NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneWHITE
2TCGA-Z7-A8R6Alive3256NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
3TCGA-3C-AALIAlive4005NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
4TCGA-4H-AAAKAlive348NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
5TCGA-5L-AAT0Alive1477HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
6TCGA-A1-A0SNAlive1196NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
7TCGA-A1-A0SJAlive416NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneBLACK OR AFRICAN AMERICAN
8TCGA-A1-A0SQAlive554NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
9TCGA-A1-A0SPAlive584NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneNone
10TCGA-A1-A0SHAlive1437NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
11TCGA-A2-A25EAlive3204NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
12TCGA-A2-A25BAlive1291NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
13TCGA-A2-A0SXAlive1534NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
14TCGA-A2-A0YLAlive1474NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
15TCGA-A2-A0D4Alive767NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
16TCGA-A2-A3XZAlive1532NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
17TCGA-A2-A0D2Alive1027NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
18TCGA-A2-A04UAlive2654NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPeri (6-12 months since last menstrual period)WHITE
19TCGA-A2-A04VDead1920NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
20TCGA-A2-A0CVAlive3011NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...ASIAN
21TCGA-A2-A0T3Alive1516NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
22TCGA-A2-A3XUDead912NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
23TCGA-A2-A3XVAlive996NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
24TCGA-A2-A0T6Alive575NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
25TCGA-A2-A0D3Alive1873NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
26TCGA-A2-A0EXAlive752NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
27TCGA-A2-A25AAlive3276NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
28TCGA-A2-A3XTAlive2770NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
29TCGA-A2-A1G0Alive616NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
........................
299TCGA-GM-A2DADead6593NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
300TCGA-GM-A2DLAlive3519NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPeri (6-12 months since last menstrual period)WHITE
301TCGA-GM-A3XNAlive2019NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
302TCGA-GM-A3XLAlive2108NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
303TCGA-GM-A3XGAlive1330NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
304TCGA-HN-A2OBDead1900NoneInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...None
305TCGA-JL-A3YXAlive352NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...ASIAN
306TCGA-JL-A3YWAlive360NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...ASIAN
307TCGA-LD-A66UAlive646NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaNoneWHITE
308TCGA-LL-A5YPAlive450NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
309TCGA-LL-A7SZAlive594NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
310TCGA-LL-A6FRAlive489NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
311TCGA-LL-A5YNAlive447NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
312TCGA-LL-A5YOAlive440NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
313TCGA-MS-A51UAlive681NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaNoneWHITE
314TCGA-OL-A5RVAlive1062NoneInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
315TCGA-OL-A5RWAlive1106NoneInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
316TCGA-OL-A5D8Alive973NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
317TCGA-OL-A6VQAlive600NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
318TCGA-OL-A6VOAlive858NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneBLACK OR AFRICAN AMERICAN
319TCGA-OL-A66IAlive714NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneBLACK OR AFRICAN AMERICAN
320TCGA-OL-A6VRAlive1220NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneBLACK OR AFRICAN AMERICAN
321TCGA-OL-A66OAlive528NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
322TCGA-PE-A5DEAlive2645NoneInfiltrating Lobular CarcinomaPre (<6 months since LMP AND no prior bilatera...WHITE
323TCGA-PL-A8LZAlive302NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
324TCGA-PL-A8LYAlive8NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
325TCGA-PL-A8LXAlive5NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPre (<6 months since LMP AND no prior bilatera...BLACK OR AFRICAN AMERICAN
326TCGA-S3-AA14Alive529NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...BLACK OR AFRICAN AMERICAN
327TCGA-S3-A6ZHAlive641NOT HISPANIC OR LATINOInfiltrating Ductal CarcinomaNoneBLACK OR AFRICAN AMERICAN
328TCGA-XX-A899Alive467NOT HISPANIC OR LATINOInfiltrating Lobular CarcinomaPost (prior bilateral ovariectomy OR >12 mo si...WHITE
\n", "

329 rows × 7 columns

\n", "
" ], "text/plain": [ " ParticipantBarcode vital_status days_to_last_known_alive \\\n", "0 TCGA-BH-A18M Dead 2207 \n", "1 TCGA-BH-A18V Dead 1556 \n", "2 TCGA-Z7-A8R6 Alive 3256 \n", "3 TCGA-3C-AALI Alive 4005 \n", "4 TCGA-4H-AAAK Alive 348 \n", "5 TCGA-5L-AAT0 Alive 1477 \n", "6 TCGA-A1-A0SN Alive 1196 \n", "7 TCGA-A1-A0SJ Alive 416 \n", "8 TCGA-A1-A0SQ Alive 554 \n", "9 TCGA-A1-A0SP Alive 584 \n", "10 TCGA-A1-A0SH Alive 1437 \n", "11 TCGA-A2-A25E Alive 3204 \n", "12 TCGA-A2-A25B Alive 1291 \n", "13 TCGA-A2-A0SX Alive 1534 \n", "14 TCGA-A2-A0YL Alive 1474 \n", "15 TCGA-A2-A0D4 Alive 767 \n", "16 TCGA-A2-A3XZ Alive 1532 \n", "17 TCGA-A2-A0D2 Alive 1027 \n", "18 TCGA-A2-A04U Alive 2654 \n", "19 TCGA-A2-A04V Dead 1920 \n", "20 TCGA-A2-A0CV Alive 3011 \n", "21 TCGA-A2-A0T3 Alive 1516 \n", "22 TCGA-A2-A3XU Dead 912 \n", "23 TCGA-A2-A3XV Alive 996 \n", "24 TCGA-A2-A0T6 Alive 575 \n", "25 TCGA-A2-A0D3 Alive 1873 \n", "26 TCGA-A2-A0EX Alive 752 \n", "27 TCGA-A2-A25A Alive 3276 \n", "28 TCGA-A2-A3XT Alive 2770 \n", "29 TCGA-A2-A1G0 Alive 616 \n", ".. ... ... ... \n", "299 TCGA-GM-A2DA Dead 6593 \n", "300 TCGA-GM-A2DL Alive 3519 \n", "301 TCGA-GM-A3XN Alive 2019 \n", "302 TCGA-GM-A3XL Alive 2108 \n", "303 TCGA-GM-A3XG Alive 1330 \n", "304 TCGA-HN-A2OB Dead 1900 \n", "305 TCGA-JL-A3YX Alive 352 \n", "306 TCGA-JL-A3YW Alive 360 \n", "307 TCGA-LD-A66U Alive 646 \n", "308 TCGA-LL-A5YP Alive 450 \n", "309 TCGA-LL-A7SZ Alive 594 \n", "310 TCGA-LL-A6FR Alive 489 \n", "311 TCGA-LL-A5YN Alive 447 \n", "312 TCGA-LL-A5YO Alive 440 \n", "313 TCGA-MS-A51U Alive 681 \n", "314 TCGA-OL-A5RV Alive 1062 \n", "315 TCGA-OL-A5RW Alive 1106 \n", "316 TCGA-OL-A5D8 Alive 973 \n", "317 TCGA-OL-A6VQ Alive 600 \n", "318 TCGA-OL-A6VO Alive 858 \n", "319 TCGA-OL-A66I Alive 714 \n", "320 TCGA-OL-A6VR Alive 1220 \n", "321 TCGA-OL-A66O Alive 528 \n", "322 TCGA-PE-A5DE Alive 2645 \n", "323 TCGA-PL-A8LZ Alive 302 \n", "324 TCGA-PL-A8LY Alive 8 \n", "325 TCGA-PL-A8LX Alive 5 \n", "326 TCGA-S3-AA14 Alive 529 \n", "327 TCGA-S3-A6ZH Alive 641 \n", "328 TCGA-XX-A899 Alive 467 \n", "\n", " ethnicity histological_type \\\n", "0 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "1 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "2 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "3 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "4 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "5 HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "6 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "7 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "8 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "9 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "10 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "11 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "12 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "13 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "14 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "15 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "16 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "17 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "18 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "19 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "20 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "21 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "22 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "23 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "24 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "25 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "26 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "27 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "28 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "29 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", ".. ... ... \n", "299 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "300 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "301 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "302 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "303 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "304 None Infiltrating Lobular Carcinoma \n", "305 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "306 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "307 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "308 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "309 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "310 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "311 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "312 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "313 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "314 None Infiltrating Ductal Carcinoma \n", "315 None Infiltrating Ductal Carcinoma \n", "316 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "317 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "318 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "319 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "320 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "321 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "322 None Infiltrating Lobular Carcinoma \n", "323 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "324 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "325 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "326 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "327 NOT HISPANIC OR LATINO Infiltrating Ductal Carcinoma \n", "328 NOT HISPANIC OR LATINO Infiltrating Lobular Carcinoma \n", "\n", " menopause_status \\\n", "0 None \n", "1 None \n", "2 Pre (<6 months since LMP AND no prior bilatera... \n", "3 Post (prior bilateral ovariectomy OR >12 mo si... \n", "4 Post (prior bilateral ovariectomy OR >12 mo si... \n", "5 Post (prior bilateral ovariectomy OR >12 mo si... \n", "6 Post (prior bilateral ovariectomy OR >12 mo si... \n", "7 None \n", "8 Pre (<6 months since LMP AND no prior bilatera... \n", "9 None \n", "10 Pre (<6 months since LMP AND no prior bilatera... \n", "11 Pre (<6 months since LMP AND no prior bilatera... \n", "12 Pre (<6 months since LMP AND no prior bilatera... \n", "13 Pre (<6 months since LMP AND no prior bilatera... \n", "14 Pre (<6 months since LMP AND no prior bilatera... \n", "15 Pre (<6 months since LMP AND no prior bilatera... \n", "16 Pre (<6 months since LMP AND no prior bilatera... \n", "17 Pre (<6 months since LMP AND no prior bilatera... \n", "18 Peri (6-12 months since last menstrual period) \n", "19 Pre (<6 months since LMP AND no prior bilatera... \n", "20 Post (prior bilateral ovariectomy OR >12 mo si... \n", "21 Pre (<6 months since LMP AND no prior bilatera... \n", "22 Post (prior bilateral ovariectomy OR >12 mo si... \n", "23 Pre (<6 months since LMP AND no prior bilatera... \n", "24 Pre (<6 months since LMP AND no prior bilatera... \n", "25 Post (prior bilateral ovariectomy OR >12 mo si... \n", "26 Post (prior bilateral ovariectomy OR >12 mo si... \n", "27 Pre (<6 months since LMP AND no prior bilatera... \n", "28 Pre (<6 months since LMP AND no prior bilatera... \n", "29 Pre (<6 months since LMP AND no prior bilatera... \n", ".. ... \n", "299 Post (prior bilateral ovariectomy OR >12 mo si... \n", "300 Peri (6-12 months since last menstrual period) \n", "301 Post (prior bilateral ovariectomy OR >12 mo si... \n", "302 Post (prior bilateral ovariectomy OR >12 mo si... \n", "303 Pre (<6 months since LMP AND no prior bilatera... \n", "304 Pre (<6 months since LMP AND no prior bilatera... \n", "305 Post (prior bilateral ovariectomy OR >12 mo si... \n", "306 Post (prior bilateral ovariectomy OR >12 mo si... \n", "307 None \n", "308 Post (prior bilateral ovariectomy OR >12 mo si... \n", "309 Pre (<6 months since LMP AND no prior bilatera... \n", "310 Post (prior bilateral ovariectomy OR >12 mo si... \n", "311 Post (prior bilateral ovariectomy OR >12 mo si... \n", "312 Pre (<6 months since LMP AND no prior bilatera... \n", "313 None \n", "314 Pre (<6 months since LMP AND no prior bilatera... \n", "315 Pre (<6 months since LMP AND no prior bilatera... \n", "316 Pre (<6 months since LMP AND no prior bilatera... \n", "317 Pre (<6 months since LMP AND no prior bilatera... \n", "318 None \n", "319 None \n", "320 None \n", "321 Pre (<6 months since LMP AND no prior bilatera... \n", "322 Pre (<6 months since LMP AND no prior bilatera... \n", "323 Pre (<6 months since LMP AND no prior bilatera... \n", "324 Pre (<6 months since LMP AND no prior bilatera... \n", "325 Pre (<6 months since LMP AND no prior bilatera... \n", "326 Post (prior bilateral ovariectomy OR >12 mo si... \n", "327 None \n", "328 Post (prior bilateral ovariectomy OR >12 mo si... \n", "\n", " race \n", "0 WHITE \n", "1 WHITE \n", "2 WHITE \n", "3 BLACK OR AFRICAN AMERICAN \n", "4 WHITE \n", "5 WHITE \n", "6 WHITE \n", "7 BLACK OR AFRICAN AMERICAN \n", "8 WHITE \n", "9 None \n", "10 WHITE \n", "11 BLACK OR AFRICAN AMERICAN \n", "12 BLACK OR AFRICAN AMERICAN \n", "13 WHITE \n", "14 WHITE \n", "15 BLACK OR AFRICAN AMERICAN \n", "16 BLACK OR AFRICAN AMERICAN \n", "17 WHITE \n", "18 WHITE \n", "19 WHITE \n", "20 ASIAN \n", "21 WHITE \n", "22 BLACK OR AFRICAN AMERICAN \n", "23 BLACK OR AFRICAN AMERICAN \n", "24 WHITE \n", "25 WHITE \n", "26 WHITE \n", "27 WHITE \n", "28 BLACK OR AFRICAN AMERICAN \n", "29 WHITE \n", ".. ... \n", "299 WHITE \n", "300 WHITE \n", "301 WHITE \n", "302 BLACK OR AFRICAN AMERICAN \n", "303 WHITE \n", "304 None \n", "305 ASIAN \n", "306 ASIAN \n", "307 WHITE \n", "308 BLACK OR AFRICAN AMERICAN \n", "309 BLACK OR AFRICAN AMERICAN \n", "310 BLACK OR AFRICAN AMERICAN \n", "311 BLACK OR AFRICAN AMERICAN \n", "312 BLACK OR AFRICAN AMERICAN \n", "313 WHITE \n", "314 BLACK OR AFRICAN AMERICAN \n", "315 BLACK OR AFRICAN AMERICAN \n", "316 BLACK OR AFRICAN AMERICAN \n", "317 WHITE \n", "318 BLACK OR AFRICAN AMERICAN \n", "319 BLACK OR AFRICAN AMERICAN \n", "320 BLACK OR AFRICAN AMERICAN \n", "321 BLACK OR AFRICAN AMERICAN \n", "322 WHITE \n", "323 BLACK OR AFRICAN AMERICAN \n", "324 BLACK OR AFRICAN AMERICAN \n", "325 BLACK OR AFRICAN AMERICAN \n", "326 BLACK OR AFRICAN AMERICAN \n", "327 BLACK OR AFRICAN AMERICAN \n", "328 WHITE \n", "\n", "[329 rows x 7 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bq.Query(createCohort_and_checkAnnotations.select_on_clinical).results().to_dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and finally we call the main query:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ParticipantBarcode
0TCGA-3C-AALI
1TCGA-4H-AAAK
2TCGA-5L-AAT0
3TCGA-A1-A0SH
4TCGA-A1-A0SJ
5TCGA-A1-A0SN
6TCGA-A1-A0SP
7TCGA-A1-A0SQ
8TCGA-A2-A04P
9TCGA-A2-A04Q
10TCGA-A2-A04R
11TCGA-A2-A04U
12TCGA-A2-A04V
13TCGA-A2-A04W
14TCGA-A2-A04X
15TCGA-A2-A0CL
16TCGA-A2-A0CM
17TCGA-A2-A0CV
18TCGA-A2-A0CZ
19TCGA-A2-A0D2
20TCGA-A2-A0D3
21TCGA-A2-A0D4
22TCGA-A2-A0EX
23TCGA-A2-A0SX
24TCGA-A2-A0T3
25TCGA-A2-A0T5
26TCGA-A2-A0T6
27TCGA-A2-A0YE
28TCGA-A2-A0YJ
29TCGA-A2-A0YL
......
297TCGA-GM-A2DL
298TCGA-GM-A3XG
299TCGA-GM-A3XL
300TCGA-GM-A3XN
301TCGA-HN-A2OB
302TCGA-JL-A3YW
303TCGA-JL-A3YX
304TCGA-LD-A66U
305TCGA-LL-A5YN
306TCGA-LL-A5YO
307TCGA-LL-A5YP
308TCGA-LL-A6FR
309TCGA-LL-A7SZ
310TCGA-MS-A51U
311TCGA-OL-A5D8
312TCGA-OL-A5RV
313TCGA-OL-A5RW
314TCGA-OL-A66I
315TCGA-OL-A66O
316TCGA-OL-A6VO
317TCGA-OL-A6VQ
318TCGA-OL-A6VR
319TCGA-PE-A5DE
320TCGA-PL-A8LX
321TCGA-PL-A8LY
322TCGA-PL-A8LZ
323TCGA-S3-A6ZH
324TCGA-S3-AA14
325TCGA-XX-A899
326TCGA-Z7-A8R6
\n", "

327 rows × 1 columns

\n", "
" ], "text/plain": [ " ParticipantBarcode\n", "0 TCGA-3C-AALI\n", "1 TCGA-4H-AAAK\n", "2 TCGA-5L-AAT0\n", "3 TCGA-A1-A0SH\n", "4 TCGA-A1-A0SJ\n", "5 TCGA-A1-A0SN\n", "6 TCGA-A1-A0SP\n", "7 TCGA-A1-A0SQ\n", "8 TCGA-A2-A04P\n", "9 TCGA-A2-A04Q\n", "10 TCGA-A2-A04R\n", "11 TCGA-A2-A04U\n", "12 TCGA-A2-A04V\n", "13 TCGA-A2-A04W\n", "14 TCGA-A2-A04X\n", "15 TCGA-A2-A0CL\n", "16 TCGA-A2-A0CM\n", "17 TCGA-A2-A0CV\n", "18 TCGA-A2-A0CZ\n", "19 TCGA-A2-A0D2\n", "20 TCGA-A2-A0D3\n", "21 TCGA-A2-A0D4\n", "22 TCGA-A2-A0EX\n", "23 TCGA-A2-A0SX\n", "24 TCGA-A2-A0T3\n", "25 TCGA-A2-A0T5\n", "26 TCGA-A2-A0T6\n", "27 TCGA-A2-A0YE\n", "28 TCGA-A2-A0YJ\n", "29 TCGA-A2-A0YL\n", ".. ...\n", "297 TCGA-GM-A2DL\n", "298 TCGA-GM-A3XG\n", "299 TCGA-GM-A3XL\n", "300 TCGA-GM-A3XN\n", "301 TCGA-HN-A2OB\n", "302 TCGA-JL-A3YW\n", "303 TCGA-JL-A3YX\n", "304 TCGA-LD-A66U\n", "305 TCGA-LL-A5YN\n", "306 TCGA-LL-A5YO\n", "307 TCGA-LL-A5YP\n", "308 TCGA-LL-A6FR\n", "309 TCGA-LL-A7SZ\n", "310 TCGA-MS-A51U\n", "311 TCGA-OL-A5D8\n", "312 TCGA-OL-A5RV\n", "313 TCGA-OL-A5RW\n", "314 TCGA-OL-A66I\n", "315 TCGA-OL-A66O\n", "316 TCGA-OL-A6VO\n", "317 TCGA-OL-A6VQ\n", "318 TCGA-OL-A6VR\n", "319 TCGA-PE-A5DE\n", "320 TCGA-PL-A8LX\n", "321 TCGA-PL-A8LY\n", "322 TCGA-PL-A8LZ\n", "323 TCGA-S3-A6ZH\n", "324 TCGA-S3-AA14\n", "325 TCGA-XX-A899\n", "326 TCGA-Z7-A8R6\n", "\n", "[327 rows x 1 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bq.Query(createCohort_and_checkAnnotations).results().to_dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we didn't need to call each sub-query individually, we could have just called the main query and gotten the same result. As you can see, two patients that met the clinical select criteria (which returned 329 patients) were excluded from the final result (which returned 327 patients)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we leave off, here are a few useful tricks for working with BigQuery in Cloud Datalab:\n", "+ if you want to see the raw SQL, you can just build the query and then print it out (this might be useful, for example, in debugging a query -- you can copy and paste the SQL directly into the BigQuery Web UI);\n", "+ if you want to see how much data and which tables are going to be touched by this data, you can use the \"dry run\" option. (Notice the \"cacheHit\" flag -- if you have recently done a particular query, you will not be charged to repeat it since it will have been cached.)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
SELECT\n",
       "  c.ParticipantBarcode AS ParticipantBarcode\n",
       "FROM (\n",
       "  SELECT\n",
       "    a.categoryName,\n",
       "    a.classificationName,\n",
       "    a.ParticipantBarcode,\n",
       "    c.ParticipantBarcode,\n",
       "  FROM ( (SELECT\n",
       "  ParticipantBarcode,\n",
       "  annotationCategoryName AS categoryName,\n",
       "  annotationClassification AS classificationName\n",
       "FROM\n",
       "  [isb-cgc:tcga_201607_beta.Annotations]\n",
       "WHERE\n",
       "  ( itemTypeName=\"Patient\"\n",
       "    AND (annotationCategoryName=\"History of unacceptable prior treatment related to a prior/other malignancy\"\n",
       "      OR annotationClassification=\"Redaction\" ) )\n",
       "GROUP BY\n",
       "  ParticipantBarcode,\n",
       "  categoryName,\n",
       "  classificationName) ) AS a\n",
       "  OUTER JOIN EACH \n",
       "       ( (SELECT\n",
       "  ParticipantBarcode,\n",
       "  vital_status,\n",
       "  days_to_last_known_alive,\n",
       "  ethnicity,\n",
       "  histological_type,\n",
       "  menopause_status,\n",
       "  race\n",
       "FROM\n",
       "  [isb-cgc:tcga_201607_beta.Clinical_data]\n",
       "WHERE\n",
       "  ( Study=\"BRCA\"\n",
       "    AND age_at_initial_pathologic_diagnosis<=50\n",
       "    AND gender=\"FEMALE\" )) ) AS c\n",
       "  ON\n",
       "    a.ParticipantBarcode = c.ParticipantBarcode\n",
       "  WHERE\n",
       "    (a.ParticipantBarcode IS NOT NULL\n",
       "      OR c.ParticipantBarcode IS NOT NULL)\n",
       "  ORDER BY\n",
       "    a.classificationName,\n",
       "    a.categoryName,\n",
       "    a.ParticipantBarcode,\n",
       "    c.ParticipantBarcode )\n",
       "WHERE\n",
       "  ( a.categoryName IS NULL\n",
       "    AND a.classificationName IS NULL\n",
       "    AND c.ParticipantBarcode IS NOT NULL )\n",
       "ORDER BY\n",
       "  c.ParticipantBarcode
" ], "text/plain": [ "SELECT\n", " c.ParticipantBarcode AS ParticipantBarcode\n", "FROM (\n", " SELECT\n", " a.categoryName,\n", " a.classificationName,\n", " a.ParticipantBarcode,\n", " c.ParticipantBarcode,\n", " FROM ( (SELECT\n", " ParticipantBarcode,\n", " annotationCategoryName AS categoryName,\n", " annotationClassification AS classificationName\n", "FROM\n", " [isb-cgc:tcga_201607_beta.Annotations]\n", "WHERE\n", " ( itemTypeName=\"Patient\"\n", " AND (annotationCategoryName=\"History of unacceptable prior treatment related to a prior/other malignancy\"\n", " OR annotationClassification=\"Redaction\" ) )\n", "GROUP BY\n", " ParticipantBarcode,\n", " categoryName,\n", " classificationName) ) AS a\n", " OUTER JOIN EACH \n", " ( (SELECT\n", " ParticipantBarcode,\n", " vital_status,\n", " days_to_last_known_alive,\n", " ethnicity,\n", " histological_type,\n", " menopause_status,\n", " race\n", "FROM\n", " [isb-cgc:tcga_201607_beta.Clinical_data]\n", "WHERE\n", " ( Study=\"BRCA\"\n", " AND age_at_initial_pathologic_diagnosis<=50\n", " AND gender=\"FEMALE\" )) ) AS c\n", " ON\n", " a.ParticipantBarcode = c.ParticipantBarcode\n", " WHERE\n", " (a.ParticipantBarcode IS NOT NULL\n", " OR c.ParticipantBarcode IS NOT NULL)\n", " ORDER BY\n", " a.classificationName,\n", " a.categoryName,\n", " a.ParticipantBarcode,\n", " c.ParticipantBarcode )\n", "WHERE\n", " ( a.categoryName IS NULL\n", " AND a.classificationName IS NULL\n", " AND c.ParticipantBarcode IS NOT NULL )\n", "ORDER BY\n", " c.ParticipantBarcode" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q = bq.Query(createCohort_and_checkAnnotations)\n", "q" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'cacheHit': True,\n", " u'referencedTables': [{u'datasetId': u'tcga_201607_beta',\n", " u'projectId': u'isb-cgc',\n", " u'tableId': u'Annotations'},\n", " {u'datasetId': u'tcga_201607_beta',\n", " u'projectId': u'isb-cgc',\n", " u'tableId': u'Clinical_data'}],\n", " u'totalBytesBilled': u'0',\n", " u'totalBytesProcessed': u'785007'}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q.execute_dry_run()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }