# Intermine-Python: Tutorial 5: Query Results

This tutorial will talk about dealing with the results of our query. You can either store the results into a file (using a library like csv) or you can process the results immediately after you extract them. 

We will write a short query and will then explore the Results class of intermine. 

In [1]:
from intermine.webservice import Service

In [2]:
service = Service("www.flymine.org/flymine/service")
query=service.new_query("Gene")

In [3]:
query.select("publications.*")

<intermine.query.Query at 0x7f9e100fd080>

In [4]:
query.add_constraint("Gene","LOOKUP","zen",extra_value="D. melanogaster")

<TernaryConstraint: Gene LOOKUP zen IN D. melanogaster>

Once we have added our constraints and views, we are ready to look at the results. The results can be accessed in either a dictionary form, or a list, a ResultRow object(the most common one), or even as a list of strings (CSV or TSV).

In [5]:
for gene in query.results(row="rr"):
    print(gene)

Gene: publications.abstractText="A 631-bp fragment containing the 5'-flanking region of the Drosophila melanogaster proliferating-cell nuclear antigen (PCNA) gene was placed upstream of the chloramphenicol acetyltransferase (CAT) gene of a CAT vector. A transient expression assay of CAT activity in Drosophila Kc cells transfected with this plasmid and a set of 5'-deletion derivatives revealed that the promoter function resided within a 192-bp region (-168 to +24 with respect to the transcription initiation site). Cotransfection with a zerknüllt (zen)-expressing plasmid specifically repressed CAT expression. However, cotransfection with expression plasmids for a nonfunctional zen mutation, even-skipped, or bicoid showed no significant effect on CAT expression. RNase protection analysis revealed that the repression by zen was at the transcription step. The target sequence of zen was mapped within the 34-bp region (-119 to -86) of the PCNA gene promoter, even though it lacked zen protein-

In [6]:
for gene in query.rows():
    print(gene)

Gene: publications.abstractText="A 631-bp fragment containing the 5'-flanking region of the Drosophila melanogaster proliferating-cell nuclear antigen (PCNA) gene was placed upstream of the chloramphenicol acetyltransferase (CAT) gene of a CAT vector. A transient expression assay of CAT activity in Drosophila Kc cells transfected with this plasmid and a set of 5'-deletion derivatives revealed that the promoter function resided within a 192-bp region (-168 to +24 with respect to the transcription initiation site). Cotransfection with a zerknüllt (zen)-expressing plasmid specifically repressed CAT expression. However, cotransfection with expression plasmids for a nonfunctional zen mutation, even-skipped, or bicoid showed no significant effect on CAT expression. RNase protection analysis revealed that the repression by zen was at the transcription step. The target sequence of zen was mapped within the 34-bp region (-119 to -86) of the PCNA gene promoter, even though it lacked zen protein-

Iterating through query.results(row="rr") and Iterating through query.rows() are equivalent. Feel free to use whichever you feel more comfortable with. If you want to extract only specific columns, it may be easier to use "list" instead or rr. Let's say you want to extract column 2 & 3, i.e. publications.doi and publications.firstAuthor, then it can be done as follows. 

In [7]:
for gene in query.results("list"):
    print(gene[1],gene[2])

None Yamaguchi M
None Kwon Eunjeong
10.1006/jtbi.1996.0328 Bodnar J W
None Ip Y T
None Jiang J
10.1038/332858a0 Hoey T
10.1038/35101500 Gurdon J B
10.1016/j.cub.2005.03.022 Schmidt-Ott Urs
10.1242/dev.001065 Duboule Denis
10.1002/1521-1878(200102)23:2<125::AID-BIES1019>3.0.CO;2-C Jagla K
10.1002/bies.950170904 Venkatesh T V
None Patel N H
None Agosti D
None Geisler R
None Hashimoto Carl
None Jones C M
None Kidd S
None Carroll S B
10.1242/dev.067926 Wilson Megan J
10.1242/dev.01124 Markstein Michele
10.1038/35068578 Ross J J
10.1016/j.tig.2003.10.009 Raftery Laurel A
None Jaźwińska A
None González-Reyes A
10.1371/journal.pgen.1002769 Holmqvist Per-Henrik
None Nepveu A
10.1101/gad.215459.113 Saunders Abbie
None Strecker T R
10.1093/gbe/evr061 Gehring Walter J
10.1007/s00438-006-0187-8 Roy Swarnava
10.1038/ncomms8102 Ugrankar Rupali
None Markstein Michele
10.1101/gr.104018.109 Ozdemir Anil
10.1371/journal.pbio.1000456 Kazemian Majid
None Comeron J M
10.1101/gr.178701 Bergman C M
None Akam

If you want to print only those rows where publications.doi is not None then you can add an if condition as shown below. 

In [43]:
for row in query.results(row="list"):
    if row[1]!=None:
        print(row[1])

10.1006/jtbi.1996.0328
10.1038/332858a0
10.1038/35101500
10.1016/j.cub.2005.03.022
10.1242/dev.001065
10.1002/1521-1878(200102)23:2<125::AID-BIES1019>3.0.CO;2-C
10.1002/bies.950170904
10.1242/dev.067926
10.1242/dev.01124
10.1038/35068578
10.1016/j.tig.2003.10.009
10.1371/journal.pgen.1002769
10.1101/gad.215459.113
10.1093/gbe/evr061
10.1007/s00438-006-0187-8
10.1038/ncomms8102
10.1101/gr.104018.109
10.1371/journal.pbio.1000456
10.1101/gr.178701
10.1371/journal.pcbi.1001020
10.1038/ng1908
10.1016/j.semcdb.2014.04.036
10.1016/j.devcel.2011.05.009
10.7554/eLife.04837
10.1242/dev.019802
10.1038/nature07214
10.1016/j.ydbio.2014.02.007
10.1126/science.1090289
10.1006/dbio.1994.1306
10.1371/journal.pbio.0050117
10.1038/sj.emboj.7601532
10.1006/jtbi.1995.0175
10.1101/gr.6490707
10.1038/380037a0
10.1146/annurev.genet.35.102401.090832
10.1101/gad.1509607
10.1038/ncomms10115
10.1126/science.1139816
10.1242/dev.01722
10.1371/journal.pone.0030610
10.1101/gad.188052.112
10.1006/dbio.2002.0652
10.101

You can pass two more parameters while passing query.results(). These are start and size. Start represents the row number that you want to start processing from. By default this is set to 0 (first row). Size represents the number of rows that you want to print. Lets say we want to print rows 10 and 11 only. 

In [45]:
for row in query.results(row="rr",size=2,start=10):
    print(row)

Gene: publications.abstractText='Although the genetics of dorsal-ventral polarity which leads to mesoderm formation in Drosophila are understood in considerable detail, subsequent molecular mechanisms involved in patterning the mesoderm primordium into individual mesodermal subtypes are poorly understood. Two papers published recently suggest strongly that an inductive signal from dorsal ectoderm is involved in subdividing the underlying mesoderm, and present evidence that one of the signalling factors is Decapentaplegic (Dpp), a member of the bone morphogenetic protein subgroup of the Transforming Growth Factor-beta (TGF-beta) super family of proteins.' publications.doi='10.1002/bies.950170904' publications.firstAuthor='Venkatesh T V' publications.id=1007725 publications.issue='9' publications.journal='Bioessays' publications.month='Sep' publications.pages='754-7' publications.pubMedId='8763827' publications.title='How many signals does it take?' publications.volume='17' publications.

If you prefer dealing with lists of strings, i.e. csv and tsv objects you can use them too. First, we need to import the csv library. If you want to read your results in a CSV format you create a csv.reader object. This has been shown below. 

In [46]:
import csv

In [47]:
csv_reader = csv.reader(query.results(row="csv"), delimiter=",", quotechar='"')

In [48]:
for row in csv_reader:
    print(row[0])

A 631-bp fragment containing the 5'-flanking region of the Drosophila melanogaster proliferating-cell nuclear antigen (PCNA) gene was placed upstream of the chloramphenicol acetyltransferase (CAT) gene of a CAT vector. A transient expression assay of CAT activity in Drosophila Kc cells transfected with this plasmid and a set of 5'-deletion derivatives revealed that the promoter function resided within a 192-bp region (-168 to +24 with respect to the transcription initiation site). Cotransfection with a zerknüllt (zen)-expressing plasmid specifically repressed CAT expression. However, cotransfection with expression plasmids for a nonfunctional zen mutation, even-skipped, or bicoid showed no significant effect on CAT expression. RNase protection analysis revealed that the repression by zen was at the transcription step. The target sequence of zen was mapped within the 34-bp region (-119 to -86) of the PCNA gene promoter, even though it lacked zen protein-binding sites. Transgenic flies c

If you have used the csv library before try writing your results into a csv file using the writer class. If you have not used it before, trying going through the documentation first and then writing code on your own. It is pretty self explanatory. The documentation can be found at: https://docs.python.org/2/library/csv.html

The last thing that we will look at, in this tutorial, is the summarize method. This method proves to be particularly useful when we want some basic statistics regarding a particular column. We will look at the statistics of the length of genes present in the list : List of the most enriched genes in the adult fly brain. We begin by creating a query. This is followed by adding views and the list constraint. 

In [12]:
query2=service.new_query()

In [13]:
query2.select("Gene.*","organism.*")

<intermine.query.Query at 0x7f9e100b57b8>

In [14]:
query2.add_constraint("Gene","IN","PL FlyAtlas_brain_top")

<ListConstraint: Gene IN PL FlyAtlas_brain_top>

We then print out the first 10 rows of results. 

In [15]:
for row in query2.rows(size=10):
    print(row)

Gene: briefDescription=None cytoLocation='10A3-10A3' description=None id=1107310 length=2075 name=None primaryIdentifier='FBgn0030259' score=None scoreType=None secondaryIdentifier='CG1545' symbol='CG1545' organism.commonName='fruit fly' organism.genus='Drosophila' organism.id=1000001 organism.name='Drosophila melanogaster' organism.shortName='D. melanogaster' organism.species='melanogaster' organism.taxonId=7227
Gene: briefDescription=None cytoLocation='11D8-11D8' description=None id=1039485 length=90456 name='radish' primaryIdentifier='FBgn0265597' score=None scoreType=None secondaryIdentifier='CG44424' symbol='rad' organism.commonName='fruit fly' organism.genus='Drosophila' organism.id=1000001 organism.name='Drosophila melanogaster' organism.shortName='D. melanogaster' organism.species='melanogaster' organism.taxonId=7227
Gene: briefDescription=None cytoLocation='14A1-14A3' description=None id=1040836 length=26224 name='mind-meld' primaryIdentifier='FBgn0259110' score=None scoreType

We can look at the summary of the length of each gene. This contains some useful information such as the average length and the maximum and minimum length. 

In [16]:
print(query2.summarise("length"))

{'count': 17.0, 'bucket': 1.0, 'max': 169106.0, 'stdev': 41489.79692298, 'average': 31601.891891891893, 'min': 808.0, 'buckets': 20.0}


 Also, another property that comes in handy while printing results about a query that we can print only the columns we want in that result. Suppose for now, I only want the ID of Gene's, here's what I do: 

In [17]:
 for row in query2.rows(size=10):
 print(row["Gene.id"])

1109635
1040181
1041556
1061141
1078562
1071407
1008394
1700377
1434993
1182992


This brings us to the end of the fifth tutorial. The next tutorial will be about further management of results. 