# Intermine-Python: Tutorial 3: More about Constraints

In the previous tutorial, we learnt about adding constraints to our query so that we could filter the results. In this tutorial we will take a look at some more contraints and the different types of constraints. 

In [1]:
from intermine.webservice import Service

In [2]:
service = Service("www.flymine.org/flymine/service")
query=service.new_query("Gene") 

##### Unary Constraint

The first type of constraint that we will look at is a Unary Constraint. A Unary Constraint is one that does not take any value but can be used to check if a particular attirbute is absent or present. The Unary constraints are IS Null and IS NOT Null. We can look at a small example. 

In [3]:
query.add_constraint("primaryIdentifier","IS NOT NULL")



In [4]:
for row in query.rows(size=10):
 print(row)

Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1004698 length=1951 name=None primaryIdentifier='FBgn0039942' score=None scoreType=None secondaryIdentifier='CG17163' symbol='CG17163'
Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secon

##### Binary Constraint

The next type of constraint is a Binary Constraint. This refers to constraints that take a value. Most of the constraints that we looked at in the second tutorial were binary constraints. Binary constraints are the largest group of constraints. The operators are =,<=,>=,<,>,!=

In [5]:
query.add_constraint("length",">=","12000")

= 12000>

In [6]:
for row in query.rows(size=10):
 print(row)

Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None seconda

The above constraint is an example of a binary constraint. 

##### Ternary Constraint

We will now look at Ternary constraints. A ternary constraint is a type of constraint which has one required value and one optional value. Currently, intermine supports only one such type of operator: LOOKUP. The lookup operator searches through all the fields in a particular class for the value specified by the user. In the example given below, it will search through the entire gene class to find if any of the fields has an occurence of "zen". The advantage of this is that you do not need to remember if zen is a symbol or a name or a primaryIdentifier. However, this may lead to ambiguous results and so you can use the optional extra_value parameter to limit the search to the type of object (for example, organism in genes). 

In [7]:
query2=service.new_query()

In [8]:
query2.add_constraint("Gene","LOOKUP","zen",extra_value="D. melanogaster")



In [9]:
for row in query2.rows():
 print(row)

Gene: briefDescription=None cytoLocation='84A5-84A5' description=None id=1007678 length=1331 name='zerknullt' primaryIdentifier='FBgn0004053' score=None scoreType=None secondaryIdentifier='CG1046' symbol='zen'


##### Multi-Value Constraint

The next constraint type that we will look at is Multi-Value constraints. This allows the constraint to take multiple values. The two operators that are allowed are ONE OF and NONE OF. 

In [10]:
query3=service.new_query("Gene")

In [11]:
query3.add_constraint("symbol","NONE OF",['zen','eve'])



In [12]:
for row in query3.rows(size=10):
 print(row)

Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1004698 length=1951 name=None primaryIdentifier='FBgn0039942' score=None scoreType=None secondaryIdentifier='CG17163' symbol='CG17163'
Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secon

##### List Constraint

List Constraints: List constraints allow users to create a named list of objects and then use the operators IN and NOT IN to use those named lists in queries. An example for the same is below. The path in such a query must always be a Class (for example - Gene is a valid path). The available lists in intermine can be found at: http://www.flymine.org/flymine/bag.do?subtab=view .

In [16]:
query4=service.new_query()

In [17]:
query4.add_constraint("Gene","IN","PL FlyAtlas_brain_top")



In [18]:
for row in query4.rows(size=10):
 print(row)

Gene: briefDescription=None cytoLocation='10A3-10A3' description=None id=1107310 length=2075 name=None primaryIdentifier='FBgn0030259' score=None scoreType=None secondaryIdentifier='CG1545' symbol='CG1545'
Gene: briefDescription=None cytoLocation='11D8-11D8' description=None id=1039485 length=90456 name='radish' primaryIdentifier='FBgn0265597' score=None scoreType=None secondaryIdentifier='CG44424' symbol='rad'
Gene: briefDescription=None cytoLocation='14A1-14A3' description=None id=1040836 length=26224 name='mind-meld' primaryIdentifier='FBgn0259110' score=None scoreType=None secondaryIdentifier='CG42252' symbol='mmd'
Gene: briefDescription=None cytoLocation='16F3-16F5' description=None id=1059829 length=138941 name='Shaker' primaryIdentifier='FBgn0003380' score=None scoreType=None secondaryIdentifier='CG12348' symbol='Sh'
Gene: briefDescription=None cytoLocation='18C2-18C3' description=None id=1076481 length=21373 name='nicotinic Acetylcholine Receptor alpha7' primaryIdentifier='FBgn

##### Sub-Class Constraints

The intermine database is a hierarchical database. Sub-class constraints allow you to specify a sub-class of a class to constrain a path to. This basically allows us to constrain our results to only those items of the sub class. The example below is an example of a sub-class constraint. 

In [19]:
query5=service.new_query("Gene")

In [20]:
query5.add_constraint("ontologyAnnotations","GOAnnotation")



In [21]:
for row in query5.rows(size=10):
 print(row)

Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None seconda

Unlike most constraints, Sub-class constraints do not have an operator that is specified as a parameter to a constraint. 

##### Loop Constraints

Loop Constraints assert that two paths refer to the same object. The valid operators are IS and IS NOT. The Path and LoopPath in such a query must always be a Class(for example - Gene is a valid path). Also, the operators IS and IS NOT map to the ops = and != when they are used in XML serialisation. The example below is an example of a Loop Constraint. 

In [22]:
query=service.new_query("Gene")

In [23]:
query.add_view("homologues.gene.primaryIdentifier","homologues.homologue.primaryIdentifier")



In [24]:
query.add_constraint("Gene", "IN", "H. sapiens orthologues of FlyTF_site_specific_1", code = "A")



In [25]:
query.add_constraint("homologues.homologue", "IS NOT", "Gene", code = "B")



In [26]:
for row in query.rows(size=10):
 print(row["homologues.gene.primaryIdentifier"],row["homologues.homologue.primaryIdentifier"])

1390 10488 
1390 1385 
1390 1388 
1390 148327 
1390 22926 
1390 286319 
1390 466 
1390 64764 
1390 84699 
1390 90993 


##### Range Constraints

Range Constraints are used for testing where a value lies relative to a set of ranges.These constraints require that the value of the path they constrain should lie in relationship to the set of values passed according to the specific operator. Valid operators are OVERLAPS, DOES NOT OVERLAP, WITHIN, OUTSIDE, CONTAINS and DOES NOT CONTAIN. Here is an example of Range Constraint. 

In [27]:
query=service.new_query()

In [28]:
query.add_view("SequenceFeature.organism.shortName" "SequenceFeature.chromosomeLocation.locatedOn.primaryIdentifier" "SequenceFeature.chromosomeLocation.start" "SequenceFeature.chromosomeLocation.end" )



In [29]:
query.add_constraint("chromosomeLocation", "OVERLAPS", ["X:94248091..143371935"])



In [29]:
for row in query.rows(size=4): 
 print(row)

SequenceFeature: organism.shortName='H. sapiens' chromosomeLocation.locatedOn.primaryIdentifier='X' chromosomeLocation.start=62462543 chromosomeLocation.end=114281198 
SequenceFeature: organism.shortName='H. sapiens' chromosomeLocation.locatedOn.primaryIdentifier='X' chromosomeLocation.start=94309037 chromosomeLocation.end=94319036 
SequenceFeature: organism.shortName='H. sapiens' chromosomeLocation.locatedOn.primaryIdentifier='X' chromosomeLocation.start=94309037 chromosomeLocation.end=94320117 
SequenceFeature: organism.shortName='H. sapiens' chromosomeLocation.locatedOn.primaryIdentifier='X' chromosomeLocation.start=94314037 chromosomeLocation.end=94319036 


This tutorial summed up some of the important constraint types. In the next tutorial we will look at some of the other features of a query. 