# Creating an InterMine workflow using the API

We are going to re-create the workflow we did using the web interface using the python API.

We start by importing the Service class from InterMine's webservice module. You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into [HumanMine](http://www.humanmine.org/) and going to the account details tab within MyMine. Cut and paste your token into the code below.

In [7]:
from intermine.webservice import Service
service = Service("http://www.humanmine.org/humanmine/service", token = "YOUR TOKEN HERE")


Our first query looked at whether the set of Pax6 targets (from list PL_Pax6_Targets) are expressed in the pancreas. In the web interface we used a template to run this query, but here we will create a query object. 

In [8]:
query = service.new_query("Gene")


First we will define the output columns that we want in our result - i.e the view. We want to add fields (attributes) from both the Gene class and the proteinAtlasExpression class.

In [9]:
query.add_view(
 "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
 "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
 "proteinAtlasExpression.tissue.name"
)





Next, add the constraints to your query. We want to constrain the Gene class to the genes in the PL_Pax6_Targets list.

In [10]:
query.add_constraint("Gene", "IN", "PL_Pax6_Targets", code = "A")




We also need to constrain the expression level to be "high" or "medium" and the tissue to be "pancreas".

In [11]:
query.add_constraint("proteinAtlasExpression.tissue.name", "=", "Pancreas", code = "B")
query.add_constraint("proteinAtlasExpression.level", "ONE OF", ["Medium", "High"], code = "C")




Now, let's check what the query returns by looping through the rows and printing the results:

In [12]:
for row in query.rows():
 print(row["primaryIdentifier"], row["symbol"], row["proteinAtlasExpression.cellType"], \
 row["proteinAtlasExpression.level"], row["proteinAtlasExpression.reliability"], \
 row["proteinAtlasExpression.tissue.name"])

(u'84618', u'NT5C1A', u'exocrine glandular cells', u'Medium', u'Supported', u'Pancreas')
(u'29880', u'ALG5', u'exocrine glandular cells', u'Medium', u'Approved', u'Pancreas')
(u'10097', u'ACTR2', u'exocrine glandular cells', u'Medium', u'Approved', u'Pancreas')
(u'10097', u'ACTR2', u'islets of Langerhans', u'Medium', u'Approved', u'Pancreas')
(u'476', u'ATP1A1', u'exocrine glandular cells', u'High', u'Enhanced', u'Pancreas')
(u'23200', u'ATP11B', u'exocrine glandular cells', u'Medium', u'Uncertain', u'Pancreas')
(u'374868', u'ATP9B', u'exocrine glandular cells', u'Medium', u'Approved', u'Pancreas')
(u'490', u'ATP2B1', u'exocrine glandular cells', u'Medium', u'Enhanced', u'Pancreas')
(u'490', u'ATP2B1', u'islets of Langerhans', u'Medium', u'Enhanced', u'Pancreas')
(u'54828', u'BCAS3', u'exocrine glandular cells', u'Medium', u'Approved', u'Pancreas')
(u'54828', u'BCAS3', u'islets of Langerhans', u'Medium', u'Approved', u'Pancreas')
(u'1121', u'CHM', u'exocrine glandular cells', u'Medium'

We want to save this set of genes (i.e genes from the Pax6 target set that are expressed in the pancreas) for further analysis. To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list.

In [13]:
UpinPancreas = list()
for row in query.rows():
 UpinPancreas.append(row["primaryIdentifier"])

and check that the list we have created looks correct:

In [14]:
print(UpinPancreas)

[u'84618', u'29880', u'10097', u'10097', u'476', u'23200', u'374868', u'490', u'490', u'54828', u'54828', u'1121', u'1121', u'55152', u'55152', u'5422', u'23085', u'2045', u'2045', u'2048', u'55120', u'55120', u'28964', u'2736', u'2736', u'6928', u'9922', u'8543', u'8543', u'26468', u'26468', u'987', u'4211', u'4212', u'4212', u'140609', u'140609', u'5087', u'5087', u'5090', u'5090', u'9678', u'23133', u'23133', u'5862', u'5862', u'27316', u'27316', u'55703', u'23328', u'23328', u'9792', u'84193', u'7110', u'7110', u'80700', u'27072', u'65125', u'51741', u'51741', u'3983', u'4301', u'51319', u'51319', u'657', u'84529', u'171425', u'171425', u'1478', u'1478', u'905', u'905', u'3491', u'1848', u'26610', u'26610', u'79767', u'8891', u'8891', u'8667', u'8667', u'11340', u'83989', u'83989', u'63877', u'55137', u'55137', u'93986', u'93986', u'2971', u'2619', u'3172', u'3187', u'3187', u'8339', u'8339', u'3205', u'3217', u'3233', u'3233', u'3397', u'3397', u'3615', u'3615', u'57117', u'57117'

We now need to save the list to our intermine account so we can use it again in a later query. The ListManager class provides methods to manage list contents and operations.

In [15]:
lm=service.list_manager()
lm.create_list(content=UpinPancreas, list_type="Gene", name="UpinPancreas")



Log in to HumanMine and check your list has been created.

## Second query: Diabetes genes 

Our second query (which we created using the query builder) found genes that are associated with the diesease diabetes. Re-create this query using code as follows:

In [16]:
query2 = service.new_query("Gene")
query2.add_view("primaryIdentifier", "symbol")
query2.add_constraint("organism.name", "=", "Homo sapiens", code = "A")
query2.add_constraint("diseases.name", "CONTAINS", "diabetes", code = "B")

for row in query2.rows():
 print (row["primaryIdentifier"], row["symbol"])

(u'4938', u'OAS1')
(u'208', u'AKT2')
(u'6833', u'ABCC8')
(u'640', u'BLK')
(u'1234', u'CCR5')
(u'54901', u'CDKAL1')
(u'5611', u'DNAJC3')
(u'169792', u'GLIS3')
(u'6927', u'HNF1A')
(u'6928', u'HNF1B')
(u'8462', u'KLF11')
(u'5325', u'PLAGL1')
(u'9882', u'TBC1D4')
(u'346171', u'ZFP57')
(u'26060', u'APPL1')
(u'1636', u'ACE')
(u'359', u'AQP2')
(u'551', u'AVP')
(u'554', u'AVPR2')
(u'11132', u'CAPN10')
(u'1056', u'CEL')
(u'1493', u'CTLA4')
(u'5167', u'ENPP1')
(u'2056', u'EPO')
(u'9451', u'EIF2AK3')
(u'50943', u'FOXP3')
(u'2642', u'GCGR')
(u'2645', u'GCK')
(u'2820', u'GPD2')
(u'3077', u'HFE')
(u'3172', u'HNF4A')
(u'3159', u'HMGA1')
(u'57061', u'HYMAI')
(u'51124', u'IER3IP1')
(u'3710', u'ITPR3')
(u'3630', u'INS')
(u'10644', u'IGF2BP2')
(u'3643', u'INSR')
(u'3667', u'IRS1')
(u'8660', u'IRS2')
(u'3557', u'IL1RN')
(u'3559', u'IL2RA')
(u'3569', u'IL6')
(u'3990', u'LIPC')
(u'4544', u'MTNR1B')
(u'9479', u'MAPK8IP1')
(u'4760', u'NEUROD1')
(u'5078', u'PAX4')
(u'3651', u'PDX1')
(u'5444', u'PON1')
(u'5468'

and save the set of genes returned as a list:

In [17]:
diabetesGenes = list()
for row in query2.rows():
 diabetesGenes.append(row["primaryIdentifier"])

In [18]:
lm=service.list_manager()
lm.create_list(content=diabetesGenes, list_type="Gene", name="diabetesGenes")



Next, we used a list intersect to find those genes that are upregulated in the pancreas that are also associated with the disease diabetes. We need to intersect the first (UpinPancreas) and second (diabetesGenes) lists that we created. We can do this using the intersect method from the ListManager class.

In [19]:
lm.intersect(["UpinPancreas", "diabetesGenes"], "intersectedList")



In [27]:
intersectedList = lm.get_list("intersectedList")

In [28]:
print(intersectedList)

intersectedList (3 Gene) 2019-02-12T11:34:10+0000 Intersection of UpinPancreas and diabetesGenes


## Final Query: GWAS 

Finally, we fed the intersected list from above back into another query to see if there was any association of these genes with diabetes phenotypes according to GWAS studies. Note that we now start our query from the GWAS class:

In [22]:
query = service.new_query("GWAS")

In [23]:
query.add_view(
 "results.associatedGenes.primaryIdentifier",
 "results.associatedGenes.symbol", "results.associatedGenes.name",
 "results.SNP.primaryIdentifier", "results.pValue", "results.phenotype",
 "firstAuthor", "name", "publication.pubMedId",
 "results.associatedGenes.organism.shortName"
)




In [24]:
query.add_constraint("results.pValue", "<=", "1e-04", code = "B")
query.add_constraint("results.phenotype", "CONTAINS", "diabetes", code = "C")
query.add_constraint("results.associatedGenes", "IN", "intersectedList", code = "D")



In [25]:
for row in query.rows():
 print(row["results.associatedGenes.primaryIdentifier"], row["results.associatedGenes.symbol"], \
 row["results.associatedGenes.name"], row["results.SNP.primaryIdentifier"], \
 row["results.pValue"], row["results.phenotype"], row["firstAuthor"], row["name"], \
 row["publication.pubMedId"], row["results.associatedGenes.organism.shortName"])

(u'3172', u'HNF4A', u'hepatocyte nuclear factor 4 alpha', u'rs6017317', 1e-11, u'Type 2 diabetes', u'Cho YS', u'Type 2 diabetes', u'22158537', u'H. sapiens')
(u'6934', u'TCF7L2', u'transcription factor 7 like 2', u'rs7903146', 2e-15, u'Type 2 diabetes', u'Kho AN', u'Type 2 diabetes', u'22101970', u'H. sapiens')
(u'3172', u'HNF4A', u'hepatocyte nuclear factor 4 alpha', u'rs4812829', 3e-10, u'Type 2 diabetes', u'Kooner JS', u'Type 2 diabetes', u'21874001', u'H. sapiens')
(u'6928', u'HNF1B', u'HNF1 homeobox B', u'rs4430796', 2e-11, u'Type 2 diabetes', u'Li H', u'Type 2 diabetes', u'22961080', u'H. sapiens')
(u'6934', u'TCF7L2', u'transcription factor 7 like 2', u'rs7903146', 2e-40, u'Type 2 diabetes', u'Perry JR', u'Type 2 diabetes', u'22693455', u'H. sapiens')
(u'6934', u'TCF7L2', u'transcription factor 7 like 2', u'rs7903146', 4e-21, u'Type 2 diabetes', u'Perry JR', u'Type 2 diabetes', u'22693455', u'H. sapiens')
(u'6934', u'TCF7L2', u'transcription factor 7 like 2', u'rs7903146', 1e-30

In [26]:
for row in query.rows():
 print(row["results.associatedGenes.symbol"])

HNF4A
TCF7L2
HNF4A
HNF1B
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
HNF1B
TCF7L2
TCF7L2
TCF7L2
TCF7L2
