# Creating an InterMine workflow using the API

We are going to re-create the workflow we did using the web interface using the python API.

We start by importing the Service class from InterMine's webservice module. You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into [HumanMine](http://www.humanmine.org/) and going to the account details tab within MyMine. Cut and paste your token into the code below.

In [1]:
from intermine.webservice import Service
service = Service("https://www.humanmine.org/humanmine/service", token = "C1w6Sciavafam1W5d7Q8")

Our first query looked at whether the set of Pax6 targets (from list PL_Pax6_Targets) are expressed in the pancreas. In the web interface we used a template to run this query, but here we will create a query object. 

In [2]:
query = service.new_query("Gene")


First we will define the output columns that we want in our result - i.e the view. We want to add fields (attributes) from both the Gene class and the proteinAtlasExpression class.

In [3]:
query.add_view(
 "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
 "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
 "proteinAtlasExpression.tissue.name"
)





Next, add the constraints to your query. We want to constrain the Gene class to the genes in the PL_Pax6_Targets list.

In [4]:
query.add_constraint("Gene", "IN", "PL_Pax6_Targets", code = "A")




We also need to constrain the expression level to be "high" or "medium" and the tissue to be "pancreas".

In [5]:
query.add_constraint("proteinAtlasExpression.tissue.name", "=", "Pancreas", code = "B")
query.add_constraint("proteinAtlasExpression.level", "ONE OF", ["Medium", "High"], code = "C")




Now, let's check what the query returns by looping through the rows and printing the results:

In [6]:
for row in query.rows():
 print(row["primaryIdentifier"], row["symbol"], row["proteinAtlasExpression.cellType"], \
 row["proteinAtlasExpression.level"], row["proteinAtlasExpression.reliability"], \
 row["proteinAtlasExpression.tissue.name"])

84618 NT5C1A exocrine glandular cells Medium Supported Pancreas
29880 ALG5 exocrine glandular cells Medium Approved Pancreas
476 ATP1A1 exocrine glandular cells High Enhanced Pancreas
23200 ATP11B exocrine glandular cells Medium Uncertain Pancreas
374868 ATP9B exocrine glandular cells Medium Approved Pancreas
490 ATP2B1 exocrine glandular cells Medium Enhanced Pancreas
490 ATP2B1 islets of Langerhans Medium Enhanced Pancreas
54828 BCAS3 exocrine glandular cells Medium Approved Pancreas
54828 BCAS3 islets of Langerhans Medium Approved Pancreas
1121 CHM exocrine glandular cells Medium Approved Pancreas
1121 CHM islets of Langerhans Medium Approved Pancreas
55152 DALRD3 exocrine glandular cells Medium Approved Pancreas
55152 DALRD3 islets of Langerhans Medium Approved Pancreas
5422 POLA1 exocrine glandular cells Medium Supported Pancreas
23085 ERC1 exocrine glandular cells Medium Approved Pancreas
2045 EPHA7 exocrine glandular cells High Approved Pancreas
2045 EPHA7 islets of Langerhans M

389072 PLEKHM3 exocrine glandular cells Medium Uncertain Pancreas
389072 PLEKHM3 islets of Langerhans Medium Uncertain Pancreas
5339 PLEC exocrine glandular cells Medium Enhanced Pancreas
22827 PUF60 exocrine glandular cells Medium Enhanced Pancreas
22827 PUF60 islets of Langerhans Medium Enhanced Pancreas
5033 P4HA1 exocrine glandular cells Medium Enhanced Pancreas
54681 P4HTM exocrine glandular cells High Supported Pancreas
54681 P4HTM islets of Langerhans Medium Supported Pancreas
10196 PRMT3 exocrine glandular cells Medium Uncertain Pancreas
10196 PRMT3 islets of Langerhans Medium Uncertain Pancreas
5566 PRKACA exocrine glandular cells Medium Approved Pancreas
5566 PRKACA islets of Langerhans Medium Approved Pancreas
6210 RPS15A exocrine glandular cells High Supported Pancreas
6428 SRSF3 exocrine glandular cells High Supported Pancreas
6428 SRSF3 islets of Langerhans High Supported Pancreas
55084 SOBP exocrine glandular cells Medium Uncertain Pancreas
55084 SOBP islets of Langerhan

We want to save this set of genes (i.e genes from the Pax6 target set that are expressed in the pancreas) for further analysis. To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list.

In [7]:
UpinPancreas = list()
for row in query.rows():
 UpinPancreas.append(row["primaryIdentifier"])

and check that the list we have created looks correct:

In [8]:
print(UpinPancreas)

['84618', '29880', '476', '23200', '374868', '490', '490', '54828', '54828', '1121', '1121', '55152', '55152', '5422', '23085', '2045', '2045', '2048', '55120', '55120', '28964', '2736', '2736', '8339', '8339', '6928', '9922', '8543', '8543', '26468', '26468', '987', '4211', '4212', '4212', '140609', '140609', '5087', '5087', '5090', '5090', '9678', '23133', '23133', '5862', '5862', '27316', '27316', '55703', '23328', '23328', '9792', '84193', '7110', '7110', '80700', '57654', '27072', '7444', '65125', '51741', '51741', '79971', '3983', '10097', '10097', '4301', '51319', '51319', '657', '3491', '84529', '171425', '171425', '1478', '1478', '905', '905', '1848', '26610', '26610', '79767', '8891', '8891', '8667', '8667', '11340', '83989', '83989', '63877', '55137', '55137', '93986', '93986', '2971', '2619', '3172', '3187', '3187', '3205', '3217', '3233', '3233', '3397', '3397', '3615', '3615', '57117', '57117', '359948', '359948', '3728', '3728', '57565', '57565', '114818', '22920', '2292

We now need to save the list to our intermine account so we can use it again in a later query. The ListManager class provides methods to manage list contents and operations.

In [9]:
lm=service.list_manager()
lm.delete_lists(["UpinPancreas"])
lm.create_list(content=UpinPancreas, list_type="Gene", name="UpinPancreas")



Log in to HumanMine and check your list has been created.

## Second query: Diabetes genes 

Our second query (which we created using the query builder) found genes that are associated with the diesease diabetes. Re-create this query using code as follows:

In [10]:
query2 = service.new_query("Gene")
query2.add_view("primaryIdentifier", "symbol")
query2.add_constraint("organism.name", "=", "Homo sapiens", code = "A")
query2.add_constraint("diseases.name", "CONTAINS", "diabetes", code = "B")

for row in query2.rows():
 print (row["primaryIdentifier"], row["symbol"])

208 AKT2
6833 ABCC8
640 BLK
1234 CCR5
5611 DNAJC3
169792 GLIS3
6927 HNF1A
6928 HNF1B
8462 KLF11
389692 MAFA
5325 PLAGL1
9882 TBC1D4
346171 ZFP57
26060 APPL1
1636 ACE
359 AQP2
551 AVP
554 AVPR2
11132 CAPN10
1056 CEL
1493 CTLA4
5167 ENPP1
2056 EPO
9451 EIF2AK3
50943 FOXP3
2642 GCGR
2645 GCK
2820 GPD2
3172 HNF4A
3159 HMGA1
3077 HFE
57061 HYMAI
51124 IER3IP1
3710 ITPR3
3630 INS
10644 IGF2BP2
3643 INSR
3667 IRS1
8660 IRS2
3557 IL1RN
3559 IL2RA
3569 IL6
3990 LIPC
4544 MTNR1B
9479 MAPK8IP1
4760 NEUROD1
5078 PAX4
3651 PDX1
5444 PON1
5468 PPARG
3767 KCNJ11
5506 PPP1R3A
5770 PTPN1
26191 PTPN22
56729 RETN
387082 SUMO4
6514 SLC2A2
169026 SLC30A8
6648 SOD2
6934 TCF7L2
7422 VEGFA
7466 WFS1


and save the set of genes returned as a list:

In [11]:
diabetesGenes = list()
for row in query2.rows():
 diabetesGenes.append(row["primaryIdentifier"])

In [12]:
lm=service.list_manager()
lm.delete_lists(["diabetesGenes"])
lm.create_list(content=diabetesGenes, list_type="Gene", name="diabetesGenes")



Next, we used a list intersect to find those genes that are upregulated in the pancreas that are also associated with the disease diabetes. We need to intersect the first (UpinPancreas) and second (diabetesGenes) lists that we created. We can do this using the intersect method from the ListManager class.

In [13]:
lm.delete_lists(["intersectedList"])
lm.intersect(["UpinPancreas", "diabetesGenes"], "intersectedList")



In [14]:
intersectedList = lm.get_list("intersectedList")

In [15]:
print(intersectedList)

intersectedList (3 Gene) 2020-03-26T20:31:20+0000 Intersection of UpinPancreas and diabetesGenes


## Final Query: GWAS 

Finally, we fed the intersected list from above back into another query to see if there was any association of these genes with diabetes phenotypes according to GWAS studies. Note that we now start our query from the GWAS class:

In [16]:
query = service.new_query("GWAS")

In [17]:
query.add_view(
 "results.associatedGenes.primaryIdentifier",
 "results.associatedGenes.symbol", "results.associatedGenes.name",
 "results.SNP.primaryIdentifier", "results.pValue", "results.phenotype",
 "firstAuthor", "name", "publication.pubMedId",
 "results.associatedGenes.organism.shortName"
)




In [18]:
query.add_constraint("results.pValue", "<=", "1e-04", code = "B")
query.add_constraint("results.phenotype", "CONTAINS", "diabetes", code = "C")
query.add_constraint("results.associatedGenes", "IN", "intersectedList", code = "D")



In [19]:
for row in query.rows():
 print(row["results.associatedGenes.primaryIdentifier"], row["results.associatedGenes.symbol"], \
 row["results.associatedGenes.name"], row["results.SNP.primaryIdentifier"], \
 row["results.pValue"], row["results.phenotype"], row["firstAuthor"], row["name"], \
 row["publication.pubMedId"], row["results.associatedGenes.organism.shortName"])

6934 TCF7L2 transcription factor 7 like 2 rs386418874 6e-11 Type 2 diabetes Adeyemo AA ZRANB3 is an African-specific type 2 diabetes locus associated with beta-cell mass and insulin response. 31324766 H. sapiens
6934 TCF7L2 transcription factor 7 like 2 rs34872471 1e-94 Type 2 diabetes Bonas-Guarch S Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. 29358691 H. sapiens
6934 TCF7L2 transcription factor 7 like 2 rs7903146 5e-13 Type 2 diabetes Chen J Genome-wide association study of type 2 diabetes in Africa. 31049640 H. sapiens
6934 TCF7L2 transcription factor 7 like 2 rs34872471 6e-53 Type 2 diabetes Cook JP Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility. 27189021 H. sapiens
6934 TCF7L2 transcription factor 7 like 2 rs34872471 8e-08 Type 2 diabetes Ghassibe-Sabbagh M T2DM GWAS in the Lebanese population confirms the role of TCF7L2 and CDKAL1 in disease susceptibility. 25483131 H

In [20]:
for row in query.rows():
 print(row["results.associatedGenes.symbol"])

TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
HNF4A
TCF7L2
TCF7L2
TCF7L2
HNF4A
TCF7L2
HNF4A
TCF7L2
HNF4A
HNF4A
HNF4A
HNF4A
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
HNF4A
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
HNF4A
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
TCF7L2
HNF4A
TCF7L2
TCF7L2
TCF7L2
