{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating an InterMine workflow using the API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to re-create the workflow we did using the web interface using the python API.\n", "\n", "We start by importing the Service class from InterMine's webservice module. You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into [HumanMine](http://www.humanmine.org/) and going to the account details tab within MyMine. Cut and paste your token into the code below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from intermine.webservice import Service\n", "service = Service(\"https://www.humanmine.org/humanmine/service\", token = \"C1w6Sciavafam1W5d7Q8\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our first query looked at whether the set of Pax6 targets (from list PL_Pax6_Targets) are expressed in the pancreas. In the web interface we used a template to run this query, but here we will create a query object. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query = service.new_query(\"Gene\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we will define the output columns that we want in our result - i.e the view. We want to add fields (attributes) from both the Gene class and the proteinAtlasExpression class." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_view(\n", " \"primaryIdentifier\", \"symbol\", \"proteinAtlasExpression.cellType\",\n", " \"proteinAtlasExpression.level\", \"proteinAtlasExpression.reliability\",\n", " \"proteinAtlasExpression.tissue.name\"\n", ")\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, add the constraints to your query. We want to constrain the Gene class to the genes in the PL_Pax6_Targets list." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_constraint(\"Gene\", \"IN\", \"PL_Pax6_Targets\", code = \"A\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need to constrain the expression level to be \"high\" or \"medium\" and the tissue to be \"pancreas\"." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_constraint(\"proteinAtlasExpression.tissue.name\", \"=\", \"Pancreas\", code = \"B\")\n", "query.add_constraint(\"proteinAtlasExpression.level\", \"ONE OF\", [\"Medium\", \"High\"], code = \"C\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's check what the query returns by looping through the rows and printing the results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84618 NT5C1A exocrine glandular cells Medium Supported Pancreas\n", "29880 ALG5 exocrine glandular cells Medium Approved Pancreas\n", "476 ATP1A1 exocrine glandular cells High Enhanced Pancreas\n", "23200 ATP11B exocrine glandular cells Medium Uncertain Pancreas\n", "374868 ATP9B exocrine glandular cells Medium Approved Pancreas\n", "490 ATP2B1 exocrine glandular cells Medium Enhanced Pancreas\n", "490 ATP2B1 islets of Langerhans Medium Enhanced Pancreas\n", "54828 BCAS3 exocrine glandular cells Medium Approved Pancreas\n", "54828 BCAS3 islets of Langerhans Medium Approved Pancreas\n", "1121 CHM exocrine glandular cells Medium Approved Pancreas\n", "1121 CHM islets of Langerhans Medium Approved Pancreas\n", "55152 DALRD3 exocrine glandular cells Medium Approved Pancreas\n", "55152 DALRD3 islets of Langerhans Medium Approved Pancreas\n", "5422 POLA1 exocrine glandular cells Medium Supported Pancreas\n", "23085 ERC1 exocrine glandular cells Medium Approved Pancreas\n", "2045 EPHA7 exocrine glandular cells High Approved Pancreas\n", "2045 EPHA7 islets of Langerhans Medium Approved Pancreas\n", "2048 EPHB2 exocrine glandular cells Medium Approved Pancreas\n", "55120 FANCL exocrine glandular cells Medium Supported Pancreas\n", "55120 FANCL islets of Langerhans Medium Supported Pancreas\n", "28964 GIT1 islets of Langerhans Medium Enhanced Pancreas\n", "2736 GLI2 exocrine glandular cells High Approved Pancreas\n", "2736 GLI2 islets of Langerhans High Approved Pancreas\n", "8339 H2BC8 exocrine glandular cells High Supported Pancreas\n", "8339 H2BC8 islets of Langerhans High Supported Pancreas\n", "6928 HNF1B exocrine glandular cells High Enhanced Pancreas\n", "9922 IQSEC1 exocrine glandular cells Medium Uncertain Pancreas\n", "8543 LMO4 exocrine glandular cells Medium Approved Pancreas\n", "8543 LMO4 islets of Langerhans Medium Approved Pancreas\n", "26468 LHX6 exocrine glandular cells Medium Approved Pancreas\n", "26468 LHX6 islets of Langerhans Medium Approved Pancreas\n", "987 LRBA exocrine glandular cells High Approved Pancreas\n", "4211 MEIS1 exocrine glandular cells Medium Approved Pancreas\n", "4212 MEIS2 exocrine glandular cells High Enhanced Pancreas\n", "4212 MEIS2 islets of Langerhans High Enhanced Pancreas\n", "140609 NEK7 exocrine glandular cells Medium Approved Pancreas\n", "140609 NEK7 islets of Langerhans Medium Approved Pancreas\n", "5087 PBX1 exocrine glandular cells High Supported Pancreas\n", "5087 PBX1 islets of Langerhans Medium Supported Pancreas\n", "5090 PBX3 exocrine glandular cells Medium Supported Pancreas\n", "5090 PBX3 islets of Langerhans Medium Supported Pancreas\n", "9678 PHF14 exocrine glandular cells High Supported Pancreas\n", "23133 PHF8 exocrine glandular cells High Approved Pancreas\n", "23133 PHF8 islets of Langerhans Medium Approved Pancreas\n", "5862 RAB2A exocrine glandular cells Medium Approved Pancreas\n", "5862 RAB2A islets of Langerhans Medium Approved Pancreas\n", "27316 RBMX exocrine glandular cells Medium Supported Pancreas\n", "27316 RBMX islets of Langerhans Medium Supported Pancreas\n", "55703 POLR3B exocrine glandular cells High Supported Pancreas\n", "23328 SASH1 exocrine glandular cells Medium Uncertain Pancreas\n", "23328 SASH1 islets of Langerhans Medium Uncertain Pancreas\n", "9792 SERTAD2 exocrine glandular cells High Enhanced Pancreas\n", "84193 SETD3 islets of Langerhans Medium Approved Pancreas\n", "7110 TMF1 exocrine glandular cells Medium Approved Pancreas\n", "7110 TMF1 islets of Langerhans Medium Approved Pancreas\n", "80700 UBXN6 exocrine glandular cells Medium Approved Pancreas\n", "57654 UVSSA exocrine glandular cells Medium Approved Pancreas\n", "27072 VPS41 islets of Langerhans Medium Approved Pancreas\n", "7444 VRK2 exocrine glandular cells Medium Approved Pancreas\n", "65125 WNK1 islets of Langerhans Medium Enhanced Pancreas\n", "51741 WWOX exocrine glandular cells Medium Approved Pancreas\n", "51741 WWOX islets of Langerhans Medium Approved Pancreas\n", "79971 WLS exocrine glandular cells Medium Approved Pancreas\n", "3983 ABLIM1 exocrine glandular cells Medium Approved Pancreas\n", "10097 ACTR2 exocrine glandular cells Medium Approved Pancreas\n", "10097 ACTR2 islets of Langerhans Medium Approved Pancreas\n", "4301 AFDN exocrine glandular cells Medium Approved Pancreas\n", "51319 RSRC1 exocrine glandular cells Medium Supported Pancreas\n", "51319 RSRC1 islets of Langerhans Medium Supported Pancreas\n", "657 BMPR1A exocrine glandular cells High Approved Pancreas\n", "3491 CCN1 exocrine glandular cells Medium Uncertain Pancreas\n", "84529 C15orf41 exocrine glandular cells Medium Uncertain Pancreas\n", "171425 CLYBL exocrine glandular cells High Enhanced Pancreas\n", "171425 CLYBL islets of Langerhans High Enhanced Pancreas\n", "1478 CSTF2 exocrine glandular cells Medium Supported Pancreas\n", "1478 CSTF2 islets of Langerhans Medium Supported Pancreas\n", "905 CCNT2 exocrine glandular cells High Supported Pancreas\n", "905 CCNT2 islets of Langerhans High Supported Pancreas\n", "1848 DUSP6 exocrine glandular cells Medium Approved Pancreas\n", "26610 ELP4 exocrine glandular cells Medium Approved Pancreas\n", "26610 ELP4 islets of Langerhans Medium Approved Pancreas\n", "79767 ELMO3 islets of Langerhans Medium Approved Pancreas\n", "8891 EIF2B3 exocrine glandular cells Medium Supported Pancreas\n", "8891 EIF2B3 islets of Langerhans Medium Supported Pancreas\n", "8667 EIF3H exocrine glandular cells Medium Approved Pancreas\n", "8667 EIF3H islets of Langerhans Medium Approved Pancreas\n", "11340 EXOSC8 exocrine glandular cells High Uncertain Pancreas\n", "83989 FAM172A exocrine glandular cells High Approved Pancreas\n", "83989 FAM172A islets of Langerhans Medium Approved Pancreas\n", "63877 FAM204A exocrine glandular cells High Approved Pancreas\n", "55137 FIGN exocrine glandular cells High Approved Pancreas\n", "55137 FIGN islets of Langerhans Medium Approved Pancreas\n", "93986 FOXP2 exocrine glandular cells Medium Enhanced Pancreas\n", "93986 FOXP2 islets of Langerhans Medium Enhanced Pancreas\n", "2971 GTF3A exocrine glandular cells High Supported Pancreas\n", "2619 GAS1 exocrine glandular cells High Approved Pancreas\n", "3172 HNF4A exocrine glandular cells Medium Enhanced Pancreas\n", "3187 HNRNPH1 exocrine glandular cells High Supported Pancreas\n", "3187 HNRNPH1 islets of Langerhans High Supported Pancreas\n", "3205 HOXA9 islets of Langerhans Medium Uncertain Pancreas\n", "3217 HOXB7 exocrine glandular cells High Uncertain Pancreas\n", "3233 HOXD4 exocrine glandular cells Medium Approved Pancreas\n", "3233 HOXD4 islets of Langerhans Medium Approved Pancreas\n", "3397 ID1 exocrine glandular cells High Approved Pancreas\n", "3397 ID1 islets of Langerhans Medium Approved Pancreas\n", "3615 IMPDH2 exocrine glandular cells High Approved Pancreas\n", "3615 IMPDH2 islets of Langerhans Medium Approved Pancreas\n", "57117 INTS12 exocrine glandular cells High Approved Pancreas\n", "57117 INTS12 islets of Langerhans Medium Approved Pancreas\n", "359948 IRF2BP2 exocrine glandular cells High Approved Pancreas\n", "359948 IRF2BP2 islets of Langerhans High Approved Pancreas\n", "3728 JUP exocrine glandular cells High Supported Pancreas\n", "3728 JUP islets of Langerhans Medium Supported Pancreas\n", "57565 KLHL14 exocrine glandular cells High Uncertain Pancreas\n", "57565 KLHL14 islets of Langerhans High Uncertain Pancreas\n", "114818 KLHL29 exocrine glandular cells Medium Approved Pancreas\n", "22920 KIFAP3 exocrine glandular cells Medium Supported Pancreas\n", "22920 KIFAP3 islets of Langerhans Medium Supported Pancreas\n", "83938 LRMDA exocrine glandular cells Medium Approved Pancreas\n", "201255 LRRC45 exocrine glandular cells Medium Enhanced Pancreas\n", "54674 LRRN3 exocrine glandular cells Medium Approved Pancreas\n", "143098 MPP7 exocrine glandular cells Medium Enhanced Pancreas\n", "143098 MPP7 islets of Langerhans High Enhanced Pancreas\n", "145282 MIPOL1 exocrine glandular cells Medium Enhanced Pancreas\n", "145282 MIPOL1 islets of Langerhans Medium Enhanced Pancreas\n", "219402 MTIF3 exocrine glandular cells High Approved Pancreas\n", "219402 MTIF3 islets of Langerhans High Approved Pancreas\n", "5607 MAP2K5 exocrine glandular cells Medium Supported Pancreas\n", "51776 MAP3K20 exocrine glandular cells Medium Approved Pancreas\n", "124540 MSI2 exocrine glandular cells High Approved Pancreas\n", "124540 MSI2 islets of Langerhans High Approved Pancreas\n", "4082 MARCKS exocrine glandular cells Medium Enhanced Pancreas\n", "4082 MARCKS islets of Langerhans Medium Enhanced Pancreas\n", "4756 NEO1 exocrine glandular cells Medium Uncertain Pancreas\n", "4774 NFIA exocrine glandular cells High Enhanced Pancreas\n", "340371 NRBP2 exocrine glandular cells Medium Approved Pancreas\n", "340371 NRBP2 islets of Langerhans Medium Approved Pancreas\n", "2494 NR5A2 exocrine glandular cells High Approved Pancreas\n", "2494 NR5A2 islets of Langerhans High Approved Pancreas\n", "116039 OSR2 islets of Langerhans Medium Uncertain Pancreas\n", "5080 PAX6 islets of Langerhans High Enhanced Pancreas\n", "5727 PTCH1 exocrine glandular cells Medium Uncertain Pancreas\n", "5727 PTCH1 islets of Langerhans Medium Uncertain Pancreas\n", "442213 PTCHD4 exocrine glandular cells High Uncertain Pancreas\n", "5195 PEX14 exocrine glandular cells Medium Supported Pancreas\n", "5195 PEX14 islets of Langerhans Medium Supported Pancreas\n", "5144 PDE4D exocrine glandular cells High Approved Pancreas\n", "5150 PDE7A islets of Langerhans Medium Approved Pancreas\n", "144100 PLEKHA7 exocrine glandular cells High Approved Pancreas\n", "144100 PLEKHA7 islets of Langerhans Medium Approved Pancreas\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "389072 PLEKHM3 exocrine glandular cells Medium Uncertain Pancreas\n", "389072 PLEKHM3 islets of Langerhans Medium Uncertain Pancreas\n", "5339 PLEC exocrine glandular cells Medium Enhanced Pancreas\n", "22827 PUF60 exocrine glandular cells Medium Enhanced Pancreas\n", "22827 PUF60 islets of Langerhans Medium Enhanced Pancreas\n", "5033 P4HA1 exocrine glandular cells Medium Enhanced Pancreas\n", "54681 P4HTM exocrine glandular cells High Supported Pancreas\n", "54681 P4HTM islets of Langerhans Medium Supported Pancreas\n", "10196 PRMT3 exocrine glandular cells Medium Uncertain Pancreas\n", "10196 PRMT3 islets of Langerhans Medium Uncertain Pancreas\n", "5566 PRKACA exocrine glandular cells Medium Approved Pancreas\n", "5566 PRKACA islets of Langerhans Medium Approved Pancreas\n", "6210 RPS15A exocrine glandular cells High Supported Pancreas\n", "6428 SRSF3 exocrine glandular cells High Supported Pancreas\n", "6428 SRSF3 islets of Langerhans High Supported Pancreas\n", "55084 SOBP exocrine glandular cells Medium Uncertain Pancreas\n", "55084 SOBP islets of Langerhans Medium Uncertain Pancreas\n", "115286 SLC25A26 exocrine glandular cells Medium Supported Pancreas\n", "6840 SVIL exocrine glandular cells Medium Approved Pancreas\n", "6809 STX3 exocrine glandular cells Medium Approved Pancreas\n", "6809 STX3 islets of Langerhans High Approved Pancreas\n", "57616 TSHZ3 exocrine glandular cells High Approved Pancreas\n", "6938 TCF12 exocrine glandular cells High Approved Pancreas\n", "6934 TCF7L2 exocrine glandular cells High Supported Pancreas\n", "6934 TCF7L2 islets of Langerhans High Supported Pancreas\n", "6907 TBL1X exocrine glandular cells High Approved Pancreas\n", "6907 TBL1X islets of Langerhans Medium Approved Pancreas\n", "3842 TNPO1 exocrine glandular cells High Approved Pancreas\n", "3842 TNPO1 islets of Langerhans Medium Approved Pancreas\n", "6902 TBCA exocrine glandular cells High Approved Pancreas\n", "6902 TBCA islets of Langerhans Medium Approved Pancreas\n", "7533 YWHAH islets of Langerhans Medium Uncertain Pancreas\n", "9039 UBA3 exocrine glandular cells Medium Supported Pancreas\n", "9039 UBA3 islets of Langerhans Medium Supported Pancreas\n", "9690 UBE3C exocrine glandular cells Medium Approved Pancreas\n", "9690 UBE3C islets of Langerhans Medium Approved Pancreas\n", "84669 USP32 exocrine glandular cells Medium Approved Pancreas\n", "84669 USP32 islets of Langerhans Medium Approved Pancreas\n", "143187 VTI1A exocrine glandular cells High Supported Pancreas\n", "9686 VGLL4 exocrine glandular cells Medium Approved Pancreas\n", "23613 ZMYND8 islets of Langerhans Medium Approved Pancreas\n", "84858 ZNF503 exocrine glandular cells High Approved Pancreas\n", "84858 ZNF503 islets of Langerhans Medium Approved Pancreas\n", "25925 ZNF521 exocrine glandular cells Medium Enhanced Pancreas\n", "23414 ZFPM2 exocrine glandular cells Medium Approved Pancreas\n", "23414 ZFPM2 islets of Langerhans Medium Approved Pancreas\n" ] } ], "source": [ "for row in query.rows():\n", " print(row[\"primaryIdentifier\"], row[\"symbol\"], row[\"proteinAtlasExpression.cellType\"], \\\n", " row[\"proteinAtlasExpression.level\"], row[\"proteinAtlasExpression.reliability\"], \\\n", " row[\"proteinAtlasExpression.tissue.name\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to save this set of genes (i.e genes from the Pax6 target set that are expressed in the pancreas) for further analysis. To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "UpinPancreas = list()\n", "for row in query.rows():\n", " UpinPancreas.append(row[\"primaryIdentifier\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and check that the list we have created looks correct:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['84618', '29880', '476', '23200', '374868', '490', '490', '54828', '54828', '1121', '1121', '55152', '55152', '5422', '23085', '2045', '2045', '2048', '55120', '55120', '28964', '2736', '2736', '8339', '8339', '6928', '9922', '8543', '8543', '26468', '26468', '987', '4211', '4212', '4212', '140609', '140609', '5087', '5087', '5090', '5090', '9678', '23133', '23133', '5862', '5862', '27316', '27316', '55703', '23328', '23328', '9792', '84193', '7110', '7110', '80700', '57654', '27072', '7444', '65125', '51741', '51741', '79971', '3983', '10097', '10097', '4301', '51319', '51319', '657', '3491', '84529', '171425', '171425', '1478', '1478', '905', '905', '1848', '26610', '26610', '79767', '8891', '8891', '8667', '8667', '11340', '83989', '83989', '63877', '55137', '55137', '93986', '93986', '2971', '2619', '3172', '3187', '3187', '3205', '3217', '3233', '3233', '3397', '3397', '3615', '3615', '57117', '57117', '359948', '359948', '3728', '3728', '57565', '57565', '114818', '22920', '22920', '83938', '201255', '54674', '143098', '143098', '145282', '145282', '219402', '219402', '5607', '51776', '124540', '124540', '4082', '4082', '4756', '4774', '340371', '340371', '2494', '2494', '116039', '5080', '5727', '5727', '442213', '5195', '5195', '5144', '5150', '144100', '144100', '389072', '389072', '5339', '22827', '22827', '5033', '54681', '54681', '10196', '10196', '5566', '5566', '6210', '6428', '6428', '55084', '55084', '115286', '6840', '6809', '6809', '57616', '6938', '6934', '6934', '6907', '6907', '3842', '3842', '6902', '6902', '7533', '9039', '9039', '9690', '9690', '84669', '84669', '143187', '9686', '23613', '84858', '84858', '25925', '23414', '23414']\n" ] } ], "source": [ "print(UpinPancreas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now need to save the list to our intermine account so we can use it again in a later query. The ListManager class provides methods to manage list contents and operations." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm=service.list_manager()\n", "lm.delete_lists([\"UpinPancreas\"])\n", "lm.create_list(content=UpinPancreas, list_type=\"Gene\", name=\"UpinPancreas\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Log in to HumanMine and check your list has been created." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second query: Diabetes genes \n", "\n", "Our second query (which we created using the query builder) found genes that are associated with the diesease diabetes. Re-create this query using code as follows:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "208 AKT2\n", "6833 ABCC8\n", "640 BLK\n", "1234 CCR5\n", "5611 DNAJC3\n", "169792 GLIS3\n", "6927 HNF1A\n", "6928 HNF1B\n", "8462 KLF11\n", "389692 MAFA\n", "5325 PLAGL1\n", "9882 TBC1D4\n", "346171 ZFP57\n", "26060 APPL1\n", "1636 ACE\n", "359 AQP2\n", "551 AVP\n", "554 AVPR2\n", "11132 CAPN10\n", "1056 CEL\n", "1493 CTLA4\n", "5167 ENPP1\n", "2056 EPO\n", "9451 EIF2AK3\n", "50943 FOXP3\n", "2642 GCGR\n", "2645 GCK\n", "2820 GPD2\n", "3172 HNF4A\n", "3159 HMGA1\n", "3077 HFE\n", "57061 HYMAI\n", "51124 IER3IP1\n", "3710 ITPR3\n", "3630 INS\n", "10644 IGF2BP2\n", "3643 INSR\n", "3667 IRS1\n", "8660 IRS2\n", "3557 IL1RN\n", "3559 IL2RA\n", "3569 IL6\n", "3990 LIPC\n", "4544 MTNR1B\n", "9479 MAPK8IP1\n", "4760 NEUROD1\n", "5078 PAX4\n", "3651 PDX1\n", "5444 PON1\n", "5468 PPARG\n", "3767 KCNJ11\n", "5506 PPP1R3A\n", "5770 PTPN1\n", "26191 PTPN22\n", "56729 RETN\n", "387082 SUMO4\n", "6514 SLC2A2\n", "169026 SLC30A8\n", "6648 SOD2\n", "6934 TCF7L2\n", "7422 VEGFA\n", "7466 WFS1\n" ] } ], "source": [ "query2 = service.new_query(\"Gene\")\n", "query2.add_view(\"primaryIdentifier\", \"symbol\")\n", "query2.add_constraint(\"organism.name\", \"=\", \"Homo sapiens\", code = \"A\")\n", "query2.add_constraint(\"diseases.name\", \"CONTAINS\", \"diabetes\", code = \"B\")\n", "\n", "for row in query2.rows():\n", " print (row[\"primaryIdentifier\"], row[\"symbol\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and save the set of genes returned as a list:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "diabetesGenes = list()\n", "for row in query2.rows():\n", " diabetesGenes.append(row[\"primaryIdentifier\"])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm=service.list_manager()\n", "lm.delete_lists([\"diabetesGenes\"])\n", "lm.create_list(content=diabetesGenes, list_type=\"Gene\", name=\"diabetesGenes\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we used a list intersect to find those genes that are upregulated in the pancreas that are also associated with the disease diabetes. We need to intersect the first (UpinPancreas) and second (diabetesGenes) lists that we created. We can do this using the intersect method from the ListManager class." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm.delete_lists([\"intersectedList\"])\n", "lm.intersect([\"UpinPancreas\", \"diabetesGenes\"], \"intersectedList\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "intersectedList = lm.get_list(\"intersectedList\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "intersectedList (3 Gene) 2020-03-26T20:31:20+0000 Intersection of UpinPancreas and diabetesGenes\n" ] } ], "source": [ "print(intersectedList)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final Query: GWAS \n", "\n", "Finally, we fed the intersected list from above back into another query to see if there was any association of these genes with diabetes phenotypes according to GWAS studies. Note that we now start our query from the GWAS class:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "query = service.new_query(\"GWAS\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_view(\n", " \"results.associatedGenes.primaryIdentifier\",\n", " \"results.associatedGenes.symbol\", \"results.associatedGenes.name\",\n", " \"results.SNP.primaryIdentifier\", \"results.pValue\", \"results.phenotype\",\n", " \"firstAuthor\", \"name\", \"publication.pubMedId\",\n", " \"results.associatedGenes.organism.shortName\"\n", ")\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "query.add_constraint(\"results.pValue\", \"<=\", \"1e-04\", code = \"B\")\n", "query.add_constraint(\"results.phenotype\", \"CONTAINS\", \"diabetes\", code = \"C\")\n", "query.add_constraint(\"results.associatedGenes\", \"IN\", \"intersectedList\", code = \"D\")" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6934 TCF7L2 transcription factor 7 like 2 rs386418874 6e-11 Type 2 diabetes Adeyemo AA ZRANB3 is an African-specific type 2 diabetes locus associated with beta-cell mass and insulin response. 31324766 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs34872471 1e-94 Type 2 diabetes Bonas-Guarch S Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. 29358691 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 5e-13 Type 2 diabetes Chen J Genome-wide association study of type 2 diabetes in Africa. 31049640 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs34872471 6e-53 Type 2 diabetes Cook JP Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility. 27189021 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs34872471 8e-08 Type 2 diabetes Ghassibe-Sabbagh M T2DM GWAS in the Lebanese population confirms the role of TCF7L2 and CDKAL1 in disease susceptibility. 25483131 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-11 Type 2 diabetes Hackinger S Evidence for genetic contribution to the increased risk of type 2 diabetes in schizophrenia. 30470734 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-09 Schizophrenia vs type 2 diabetes Hackinger S Evidence for genetic contribution to the increased risk of type 2 diabetes in schizophrenia. 30470734 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-15 Type 2 diabetes Hara K Genome-wide association study identifies three novel loci for type 2 diabetes. 23945395 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs34872471 3e-23 Type 2 diabetes Imamura M Genome-wide association studies in the Japanese population identify seven novel loci for type 2 diabetes. 26818947 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-15 Type 2 diabetes Kho AN Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. 22101970 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 2e-11 Type 2 diabetes Kichaev G Leveraging Polygenic Functional Enrichment to Improve GWAS Power. 30595370 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs117229942 2e-11 Type 2 diabetes Kichaev G Leveraging Polygenic Functional Enrichment to Improve GWAS Power. 30595370 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs4918796 4e-09 Type 2 diabetes Kichaev G Leveraging Polygenic Functional Enrichment to Improve GWAS Power. 30595370 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-203 Type 2 diabetes Kichaev G Leveraging Polygenic Functional Enrichment to Improve GWAS Power. 30595370 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs4812829 3e-10 Type 2 diabetes Kooner JS Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. 21874001 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7901695 1e-06 Type 2 diabetes Lettre G Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe Project. 21347282 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs4812829 5e-08 Type 2 diabetes Mahajan A Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. 24509480 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 8e-75 Type 2 diabetes Mahajan A Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. 24509480 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 3e-08 Type 2 diabetes Mahajan A Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. 29632382 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 5e-08 Type 2 diabetes (adjusted for BMI) Mahajan A Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. 29632382 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 7e-07 Type 2 diabetes (adjusted for BMI) Mahajan A Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. 29632382 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 3e-06 Type 2 diabetes Mahajan A Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. 29632382 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-139 Type 2 diabetes Morris AP Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. 22885922 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 6e-92 Type 2 diabetes Morris AP Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. 22885922 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 4e-73 Type 2 diabetes Morris AP Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. 22885922 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 4e-94 Type 2 diabetes Ng MC Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. 25102180 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 5e-44 Type 2 diabetes Ng MC Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. 25102180 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-40 Type 2 diabetes Perry JR Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases. 22693455 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 4e-21 Type 2 diabetes Perry JR Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases. 22693455 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-11 Type 2 diabetes Qi Q Genetics of Type 2 Diabetes in U.S. Hispanic/Latino Individuals: Results From the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). 28254843 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-30 Type 2 diabetes Rung J Genetic variant near IRS1 is associated with type 2 diabetes, insulin resistance and hyperinsulinemia. 19734900 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 5e-08 Type 2 diabetes Salonen JT Type 2 diabetes whole-genome association study in four populations: the DiaGen consortium. 17668382 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-48 Type 2 diabetes Saxena R Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. 17463246 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 9e-75 Type 2 diabetes Saxena R Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. 23300278 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-38 Type 2 diabetes Saxena R Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. 23300278 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-35 Type 2 diabetes Saxena R Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. 23300278 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 6e-22 Type 2 diabetes Saxena R Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. 23300278 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-19 Type 2 diabetes Saxena R Genome-wide association study identifies a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin from India. 23300278 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-08 Type 2 diabetes Scott LJ A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. 17463248 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-34 Type 2 diabetes Sladek R A genome-wide association study identifies novel risk loci for type 2 diabetes. 17293876 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-10 Type 2 diabetes Steinthorsdottir V A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. 17460697 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs16988991 4e-09 Type 2 diabetes Suzuki K Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. 30718926 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7901695 3e-49 Type 2 diabetes Suzuki K Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. 30718926 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-35 Type 2 diabetes Tabassum R Genome-wide association study for type 2 diabetes in Indians identifies a new susceptibility locus at 2q21. 23209189 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 8e-12 Type 2 diabetes Takeuchi F Confirmation of multiple risk Loci and genetic impacts by a genome-wide association study of type 2 diabetes in the Japanese population. 19401414 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 9e-30 Type 2 diabetes Timpson NJ Adiposity-related heterogeneity in patterns of type 2 diabetes susceptibility observed in genome-wide association data. 19056611 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 6e-16 Type 2 diabetes Timpson NJ Adiposity-related heterogeneity in patterns of type 2 diabetes susceptibility observed in genome-wide association data. 19056611 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-51 Type 2 diabetes Voight BF Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. 20581827 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs4506565 5e-12 Type 2 diabetes Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. 17554300 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-14 Type 2 diabetes Williams AL Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. 24390345 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 7e-45 Type 2 diabetes Wood AR Variants in the FTO and CDKAL1 loci have recessive effects on risk of obesity and type 2 diabetes, respectively. 26961502 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs1800961 1e-12 Medication use (drugs used in diabetes) Wu Y Genome-wide association study of medication-use and associated disease in the UK Biobank. 31015401 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs61123794 2e-08 Medication use (drugs used in diabetes) Wu Y Genome-wide association study of medication-use and associated disease in the UK Biobank. 31015401 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 4e-145 Medication use (drugs used in diabetes) Wu Y Genome-wide association study of medication-use and associated disease in the UK Biobank. 31015401 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs117229942 4e-11 Type 2 diabetes Xue A Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. 30054458 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs4918796 4e-13 Type 2 diabetes Xue A Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. 30054458 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 0 Type 2 diabetes Xue A Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. 30054458 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-23 Type 2 diabetes Zeggini E Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. 18372903 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7901695 1e-48 Type 2 diabetes Zeggini E Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. 17463249 H. sapiens\n", "3172 HNF4A hepatocyte nuclear factor 4 alpha rs4812829 4e-10 Type 2 diabetes Zhao W Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. 28869590 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 1e-219 Type 2 diabetes Zhao W Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. 28869590 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 2e-184 Type 2 diabetes Zhao W Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. 28869590 H. sapiens\n", "6934 TCF7L2 transcription factor 7 like 2 rs7903146 3e-23 Type 2 diabetes Zhao W Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with coronary heart disease. 28869590 H. sapiens\n" ] } ], "source": [ "for row in query.rows():\n", " print(row[\"results.associatedGenes.primaryIdentifier\"], row[\"results.associatedGenes.symbol\"], \\\n", " row[\"results.associatedGenes.name\"], row[\"results.SNP.primaryIdentifier\"], \\\n", " row[\"results.pValue\"], row[\"results.phenotype\"], row[\"firstAuthor\"], row[\"name\"], \\\n", " row[\"publication.pubMedId\"], row[\"results.associatedGenes.organism.shortName\"])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "HNF4A\n", "HNF4A\n", "HNF4A\n", "HNF4A\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n", "HNF4A\n", "TCF7L2\n", "TCF7L2\n", "TCF7L2\n" ] } ], "source": [ "for row in query.rows():\n", " print(row[\"results.associatedGenes.symbol\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }