{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intermine-Python: Tutorial 1: The Basics of a Query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the first intermine-python tutorial. Over a series of approximately 12 tutorials, we will go through the basics of writing code in Python that allows us to query the intermine database. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will tell you about the basics of intermine-python queries and how to write your first query. To get started, you would want to `pip install intermine` in terminal first. Once you have installed the package, you are good to go! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by importing the Service class from InterMine's webservice module. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from intermine.webservice import Service" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Service class has a method called \"new_query\" that creates a query object:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "service = Service(\"https://www.flymine.org/flymine/service\")\n", "query=service.new_query()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A query object defines what we want to extract from the InterMine database. The first part of a query is referred to as the \"views\". The views define the output columns that we want in our result. Let's query the FlyMine database to extract the symbol, primaryIdentifier and length of all genes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {}, "execution_count": 3 } ], "source": [ "query.select(\"Gene.symbol\",\"Gene.primaryIdentifier\", \"Gene.length\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have added the output columns to our query request we can print the results of our query." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "Gene: symbol='0610005C13Rik' primaryIdentifier='MGI:1918911' length=None\nGene: symbol='0610006L08Rik' primaryIdentifier='MGI:1923503' length=None\nGene: symbol='0610008J02Rik' primaryIdentifier='MGI:1925547' length=None\nGene: symbol='0610009B22Rik' primaryIdentifier='MGI:1913300' length=None\nGene: symbol='0610009E02Rik' primaryIdentifier='MGI:3698435' length=None\nGene: symbol='0610009F21Rik' primaryIdentifier='MGI:1918921' length=None\nGene: symbol='0610009K14Rik' primaryIdentifier='MGI:1918931' length=None\nGene: symbol='0610009L18Rik' primaryIdentifier='MGI:1914088' length=None\nGene: symbol='0610010F05Rik' primaryIdentifier='MGI:1918925' length=None\nGene: symbol='0610010K14Rik' primaryIdentifier='MGI:1915609' length=None\n" } ], "source": [ "for row in query.rows(start=0,size=10):\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The query can also be rewritten in the following way. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {}, "execution_count": 5 } ], "source": [ "query=service.new_query(\"Gene\")\n", "query.select(\"symbol\",\"primaryIdentifier\",\"length\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "Gene: symbol='0610005C13Rik' primaryIdentifier='MGI:1918911' length=None\nGene: symbol='0610006L08Rik' primaryIdentifier='MGI:1923503' length=None\nGene: symbol='0610008J02Rik' primaryIdentifier='MGI:1925547' length=None\nGene: symbol='0610009B22Rik' primaryIdentifier='MGI:1913300' length=None\nGene: symbol='0610009E02Rik' primaryIdentifier='MGI:3698435' length=None\nGene: symbol='0610009F21Rik' primaryIdentifier='MGI:1918921' length=None\nGene: symbol='0610009K14Rik' primaryIdentifier='MGI:1918931' length=None\nGene: symbol='0610009L18Rik' primaryIdentifier='MGI:1914088' length=None\nGene: symbol='0610010F05Rik' primaryIdentifier='MGI:1918925' length=None\nGene: symbol='0610010K14Rik' primaryIdentifier='MGI:1915609' length=None\n" } ], "source": [ "for row in query.rows(start=0,size=10):\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feel free to use whichever method you find more comfortable. Now, let us try to write a new query that returns all organisms in the database." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "query2=service.new_query()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {}, "execution_count": 8 } ], "source": [ "query2.select(\"Organism.name\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to add another column to our final output, instead of rewriting your query, you can use the add_view method." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {}, "execution_count": 9 } ], "source": [ "query2.add_view(\"Organism.taxonId\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "Organism: name='Anopheles gambiae' taxonId='7165'\nOrganism: name='Caenorhabditis elegans' taxonId='6239'\nOrganism: name='Danio rerio' taxonId='7955'\nOrganism: name='Drosophila ananassae' taxonId='7217'\nOrganism: name='Drosophila erecta' taxonId='7220'\nOrganism: name='Drosophila grimshawi' taxonId='7222'\nOrganism: name='Drosophila melanogaster' taxonId='7227'\nOrganism: name='Drosophila mojavensis' taxonId='7230'\nOrganism: name='Drosophila persimilis' taxonId='7234'\nOrganism: name='Drosophila pseudoobscura' taxonId='7237'\n" } ], "source": [ "for row in query2.rows(start=0,size=10):\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the result will be sorted according to the first column that you defined. If you want to change this sorting order to another column, use the add_sort_order method of the query class. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "" }, "metadata": {}, "execution_count": 11 } ], "source": [ "query2.add_sort_order(\"Organism.name\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "Organism: name='Anopheles gambiae' taxonId='7165'\nOrganism: name='Caenorhabditis elegans' taxonId='6239'\nOrganism: name='Danio rerio' taxonId='7955'\nOrganism: name='Drosophila ananassae' taxonId='7217'\nOrganism: name='Drosophila erecta' taxonId='7220'\nOrganism: name='Drosophila grimshawi' taxonId='7222'\nOrganism: name='Drosophila melanogaster' taxonId='7227'\nOrganism: name='Drosophila mojavensis' taxonId='7230'\nOrganism: name='Drosophila persimilis' taxonId='7234'\nOrganism: name='Drosophila pseudoobscura' taxonId='7237'\n" } ], "source": [ "for row in query2.rows(start=0,size=10):\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, I've limited the results to only 10 rows. You can change this number if you want to view more or less rows. The above queries will list all the organisms or all the genes in the database, and hence we limited the number of rows in our output. Views or output columns are one part of queries. The second part is to add constraints on these queries. We will take a look at adding constraints in our next tutorial. " ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1-final" }, "nbdime-conflicts": { "local_diff": [ { "diff": [ { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "Python 3" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "display_name", "op": "patch" }, { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "python3" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "name", "op": "patch" } ], "key": "kernelspec", "op": "patch" }, { "diff": [ { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "3.6.9" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "version", "op": "patch" } ], "key": "language_info", "op": "patch" } ], "remote_diff": [ { "diff": [ { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "Python 3.8.1 64-bit" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "display_name", "op": "patch" }, { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "python38164bit64f092a14f47420fadd99522ecd01b91" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "name", "op": "patch" } ], "key": "kernelspec", "op": "patch" }, { "diff": [ { "diff": [ { "key": 0, "op": "addrange", "valuelist": [ "3.8.1-final" ] }, { "key": 0, "length": 1, "op": "removerange" } ], "key": "version", "op": "patch" } ], "key": "language_info", "op": "patch" } ] } }, "nbformat": 4, "nbformat_minor": 1 }