{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Exercises 3 – answers\n\n1\\. Write a script to lookup the gene called *ESPN* in human and print the stable ID of this gene."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import requests, sys, json\nfrom pprint import pprint\n\ndef fetch_endpoint(server, request, content_type):\n\n    r = requests.get(server+request, headers={ \"Accept\" : content_type})\n\n    if not r.ok:\n        r.raise_for_status()\n        sys.exit()\n\n    if content_type == 'application/json':\n        return r.json()\n    else:\n        return r.text\n\n# Get the gene name from the command line\ngene_name = \"ESPN\"\n\n# define the general URL parameters\nserver = \"http://rest.ensembl.org/\"\ncon = \"application/json\"\next_get_lookup = \"lookup/symbol/homo_sapiens/\" + gene_name + \"?\"\n\n# submit the query\nget_lookup = fetch_endpoint(server, ext_get_lookup, con)\n\nprint (get_lookup['id'])",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "2\\. Get all variants that are associated with the phenotype 'Coffee consumption'. For each variant print:\n\n   a. the p-value for the association\n   \n   b. the PMID for the publication which describes the association between that variant and ‘Coffee consumption’\n   \n   c. the risk allele and the associated gene."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import requests, sys, json\nfrom pprint import pprint\n\ndef fetch_endpoint(server, request, content_type):\n\n    r = requests.get(server+request, headers={ \"Accept\" : content_type})\n\n    if not r.ok:\n        r.raise_for_status()\n        sys.exit()\n\n    if content_type == 'application/json':\n        return r.json()\n    else:\n        return r.text\n\nprint (\"Variant\\tp-value\\tPub-med ID\\tRisk allele\\tGene\")\n\n# define the general URL parameters\nserver = \"http://rest.ensembl.org/\"\next_phen = \"/phenotype/term/homo_sapiens/coffee consumption?\"\ncon = \"application/json\"\n\n# submit the query\nget_phen = fetch_endpoint(server, ext_phen, con)\n\nfor variant in get_phen:\n    id = variant['Variation']\n    pv = str(variant['attributes'].get('p_value'))\n    pmid = variant['attributes']['external_reference']\n    risk = str(variant['attributes'].get('risk_allele'))\n    gene = variant['attributes']['associated_gene']\n \n    print (id + \"\\t\" + pv + \"\\t\" + pmid + \"\\t\" + risk + \"\\t\" + gene)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "3\\. Get the mouse homologue of the human BRCA2 and print the ID and the aligned sequence of both.\n\nNote that the JSON for the endpoint you need is several layers deep, containing nested lists (appear as square brackets [ ] in the JSON) and key value sets (appear as curly brackets { } in the JSON). Pretty print (pprint) comes in very useful here for the intermediate stage when you're trying to work out the json."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import requests, sys, json\nfrom pprint import pprint\n\ndef fetch_endpoint(server, request, content_type):\n\n    r = requests.get(server+request, headers={ \"Accept\" : content_type})\n\n    if not r.ok:\n        r.raise_for_status()\n        sys.exit()\n\n    if content_type == 'application/json':\n        return r.json()\n    else:\n        return r.text\n\ngene = \"BRCA2\"\n\n# define the general URL parameters\nserver = \"http://rest.ensembl.org/\"\next_hom = \"homology/symbol/human/\" + gene + \"?target_species=mouse\"\ncon = \"application/json\"\n\nget_hom = fetch_endpoint(server, ext_hom, con)\n\nfor datum in get_hom['data']:\n    for homology in datum['homologies']:\n        source_id = homology['source']['id']\n        source_species = homology['source']['species']\n        source_seq = homology['source']['align_seq']\n        target_id = homology['target']['id']\n        target_seq = homology['target']['align_seq']\n        target_species = homology['target']['species']\n        \n        print (\">\", source_id + \" \" + source_species + \"\\n\" + source_seq + \"\\n>\", target_id + \" \" + target_species + \"\\n\" + target_seq)",
      "execution_count": null,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "![3.3_Python.png](http://ftp.ebi.ac.uk/pub/databases/ensembl/training/images_for_REST/3.3_Python.png)"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "[Next page: Other content types](4_Other_content_types.ipynb)"
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "mimetype": "text/x-python",
      "nbconvert_exporter": "python",
      "name": "python",
      "file_extension": ".py",
      "version": "3.5.4",
      "pygments_lexer": "ipython3",
      "codemirror_mode": {
        "version": 3,
        "name": "ipython"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}