{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieve protein data from Metabolomics Workbench using REST API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import Python modules.."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "\n",
    "import os\n",
    "import sys\n",
    "import time\n",
    "import re\n",
    "\n",
    "import requests\n",
    "\n",
    "from IPython import __version__ as ipyVersion\n",
    "\n",
    "print(\"Python: %s.%s.%s\" % sys.version_info[:3])\n",
    "print(\"IPython: %s\" % ipyVersion)\n",
    "\n",
    "print()\n",
    "print(time.asctime())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The URL PATH**\n",
    "\n",
    "The MW REST URL consists of three main parts, separated by forward slashes, after the common prefix specifying the invariant base URL (https://www.metabolomicsworkbench.org/rest/):\n",
    "\n",
    "https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification\n",
    "    \n",
    "Part 1: The context determines the type of data to be accessed from the Metabolomics Workbench, such as metadata or results related to the submitted studies, data from metabolites, genes/proteins and analytical chemistry databases as well as other services related to mass spectrometry and metabolite identification:\n",
    "\n",
    "context = study | compound | refmet | gene | protein | moverz | exactmass\n",
    "\n",
    "Part 2: The input specification consists of two required parameters describing the REST request:\n",
    "\n",
    "input_specification = input_item/input_value\n",
    "\n",
    "Part 3: The output specification consists of two parameters describing the output generated by the REST request:\n",
    "\n",
    "output_specification = output_item/(output_format)\n",
    "\n",
    "The first parameter is required in most cases. The second parameter is optional. The input and output specifications are context sensitive. The context determines the values allowed for the remaining parameters in the input and output specifications as detailed in the sections below.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup MW REST base URL..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWBaseURL = \"https://www.metabolomicsworkbench.org/rest\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The “protein” context**\n",
    "\n",
    "The “protein” context refers to a Human Metabolome Gene/Protein Database (MGP) of metabolome-related genes and proteins contains data for over 7300 genes and over 15,500 proteins. In addition to gene information, it provides access to protein related information such as MGP ID, various protein IDs, protein name, protein sequence, etc.\n",
    "\n",
    "context = protein\n",
    "\n",
    "input item = mgp_id | gene_id | gene_name | gene_symbol | taxid | mrna_id | refseq_id | protein_gi | uniprot_id | protein_entry | protein_name\n",
    "\n",
    "input_value = input_item_value\n",
    "\n",
    "output_item = all | mgp_id | gene_id | gene_name | gene_symbol | taxid | species | species_long | mrna_id | refseq_id | protein_gi | uniprot_id | protein_entry | protein_name | seqlength | seq | is_identical_to | mgp_id,gene_id,gene_name,...\n",
    "\n",
    "output_format = txt | json\n",
    "\n",
    "The “all” output item is automatically expanded to include the following items: mgp_id, gene_id, gene_name, gene_symbol, taxid, species, species_long, mrna_id, refseq_id, protein_gi, uniprot_id, protein_entry, protein_name, seqlength, seq, is_identical_to.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process protein data in JSON format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup REST URL to retrieve all available data for a protein Uniprot ID..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/protein/uniprot_id/Q13085/all\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request using \"request\" module..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check \"request\" status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Process JSON results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAll available protein data using Uniprot ID:\\n\")\n",
    "\n",
    "Results = Response.json()\n",
    "\n",
    "Count = 0\n",
    "for ResultNum in Results:\n",
    "    Count += 1\n",
    "    print(\"\\nResultNum: %s\\n\" % ResultNum)\n",
    "    \n",
    "    for ResultType in Results[ResultNum]:\n",
    "        ResultValue = Results[ResultNum][ResultType]    \n",
    "        print(\"%s: %s\" % (ResultType, ResultValue))\n",
    "\n",
    "print(\"\\nTotal number of proteins matched: %d\" % Count)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Retrieve and process protein data in text format**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Retrieve and process data for protein using Entrez gene id..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/protein/gene_id/19/all/txt\"\n",
    "\n",
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)\n",
    "\n",
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)\n",
    "\n",
    "print(\"\\nAll available protein data using gene_id:\\n\")\n",
    "\n",
    "Results = Response.text\n",
    "for Result in Results.split(\"\\n\"):\n",
    "    Words = Result.split(\"\\t\")\n",
    "    if len(Words) != 2:\n",
    "        continue\n",
    "    \n",
    "    ResultType, ResultValue = Result.split(\"\\t\")\n",
    "    print(\"%s: %s\" % (ResultType, ResultValue))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}