{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieve study data from Metabolomics Workbench using REST API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import Python modules..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "\n",
    "import os\n",
    "import sys\n",
    "import time\n",
    "import re\n",
    "\n",
    "import requests\n",
    "\n",
    "from IPython import __version__ as ipyVersion\n",
    "\n",
    "print(\"Python: %s.%s.%s\" % sys.version_info[:3])\n",
    "print(\"IPython: %s\" % ipyVersion)\n",
    "\n",
    "print()\n",
    "print(time.asctime())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The URL PATH**\n",
    "\n",
    "The MW REST URL consists of three main parts, separated by forward slashes, after the common prefix specifying the invariant base URL (https://www.metabolomicsworkbench.org/rest/):\n",
    "\n",
    "https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification\n",
    "    \n",
    "Part 1: The context determines the type of data to be accessed from the Metabolomics Workbench, such as metadata or results related to the submitted studies, data from metabolites, genes/proteins and analytical chemistry databases as well as other services related to mass spectrometry and metabolite identification:\n",
    "\n",
    "context = study | compound | refmet | gene | protein | moverz | exactmass\n",
    "\n",
    "Part 2: The input specification consists of two required parameters describing the REST request:\n",
    "\n",
    "input_specification = input_item/input_value\n",
    "\n",
    "Part 3: The output specification consists of two parameters describing the output generated by the REST request:\n",
    "\n",
    "output_specification = output_item/(output_format)\n",
    "\n",
    "The first parameter is required in most cases. The second parameter is optional. The input and output specifications are context sensitive. The context determines the values allowed for the remaining parameters in the input and output specifications as detailed in the sections below.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup MW REST base URL..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWBaseURL = \"https://www.metabolomicsworkbench.org/rest\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The “study” context**\n",
    "\n",
    "The \"study\" context refers to the studies available in the Metabolomics Workbench (www.metabolomicsworkbench.org), a public repository for metabolomics metadata and experimental data spanning various species and experimental platforms, metabolite standards, metabolite structures, protocols, tutorials and training material, and other educational resources. It provides a computational platform to integrate, analyze, track, deposit, and disseminate large volumes of heterogeneous data from a wide variety of metabolomics studies including Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) spectrometry data spanning a variety of species covering all the major taxonomic categories including humans and other mammals, plants, insects, invertebrates, and microorganisms. This context provides access to a variety of data associated with studies such as study summary, experimental factors for study design, analysis information, metabolites and results data, sample source and species etc.\n",
    "\n",
    "context = study\n",
    "\n",
    "input_item = study_id | study_title | institute | last_name | analysis_id | metabolite_id\n",
    "\n",
    "input_value = input_item_value\n",
    "\n",
    "output_item = summary | factors | analysis | metabolites | mwtab | source | species | disease | number_of_metabolites | data | datatable | untarg_studies | untarg_factors | untarg_data\n",
    "\n",
    "output_format = txt | json (Default: json)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process study data for in JSON format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup REST URL to retrieve data for a study ID..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/study/study_id/ST000001/summary/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request using \"request\" module..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check \"request\" status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Process JSON results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAvailable data for a study summary:\\n\")\n",
    "\n",
    "Results = Response.json()\n",
    "\n",
    "for ResultType in Results:\n",
    "    ResultValue = Results[ResultType]\n",
    "    print(\"%s: %s\" % (ResultType, ResultValue))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process study data for multiple studies in JSON format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A substring may be specified as a study ID to retrieve data for multiple studies. The specified study ID substring is matched against all study IDs to retrieve data for matched studies. The study ID substring \"ST\" matches against all available studies.\n",
    "\n",
    "Setup REST URL to retrieve data for studies ST000010 to ST000019 by using ST00001 as study ID..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/study/study_id/ST00001/summary/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request and check the status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)\n",
    "\n",
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Process JSON results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAvailable data for studies:\\n\")\n",
    "\n",
    "Results = Response.json()\n",
    "\n",
    "StudiesCount = 0\n",
    "for ResultNum in Results:\n",
    "    StudiesCount += 1\n",
    "    print(\"\\nResultNum: %s\\n\" % ResultNum)\n",
    "    \n",
    "    for ResultType in Results[ResultNum]:\n",
    "        ResultValue = Results[ResultNum][ResultType]    \n",
    "        print(\"%s: %s\" % (ResultType, ResultValue))\n",
    "\n",
    "print(\"\\nTotal number of studies matched: %d\" % StudiesCount)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process study data in text format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set up REST URL..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/study/study_id/ST000001/summary/txt\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request using \"request\" module..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check \"request\" status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Process text results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAvailable data for study summary:\\n\")\n",
    "\n",
    "Results = Response.text\n",
    "for Result in Results.split(\"\\n\"):\n",
    "    Words = Result.split(\"\\t\")\n",
    "    if len(Words) != 2:\n",
    "        continue\n",
    "    \n",
    "    ResultType, ResultValue = Result.split(\"\\t\")\n",
    "    print(\"%s: %s\" % (ResultType, ResultValue))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process data for multiple studies in text format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup REST URL to retrieve data for studies ST000010 to ST000019 by using ST00001 as study ID..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/study/study_id/ST00001/summary/txt\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request and check the status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)\n",
    "\n",
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Process text results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAvailable summary data for studies:\\n\")\n",
    "\n",
    "Results = Response.text\n",
    "\n",
    "StudiesCount = 0\n",
    "for Result in Results.split(\"\\n\"):\n",
    "    Words = Result.split(\"\\t\")\n",
    "    if len(Words) != 2:\n",
    "        print(\"\")\n",
    "        continue\n",
    "    \n",
    "    ResultType, ResultValue = Words\n",
    "    if re.match(\"^study_id$\", ResultType, re.I):\n",
    "        StudiesCount += 1\n",
    "    print(\"%s: %s\" % (ResultType, ResultValue))\n",
    "\n",
    "print(\"\\nTotal number of studies matched: %d\" % StudiesCount)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}