{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Perform m/z search using MW REST API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import Python modules..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "\n",
    "import os\n",
    "import sys\n",
    "import time\n",
    "import re\n",
    "from io import StringIO\n",
    "\n",
    "import requests\n",
    "import pandas as pd\n",
    "\n",
    "from IPython import __version__ as ipyVersion\n",
    "\n",
    "print(\"Python: %s.%s.%s\" % sys.version_info[:3])\n",
    "print(\"IPython: %s\" % ipyVersion)\n",
    "\n",
    "print()\n",
    "print(time.asctime())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The URL PATH**\n",
    "\n",
    "The MW REST URL consists of three main parts, separated by forward slashes, after the common prefix specifying the invariant base URL (https://www.metabolomicsworkbench.org/rest/):\n",
    "\n",
    "https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification\n",
    "    \n",
    "Part 1: The context determines the type of data to be accessed from the Metabolomics Workbench, such as metadata or results related to the submitted studies, data from metabolites, genes/proteins and analytical chemistry databases as well as other services related to mass spectrometry and metabolite identification:\n",
    "\n",
    "context = study | compound | refmet | gene | protein | moverz | exactmass\n",
    "\n",
    "Part 2: The input specification consists of two required parameters describing the REST request:\n",
    "\n",
    "input_specification = input_item/input_value\n",
    "\n",
    "Part 3: The output specification consists of two parameters describing the output generated by the REST request:\n",
    "\n",
    "output_specification = output_item/(output_format)\n",
    "\n",
    "The first parameter is required in most cases. The second parameter is optional. The input and output specifications are context sensitive. The context determines the values allowed for the remaining parameters in the input and output specifications as detailed in the sections below.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup MW REST base URL..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWBaseURL = \"https://www.metabolomicsworkbench.org/rest\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**The “moverz” context**\n",
    "\n",
    "The \"moverz\" context refers to performing a m/z search against the LIPIDS (a database of ~30,000 computationally generated “bulk” lipid species), MB (the Metabolomics Workbench database of ~64,000 exact structures), or REFMET (a database of ~30,000 standardized names which includes both exact structures and bulk species detected by MS or NMR) databases by specifying an appropriate m/z value, ion type(adduct) and mass tolerance range.\n",
    "\n",
    "context = moverz\n",
    "\n",
    "input_item = LIPIDS | MB | REFMET\n",
    "\n",
    "input_value1 = m/z_value\n",
    "\n",
    "input_value2 = ion_type_value\n",
    "\n",
    "input_value3 = m/z_tolerance_value\n",
    "\n",
    "output_format = txt\n",
    "\n",
    "The following ion types (adducts) are currently supported: M+H, M+H-H2O, M+2H, M+3H, M+4H, M+K, M+2K, M+Na, M+2Na, M+Li, M+2Li, M+NH4, M+H+CH3CN, M+Na+CH3CN, M.NaFormate+H, M.NH4Formate+H, M.CH3, M.TMSi, M.tBuDMSi, M-H, M-H-H2O, M+Na-2H, M+K-2H, M-2H, M-3H, M-4H, M.Cl, M.F, M.HF2, M.OAc, M.Formate, M.NaFormate-H, M.NH4Formate-H, Neutral.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Retrieve and process data from a  m/z search in text format**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Setup REST URL to perform MS precursor ion search on Metabolomics Workbench database with m/z 635.52, ion-type M+H and mass tolerance of 0.5 ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MWDataURL = MWBaseURL + \"/moverz/MB/635.52/M+H/0.5/txt\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute REST request using \"request\" module..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Initiating request: %s\" % MWDataURL)\n",
    "    \n",
    "Response = requests.get(MWDataURL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check \"request\" status..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nStatus Code: %d\" % (Response.status_code))\n",
    "\n",
    "if Response.status_code != 200:\n",
    "    print(\"Request failed: status_code: %d\" % Response.status_code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Proess search results..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\nAvailable data from a m/z search:\\n\")\n",
    "\n",
    "Results = Response.text\n",
    "\n",
    "LineCount = 0\n",
    "FilteredResultsLines = []\n",
    "for Result in Results.split(\"\\n\"):\n",
    "    if len(Result) == 0:\n",
    "        continue\n",
    "    if re.search(\"pre>\", Result, re.I):\n",
    "        continue\n",
    "    \n",
    "    LineCount +=1 \n",
    "    FilteredResultsLines.append(Result)\n",
    "    \n",
    "    print(\"%s\" % Result)\n",
    "    \n",
    "print(\"\\nTotal number of matches: %d\" % (LineCount - 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Load text data into pandas data frame...**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "FILTEREDRESULTSDATA = StringIO(\"\\n\".join(FilteredResultsLines))\n",
    "\n",
    "DataFrame = pd.read_csv(FILTEREDRESULTSDATA, sep=\"\\t\", index_col = False)\n",
    "\n",
    "DataFrame"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}