{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve RefMet data from Metabolomics Workbench using REST API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import Python modules..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "\n", "import os\n", "import sys\n", "import time\n", "import re\n", "\n", "import requests\n", "\n", "from IPython import __version__ as ipyVersion\n", "\n", "print(\"Python: %s.%s.%s\" % sys.version_info[:3])\n", "print(\"IPython: %s\" % ipyVersion)\n", "\n", "print()\n", "print(time.asctime())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The URL PATH**\n", "\n", "The MW REST URL consists of three main parts, separated by forward slashes, after the common prefix specifying the invariant base URL (https://www.metabolomicsworkbench.org/rest/):\n", "\n", "https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification\n", " \n", "Part 1: The context determines the type of data to be accessed from the Metabolomics Workbench, such as metadata or results related to the submitted studies, data from metabolites, genes/proteins and analytical chemistry databases as well as other services related to mass spectrometry and metabolite identification:\n", "\n", "context = study | compound | refmet | gene | protein | moverz | exactmass\n", "\n", "Part 2: The input specification consists of two required parameters describing the REST request:\n", "\n", "input_specification = input_item/input_value\n", "\n", "Part 3: The output specification consists of two parameters describing the output generated by the REST request:\n", "\n", "output_specification = output_item/(output_format)\n", "\n", "The first parameter is required in most cases. The second parameter is optional. The input and output specifications are context sensitive. The context determines the values allowed for the remaining parameters in the input and output specifications as detailed in the sections below.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup MW REST base URL..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MWBaseURL = \"https://www.metabolomicsworkbench.org/rest\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The “refmet” context**\n", "\n", "The “refmet” context refers to a standardized reference nomenclature for both discrete metabolite structures and metabolite species identified by spectroscopic techniques in metabolomics experiments. This is an essential prerequisite for the ability to compare and contrast metabolite data across different experiments and studies. The use of identifiers such as PubChem compound IDs and InChiKeys offers only a partial solution because these identifiers will vary depending on parameters such as the salt form and degree of stereochemical detail. In addition, many metabolite species, especially lipids, are not reported by MS methods as discrete structures but rather as isobaric mixtures (such as PC(34:1) and TG(54:2)). To this end, a list of over 160,000 names from a set of over 800 MS and NMR studies on the Metabolomics Workbench has been used as a starting point to generate a highly curated analytical chemistry-centric list of common names for metabolite structures and isobaric species. Additionally, the vast majority of these names have been linked to a metabolite classification system using a combination of LIPID MAPS and ClassyFire classification methods. A name-conversion user interface is provided where users can submit a list of metabolite names and map them to the corresponding Refmet names. This is a work-in-progress with the caveat that many metabolite names generated by metabolomics experiments will not currently map to RefMet identifiers. Nevertheless, RefMet has the ability to greatly increase the data-sharing potential of metabolomics experiments and facilitate \"meta-analysis\" and systems biology objectives for the majority of commonly encountered metabolite species.\n", "\n", "This context provides access to many structural features including InChIKey, exact mass, formula common and systematic names, chemical classification and cross-references to other database\n", "\n", "context = refmet\n", "\n", "input_item = all | match | name | inchi_key | regno | pubchem_cid | formula | main_class | sub_class\n", "\n", "input_value = input_item_value\n", "\n", "output_item = all | name | inchi_key | regno | pubchem_cid | exactmass | formula | synonyms | sys_name | main_class | sub_class | name,inchi_key,regno,...\n", "\n", "output_format = txt | json\n", "\n", "The “all” output item is automatically expanded to include the following items: name, regno, pubchem_cid, inchi_key, exactmass, formula, sys_name, main_class, sub_class, synonyms. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Retrieve and process RefMet data for compounds in JSON format**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup REST URL to retrieve all available RefMet data for Cholesterol..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MWDataURL = MWBaseURL + \"/refmet/name/Cholesterol/all\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Execute REST request using \"request\" module..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Initiating request: %s\" % MWDataURL)\n", " \n", "Response = requests.get(MWDataURL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check \"request\" status..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"\\nStatus Code: %d\" % (Response.status_code))\n", "\n", "if Response.status_code != 200:\n", " print(\"Request failed: status_code: %d\" % Response.status_code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Process JSON results..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"\\nAvailable RefMet data for a compound using name:\\n\")\n", "\n", "Results = Response.json()\n", "\n", "for ResultType in Results:\n", " ResultValue = Results[ResultType]\n", " print(\"%s: %s\" % (ResultType, ResultValue))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Retreive and process all available RefMet data for all compounds..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MWDataURL = MWBaseURL + \"/refmet/all\"\n", "\n", "print(\"Initiating request: %s\" % MWDataURL)\n", " \n", "Response = requests.get(MWDataURL)\n", "\n", "print(\"\\nStatus Code: %d\" % (Response.status_code))\n", "\n", "if Response.status_code != 200:\n", " print(\"Request failed: status_code: %d\" % Response.status_code)\n", "\n", "print(\"\\nAll available RefMet data for all compounds:\\n\")\n", "\n", "Results = Response.json()\n", "\n", "CmpdsCount = 0\n", "ListCmpdsCount = 50\n", "\n", "for ResultNum in Results:\n", " CmpdsCount += 1\n", " if CmpdsCount > ListCmpdsCount:\n", " continue\n", " \n", " print(\"\\nResultNum: %s\\n\" % ResultNum)\n", " for ResultType in Results[ResultNum]:\n", " ResultValue = Results[ResultNum][ResultType] \n", " print(\"%s: %s\" % (ResultType, ResultValue))\n", "\n", "\n", "print(\"\\nTotal number of compounds: %d\" % CmpdsCount)\n", "print(\"Number of compounds listed: %d\" % ListCmpdsCount)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Retrieve and process RefMet data for compounds in text format**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Retreive and process all available RefMet data for all compounds..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MWDataURL = MWBaseURL + \"/refmet/all/xx/all/txt\"\n", "\n", "print(\"Initiating request: %s\" % MWDataURL)\n", " \n", "Response = requests.get(MWDataURL)\n", "\n", "print(\"\\nStatus Code: %d\" % (Response.status_code))\n", "\n", "if Response.status_code != 200:\n", " print(\"Request failed: status_code: %d\" % Response.status_code)\n", "\n", "print(\"\\nAll available RefMet data for all compounds:\\n\")\n", "\n", "Results = Response.text\n", "\n", "CmpdsCount = 0\n", "ListCmpdsCount = 40\n", "for Result in Results.split(\"\\n\"):\n", " Words = Result.split(\"\\t\")\n", " if len(Words) != 2:\n", " continue\n", " \n", " if re.match(\"^name$\", ResultType, re.I):\n", " CmpdsCount += 1\n", " \n", " if CmpdsCount > ListCmpdsCount:\n", " continue\n", " \n", " ResultType, ResultValue = Words\n", " if re.match(\"^name$\", ResultType, re.I):\n", " if CmpdsCount < ListCmpdsCount:\n", " print(\"\\n\")\n", " \n", " print(\"%s: %s\" % (ResultType, ResultValue))\n", "\n", "print(\"\\nTotal number of compounds: %d\" % CmpdsCount)\n", "print(\"Number of compounds listed: %d\" % ListCmpdsCount)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }