{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve named metabolites data from Metabolomics Workbench using REST API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import Python modules..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "\n", "import os\n", "import sys\n", "import time\n", "import re\n", "from io import StringIO\n", "\n", "import requests\n", "import pandas as pd\n", "import ipywidgets as widgets\n", "\n", "from IPython.display import display, HTML\n", "from IPython import __version__ as ipyVersion\n", "\n", "# Import MW modules from the current directory or default Python directory...\n", "import MWUtil\n", "\n", "%matplotlib inline\n", "\n", "print(\"Python: %s.%s.%s\" % sys.version_info[:3])\n", "print(\"IPython: %s\" % ipyVersion)\n", "\n", "print()\n", "print(time.asctime())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The URL PATH\n", "\n", "The MW REST URL consists of three main parts, separated by forward slashes, after the common prefix specifying the invariant base URL (https://www.metabolomicsworkbench.org/rest/):\n", "\n", "https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification\n", "\n", "Part 1: The context determines the type of data to be accessed from the Metabolomics Workbench, such as metadata or results related to the submitted studies, data from metabolites, genes/proteins and analytical chemistry databases as well as other services related to mass spectrometry and metabolite identification:\n", "\n", "context = study | compound | refmet | gene | protein | moverz | exactmass\n", "\n", "Part 2: The input specification consists of two required parameters describing the REST request:\n", "\n", "input_specification = input_item/input_value\n", "\n", "Part 3: The output specification consists of two parameters describing the output generated by the REST request:\n", "\n", "output_specification = output_item/(output_format)\n", "\n", "The first parameter is required in most cases. The second parameter is optional. The input and output specifications are context sensitive. The context determines the values allowed for the remaining parameters in the input and output specifications as detailed in the sections below.\n", "\n", "Setup MW REST base URL..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MWBaseURL = \"https://www.metabolomicsworkbench.org/rest\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Retrieve and process results data for named metabolites...**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup utitlity functions to retrieve and process analysis and results data..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def RetrieveStudiesAnalysisAndResultsData(StudyID):\n", " \"\"\"Retrieve analysis and results data for a study or studies.\"\"\"\n", " \n", " MWDataURL = MWBaseURL + \"/study/study_id/\" + StudyID + \"/analysis/\"\n", " \n", " print(\"Initiating request: %s\" % MWDataURL)\n", " Response = requests.get(MWDataURL)\n", " if Response.status_code != 200:\n", " print(\"Request failed: status_code: %d\" % Response.status_code)\n", " return {}\n", "\n", " AnalysisData = Response.json()\n", "\n", " print(\"Processing analysis data...\")\n", " StudiesResultsData = ProcessAnalysisData(AnalysisData)\n", "\n", " for StudyID in StudiesResultsData:\n", " for AnalysisID in StudiesResultsData[StudyID]:\n", " print(\"\\nRetrieving datatable for analysis ID, %s, in study ID, %s...\" % (AnalysisID, StudyID))\n", " \n", " MWDataURL = MWBaseURL + \"/study/analysis_id/\" + AnalysisID + \"/datatable\"\n", " \n", " print(\"Initiating request: %s\" % MWDataURL)\n", " Response = requests.get(MWDataURL)\n", " if Response.status_code != 200:\n", " print(\"Request failed: status_code: %d\" % Response.status_code)\n", " continue\n", " \n", " print(\"Processing datatable text...\")\n", " ResultsDataTable = ProcessDataTableText(Response.text, AddClassNum = True)\n", " \n", " print(\"Setting up Pandas dataframe...\")\n", " RESULTSDATATABLE = StringIO(ResultsDataTable)\n", " StudiesResultsData[StudyID][AnalysisID][\"data_frame\"] = pd.read_csv(RESULTSDATATABLE, sep=\"\\t\", index_col = False)\n", " \n", " return StudiesResultsData\n", "\n", "def ProcessAnalysisData(AnalysisData):\n", " \"\"\"Process analysis data retrieved in JSON format for a study or set of studies\"\"\"\n", "\n", " StudiesResultsData = {}\n", " \n", " if \"study_id\" in AnalysisData:\n", " # Turn single study with single analysis data set into dictionary\n", " # with multiple studies/analysis data set...\n", " AnalysisData = {\"1\" : AnalysisData}\n", " \n", " for DataSetNum in AnalysisData:\n", " StudyID = AnalysisData[DataSetNum][\"study_id\"]\n", " AnalysisID = AnalysisData[DataSetNum][\"analysis_id\"]\n", " \n", " # Intialize data...\n", " if StudyID not in StudiesResultsData:\n", " StudiesResultsData[StudyID] = {}\n", " \n", " StudiesResultsData[StudyID][AnalysisID] = {}\n", " \n", " # Track data...\n", " for DataType in AnalysisData[DataSetNum]: \n", " DataValue = AnalysisData[DataSetNum][DataType] \n", " if re.match(\"^(study_id|analysis_id)$\", DataType, re.I):\n", " continue\n", " \n", " StudiesResultsData[StudyID][AnalysisID][DataType] = DataValue\n", " \n", " return StudiesResultsData\n", "\n", "\n", "def ProcessDataTableText(DataTableText, AddClassNum = True):\n", " \"\"\"Process datatable retrieved retrieves in text format for a specific analysis ID\"\"\"\n", " \n", " DataLines = []\n", " \n", " TextLines = DataTableText.split(\"\\n\")\n", " \n", " # Process data labels...\n", " LineWords = TextLines[0].split(\"\\t\")\n", " \n", " DataLabels = []\n", " DataLabels.append(LineWords[0])\n", " DataLabels.append(LineWords[1])\n", " if AddClassNum:\n", " DataLabels.append(\"ClassNum\")\n", " \n", " for Index in range(2, len(LineWords)):\n", " DataLabels.append(LineWords[Index])\n", " \n", " DataLines.append(\"\\t\".join(DataLabels))\n", " \n", " # Process data...\n", " ClassNamesMap = {}\n", " ClassNum = 0\n", " for Index in range(1, len(TextLines)):\n", " LineWords = TextLines[Index].split(\"\\t\")\n", " \n", " if len(LineWords) <= 2:\n", " continue\n", " \n", " # Handle sample ID and class name...\n", " DataLine = []\n", " DataLine.append(LineWords[0])\n", " DataLine.append(LineWords[1])\n", " \n", " if AddClassNum:\n", " ClassName = LineWords[1]\n", " if ClassName not in ClassNamesMap:\n", " ClassNum += 1\n", " ClassNamesMap[ClassName] = ClassNum\n", " DataLine.append(\"%s\" % ClassNamesMap[ClassName])\n", " \n", " for Index in range(2, len(LineWords)):\n", " DataLine.append(LineWords[Index])\n", " \n", " DataLines.append(\"\\t\".join(DataLine))\n", " \n", " return \"\\n\".join(DataLines)\n", "\n", "def ListStudiesAnalysisAndResultsData(StudiesResultsData, DisplayDataFrame = True):\n", " \"\"\"List analysis and results data for studies.\"\"\"\n", " \n", " print(\"\\nListing analysis metadata for studies along with datatable for named metabolites...\")\n", " \n", " for StudyID in StudiesResultsData:\n", " print(\"\")\n", " for AnalysisID in StudiesResultsData[StudyID]:\n", " print(\"\\nstudy_id:%s\\nanalysis_id:%s\" % (StudyID, AnalysisID))\n", " for DataType in StudiesResultsData[StudyID][AnalysisID]:\n", " DataValue = StudiesResultsData[StudyID][AnalysisID][DataType]\n", " if re.match(\"^(data_frame)$\", DataType, re.I):\n", " if DisplayDataFrame:\n", " print(\"data_frame:\\n\")\n", " display(HTML(DataValue.to_html(max_rows = 10, max_cols = 10)))\n", " else:\n", " print(\"data_frame: \")\n", " else:\n", " print(\"%s: %s\" % (DataType, DataValue))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Retrieve and process results data for study containing a single analysis...**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "StudyID = \"ST000001\"\n", "print(\"\\nProcessing study ID: %s\" % StudyID)\n", "\n", "StudiesResultsData = RetrieveStudiesAnalysisAndResultsData(StudyID)\n", "ListStudiesAnalysisAndResultsData(StudiesResultsData)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Retrieve and process results data for study containing a multiple analysis...**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "StudyID = \"ST000009\"\n", "print(\"\\nProcessing study ID: %s\" % StudyID)\n", "\n", "StudiesResultsData = RetrieveStudiesAnalysisAndResultsData(StudyID)\n", "ListStudiesAnalysisAndResultsData(StudiesResultsData)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Retrieve and process results data for multiple studies containing multiple analysis...**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup a study ID subsring to match study IDs ST000010 to ST000019...\n", "StudyID = \"ST00001\"\n", "print(\"\\nProcessing study ID substring to match studies from ST000010 to ST000019: %s\" % StudyID)\n", "\n", "StudiesResultsData = RetrieveStudiesAnalysisAndResultsData(StudyID)\n", "\n", "# Turn off dataframe display...\n", "ListStudiesAnalysisAndResultsData(StudiesResultsData, DisplayDataFrame = False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Perform interactive retrieval of data for a specified study ID...**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "StudyIDText = widgets.Text(value = \"ST000001\", description = \"Study ID\", placeholder = \"Type study ID\", disabled = False)\n", "\n", "RetrieveDataBtn = widgets.Button(description = 'Retrieve Data', disabled = False, button_stype = '', tooltip = \"Retrieve data for study ID\")\n", "OutputRetrieveDataBtn = widgets.Output()\n", "\n", "def RetrieveAndListData(Object):\n", " StudyID = StudyIDText.value\n", " \n", " OutputRetrieveDataBtn.clear_output()\n", " with OutputRetrieveDataBtn:\n", " if len(StudyID):\n", " print(\"\\nProcessing study ID: %s\" % StudyID)\n", " StudiesResultsData = RetrieveStudiesAnalysisAndResultsData(StudyID)\n", " ListStudiesAnalysisAndResultsData(StudiesResultsData)\n", " else:\n", " print(\"\\nNo study ID specified...\")\n", "\n", "RetrieveDataBtn.on_click(RetrieveAndListData)\n", "\n", "display(StudyIDText, RetrieveDataBtn)\n", "display(OutputRetrieveDataBtn)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }