---
name: tooluniverse-metabolomics-pathway
description: Metabolomics pathway analysis — metabolite identification (HMDB, KEGG, ChEBI), pathway mapping (Reactome, KEGG, MetaCyc), disease associations, enzyme/gene linkage. Use for metabolite-to-pathway-to-disease connections, BridgeDb-based ID conversion, and integrating metabolomics with gene-level pathway analyses.
disable-model-invocation: true
---

# Metabolomics Pathway Analysis

Identify metabolites, map to metabolic pathways, find disease associations, and connect to enzymes/genes.

## Domain Reasoning

Metabolite-to-pathway mapping requires correct, database-specific identifiers. HMDB IDs link to KEGG/Reactome but must be converted via BridgeDb; PubChem CIDs need explicit cross-referencing. Always verify metabolite identity first: the same common name can refer to structurally distinct isomers, and PubChem names frequently differ from CTD/KEGG names.

## LOOK UP DON'T GUESS

- Pathway membership: call `MetaCyc_get_compound`, `KEGG_get_compound`, or `ReactomeContent_search`
- Cross-database IDs: use `BridgeDb_xrefs`
- Enzyme-metabolite relationships: use `CTD_get_chemical_gene_interactions` or `KEGG_get_compound`
- Disease associations: query `Metabolite_get_diseases` or `CTD_get_chemical_diseases`

---

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## Workflow

```
Phase 0: Identify & Resolve → Phase 1: Characterize → Phase 2: Pathway Map →
Phase 3: Enzyme/Gene Linkage → Phase 4: Disease Associations → Phase 5: Cross-DB Enrichment → Report
```

---

## Phase 0: Metabolite Identification & Resolution

### By Name
**Metabolite_search**: `query` (REQUIRED), `search_type` ("name"/"formula"). Returns PubChem matches with CID, name, formula, MW, SMILES.
**MetabolomicsWorkbench_search_compound_by_name**: `name` (REQUIRED). Cross-reference with RefMet.

### By Mass/Formula
**MetabolomicsWorkbench_search_by_mz**: `mz` (REQUIRED), `adduct` (e.g., "M+H"), `tolerance`. Uses moverz/REFMET/{mz}/{adduct}/{tolerance}.
**MetabolomicsWorkbench_search_by_exact_mass**: `exact_mass` (REQUIRED), `tolerance`. Uses moverz/REFMET/{mass}/M/{tolerance}.

### By ID
**Metabolite_get_info**: `compound_name`, `hmdb_id` (e.g., "HMDB0000122"), or `pubchem_cid`. Returns HMDB ID, CID, InChIKey, classification.
**KEGG_get_compound**: `compound_id` (e.g., "C00031"). Returns linked pathways, enzymes, reactions.

### ID Cross-Referencing
**BridgeDb_xrefs**: `identifier` (REQUIRED), `source` (REQUIRED: "Ch"=HMDB, "Cs"=ChemSpider, "Ck"=KEGG, "Ce"=ChEBI), `target` (optional).
**BridgeDb_search**: `query` (REQUIRED), `organism`. Free-text metabolite search.

---

## Phase 1: Metabolite Characterization

**Metabolite_get_info**: classification (super_class/class/sub_class), biological_roles, cellular_locations.
**MetabolomicsWorkbench_get_refmet_info**: `refmet_name` (REQUIRED). Standardized RefMet classification.
**KEGG_get_compound**: linked enzyme/reaction/pathway IDs.

---

## Phase 2: Pathway Mapping

### MetaCyc
- `MetaCyc_search_pathways`: `query` (keyword search, e.g., "glycolysis")
- `MetaCyc_get_pathway`: `pathway_id` (e.g., "GLYCOLYSIS") -- reactions, enzymes, compounds
- `MetaCyc_get_compound`: `compound_id` (e.g., "PYRUVATE") -- pathways it participates in
- `MetaCyc_get_reaction`: `reaction_id` -- substrates, products, enzymes

### KEGG
- `KEGG_get_gene_pathways`: `gene_id` (e.g., "hsa:5230") -- pathways for enzyme gene
- `KEGG_get_pathway_genes`: `pathway_id` (e.g., "hsa00010") -- all genes in pathway

### Reactome
- `ReactomeContent_search`: `query`, `types` (e.g., "Pathway"), `species`
- `Reactome_get_pathway`: `id` (e.g., "R-HSA-70171")
- `ReactomeAnalysis_pathway_enrichment`: `identifiers` (space-separated string, NOT array)
- `Reactome_map_uniprot_to_pathways`: `uniprot_id`

---

## Phase 3: Enzyme & Gene Linkage

**CTD_get_chemical_gene_interactions**: `input_terms` (chemical name). Returns interacting genes.
**KEGG_get_gene_pathways**: which pathways an enzyme gene participates in.
**BridgeDb_attributes**: `identifier`, `source`, `organism`. Get attributes for identifier.

Workflow: KEGG compound -> enzyme IDs -> MetaCyc reaction -> enzyme names -> Reactome uniprot -> pathways -> MyGene for gene info.

---

## Phase 4: Disease Associations

**CTD_get_chemical_diseases**: `input_terms` (chemical name, MeSH, CAS RN). Curated associations with direct/inferred evidence.
**CTD_get_gene_diseases**: `input_terms` (gene name). For metabolite-processing genes from Phase 3.
**Metabolite_get_diseases**: `compound_name`/`hmdb_id`/`pubchem_cid`, `limit` (default 50). CTD-backed.

---

## Phase 5: Cross-Database Enrichment

**MetabolomicsWorkbench_get_study**: `study_id` (e.g., "ST000001").
**MetabolomicsWorkbench_get_compound_by_pubchem_cid**: `pubchem_cid`.
**PubMed_search_articles** / **EuropePMC_search_articles**: literature context.

For metabolite list enrichment: (1) convert names to gene/enzyme IDs via CTD, (2) run `ReactomeAnalysis_pathway_enrichment` with space-separated identifiers, (3) use `KEGG_get_gene_pathways` per enzyme.

---

## Common Mistakes to Avoid

| Mistake | Correction |
|---------|-----------|
| Array to ReactomeAnalysis_pathway_enrichment | Must be space-separated string |
| HMDB IDs in CTD_get_chemical_diseases | CTD uses common names or MeSH IDs |
| Not resolving names first | Always start with Metabolite_search |
| gene_id without organism prefix for KEGG | Need "hsa:5230" not "5230" |
| Expecting HMDB API | No open API; use Metabolite_get_info (PubChem-backed) |
| PubChem title to CTD when names differ | Try both PubChem name and common synonyms |
| MetabolomicsWorkbench exactmass | Use moverz/REFMET/{mass}/M/{tolerance} (exactmass broken) |

---

## Fallback Strategies

- **Metabolite_search empty** -> MetabolomicsWorkbench_search_compound_by_name or KEGG_get_compound
- **MetaCyc not found** -> KEGG or Reactome pathways
- **CTD empty for disease** -> Metabolite_get_diseases with HMDB/CID
- **No KEGG compound ID** -> BridgeDb_xrefs from HMDB/ChEBI
- **exactmass fails** -> search_by_mz with M+H adduct
- **Need enzyme genes** -> CTD_get_chemical_gene_interactions

---

## Evidence Grading

| Tier | Criteria | Sources |
|------|----------|---------|
| **T1** | Curated disease association, direct evidence | CTD curated, OMIM |
| **T2** | Multiple database pathway concordance | MetaCyc + KEGG + Reactome agreement |
| **T3** | Inferred or single-database | CTD inferred, single pathway DB |
| **T4** | Computational prediction or text-mining | Literature, RefMet classification |

---

## Limitations

- HMDB has no open API; use Metabolite_get_info (PubChem-backed).
- MetaCyc pathways are reference (not organism-specific like KEGG).
- CTD can return very large sets for common metabolites (22K+ for acetaminophen).
- ReactomeAnalysis expects gene/protein IDs, not metabolite IDs directly.
- BridgeDb coverage depends on the metabolite being in mapping databases.