---
name: tooluniverse-spatial-omics-analysis
description: Computational analysis framework for spatial multi-omics data integration. Given spatially variable genes (SVGs), spatial domain annotations, tissue type, and disease context from spatial transcriptomics/proteomics experiments (10x Visium, MERFISH, DBiTplus, SLIDE-seq, etc.), performs comprehensive biological interpretation including pathway enrichment, cell-cell interaction inference, druggable target identification, immune microenvironment characterization, and multi-modal integration. Produces a detailed markdown report with Spatial Omics Integration Score (0-100), domain-by-domain characterization, and validation recommendations. Uses 70+ ToolUniverse tools across 9 analysis phases. Use when users ask about spatial transcriptomics analysis, spatial omics interpretation, tissue heterogeneity, spatial gene expression patterns, tumor microenvironment mapping, tissue zonation, or cell-cell communication from spatial data.
---

# Spatial Multi-Omics Analysis Pipeline

Comprehensive biological interpretation of spatial omics data. Transforms spatially variable genes (SVGs), domain annotations, and tissue context into actionable biological insights covering pathway enrichment, cell-cell interactions, druggable targets, immune microenvironment, and multi-modal integration.

**KEY PRINCIPLES**:
1. **Report-first approach** - Create report file FIRST, then populate progressively
2. **Domain-by-domain analysis** - Characterize each spatial region independently before comparison
3. **Gene-list-centric** - Analyze user-provided SVGs and marker genes with ToolUniverse databases
4. **Biological interpretation** - Go beyond statistics to explain biological meaning of spatial patterns
5. **Disease focus** - Emphasize disease mechanisms and therapeutic opportunities when disease context is provided
6. **Evidence grading** - Grade all evidence as T1 (human/clinical) to T4 (computational)
7. **Multi-modal thinking** - Integrate RNA, protein, and metabolite information when available
8. **Validation guidance** - Suggest experimental validation approaches for key findings
9. **Source references** - Every statement must cite tool/database source
10. **Completeness checklist** - Mandatory section showing analysis coverage
11. **English-first queries** - Always use English terms in tool calls. Respond in user's language

---

## When to Use This Skill

Apply when users:
- Provide spatially variable genes from spatial transcriptomics experiments
- Ask about biological interpretation of spatial domains/clusters
- Need pathway enrichment analysis of spatial gene expression data
- Want to understand cell-cell interactions from spatial data
- Ask about tumor microenvironment heterogeneity from spatial omics
- Need druggable targets in specific spatial regions
- Ask about tissue zonation patterns (liver, brain, kidney)
- Want to integrate spatial transcriptomics + proteomics data
- Ask about immune infiltration patterns from spatial data
- Need to compare healthy vs disease regions spatially
- Ask "What pathways are enriched in this tumor core vs tumor margin?"
- Ask "What cell-cell interactions occur in this spatial domain?"

**NOT for** (use other skills instead):
- Single gene interpretation without spatial context -> Use `tooluniverse-target-research`
- Variant interpretation -> Use `tooluniverse-variant-interpretation`
- Drug safety profiling -> Use `tooluniverse-adverse-event-detection`
- Disease-only analysis without spatial data -> Use `tooluniverse-multiomic-disease-characterization`
- GWAS analysis -> Use `tooluniverse-gwas-*` skills
- Bulk RNA-seq (non-spatial) -> Use `tooluniverse-systems-biology`

---

## Input Parameters

| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| **svgs** | Yes | Spatially variable genes (gene symbols) | `['EGFR', 'CDH1', 'VIM', 'MYC', 'CD3E']` |
| **tissue_type** | Yes | Tissue/organ type | `brain`, `liver`, `lung`, `breast`, `skin` |
| **technology** | No | Spatial omics platform used | `10x Visium`, `MERFISH`, `DBiTplus`, `SLIDE-seq` |
| **disease_context** | No | Disease if applicable | `breast cancer`, `Alzheimer disease`, `liver cirrhosis` |
| **spatial_domains** | No | Dict mapping domain name to marker genes | `{'Tumor core': ['MYC','EGFR'], 'Stroma': ['VIM','COL1A1']}` |
| **cell_types** | No | Cell types identified in deconvolution | `['Epithelial', 'T cell', 'Macrophage', 'Fibroblast']` |
| **proteins** | No | Proteins detected (if multi-modal) | `['CD3', 'CD8', 'PD-L1', 'Ki67']` |
| **metabolites** | No | Metabolites detected (if SpatialMETA) | `['glutamine', 'lactate', 'ATP']` |

---

## Spatial Omics Integration Score (0-100)

### Score Components

**Data Completeness (0-30 points)**:
- SVGs provided (>10 genes): 5 points
- Disease context provided: 5 points
- Spatial domains defined: 5 points
- Cell type composition available: 5 points
- Multi-modal data (protein/metabolite): 5 points
- Literature context found: 5 points

**Biological Insight (0-40 points)**:
- Significant pathway enrichment (FDR < 0.05): 10 points
- Cell-cell interaction predictions: 10 points
- Disease mechanism identified: 10 points
- Druggable targets found in disease regions: 10 points

**Evidence Quality (0-30 points)**:
- Cross-database validation (gene found in 3+ databases): 10 points
- Clinical validation (approved drugs for spatial targets): 10 points
- Literature support (PubMed evidence for spatial patterns): 10 points

### Score Interpretation

| Score | Tier | Interpretation |
|-------|------|----------------|
| **80-100** | Excellent | Comprehensive spatial characterization, strong biological insights, druggable targets identified |
| **60-79** | Good | Good pathway and interaction analysis, some disease/therapeutic context |
| **40-59** | Moderate | Basic enrichment complete, limited spatial domain comparison or interaction analysis |
| **0-39** | Limited | Minimal data, gene-level annotation only |

### Evidence Grading System

| Tier | Symbol | Criteria | Examples |
|------|--------|----------|----------|
| **T1** | [T1] | Direct human evidence, clinical proof | FDA-approved drug for spatial target, validated biomarker |
| **T2** | [T2] | Experimental evidence | Validated spatial pattern in literature, known ligand-receptor pair |
| **T3** | [T3] | Computational/database evidence | PPI network prediction, pathway enrichment, expression correlation |
| **T4** | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction |

---

## Report Template

Create this file structure at the start: `{tissue}_{disease}_spatial_omics_report.md`

```markdown
# Spatial Multi-Omics Analysis Report: {Tissue Type}

**Report Generated**: {date}
**Technology**: {platform}
**Tissue**: {tissue_type}
**Disease Context**: {disease or "Normal tissue"}
**Total SVGs Analyzed**: {count}
**Spatial Domains**: {count}
**Spatial Omics Integration Score**: (to be calculated)

---

## Executive Summary

(2-3 sentence synthesis of key spatial findings - fill after all phases complete)

---

## 1. Tissue & Disease Context

### Tissue Information
| Property | Value | Source |
|----------|-------|--------|
| Tissue type | | |
| Disease | | |
| Expected cell types | | HPA |

### Disease Identifiers (if applicable)
| System | ID | Source |
|--------|-----|--------|

**Sources**: (tools used)

---

## 2. Spatially Variable Gene Characterization

### 2.1 Gene ID Resolution
| Gene Symbol | Ensembl ID | Entrez ID | UniProt | Function | Source |
|-------------|------------|-----------|---------|----------|--------|

### 2.2 Tissue Expression Patterns
| Gene | Tissue Expression | Specificity | Source |
|------|-------------------|-------------|--------|

### 2.3 Subcellular Localization
| Gene | Location | Confidence | Source |
|------|----------|------------|--------|

### 2.4 Disease Associations
| Gene | Disease | Score | Evidence | Source |
|------|---------|-------|----------|--------|

**Sources**: (tools used)

---

## 3. Pathway Enrichment Analysis

### 3.1 STRING Functional Enrichment
| Category | Term | Description | P-value | FDR | Genes | Source |
|----------|------|-------------|---------|-----|-------|--------|

### 3.2 Reactome Pathway Analysis
| Pathway ID | Name | P-value | FDR | Genes Found | Total Genes | Source |
|------------|------|---------|-----|-------------|-------------|--------|

### 3.3 GO Biological Processes
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### 3.4 GO Molecular Functions
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### 3.5 GO Cellular Components
| GO Term | Description | P-value | FDR | Genes | Source |
|---------|-------------|---------|-----|-------|--------|

### Pathway Summary
- Top enriched pathways:
- Key biological processes:
- Spatial pathway implications:

**Sources**: (tools used)

---

## 4. Spatial Domain Characterization

### Domain: {domain_name}

#### Marker Genes
| Gene | Function | Pathways | Source |
|------|----------|----------|--------|

#### Enriched Pathways (domain-specific)
| Pathway | P-value | FDR | Genes | Source |
|---------|---------|-----|-------|--------|

#### Cell Type Signature
| Cell Type | Marker Genes Present | Confidence |
|-----------|---------------------|------------|

#### Biological Interpretation
(Narrative interpretation of this domain)

(Repeat for each domain)

### 4.N Domain Comparison
| Feature | Domain 1 | Domain 2 | Domain 3 |
|---------|----------|----------|----------|
| Top pathway | | | |
| Cell types | | | |
| Disease relevance | | | |

**Sources**: (tools used)

---

## 5. Cell-Cell Interaction Inference

### 5.1 Protein-Protein Interactions (STRING)
| Protein A | Protein B | Score | Type | Source |
|-----------|-----------|-------|------|--------|

### 5.2 Ligand-Receptor Pairs
| Ligand | Receptor | Domain (Ligand) | Domain (Receptor) | Evidence | Source |
|--------|----------|-----------------|-------------------|----------|--------|

### 5.3 Signaling Pathways
| Pathway | Components in Data | Spatial Distribution | Source |
|---------|--------------------|---------------------|--------|

### 5.4 Interaction Network Summary
- Key interaction hubs:
- Cross-domain interactions:
- Predicted cell-cell communication axes:

**Sources**: (tools used)

---

## 6. Disease & Therapeutic Context

### 6.1 Disease Gene Overlap
| Gene | Disease Association Score | Evidence Type | Source |
|------|--------------------------|---------------|--------|

### 6.2 Druggable Targets in Spatial Domains
| Gene | Domain | Tractability | Modality | Approved Drugs | Source |
|------|--------|-------------|----------|----------------|--------|

### 6.3 Drug Mechanisms Relevant to Spatial Targets
| Drug | Target | Mechanism | Phase | Source |
|------|--------|-----------|-------|--------|

### 6.4 Clinical Trials
| NCT ID | Title | Target Gene | Phase | Status | Source |
|--------|-------|-------------|-------|--------|--------|

### Therapeutic Summary
- Druggable genes in disease regions:
- Approved therapies:
- Pipeline drugs:
- Novel opportunities:

**Sources**: (tools used)

---

## 7. Multi-Modal Integration

### 7.1 Protein-RNA Concordance (if protein data available)
| Gene/Protein | RNA Pattern | Protein Pattern | Concordance | Source |
|-------------|-------------|-----------------|-------------|--------|

### 7.2 Subcellular Context
| Gene | mRNA Location (spatial) | Protein Location (HPA) | Concordance | Source |
|------|------------------------|----------------------|-------------|--------|

### 7.3 Metabolic Context (if metabolomics available)
| Gene | Metabolic Pathway | Metabolites Detected | Spatial Pattern | Source |
|------|-------------------|---------------------|-----------------|--------|

**Sources**: (tools used)

---

## 8. Immune Microenvironment (if relevant)

### 8.1 Immune Cell Markers
| Cell Type | Marker Genes | Spatial Domain | Source |
|-----------|-------------|----------------|--------|

### 8.2 Immune Checkpoint Expression
| Checkpoint | Gene | Expression Pattern | Source |
|------------|------|--------------------|--------|

### 8.3 Tumor-Immune Interface (if cancer)
| Feature | Finding | Evidence | Source |
|---------|---------|----------|--------|

### Immune Summary
- Immune infiltration pattern:
- Key immune checkpoints:
- Immunotherapy implications:

**Sources**: (tools used)

---

## 9. Literature & Validation Context

### 9.1 Literature Evidence
| PMID | Title | Relevance | Year | Source |
|------|-------|-----------|------|--------|

### 9.2 Known Spatial Patterns
(Known tissue architecture/zonation from literature)

### 9.3 Validation Recommendations
| Priority | Gene/Target | Method | Rationale |
|----------|-------------|--------|-----------|
| High | | IHC / smFISH | |
| Medium | | IF / ISH | |

**Sources**: (tools used)

---

## Spatial Omics Integration Score

| Component | Points | Max | Details |
|-----------|--------|-----|---------|
| SVGs provided | | 5 | |
| Disease context | | 5 | |
| Spatial domains | | 5 | |
| Cell types | | 5 | |
| Multi-modal data | | 5 | |
| Literature context | | 5 | |
| Pathway enrichment | | 10 | |
| Cell-cell interactions | | 10 | |
| Disease mechanism | | 10 | |
| Druggable targets | | 10 | |
| Cross-database validation | | 10 | |
| Clinical validation | | 10 | |
| Literature support | | 10 | |
| **TOTAL** | | **100** | |

**Score**: XX/100 - [Tier]

---

## Completeness Checklist

- [ ] Gene ID resolution complete
- [ ] Tissue expression patterns analyzed (HPA)
- [ ] Subcellular localization checked (HPA)
- [ ] Pathway enrichment complete (STRING + Reactome)
- [ ] GO enrichment complete (BP + MF + CC)
- [ ] Spatial domains characterized individually
- [ ] Domain comparison performed
- [ ] Protein-protein interactions analyzed (STRING)
- [ ] Ligand-receptor pairs identified
- [ ] Disease associations checked (OpenTargets)
- [ ] Druggable targets identified (OpenTargets tractability)
- [ ] Drug mechanisms reviewed
- [ ] Multi-modal integration performed (if data available)
- [ ] Immune microenvironment characterized (if relevant)
- [ ] Literature search completed
- [ ] Validation recommendations provided
- [ ] Spatial Omics Integration Score calculated
- [ ] Executive summary written
- [ ] All sections have source citations

---

## References

### Data Sources Used
| # | Tool | Parameters | Section | Items Retrieved |
|---|------|------------|---------|-----------------|

### Database Versions
- OpenTargets: (current)
- STRING: v12.0
- Reactome: (current)
- HPA: (current)
- GTEx: v10
```

---

## Phase 0: Input Processing & Disambiguation (ALWAYS FIRST)

**Objective**: Parse user input, resolve tissue/disease identifiers, establish analysis context.

### Tools Used

**OpenTargets_get_disease_id_description_by_name** (if disease context provided):
- **Input**: `diseaseName` (string) - Disease name
- **Output**: `{data: {search: {hits: [{id, name, description}]}}}`
- **Use**: Get MONDO/EFO IDs for disease queries

**OpenTargets_get_disease_description_by_efoId**:
- **Input**: `efoId` (string) - Disease ID (e.g., `MONDO_0007254`)
- **Output**: `{data: {disease: {id, name, description, dbXRefs}}}`
- **Use**: Get full disease description

**HPA_search_genes_by_query** (tissue cell type context):
- **Input**: `query` (string) - Search term
- **Output**: List of gene entries matching query
- **Use**: Verify tissue-relevant genes

### Workflow

1. Parse SVG list from user input (ensure valid gene symbols)
2. Identify tissue type and map to standard ontology term
3. If disease provided, resolve to MONDO/EFO ID using OpenTargets
4. Get disease description and cross-references
5. Determine analysis scope:
   - Cancer? -> Include immune microenvironment, somatic mutations, druggable targets
   - Neurological? -> Include brain region specificity, neuronal markers
   - Metabolic? -> Include metabolic zonation, enzyme distribution
   - Normal tissue? -> Focus on tissue architecture and cell type composition
6. Set up report file with header information

### Decision Logic

- **Cancer tissue**: Enable immune microenvironment phase, CIViC/cBioPortal queries, immuno-oncology analysis
- **Normal tissue**: Skip disease phases, focus on tissue zonation and cell type composition
- **Liver/kidney/brain**: Enable zonation-specific analysis
- **No disease context**: Proceed with tissue biology only
- **Small gene list (<20)**: Warn about limited enrichment power, emphasize gene-level analysis
- **Large gene list (>500)**: Suggest filtering to top SVGs by significance before enrichment

---

## Phase 1: Gene Characterization

**Objective**: Resolve gene identifiers, annotate functions, tissue specificity, and subcellular localization.

### Tools Used

**MyGene_query_genes** (gene ID resolution):
- **Input**: `query` (string) - Gene symbol
- **Output**: `{hits: [{_id, symbol, name, ensembl: {gene}, entrezgene}]}`
- **Use**: Resolve gene symbol to Ensembl ID, Entrez ID
- **NOTE**: First hit may not be exact match - filter by `symbol` field

**UniProt_get_function_by_accession** (gene function):
- **Input**: `accession` (string) - UniProt accession
- **Output**: List of function description strings
- **Use**: Get protein function annotation

**UniProt_get_subcellular_location_by_accession** (protein localization):
- **Input**: `accession` (string)
- **Output**: Subcellular location information
- **Use**: Where the protein is located in the cell

**HPA_get_subcellular_location** (validated localization):
- **Input**: `gene_name` (string) - Gene symbol
- **Output**: `{gene_name, main_locations: [], additional_locations: [], location_summary}`
- **Use**: Experimentally validated protein subcellular location

**HPA_get_rna_expression_by_source** (tissue expression):
- **Input**: `gene_name` (string), `source_type` (string: 'tissue'), `source_name` (string)
- **Output**: `{data: {gene_name, source_type, source_name, expression_value, expression_level}}`
- **Use**: Check expression in the specific tissue of interest
- **NOTE**: All 3 parameters are REQUIRED

**HPA_get_comprehensive_gene_details_by_ensembl_id** (full HPA data):
- **Input**: `ensembl_id` (string), `include_isoforms` (bool), `include_images` (bool), `include_antibodies` (bool), `include_expression` (bool) - ALL 5 parameters REQUIRED
- **Output**: `{ensembl_id, gene_name, uniprot_ids, summary, protein_classes, tissue_expression, cell_line_expression, ...}`
- **Use**: One-stop gene characterization from HPA
- **NOTE**: Use `include_expression=True` for tissue data; set others to `False` for faster response

**HPA_get_cancer_prognostics_by_gene** (cancer prognosis):
- **Input**: `ensembl_id` (string) - Ensembl gene ID (NOT gene_name)
- **Output**: `{gene_name, prognostic_cancers_count, prognostic_summary: [{cancer_type, prognostic_type, p_value}]}`
- **Use**: Prognostic significance in cancer (if cancer context)

**UniProtIDMap_gene_to_uniprot** (ID mapping):
- **Input**: `gene_name` (string), `organism` (string, default 'human')
- **Output**: UniProt accession for the gene
- **Use**: Map gene symbol to UniProt accession

### Workflow

1. For each SVG (batch if >20, sample top genes):
   a. Query MyGene to get Ensembl ID, Entrez ID
   b. Map to UniProt accession
   c. Get subcellular location from HPA
   d. Get tissue expression from HPA
   e. If cancer: check cancer prognostics
2. Compile gene characterization table
3. Identify genes with tissue-specific expression
4. Note genes with nuclear vs membrane vs secreted localization (relevant for spatial patterns)

### Batch Strategy for Large Gene Lists

- **10-50 genes**: Characterize all individually
- **50-200 genes**: Characterize top 50 by priority (known disease genes first), summarize rest
- **200+ genes**: Characterize top 30, use enrichment for the full list
- Always run pathway enrichment on the FULL list regardless

---

## Phase 2: Pathway & Functional Enrichment

**Objective**: Identify biological pathways and functions enriched in SVGs and per-domain gene sets.

### Tools Used

**STRING_functional_enrichment** (primary enrichment):
- **Input**: `protein_ids` (array of gene symbols), `species` (int, 9606 for human)
- **Output**: `{status: 'success', data: [{category, term, number_of_genes, number_of_genes_in_background, p_value, fdr, description, inputGenes, preferredNames}]}`
- **Use**: Comprehensive enrichment across GO, KEGG, Reactome, COMPARTMENTS, DISEASES
- **Categories**: `Process` (GO:BP), `Function` (GO:MF), `Component` (GO:CC), `KEGG`, `Reactome`, `COMPARTMENTS`, `DISEASES`, `Keyword`, `PMID`
- **NOTE**: This is the PRIMARY enrichment tool. Returns all categories in one call

**ReactomeAnalysis_pathway_enrichment** (Reactome-specific):
- **Input**: `identifiers` (string, space-separated gene symbols, NOT array)
- **Output**: `{data: {token, pathways_found, pathways: [{pathway_id, name, p_value, fdr, entities_found, entities_total}]}}`
- **Use**: Detailed Reactome pathway analysis with hierarchy
- **NOTE**: identifiers is a SPACE-SEPARATED STRING, not array

**Reactome_map_uniprot_to_pathways** (individual gene):
- **Input**: `id` (string) - UniProt accession
- **Output**: Plain list of pathway objects (no data wrapper)
- **Use**: Map individual proteins to Reactome pathways

**GO_get_annotations_for_gene** (individual gene GO):
- **Input**: `gene_id` (string) - Gene symbol or ID
- **Output**: Plain list of GO annotation objects
- **Use**: Get GO annotations for individual genes

**kegg_search_pathway** (KEGG pathway search):
- **Input**: `query` (string) - Pathway name or keyword
- **Output**: Pathway search results
- **Use**: Find KEGG pathways relevant to spatial findings

**WikiPathways_search** (WikiPathways):
- **Input**: `query` (string) - Search term
- **Output**: WikiPathways search results
- **Use**: Additional pathway context

### Workflow

1. **Global SVG enrichment**: Run STRING_functional_enrichment on ALL SVGs
   - Filter results by FDR < 0.05
   - Separate by category (Process, Function, Component, KEGG, Reactome)
   - Report top 10-15 per category
2. **Reactome detailed analysis**: Run ReactomeAnalysis_pathway_enrichment
   - Report top pathways with FDR < 0.05
3. **Per-domain enrichment** (if spatial domains provided):
   - Run STRING_functional_enrichment on each domain's gene set
   - Compare enriched pathways across domains
   - Identify domain-specific vs shared pathways
4. **Compile pathway tables**: Merge results from all enrichment tools

### Enrichment Interpretation

- **Signaling pathways** (RTK, Wnt, Notch, Hedgehog): Cell-cell communication
- **Metabolic pathways**: Tissue metabolic zonation
- **Immune pathways**: Immune infiltration/exclusion
- **ECM/adhesion pathways**: Tissue structure and remodeling
- **Cell cycle/proliferation**: Growth zones
- **Apoptosis/stress**: Damage zones

---

## Phase 3: Spatial Domain Characterization

**Objective**: Characterize each spatial domain biologically and compare between domains.

### Tools Used

Uses the same tools as Phase 2 (STRING_functional_enrichment, ReactomeAnalysis) applied per-domain, plus:

**HPA_get_biological_processes_by_gene** (per-gene processes):
- **Input**: `gene_name` (string)
- **Output**: Biological processes associated with the gene
- **Use**: Annotate domain marker genes

**HPA_get_protein_interactions_by_gene** (gene interactions):
- **Input**: `gene_name` (string)
- **Output**: Known protein interaction partners
- **Use**: Build domain-specific interaction context

### Workflow

1. For each spatial domain:
   a. Get marker gene list
   b. Run STRING_functional_enrichment on domain genes
   c. Identify top pathways, GO terms
   d. Assign likely cell type(s) based on marker genes:
      - Epithelial: CDH1, EPCAM, KRT18, KRT19
      - Mesenchymal/Fibroblast: VIM, COL1A1, COL3A1, FAP, ACTA2
      - Immune T cell: CD3E, CD3D, CD4, CD8A, CD8B
      - Immune B cell: CD19, CD20 (MS4A1), CD79A
      - Macrophage: CD68, CD163, CSF1R
      - Endothelial: PECAM1, VWF, CDH5
      - Neuronal: SNAP25, SYP, MAP2, NEFL
      - Hepatocyte: ALB, HNF4A, CYP3A4
   e. Generate biological interpretation narrative
2. Compare domains:
   - Differential pathways
   - Unique vs shared genes
   - Disease-relevant vs homeostatic regions
   - Transition zones (shared genes between adjacent domains)

### Cell Type Assignment Rules

When user does not provide cell type annotations, infer from marker genes:
- Check each gene against known cell type markers
- Use HPA tissue/cell type expression data for validation
- Report confidence level (high: 3+ markers match, medium: 2 markers, low: 1 marker)

---

## Phase 4: Cell-Cell Interaction Inference

**Objective**: Predict cell-cell communication from spatial gene expression patterns.

### Tools Used

**STRING_get_interaction_partners** (PPI network):
- **Input**: `protein_ids` (array), `species` (int, 9606), `limit` (int), `confidence_score` (float, 0.7)
- **Output**: `{status: 'success', data: [{preferredName_A, preferredName_B, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}`
- **Use**: Find protein-protein interactions among SVGs
- **Score types**: nscore=neighborhood, fscore=fusion, pscore=phylogenetic, ascore=coexpression, escore=experimental, dscore=database, tscore=textmining

**STRING_get_protein_interactions** (pairwise interactions):
- **Input**: `protein_ids` (array), `species` (int, 9606)
- **Output**: Interaction data between specified proteins
- **Use**: Get interactions within a specific gene set

**intact_search_interactions** (IntAct database):
- **Input**: `query` (string), `max` (int)
- **Output**: Interaction data from IntAct
- **Use**: Complement STRING with IntAct interactions

**Reactome_get_interactor** (Reactome interactions):
- **Input**: Protein/gene identifier
- **Output**: Reactome interaction data
- **Use**: Pathway-level interaction context

**DGIdb_get_drug_gene_interactions** (drug-gene interactions):
- **Input**: `genes` (array of strings)
- **Output**: Drug-gene interaction data
- **Use**: Identify druggable interaction nodes

### Ligand-Receptor Analysis

Known ligand-receptor pairs to check in SVG list:
- **Growth factors**: EGF-EGFR, HGF-MET, VEGF-KDR, FGF-FGFR, PDGF-PDGFRA/B
- **Cytokines**: TNF-TNFR, IL6-IL6R, IFNG-IFNGR, TGFB1-TGFBR1/2
- **Chemokines**: CXCL12-CXCR4, CCL2-CCR2, CXCL10-CXCR3
- **Immune checkpoints**: CD274(PD-L1)-PDCD1(PD-1), CD80/CD86-CTLA4, LGALS9-HAVCR2(TIM-3)
- **Notch signaling**: DLL1/3/4-NOTCH1/2/3/4, JAG1/2-NOTCH1/2
- **Wnt signaling**: WNT ligands-FZD receptors
- **Adhesion**: CDH1-CDH1 (homotypic), ITGA/B integrins-ECM
- **Hedgehog**: SHH-PTCH1

### Workflow

1. Run STRING_get_interaction_partners on all SVGs
   - Filter interactions with score > 0.7
   - Identify hub genes (most connections)
2. Check for known ligand-receptor pairs in gene list
   - Cross-reference with spatial domain assignments
   - Identify potential cross-domain signaling
3. Build interaction network:
   - Intra-domain interactions (within same spatial region)
   - Inter-domain interactions (between different regions)
   - Identify signaling axes (e.g., tumor-stroma, immune-tumor)
4. Map interactions to Reactome signaling pathways

---

## Phase 5: Disease & Therapeutic Context

**Objective**: Connect spatial findings to disease mechanisms and identify druggable targets.

### Tools Used

**OpenTargets_get_associated_targets_by_disease_efoId** (disease genes):
- **Input**: `efoId` (string), `size` (int)
- **Output**: `{data: {disease: {associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}`
- **Use**: Get disease-associated genes, overlap with SVGs

**OpenTargets_get_target_tractability_by_ensemblID** (druggability):
- **Input**: `ensemblId` (string)
- **Output**: Tractability data (small molecule, antibody, other modalities)
- **Use**: Assess if spatial targets are druggable

**OpenTargets_get_associated_drugs_by_target_ensemblID** (drugs for target):
- **Input**: `ensemblId` (string), `size` (int)
- **Output**: Drug data for the target
- **Use**: Find approved/clinical drugs targeting spatial genes

**OpenTargets_get_drug_mechanisms_of_action_by_chemblId** (drug mechanism):
- **Input**: `chemblId` (string)
- **Output**: Mechanism of action data
- **Use**: Understand how drugs act on spatial targets

**OpenTargets_target_disease_evidence** (evidence linking target to disease):
- **Input**: `ensemblId` (string), `efoId` (string)
- **Output**: Evidence items linking target to disease
- **Use**: Specific evidence for each spatial gene in disease

**clinical_trials_search** (clinical trials):
- **Input**: `action` = `"search_studies"`, `condition` (string), `intervention` (string), `limit` (int)
- **Output**: `{total_count, studies: [{nctId, title, status, conditions}]}`
- **Use**: Find clinical trials for spatial targets
- **NOTE**: `action` MUST be `"search_studies"`

**DGIdb_get_gene_druggability** (druggability categories):
- **Input**: `genes` (array of strings)
- **Output**: `{data: {genes: {nodes: [{name, geneCategories: [{name}]}]}}}`
- **Use**: Classify genes as druggable, kinase, GPCR, etc.

**civic_search_genes** (CIViC cancer evidence, if cancer):
- **Input**: (no filter by name)
- **Output**: Gene list from CIViC
- **Use**: Check if SVGs have CIViC clinical evidence

### Workflow

1. **Disease gene overlap** (if disease context provided):
   a. Get disease-associated targets from OpenTargets
   b. Intersect with SVGs
   c. For overlapping genes, get specific evidence
2. **Druggable target identification**:
   a. Run DGIdb_get_gene_druggability on all SVGs
   b. For druggable genes, check OpenTargets tractability
   c. Get approved drugs for druggable spatial targets
3. **Clinical trials**:
   a. Search for trials targeting spatial genes in the disease context
   b. Prioritize trials for genes in disease-enriched spatial domains
4. **Cancer-specific** (if cancer):
   a. Check CIViC for clinical evidence
   b. Get mutation prevalence from cBioPortal (if specific mutations known)
   c. Check immune checkpoint genes in spatial data

---

## Phase 6: Multi-Modal Integration

**Objective**: Integrate protein, RNA, and metabolite spatial data when available.

### Tools Used

**HPA_get_subcellular_location** (protein localization):
- **Input**: `gene_name` (string)
- **Output**: `{gene_name, main_locations, additional_locations, location_summary}`
- **Use**: Compare mRNA spatial pattern with protein subcellular location

**HPA_get_rna_expression_in_specific_tissues** (tissue RNA):
- **Input**: `ensembl_id` (string), `tissue_name` (string)
- **Output**: Expression data for specific tissue
- **Use**: Validate spatial expression against bulk tissue data

**Reactome_map_uniprot_to_pathways** (metabolic pathways):
- **Input**: `id` (string) - UniProt accession
- **Output**: List of pathways
- **Use**: Map genes to metabolic pathways for metabolomics integration

**kegg_get_pathway_info** (KEGG pathway details):
- **Input**: `pathway_id` (string) - KEGG pathway ID
- **Output**: Pathway information including metabolites
- **Use**: Link spatial genes to metabolic pathways and metabolites

### Workflow

1. **RNA-Protein concordance** (if protein data provided):
   a. For each gene with both RNA and protein data:
      - Compare spatial RNA pattern with protein detection
      - Check HPA for known post-transcriptional regulation
      - Note concordant (expected) vs discordant (interesting) patterns
2. **Subcellular context**:
   a. Map spatial RNA localization to protein subcellular location (HPA)
   b. Secreted proteins -> likely paracrine signaling
   c. Membrane proteins -> cell surface markers
   d. Nuclear proteins -> transcription factors
3. **Metabolic integration** (if metabolomics available):
   a. Map genes to metabolic pathways (Reactome, KEGG)
   b. Link detected metabolites to enzyme-encoding genes
   c. Identify spatial metabolic heterogeneity
   d. Check for known metabolic zonation patterns

---

## Phase 7: Immune Microenvironment (Cancer/Inflammation)

**Objective**: Characterize immune cell composition and checkpoint expression in spatial context.

### Conditions for Activation

Only execute if:
- Disease context is cancer, autoimmune, or inflammatory
- SVGs include immune markers (CD3E, CD8A, CD68, CD163, etc.)
- User specifically asks about immune patterns

### Tools Used

**STRING_functional_enrichment** (immune pathway enrichment):
- Applied to immune-relevant SVGs
- Filter for immune-related GO terms and pathways

**OpenTargets_get_target_tractability_by_ensemblID** (checkpoint druggability):
- Applied to immune checkpoint genes
- Check for approved immunotherapies

**iedb_search_epitopes** (epitope data):
- **Input**: `organism_name` (string), `source_antigen_name` (string)
- **Output**: `{status, data, count}`
- **Use**: Check if spatial antigens have known epitopes

### Immune Cell Markers Reference

| Cell Type | Key Markers | Extended Markers |
|-----------|-------------|-----------------|
| CD8+ T cell | CD8A, CD8B | GZMA, GZMB, PRF1, IFNG |
| CD4+ T cell | CD4 | IL2, IL4, IL17A, FOXP3 (Treg) |
| Regulatory T cell | FOXP3, IL2RA | CTLA4, TIGIT |
| B cell | CD19, MS4A1, CD79A | IGHG1, IGHM |
| Plasma cell | SDC1 (CD138), XBP1 | IGHG1, MZB1 |
| M1 Macrophage | CD68, NOS2, TNF | IL1B, CXCL10 |
| M2 Macrophage | CD68, CD163, MRC1 | ARG1, IL10 |
| Dendritic cell | ITGAX (CD11c), HLA-DRA | CD80, CD86 |
| NK cell | NCAM1 (CD56), NKG7 | GNLY, KLRD1 |
| Neutrophil | FCGR3B, CXCR2 | S100A8, S100A9 |
| Mast cell | KIT, TPSAB1 | CPA3, HDC |

### Immune Checkpoint Reference

| Checkpoint | Gene | Ligand | Therapeutic Antibody |
|------------|------|--------|---------------------|
| PD-1/PD-L1 | PDCD1/CD274 | CD274, PDCD1LG2 | Pembrolizumab, Nivolumab, Atezolizumab |
| CTLA-4 | CTLA4 | CD80, CD86 | Ipilimumab |
| TIM-3 | HAVCR2 | LGALS9 | Sabatolimab |
| LAG-3 | LAG3 | HLA class II | Relatlimab |
| TIGIT | TIGIT | PVR, PVRL2 | Tiragolumab |
| VISTA | VSIR | PSGL1 | - |

### Workflow

1. Identify immune-related SVGs from marker reference
2. Classify immune cell types present per spatial domain
3. Check immune checkpoint expression
4. Assess immune infiltration patterns:
   - Hot (T cell infiltrated) vs Cold (immune desert) vs Excluded
5. Identify potential immunotherapy targets
6. Check for tertiary lymphoid structures (B cell + T cell clusters)

---

## Phase 8: Literature & Validation Context

**Objective**: Provide literature evidence for spatial findings and suggest validation experiments.

### Tools Used

**PubMed_search_articles** (literature search):
- **Input**: `query` (string), `max_results` (int)
- **Output**: List of `[{pmid, title, authors, journal, pub_date, doi}]`
- **Use**: Find published evidence for spatial patterns

**openalex_literature_search** (broader literature):
- **Input**: `query` (string), `per_page` (int)
- **Output**: List of works with titles, DOIs, abstracts
- **Use**: Complement PubMed with preprints and broader coverage

### Literature Search Strategy

1. **Tissue + spatial**: `"{tissue} spatial transcriptomics"` - e.g., "liver spatial transcriptomics"
2. **Disease + spatial**: `"{disease} spatial omics"` - e.g., "breast cancer spatial transcriptomics"
3. **Gene + tissue**: `"{top_gene} {tissue} expression"` for key SVGs
4. **Zonation** (if relevant): `"{tissue} zonation gene expression"`
5. **Technology**: `"{technology} {tissue}"` - e.g., "Visium breast cancer"

### Validation Recommendations Template

| Priority | Target | Method | Rationale | Feasibility |
|----------|--------|--------|-----------|-------------|
| **High** | Key SVG | smFISH / RNAscope | Validate spatial pattern at single-molecule level | Medium |
| **High** | Druggable target | IHC on serial sections | Confirm protein expression in spatial domain | High |
| **High** | Ligand-receptor pair | Proximity ligation assay (PLA) | Confirm physical interaction at tissue level | Medium |
| **Medium** | Domain markers | Multiplexed IF (CODEX/IBEX) | Validate multiple markers simultaneously | Low-Medium |
| **Medium** | Pathway | Spatial metabolomics (MALDI/DESI) | Confirm metabolic pathway activity | Low |
| **Low** | Novel interaction | Co-culture + conditioned media | Functional validation of predicted interaction | Medium |

### Workflow

1. Search PubMed for tissue + disease + spatial transcriptomics
2. Search for known spatial patterns in the tissue type
3. Cross-reference findings with published spatial atlas data
4. Generate validation recommendations based on:
   - Novelty of finding (novel patterns need more validation)
   - Clinical relevance (druggable targets prioritized)
   - Technical feasibility
5. Cite relevant methodology papers for each validation approach

---

## Tool Parameter Reference (CRITICAL)

### Verified Parameter Names

| Tool | Parameter | CORRECT | Common MISTAKE | Notes |
|------|-----------|---------|----------------|-------|
| `MyGene_query_genes` | query | `query` | `q` | Filter results by `symbol` field |
| `STRING_functional_enrichment` | identifiers | `protein_ids` (array) | `identifiers` | Also needs `species=9606` |
| `STRING_get_interaction_partners` | identifiers | `protein_ids` (array) | `identifiers` | `limit`, `confidence_score` optional |
| `ReactomeAnalysis_pathway_enrichment` | genes | `identifiers` (string) | Array | SPACE-SEPARATED string, NOT array |
| `HPA_get_subcellular_location` | gene | `gene_name` | `ensembl_id` | Uses gene symbol |
| `HPA_get_cancer_prognostics_by_gene` | gene | `ensembl_id` | `gene_name` | Uses Ensembl ID, NOT symbol |
| `HPA_get_rna_expression_by_source` | params | `gene_name`, `source_type`, `source_name` | - | ALL 3 required |
| `HPA_get_rna_expression_in_specific_tissues` | gene | `ensembl_id` | `gene_name` | Uses Ensembl ID |
| `OpenTargets_get_target_tractability_by_ensemblID` | target | `ensemblId` | `ensemblID` | camelCase |
| `OpenTargets_get_associated_drugs_by_target_ensemblID` | target | `ensemblId`, `size` | - | Both REQUIRED |
| `OpenTargets_get_associated_targets_by_disease_efoId` | disease | `efoId` | `diseaseId` | Returns {data: {disease: {associatedTargets}}} |
| `DGIdb_get_gene_druggability` | genes | `genes` (array) | `gene_name` | Array of strings |
| `DGIdb_get_drug_gene_interactions` | genes | `genes` (array) | `gene_name` | Array of strings |
| `clinical_trials_search` | action | `action='search_studies'` | Missing action | `action` is REQUIRED |
| `ensembl_lookup_gene` | species | `species='homo_sapiens'` | No species | REQUIRED parameter |
| GTEx tools | operation | `operation` (SOAP) | Missing | All GTEx tools need `operation` parameter |
| `HPA_get_comprehensive_gene_details_by_ensembl_id` | all params | ALL 5 required: `ensembl_id`, `include_isoforms`, `include_images`, `include_antibodies`, `include_expression` | Missing booleans | Set booleans to False except expression |
| GTEx tools | gencode | `gencode_id` (array) | `gene_id` | Requires versioned GENCODE ID |

### Response Format Reference

| Tool | Response Format | Key Fields |
|------|----------------|------------|
| `STRING_functional_enrichment` | `{status, data: [{category, term, description, p_value, fdr, inputGenes}]}` | Filter by FDR < 0.05 |
| `ReactomeAnalysis_pathway_enrichment` | `{data: {pathways: [{pathway_id, name, p_value, fdr, entities_found, entities_total}]}}` | Top 20 returned |
| `STRING_get_interaction_partners` | `{status, data: [{preferredName_A, preferredName_B, score}]}` | Score > 0.7 for high confidence |
| `MyGene_query_genes` | `{hits: [{_id, symbol, name, ensembl: {gene}, entrezgene}]}` | Filter by exact symbol match |
| `HPA_get_subcellular_location` | `{gene_name, main_locations: [], additional_locations: [], location_summary}` | Direct dict response |
| `OpenTargets_get_target_tractability_by_ensemblID` | `{data: {target: {id, tractability: [{label, modality, value}]}}}` | Check value=true |
| `DGIdb_get_gene_druggability` | `{data: {genes: {nodes: [{name, geneCategories: [{name}]}]}}}` | GraphQL response |
| `PubMed_search_articles` | Plain list of `[{pmid, title, authors, journal, pub_date}]` | No data wrapper |
| `clinical_trials_search` | `{total_count, studies: [{nctId, title, status, conditions}]}` | total_count can be None |

---

## Fallback Strategies

### Pathway Enrichment
- **Primary**: STRING_functional_enrichment (most comprehensive, one call)
- **Fallback**: ReactomeAnalysis_pathway_enrichment (Reactome-specific)
- **Default**: Individual gene GO annotations (GO_get_annotations_for_gene)

### Tissue Expression
- **Primary**: HPA_get_rna_expression_by_source
- **Fallback**: HPA_get_comprehensive_gene_details_by_ensembl_id
- **Default**: Note "tissue expression data unavailable"

### Disease Association
- **Primary**: OpenTargets_get_associated_targets_by_disease_efoId
- **Fallback**: OpenTargets_target_disease_evidence (per gene)
- **Default**: Skip disease section if no disease context

### Drug Information
- **Primary**: OpenTargets_get_associated_drugs_by_target_ensemblID
- **Fallback**: DGIdb_get_drug_gene_interactions
- **Default**: Note "no approved drugs identified"

### Literature
- **Primary**: PubMed_search_articles
- **Fallback**: openalex_literature_search
- **Default**: Note "no spatial-specific literature found"

---

## Common Use Cases

### Use Case 1: Cancer Spatial Heterogeneity

**Input**: Visium data from breast cancer with 5 spatial domains (tumor core, tumor margin, stroma, immune infiltrate, normal tissue) and 200 SVGs.

**Analysis focus**:
- Tumor-specific pathways (proliferation, DNA repair)
- Immune infiltration patterns (hot vs cold)
- Tumor-stroma interactions (CAF signaling)
- Druggable targets in tumor core
- Immune checkpoint expression patterns
- Prognostic genes per domain

### Use Case 2: Brain Tissue Zonation

**Input**: MERFISH data from hippocampus with cell-type specific genes and neuronal subtype markers.

**Analysis focus**:
- Neuronal subtype characterization
- Synaptic signaling pathways
- Neurotransmitter receptor distribution
- Known hippocampal zonation patterns (CA1, CA3, DG)
- Neurodegenerative disease gene overlap

### Use Case 3: Liver Metabolic Zonation

**Input**: Spatial transcriptomics of liver with periportal vs pericentral gene gradients.

**Analysis focus**:
- Metabolic enzyme distribution (CYP450, gluconeogenesis, lipogenesis)
- Wnt signaling gradient (known zonation regulator)
- Oxygen gradient-responsive genes
- Drug metabolism enzyme spatial patterns
- Liver disease gene overlap

### Use Case 4: Tumor-Immune Interface

**Input**: DBiTplus data from melanoma with spatial protein + RNA data showing tumor-immune boundary.

**Analysis focus**:
- Immune cell composition at boundary
- Checkpoint ligand-receptor pairs
- Immune exclusion mechanisms
- Immunotherapy target identification
- Multi-modal (RNA + protein) concordance

### Use Case 5: Developmental Spatial Patterns

**Input**: Spatial transcriptomics of embryonic tissue with developmental patterning genes.

**Analysis focus**:
- Morphogen gradients (Wnt, BMP, FGF, SHH)
- Transcription factor spatial patterns
- Cell fate determination genes
- Developmental signaling pathways
- Comparison to adult tissue patterns

### Use Case 6: Disease Progression Mapping

**Input**: Spatial data from neurodegenerative tissue showing disease gradient from affected to unaffected regions.

**Analysis focus**:
- Disease gene expression gradient
- Inflammatory response spatial pattern
- Neuronal loss markers
- Glial activation patterns
- Therapeutic window identification

---

## Limitations & Known Issues

### Database-Specific
- **Enrichment**: `enrichr_gene_enrichment_analysis` returns connectivity graph (107MB), NOT standard enrichment. Use `STRING_functional_enrichment` instead
- **GTEx**: SOAP-style tools requiring `operation` parameter; needs versioned GENCODE IDs (e.g., `ENSG00000141510.16`)
- **HPA**: Some tools use `gene_name`, others use `ensembl_id` - check parameter reference
- **OpenTargets**: Disease IDs use underscore format (`MONDO_0007254`), not colon
- **cBioPortal_get_cancer_studies**: BROKEN - has literal `{limit}` in URL causing 400 error

### Conceptual
- **No raw spatial data processing**: This skill analyzes gene LISTS, not raw spatial matrices (Seurat/Scanpy/squidpy handle raw data)
- **No spatial statistics**: Cannot perform Moran's I, spatial autocorrelation, or variogram analysis
- **No image analysis**: Cannot process H&E or fluorescence images
- **No deconvolution**: Cannot perform cell type deconvolution (use BayesSpace, cell2location, RCTD externally)
- **Ligand-receptor inference**: Based on gene co-expression + known pairs, not spatial proximity statistics (use CellChat, NicheNet, COMMOT externally)

### Technical
- **Large gene lists**: >200 genes may slow STRING queries; batch or sample
- **Response format variability**: Always check both dict and list response types
- **Rate limits**: STRING and OpenTargets may throttle frequent requests

---

## Summary

Spatial Multi-Omics Analysis skill provides:

1. Gene characterization (ID resolution, function, localization, tissue expression)
2. Pathway & functional enrichment (STRING, Reactome, GO, KEGG)
3. Spatial domain characterization (per-domain and cross-domain comparison)
4. Cell-cell interaction inference (PPI, ligand-receptor, signaling pathways)
5. Disease & therapeutic context (disease genes, druggable targets, clinical trials)
6. Multi-modal integration (RNA-protein concordance, metabolic pathways)
7. Immune microenvironment characterization (cell types, checkpoints, immunotherapy)
8. Literature context & validation recommendations

**Outputs**: Comprehensive markdown report with Spatial Omics Integration Score (0-100)
**Best for**: Biological interpretation of spatial omics experiments (post-processing after spatial data analysis tools)
**Uses**: 70+ ToolUniverse tools across 9 analysis phases
**Time**: ~10-20 minutes depending on gene list size and analysis scope