---
name: tooluniverse-rare-disease-diagnosis
description: Provide differential diagnosis for patients with suspected rare diseases based on phenotype and genetic data. Matches symptoms to HPO terms, identifies candidate diseases from Orphanet/OMIM, prioritizes genes for testing, interprets variants of uncertain significance. Use when clinician asks about rare disease diagnosis, unexplained phenotypes, or genetic testing interpretation.
---

# Rare Disease Diagnosis Advisor

Systematic diagnosis support for rare diseases using phenotype matching, gene panel prioritization, and variant interpretation across Orphanet, OMIM, HPO, ClinVar, and structure-based analysis.

**KEY PRINCIPLES**:
1. **Report-first approach** - Create report file FIRST, update progressively
2. **Phenotype-driven** - Convert symptoms to HPO terms before searching
3. **Multi-database triangulation** - Cross-reference Orphanet, OMIM, OpenTargets
4. **Evidence grading** - Grade diagnoses by supporting evidence strength
5. **Actionable output** - Prioritized differential diagnosis with next steps
6. **Genetic counseling aware** - Consider inheritance patterns and family history
7. **English-first queries** - Always use English terms in tool calls (phenotype descriptions, gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

---

## When to Use

Apply when user asks:
- "Patient has [symptoms], what rare disease could this be?"
- "Unexplained developmental delay with [features]"
- "WES found VUS in [gene], is this pathogenic?"
- "What genes should we test for [phenotype]?"
- "Differential diagnosis for [rare symptom combination]"

---

## Critical Workflow Requirements

### 1. Report-First Approach (MANDATORY)

1. **Create the report file FIRST**:
   - File name: `[PATIENT_ID]_rare_disease_report.md`
   - Initialize with all section headers
   - Add placeholder text: `[Researching...]`

2. **Progressively update** as you gather data

3. **Output separate data files**:
   - `[PATIENT_ID]_gene_panel.csv` - Prioritized genes for testing
   - `[PATIENT_ID]_variant_interpretation.csv` - If variants provided

### 2. Citation Requirements (MANDATORY)

Every finding MUST include source:

```markdown
### Candidate Disease: Marfan Syndrome
- **ORPHA**: ORPHA:558
- **OMIM**: 154700
- **Phenotype match**: 85% (17/20 HPO terms)
- **Inheritance**: AD
- **Gene**: FBN1

*Source: Orphanet via `Orphanet_558`, OMIM via `OMIM_get_entry`*
```

---

## Phase 0: Tool Verification

**CRITICAL**: Verify tool parameters before calling.

### Known Parameter Corrections

| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| `OpenTargets_get_associated_diseases_by_target_ensemblId` | `ensemblID` | `ensemblId` |
| `ClinVar_get_variant_by_id` | `variant_id` | `id` |
| `MyGene_query_genes` | `gene` | `q` |
| `gnomAD_get_variant_frequencies` | `variant` | `variant_id` |

---

## Workflow Overview

```
Phase 1: Phenotype Standardization
├── Convert symptoms to HPO terms
├── Identify core vs. variable features
└── Note age of onset, inheritance hints
    ↓
Phase 2: Disease Matching
├── Orphanet phenotype search
├── OMIM clinical synopsis match
├── OpenTargets disease associations
└── OUTPUT: Ranked differential diagnosis
    ↓
Phase 3: Gene Panel Identification
├── Extract genes from top diseases
├── Cross-reference expression (GTEx)
├── Prioritize by evidence strength
└── OUTPUT: Recommended gene panel
    ↓
Phase 3.5: Expression & Tissue Context (NEW)
├── CELLxGENE: Cell-type specific expression
├── ChIPAtlas: Regulatory context (TF binding)
├── Tissue-specific gene networks
└── OUTPUT: Expression validation
    ↓
Phase 3.6: Pathway Analysis (NEW)
├── KEGG: Metabolic/signaling pathways
├── Reactome: Biological processes
├── IntAct: Protein-protein interactions
└── OUTPUT: Biological context
    ↓
Phase 4: Variant Interpretation (if provided)
├── ClinVar pathogenicity lookup
├── gnomAD population frequency
├── Protein domain/function impact
├── ENCODE/ChIPAtlas: Regulatory variant impact
└── OUTPUT: Variant classification
    ↓
Phase 5: Structure Analysis (for VUS)
├── NvidiaNIM_alphafold2 → Predict structure
├── Map variant to structure
├── Assess functional domain impact
└── OUTPUT: Structural evidence
    ↓
Phase 6: Literature Evidence (NEW)
├── PubMed: Published studies
├── BioRxiv/MedRxiv: Preprints
├── OpenAlex: Citation analysis
└── OUTPUT: Literature support
    ↓
Phase 7: Report Synthesis
├── Prioritized differential diagnosis
├── Recommended genetic testing
├── Next steps for clinician
└── OUTPUT: Final report
```

---

## Phase 1: Phenotype Standardization

### 1.1 Convert Symptoms to HPO Terms

```python
def standardize_phenotype(tu, symptoms_list):
    """Convert clinical descriptions to HPO terms."""
    hpo_terms = []
    
    for symptom in symptoms_list:
        # Search HPO for matching terms
        results = tu.tools.HPO_search_terms(query=symptom)
        if results:
            hpo_terms.append({
                'original': symptom,
                'hpo_id': results[0]['id'],
                'hpo_name': results[0]['name'],
                'confidence': 'exact' if symptom.lower() in results[0]['name'].lower() else 'partial'
            })
    
    return hpo_terms
```

### 1.2 Phenotype Categories

| Category | Examples | Weight |
|----------|----------|--------|
| **Core features** | Always present in disease | High |
| **Variable features** | Present in >50% | Medium |
| **Occasional features** | Present in <50% | Low |
| **Age-specific** | Onset-dependent | Context |

### 1.3 Output for Report

```markdown
## 1. Phenotype Analysis

### 1.1 Standardized HPO Terms

| Clinical Feature | HPO Term | HPO ID | Category |
|------------------|----------|--------|----------|
| Tall stature | Tall stature | HP:0000098 | Core |
| Long fingers | Arachnodactyly | HP:0001166 | Core |
| Heart murmur | Cardiac murmur | HP:0030148 | Variable |
| Joint hypermobility | Joint hypermobility | HP:0001382 | Core |

**Total HPO Terms**: 8
**Onset**: Childhood
**Family History**: Father with similar features (AD suspected)

*Source: HPO via `HPO_search_terms`*
```

---

## Phase 2: Disease Matching

### 2.1 Orphanet Disease Search (NEW TOOLS)

```python
def match_diseases_orphanet(tu, symptom_keywords):
    """Find rare diseases matching symptoms using Orphanet."""
    candidate_diseases = []
    
    # Search Orphanet by disease keywords
    for keyword in symptom_keywords:
        results = tu.tools.Orphanet_search_diseases(
            operation="search_diseases",
            query=keyword
        )
        if results.get('status') == 'success':
            candidate_diseases.extend(results['data']['results'])
    
    # Get genes for each disease
    for disease in candidate_diseases:
        orpha_code = disease.get('ORPHAcode')
        genes = tu.tools.Orphanet_get_genes(
            operation="get_genes",
            orpha_code=orpha_code
        )
        disease['genes'] = genes.get('data', {}).get('genes', [])
    
    return deduplicate_and_rank(candidate_diseases)
```

### 2.2 OMIM Cross-Reference (NEW TOOLS)

```python
def cross_reference_omim(tu, orphanet_diseases, gene_symbols):
    """Get OMIM details for diseases and genes."""
    omim_data = {}
    
    # Search OMIM for each disease/gene
    for gene in gene_symbols:
        search_result = tu.tools.OMIM_search(
            operation="search",
            query=gene,
            limit=5
        )
        if search_result.get('status') == 'success':
            for entry in search_result['data'].get('entries', []):
                mim_number = entry.get('mimNumber')
                
                # Get detailed entry
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim_number)
                )
                
                # Get clinical synopsis (phenotype features)
                synopsis = tu.tools.OMIM_get_clinical_synopsis(
                    operation="get_clinical_synopsis",
                    mim_number=str(mim_number)
                )
                
                omim_data[gene] = {
                    'mim_number': mim_number,
                    'details': details.get('data', {}),
                    'clinical_synopsis': synopsis.get('data', {})
                }
    
    return omim_data
```

### 2.3 DisGeNET Gene-Disease Associations (NEW TOOLS)

```python
def get_gene_disease_associations(tu, gene_symbols):
    """Get gene-disease associations from DisGeNET."""
    associations = {}
    
    for gene in gene_symbols:
        # Get diseases associated with gene
        result = tu.tools.DisGeNET_search_gene(
            operation="search_gene",
            gene=gene,
            limit=20
        )
        
        if result.get('status') == 'success':
            associations[gene] = result['data'].get('associations', [])
    
    return associations

def get_disease_genes_disgenet(tu, disease_name):
    """Get all genes associated with a disease."""
    result = tu.tools.DisGeNET_search_disease(
        operation="search_disease",
        disease=disease_name,
        limit=30
    )
    return result.get('data', {}).get('associations', [])
```

### 2.4 Phenotype Overlap Scoring

| Match Level | Score | Criteria |
|-------------|-------|----------|
| **Excellent** | >80% | Most core + variable features match |
| **Good** | 60-80% | Core features match, some variable |
| **Possible** | 40-60% | Some overlap, needs consideration |
| **Unlikely** | <40% | Poor phenotype fit |

### 2.5 Output for Report

```markdown
## 2. Differential Diagnosis

### Top Candidate Diseases (Ranked by Phenotype Match)

| Rank | Disease | ORPHA | OMIM | Match | Inheritance | Key Gene(s) |
|------|---------|-------|------|-------|-------------|-------------|
| 1 | Marfan syndrome | 558 | 154700 | 85% | AD | FBN1 |
| 2 | Loeys-Dietz syndrome | 60030 | 609192 | 72% | AD | TGFBR1, TGFBR2 |
| 3 | Ehlers-Danlos, vascular | 286 | 130050 | 65% | AD | COL3A1 |
| 4 | Homocystinuria | 394 | 236200 | 58% | AR | CBS |

### DisGeNET Gene-Disease Evidence

| Gene | Associated Diseases | GDA Score | Evidence |
|------|---------------------|-----------|----------|
| FBN1 | Marfan syndrome, MASS phenotype | 0.95 | ★★★ Curated |
| TGFBR1 | Loeys-Dietz syndrome | 0.89 | ★★★ Curated |
| COL3A1 | vascular EDS | 0.91 | ★★★ Curated |

*Source: DisGeNET via `DisGeNET_search_gene`*

### Disease Details

#### 1. Marfan Syndrome (★★★)

**ORPHA**: 558 | **OMIM**: 154700 | **Prevalence**: 1-5/10,000

**Phenotype Match Analysis**:
| Patient Feature | Disease Feature | Match |
|-----------------|-----------------|-------|
| Tall stature | Present in 95% | ✓ |
| Arachnodactyly | Present in 90% | ✓ |
| Joint hypermobility | Present in 85% | ✓ |
| Cardiac murmur | Aortic root dilation (70%) | Partial |

**OMIM Clinical Synopsis** (via `OMIM_get_clinical_synopsis`):
- **Cardiovascular**: Aortic root dilation, mitral valve prolapse
- **Skeletal**: Scoliosis, pectus excavatum, tall stature
- **Ocular**: Ectopia lentis, myopia

**Diagnostic Criteria**: Ghent nosology (2010)
- Aortic root dilation/dissection + FBN1 mutation = Diagnosis
- Without genetic testing: systemic score ≥7 + ectopia lentis

**Inheritance**: Autosomal dominant (25% de novo)

*Source: Orphanet via `Orphanet_get_disease`, OMIM via `OMIM_get_entry`, DisGeNET*
```

---

## Phase 3: Gene Panel Identification

### 3.1 Extract Disease Genes

```python
def build_gene_panel(tu, candidate_diseases):
    """Build prioritized gene panel from candidate diseases."""
    genes = {}
    
    for disease in candidate_diseases:
        for gene in disease['genes']:
            if gene not in genes:
                genes[gene] = {
                    'symbol': gene,
                    'diseases': [],
                    'evidence_level': 'unknown'
                }
            genes[gene]['diseases'].append(disease['name'])
    
    return genes
```

### 3.1.1 ClinGen Gene-Disease Validity Check (NEW)

**Critical**: Always verify gene-disease validity through ClinGen before including in panel.

```python
def get_clingen_gene_evidence(tu, gene_symbol):
    """
    Get ClinGen gene-disease validity and dosage sensitivity.
    ESSENTIAL for rare disease gene panel prioritization.
    """
    
    # 1. Gene-disease validity classification
    validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol)
    
    validity_levels = []
    diseases_with_validity = []
    if validity.get('data'):
        for entry in validity.get('data', []):
            validity_levels.append(entry.get('Classification'))
            diseases_with_validity.append({
                'disease': entry.get('Disease Label'),
                'mondo_id': entry.get('Disease ID (MONDO)'),
                'classification': entry.get('Classification'),
                'inheritance': entry.get('Inheritance')
            })
    
    # 2. Dosage sensitivity (critical for CNV interpretation)
    dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol)
    
    hi_score = None
    ts_score = None
    if dosage.get('data'):
        for entry in dosage.get('data', []):
            hi_score = entry.get('Haploinsufficiency Score')
            ts_score = entry.get('Triplosensitivity Score')
            break
    
    # 3. Clinical actionability (return of findings context)
    actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol)
    is_actionable = (actionability.get('adult_count', 0) > 0 or 
                     actionability.get('pediatric_count', 0) > 0)
    
    # Determine best evidence level
    level_priority = ['Definitive', 'Strong', 'Moderate', 'Limited', 'Disputed', 'Refuted']
    best_level = 'Not curated'
    for level in level_priority:
        if level in validity_levels:
            best_level = level
            break
    
    return {
        'gene': gene_symbol,
        'evidence_level': best_level,
        'diseases_curated': diseases_with_validity,
        'haploinsufficiency_score': hi_score,
        'triplosensitivity_score': ts_score,
        'is_actionable': is_actionable,
        'include_in_panel': best_level in ['Definitive', 'Strong', 'Moderate']
    }

def prioritize_genes_with_clingen(tu, gene_list):
    """Prioritize genes using ClinGen evidence levels."""
    
    prioritized = []
    for gene in gene_list:
        evidence = get_clingen_gene_evidence(tu, gene)
        
        # Score based on ClinGen classification
        score = 0
        if evidence['evidence_level'] == 'Definitive':
            score = 5
        elif evidence['evidence_level'] == 'Strong':
            score = 4
        elif evidence['evidence_level'] == 'Moderate':
            score = 3
        elif evidence['evidence_level'] == 'Limited':
            score = 1
        # Disputed/Refuted get 0
        
        # Bonus for haploinsufficiency score 3
        if evidence['haploinsufficiency_score'] == '3':
            score += 1
        
        # Bonus for actionability
        if evidence['is_actionable']:
            score += 1
        
        prioritized.append({
            **evidence,
            'priority_score': score
        })
    
    # Sort by priority score
    return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)
```

**ClinGen Classification Impact on Panel**:
| Classification | Include in Panel? | Priority |
|----------------|-------------------|----------|
| **Definitive** | YES - mandatory | Highest |
| **Strong** | YES - highly recommended | High |
| **Moderate** | YES | Medium |
| **Limited** | Include but flag | Low |
| **Disputed** | Exclude or separate | Avoid |
| **Refuted** | EXCLUDE | Do not test |
| **Not curated** | Use other evidence | Variable |

### 3.2 Gene Prioritization Criteria

| Priority | Criteria | Points |
|----------|----------|--------|
| **Tier 1** | Gene causes #1 ranked disease | +5 |
| **Tier 2** | Gene causes multiple candidates | +3 |
| **Tier 3** | ClinGen "Definitive" evidence | +3 |
| **Tier 4** | Expressed in affected tissue | +2 |
| **Tier 5** | Constraint score pLI >0.9 | +1 |

### 3.3 Expression Validation

```python
def validate_expression(tu, gene_symbol, affected_tissue):
    """Check if gene is expressed in relevant tissue."""
    # Get Ensembl ID
    gene_info = tu.tools.MyGene_query_genes(q=gene_symbol, species="human")
    ensembl_id = gene_info.get('ensembl', {}).get('gene')
    
    # Check GTEx expression
    expression = tu.tools.GTEx_get_median_gene_expression(
        gencode_id=f"{ensembl_id}.latest"
    )
    
    return expression.get(affected_tissue, 0) > 1  # TPM > 1
```

### 3.4 Output for Report

```markdown
## 3. Recommended Gene Panel

### 3.1 Prioritized Genes for Testing

| Priority | Gene | Diseases | Evidence | Constraint (pLI) | Expression |
|----------|------|----------|----------|------------------|------------|
| ★★★ | FBN1 | Marfan syndrome | Definitive | 1.00 | Heart, aorta |
| ★★★ | TGFBR1 | Loeys-Dietz 1 | Definitive | 0.98 | Ubiquitous |
| ★★★ | TGFBR2 | Loeys-Dietz 2 | Definitive | 0.99 | Ubiquitous |
| ★★☆ | COL3A1 | EDS vascular | Definitive | 1.00 | Connective tissue |
| ★☆☆ | CBS | Homocystinuria | Definitive | 0.00 | Liver |

### 3.2 Panel Design Recommendation

**Minimum Panel** (high yield): FBN1, TGFBR1, TGFBR2, COL3A1
**Extended Panel** (+differential): Add CBS, SMAD3, ACTA2

**Testing Strategy**:
1. Start with FBN1 sequencing (highest pre-test probability)
2. If negative, proceed to full connective tissue panel
3. Consider WES if panel negative

*Source: ClinGen via gene-disease validity, GTEx expression*
```

---

## Phase 3.5: Expression & Tissue Context (ENHANCED)

### 3.5.1 Cell-Type Specific Expression (CELLxGENE)

```python
def get_cell_type_expression(tu, gene_symbol, affected_tissues):
    """Get single-cell expression to validate tissue relevance."""
    
    # Get expression across cell types
    expression = tu.tools.CELLxGENE_get_expression_data(
        gene=gene_symbol,
        tissue=affected_tissues[0] if affected_tissues else "all"
    )
    
    # Get cell type metadata
    cell_metadata = tu.tools.CELLxGENE_get_cell_metadata(
        gene=gene_symbol
    )
    
    # Identify high-expression cell types
    high_expression = [
        ct for ct in expression 
        if ct.get('mean_expression', 0) > 1.0  # TPM > 1
    ]
    
    return {
        'expression_data': expression,
        'high_expression_cells': high_expression,
        'total_cell_types': len(cell_metadata)
    }
```

**Why it matters**: Confirms candidate genes are expressed in disease-relevant tissues/cells.

### 3.5.2 Regulatory Context (ChIPAtlas)

```python
def get_regulatory_context(tu, gene_symbol):
    """Get transcription factor binding for candidate genes."""
    
    # Search for TF binding near gene
    tf_binding = tu.tools.ChIPAtlas_enrichment_analysis(
        gene=gene_symbol,
        cell_type="all"
    )
    
    # Get specific binding peaks
    peaks = tu.tools.ChIPAtlas_get_peak_data(
        gene=gene_symbol,
        experiment_type="TF"
    )
    
    return {
        'transcription_factors': tf_binding,
        'regulatory_peaks': peaks
    }
```

**Why it matters**: Identifies regulatory mechanisms that may be disrupted in disease.

### 3.5.3 Output for Report

```markdown
## 3.5 Expression & Regulatory Context

### Cell-Type Specific Expression (CELLxGENE)

| Gene | Top Expressing Cell Types | Expression Level | Tissue Relevance |
|------|---------------------------|------------------|------------------|
| FBN1 | Fibroblasts, Smooth muscle | High (TPM=45) | ✓ Connective tissue |
| TGFBR1 | Endothelial, Fibroblasts | Medium (TPM=12) | ✓ Vascular |
| COL3A1 | Fibroblasts, Myofibroblasts | Very High (TPM=120) | ✓ Connective tissue |

**Interpretation**: All top candidate genes show high expression in disease-relevant cell types (connective tissue, vascular cells), supporting their candidacy.

### Regulatory Context (ChIPAtlas)

| Gene | Key TF Regulators | Regulatory Significance |
|------|-------------------|------------------------|
| FBN1 | TGFβ pathway (SMAD2/3), AP-1 | TGFβ-responsive |
| TGFBR1 | STAT3, NF-κB | Inflammation-responsive |

*Source: CELLxGENE Census, ChIPAtlas*
```

---

## Phase 3.6: Pathway Analysis (NEW)

### 3.6.1 KEGG Pathway Context

```python
def get_pathway_context(tu, gene_symbols):
    """Get pathway context for candidate genes."""
    
    pathways = {}
    for gene in gene_symbols:
        # Search KEGG for gene
        kegg_genes = tu.tools.kegg_find_genes(query=f"hsa:{gene}")
        
        if kegg_genes:
            # Get pathway membership
            gene_info = tu.tools.kegg_get_gene_info(gene_id=kegg_genes[0]['id'])
            pathways[gene] = gene_info.get('pathways', [])
    
    return pathways
```

### 3.6.2 Protein-Protein Interactions (IntAct)

```python
def get_protein_interactions(tu, gene_symbol):
    """Get interaction partners for candidate genes."""
    
    # Search IntAct for interactions
    interactions = tu.tools.intact_search_interactions(
        query=gene_symbol,
        species="human"
    )
    
    # Get interaction network
    network = tu.tools.intact_get_interaction_network(
        gene=gene_symbol,
        depth=1  # Direct interactors only
    )
    
    return {
        'interactions': interactions,
        'network': network,
        'interactor_count': len(interactions)
    }
```

### 3.6.3 Output for Report

```markdown
## 3.6 Pathway & Network Context

### KEGG Pathways

| Gene | Key Pathways | Biological Process |
|------|--------------|-------------------|
| FBN1 | ECM-receptor interaction (hsa04512) | Extracellular matrix |
| TGFBR1/2 | TGF-beta signaling (hsa04350) | Cell signaling |
| COL3A1 | Focal adhesion (hsa04510) | Cell-matrix adhesion |

### Shared Pathway Analysis

**Convergent pathways** (≥2 candidate genes):
- TGF-beta signaling pathway: FBN1, TGFBR1, TGFBR2, SMAD3
- ECM organization: FBN1, COL3A1

**Interpretation**: Candidate genes converge on TGF-beta signaling and extracellular matrix pathways, consistent with connective tissue disorder etiology.

### Protein-Protein Interactions (IntAct)

| Gene | Direct Interactors | Notable Partners |
|------|-------------------|------------------|
| FBN1 | 42 | LTBP1, TGFB1, ADAMTS10 |
| TGFBR1 | 68 | TGFBR2, SMAD2, SMAD3 |

*Source: KEGG, IntAct, Reactome*
```

---

## Phase 4: Variant Interpretation (If Provided)

### 4.1 ClinVar Lookup

```python
def interpret_variant(tu, variant_hgvs):
    """Get ClinVar interpretation for variant."""
    result = tu.tools.ClinVar_search_variants(query=variant_hgvs)
    
    return {
        'clinvar_id': result.get('id'),
        'classification': result.get('clinical_significance'),
        'review_status': result.get('review_status'),
        'conditions': result.get('conditions'),
        'last_evaluated': result.get('last_evaluated')
    }
```

### 4.2 Population Frequency

```python
def check_population_frequency(tu, variant_id):
    """Get gnomAD allele frequency."""
    freq = tu.tools.gnomAD_get_variant_frequencies(variant_id=variant_id)
    
    # Interpret rarity
    if freq['allele_frequency'] < 0.00001:
        rarity = "Ultra-rare"
    elif freq['allele_frequency'] < 0.0001:
        rarity = "Rare"
    elif freq['allele_frequency'] < 0.01:
        rarity = "Low frequency"
    else:
        rarity = "Common (likely benign)"
    
    return freq, rarity
```

### 4.3 Computational Pathogenicity Prediction (ENHANCED)

Use state-of-the-art prediction tools for VUS interpretation:

```python
def comprehensive_vus_prediction(tu, variant_info):
    """
    Combine multiple prediction tools for VUS classification.
    Critical for rare disease variants not in ClinVar.
    """
    predictions = {}
    
    # 1. CADD - Deleteriousness (NEW API)
    cadd = tu.tools.CADD_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt'],
        version="GRCh38-v1.7"
    )
    if cadd.get('status') == 'success':
        predictions['cadd'] = {
            'score': cadd['data'].get('phred_score'),
            'interpretation': cadd['data'].get('interpretation'),
            'acmg': 'PP3' if cadd['data'].get('phred_score', 0) >= 20 else 'neutral'
        }
    
    # 2. AlphaMissense - DeepMind pathogenicity (NEW)
    if variant_info.get('uniprot_id') and variant_info.get('aa_change'):
        am = tu.tools.AlphaMissense_get_variant_score(
            uniprot_id=variant_info['uniprot_id'],
            variant=variant_info['aa_change']  # e.g., "E1541K"
        )
        if am.get('status') == 'success' and am.get('data'):
            classification = am['data'].get('classification')
            predictions['alphamissense'] = {
                'score': am['data'].get('pathogenicity_score'),
                'classification': classification,
                'acmg': 'PP3 (strong)' if classification == 'pathogenic' else (
                    'BP4 (strong)' if classification == 'benign' else 'neutral'
                )
            }
    
    # 3. EVE - Evolutionary prediction (NEW)
    eve = tu.tools.EVE_get_variant_score(
        chrom=variant_info['chrom'],
        pos=variant_info['pos'],
        ref=variant_info['ref'],
        alt=variant_info['alt']
    )
    if eve.get('status') == 'success':
        eve_scores = eve['data'].get('eve_scores', [])
        if eve_scores:
            predictions['eve'] = {
                'score': eve_scores[0].get('eve_score'),
                'classification': eve_scores[0].get('classification'),
                'acmg': 'PP3' if eve_scores[0].get('eve_score', 0) > 0.5 else 'BP4'
            }
    
    # 4. SpliceAI - Splice variant prediction (NEW)
    # Use for intronic, synonymous, or exonic variants near splice sites
    variant_str = f"chr{variant_info['chrom']}-{variant_info['pos']}-{variant_info['ref']}-{variant_info['alt']}"
    splice = tu.tools.SpliceAI_predict_splice(
        variant=variant_str,
        genome="38"
    )
    if splice.get('data'):
        max_score = splice['data'].get('max_delta_score', 0)
        interpretation = splice['data'].get('interpretation', '')
        
        if max_score >= 0.8:
            splice_acmg = 'PP3 (strong) - high splice impact'
        elif max_score >= 0.5:
            splice_acmg = 'PP3 (moderate) - splice impact'
        elif max_score >= 0.2:
            splice_acmg = 'PP3 (supporting) - possible splice effect'
        else:
            splice_acmg = 'BP7 (if synonymous) - no splice impact'
        
        predictions['spliceai'] = {
            'max_delta_score': max_score,
            'interpretation': interpretation,
            'scores': splice['data'].get('scores', []),
            'acmg': splice_acmg
        }
    
    # Consensus for PP3/BP4
    damaging = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg', ''))
    benign = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg', ''))
    
    return {
        'predictions': predictions,
        'consensus': {
            'damaging_count': damaging,
            'benign_count': benign,
            'pp3_applicable': damaging >= 2 and benign == 0,
            'bp4_applicable': benign >= 2 and damaging == 0
        }
    }
```

### 4.4 ACMG Classification Criteria

| Evidence Type | Criteria | Weight |
|---------------|----------|--------|
| **PVS1** | Null variant in gene where LOF is mechanism | Very Strong |
| **PS1** | Same amino acid change as established pathogenic | Strong |
| **PM2** | Absent from population databases | Moderate |
| **PP3** | Computational evidence supports deleterious (AlphaMissense, CADD, EVE, SpliceAI) | Supporting |
| **BA1** | Allele frequency >5% | Benign standalone |

**Enhanced PP3 Evidence** (NEW):
- **AlphaMissense pathogenic** (>0.564) = Strong PP3 support (~90% accuracy)
- **CADD ≥20** + **EVE >0.5** = Multiple concordant predictions
- Agreement from 2+ predictors strengthens PP3 evidence

### 4.5 Output for Report

```markdown
## 4. Variant Interpretation

### 4.1 Variant: FBN1 c.4621G>A (p.Glu1541Lys)

| Property | Value | Interpretation |
|----------|-------|----------------|
| Gene | FBN1 | Marfan syndrome gene |
| Consequence | Missense | Amino acid change |
| ClinVar | VUS | Uncertain significance |
| gnomAD AF | 0.000004 | Ultra-rare (PM2) |

### 4.2 Computational Predictions (NEW)

| Predictor | Score | Classification | ACMG Support |
|-----------|-------|----------------|--------------|
| **AlphaMissense** | 0.78 | Pathogenic | PP3 (strong) |
| **CADD PHRED** | 28.5 | Top 0.1% deleterious | PP3 |
| **EVE** | 0.72 | Likely pathogenic | PP3 |

**Consensus**: 3/3 predictors concordant damaging → **Strong PP3 support**

*Source: AlphaMissense, CADD API, EVE via Ensembl VEP*

### 4.3 ACMG Evidence Summary

| Criterion | Evidence | Strength |
|-----------|----------|----------|
| PM2 | Absent from gnomAD (AF < 0.00001) | Moderate |
| PP3 | AlphaMissense + CADD + EVE concordant | Supporting (strong) |
| PP4 | Phenotype highly specific for Marfan | Supporting |
| PS4 | Multiple affected family members | Strong |

**Preliminary Classification**: Likely Pathogenic (1 Strong + 1 Moderate + 2 Supporting)

*Source: ClinVar, gnomAD, AlphaMissense, CADD, EVE*
```

---

## Phase 5: Structure Analysis for VUS

### 5.1 When to Perform Structure Analysis

Perform when:
- Variant is VUS or conflicting interpretations
- Missense variant in critical domain
- Novel variant not in databases
- Additional evidence needed for classification

### 5.2 Structure Prediction (NVIDIA NIM)

```python
def analyze_variant_structure(tu, protein_sequence, variant_position):
    """Predict structure and analyze variant impact."""
    
    # Predict structure with AlphaFold2
    structure = tu.tools.NvidiaNIM_alphafold2(
        sequence=protein_sequence,
        algorithm="mmseqs2",
        relax_prediction=False
    )
    
    # Extract pLDDT at variant position
    variant_plddt = get_residue_plddt(structure, variant_position)
    
    # Check if in structured region
    confidence = "High" if variant_plddt > 70 else "Low"
    
    return {
        'structure': structure,
        'variant_plddt': variant_plddt,
        'confidence': confidence
    }
```

### 5.3 Domain Impact Assessment

```python
def assess_domain_impact(tu, uniprot_id, variant_position):
    """Check if variant affects functional domain."""
    
    # Get domain annotations
    domains = tu.tools.InterPro_get_protein_domains(accession=uniprot_id)
    
    for domain in domains:
        if domain['start'] <= variant_position <= domain['end']:
            return {
                'in_domain': True,
                'domain_name': domain['name'],
                'domain_function': domain['description']
            }
    
    return {'in_domain': False}
```

### 5.4 Output for Report

```markdown
## 5. Structural Analysis

### 5.1 Structure Prediction

**Method**: AlphaFold2 via NVIDIA NIM
**Protein**: Fibrillin-1 (FBN1)
**Sequence Length**: 2,871 amino acids

| Metric | Value | Interpretation |
|--------|-------|----------------|
| Mean pLDDT | 85.3 | High confidence overall |
| Variant position pLDDT | 92.1 | Very high confidence |
| Nearby domain | cbEGF-like domain 23 | Calcium-binding |

### 5.2 Variant Location Analysis

**Variant**: p.Glu1541Lys

| Feature | Finding | Impact |
|---------|---------|--------|
| Domain | cbEGF-like domain 23 | Critical for calcium binding |
| Conservation | 100% conserved across vertebrates | High constraint |
| Structural role | Calcium coordination residue | Likely destabilizing |
| Nearby pathogenic | p.Glu1540Lys (Pathogenic) | Adjacent residue |

### 5.3 Structural Interpretation

The variant p.Glu1541Lys:
1. **Located in cbEGF domain** - These domains are critical for fibrillin-1 function
2. **Glutamate → Lysine** - Charge reversal (negative to positive)
3. **Calcium binding** - Glutamate at this position coordinates Ca2+
4. **Adjacent pathogenic variant** - p.Glu1540Lys is classified Pathogenic

**Structural Evidence**: Strong support for pathogenicity (PM1 - critical domain)

*Source: NVIDIA NIM via `NvidiaNIM_alphafold2`, InterPro*
```

---

## Phase 6: Literature Evidence (NEW)

### 6.1 Published Literature (PubMed)

```python
def search_disease_literature(tu, disease_name, genes):
    """Search for relevant published literature."""
    
    # Disease-specific search
    disease_papers = tu.tools.PubMed_search_articles(
        query=f'"{disease_name}" AND (genetics OR mutation OR variant)',
        limit=20
    )
    
    # Gene-specific searches
    gene_papers = []
    for gene in genes[:5]:  # Top 5 genes
        papers = tu.tools.PubMed_search_articles(
            query=f'"{gene}" AND rare disease AND pathogenic',
            limit=10
        )
        gene_papers.extend(papers)
    
    return {
        'disease_literature': disease_papers,
        'gene_literature': gene_papers
    }
```

### 6.2 Preprint Literature (BioRxiv/MedRxiv)

```python
def search_preprints(tu, disease_name, genes):
    """Search preprints for cutting-edge findings."""
    
    # BioRxiv search
    biorxiv = tu.tools.BioRxiv_search_preprints(
        query=f"{disease_name} genetics",
        limit=10
    )
    
    # ArXiv for computational methods
    arxiv = tu.tools.ArXiv_search_papers(
        query=f"rare disease diagnosis {' OR '.join(genes[:3])}",
        category="q-bio",
        limit=5
    )
    
    return {
        'biorxiv': biorxiv,
        'arxiv': arxiv
    }
```

### 6.3 Citation Analysis (OpenAlex)

```python
def analyze_citations(tu, key_papers):
    """Analyze citation network for key papers."""
    
    citation_analysis = []
    for paper in key_papers[:5]:
        # Get citation data
        work = tu.tools.openalex_search_works(
            query=paper['title'],
            limit=1
        )
        if work:
            citation_analysis.append({
                'title': paper['title'],
                'citations': work[0].get('cited_by_count', 0),
                'year': work[0].get('publication_year')
            })
    
    return citation_analysis
```

### 6.4 Output for Report

```markdown
## 6. Literature Evidence

### 6.1 Key Published Studies

| PMID | Title | Year | Citations | Relevance |
|------|-------|------|-----------|-----------|
| 32123456 | FBN1 variants in Marfan syndrome... | 2023 | 45 | Direct |
| 31987654 | TGF-beta signaling in connective... | 2022 | 89 | Pathway |
| 30876543 | Novel diagnostic criteria for... | 2021 | 156 | Diagnostic |

### 6.2 Recent Preprints (Not Yet Peer-Reviewed)

| Source | Title | Posted | Relevance |
|--------|-------|--------|-----------|
| BioRxiv | Novel FBN1 splice variant causes... | 2024-01 | Case report |
| MedRxiv | Machine learning for Marfan... | 2024-02 | Diagnostic |

**⚠️ Note**: Preprints have not undergone peer review. Use with caution.

### 6.3 Evidence Summary

| Evidence Type | Count | Strength |
|---------------|-------|----------|
| Case reports | 12 | Supporting |
| Functional studies | 5 | Strong |
| Clinical trials | 2 | Strong |
| Reviews | 8 | Context |

*Source: PubMed, BioRxiv, OpenAlex*
```

---

## Report Template

**File**: `[PATIENT_ID]_rare_disease_report.md`

```markdown
# Rare Disease Diagnostic Report

**Patient ID**: [ID] | **Date**: [Date] | **Status**: In Progress

---

## Executive Summary
[Researching...]

---

## 1. Phenotype Analysis
### 1.1 Standardized HPO Terms
[Researching...]
### 1.2 Key Clinical Features
[Researching...]

---

## 2. Differential Diagnosis
### 2.1 Ranked Candidate Diseases
[Researching...]
### 2.2 Disease Details
[Researching...]

---

## 3. Recommended Gene Panel
### 3.1 Prioritized Genes
[Researching...]
### 3.2 Testing Strategy
[Researching...]

---

## 4. Variant Interpretation (if applicable)
### 4.1 Variant Details
[Researching...]
### 4.2 ACMG Classification
[Researching...]

---

## 5. Structural Analysis (if applicable)
### 5.1 Structure Prediction
[Researching...]
### 5.2 Variant Impact
[Researching...]

---

## 6. Clinical Recommendations
### 6.1 Diagnostic Next Steps
[Researching...]
### 6.2 Specialist Referrals
[Researching...]
### 6.3 Family Screening
[Researching...]

---

## 7. Data Gaps & Limitations
[Researching...]

---

## 8. Data Sources
[Will be populated as research progresses...]
```

---

## Evidence Grading

| Tier | Symbol | Criteria | Example |
|------|--------|----------|---------|
| **T1** | ★★★ | Phenotype match >80% + gene match | Marfan with FBN1 mutation |
| **T2** | ★★☆ | Phenotype match 60-80% OR likely pathogenic variant | Good phenotype fit |
| **T3** | ★☆☆ | Phenotype match 40-60% OR VUS in candidate gene | Possible diagnosis |
| **T4** | ☆☆☆ | Phenotype <40% OR uncertain gene | Low probability |

---

## Completeness Checklist

### Phase 1: Phenotype
- [ ] All symptoms converted to HPO terms
- [ ] Core vs. variable features distinguished
- [ ] Age of onset documented
- [ ] Family history noted

### Phase 2: Disease Matching
- [ ] ≥5 candidate diseases identified (or all matching)
- [ ] Phenotype overlap % calculated
- [ ] Inheritance patterns noted
- [ ] ORPHA and OMIM IDs provided

### Phase 3: Gene Panel
- [ ] ≥5 genes prioritized (or all from top diseases)
- [ ] Evidence level for each gene (ClinGen)
- [ ] Expression validation performed
- [ ] Testing strategy recommended

### Phase 4: Variant Interpretation (if applicable)
- [ ] ClinVar classification retrieved
- [ ] gnomAD frequency checked
- [ ] ACMG criteria applied
- [ ] Classification justified

### Phase 5: Structure Analysis (if applicable)
- [ ] Structure predicted (if VUS)
- [ ] pLDDT confidence reported
- [ ] Domain impact assessed
- [ ] Structural evidence summarized

### Phase 6: Recommendations
- [ ] ≥3 next steps listed
- [ ] Specialist referrals suggested
- [ ] Family screening addressed

---

## Fallback Chains

| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| `Orphanet_search_by_hpo` | `OMIM_search` | PubMed phenotype search |
| `ClinVar_get_variant` | `gnomAD_get_variant` | VEP annotation |
| `NvidiaNIM_alphafold2` | `alphafold_get_prediction` | UniProt features |
| `GTEx_expression` | `HPA_expression` | Tissue-specific literature |
| `gnomAD_get_variant` | `ExAC_frequencies` | 1000 Genomes |

---

## Tool Reference

See [TOOLS_REFERENCE.md](TOOLS_REFERENCE.md) for complete tool documentation.