--- name: tooluniverse-variant-interpretation description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability. --- --- name: tooluniverse-variant-interpretation description: Systematic clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Aggregates evidence from ClinVar, gnomAD, CIViC, UniProt, and PDB across ACMG criteria. Produces pathogenicity scores (0-100), clinical recommendations, and treatment implications. Use when interpreting genetic variants, classifying variants of uncertain significance (VUS), performing ACMG variant classification, or translating variant calls to clinical actionability. --- # Clinical Variant Interpreter Systematic variant interpretation skill using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis. --- ## Problem This Skill Solves Clinical labs and researchers face critical challenges in variant interpretation: 1. **Variant classification uncertainty** - VUS (Variants of Uncertain Significance) comprise 40-60% of clinical variants 2. **Evidence aggregation burden** - Must integrate data from 10+ databases per variant 3. **Structural context missing** - Traditional annotation ignores 3D protein impact 4. **Clinical actionability unclear** - How does classification translate to patient care? **This skill provides**: A systematic workflow that combines population databases, functional predictions, structural analysis (via AlphaFold2), and literature evidence into ACMG-compliant interpretations with clear clinical recommendations. --- ## Key Principles 1. **ACMG-Guided Classification** - Follow ACMG/AMP 2015 guidelines with explicit evidence codes 2. **Structural Evidence Integration** - Use AlphaFold2 for novel structural impact analysis 3. **Population Context** - gnomAD frequencies with ancestry-specific data 4. **Gene-Disease Validity** - ClinGen curation status for clinical relevance 5. **Actionable Output** - Clear recommendations, not just classifications 6. **English-first queries** - Always use English terms in tool calls (gene names, variant descriptions, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language --- ## Triggers Use this skill when users: - Ask about variant interpretation or classification - Have VCF data needing clinical annotation - Ask "what does this variant mean clinically?" - Need ACMG classification for variants - Want structural impact analysis for missense variants - Ask about pathogenicity of specific variants --- ## Workflow Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ VARIANT INTERPRETATION │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Phase 1: VARIANT IDENTITY │ │ ├── Normalize variant notation (HGVS) │ │ ├── Map to gene, transcript, protein │ │ └── Get consequence type (missense, nonsense, etc.) │ │ │ │ Phase 2: CLINICAL DATABASES │ │ ├── ClinVar: Existing classifications │ │ ├── gnomAD: Population frequencies (all + ancestry) │ │ ├── OMIM: Gene-disease associations │ │ ├── ClinGen: Gene validity + dosage sensitivity (ENHANCED) │ │ │ └─ ClinGen_search_gene_validity, ClinGen_search_dosage │ │ └── SpliceAI: Splice variant prediction (NEW) │ │ │ │ Phase 2.5: REGULATORY CONTEXT (NEW - for non-coding variants) │ │ ├── ChIPAtlas: TF binding at position │ │ ├── ENCODE: Regulatory elements (enhancers, promoters) │ │ ├── Conservation in regulatory regions │ │ └── Functional annotation of regulatory impact │ │ │ │ Phase 3: COMPUTATIONAL PREDICTIONS │ │ ├── SIFT/PolyPhen: Damaging predictions │ │ ├── CADD: Deleteriousness score │ │ ├── SpliceAI: Splice impact (if applicable) │ │ └── Conservation: Cross-species alignment │ │ │ │ Phase 4: STRUCTURAL ANALYSIS (for VUS/novel missense) │ │ ├── Get protein structure (PDB or AlphaFold2) │ │ ├── Map variant to structure │ │ ├── Assess domain/functional site impact │ │ └── Predict structural destabilization │ │ │ │ Phase 4.5: EXPRESSION CONTEXT (NEW) │ │ ├── CELLxGENE: Cell-type specific expression │ │ ├── Tissue relevance to phenotype │ │ └── Expression validation │ │ │ │ Phase 5: LITERATURE EVIDENCE │ │ ├── PubMed: Functional studies │ │ ├── BioRxiv/MedRxiv: Recent preprints (NEW) │ │ ├── Case reports: Phenotype correlations │ │ └── Segregation data (if in literature) │ │ │ │ Phase 6: ACMG CLASSIFICATION │ │ ├── Apply evidence codes (PVS1, PM2, PP3, etc.) │ │ ├── Calculate classification │ │ ├── Identify limiting factors │ │ └── Generate clinical recommendations │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Phase Details ### Phase 1: Variant Identity & Normalization **Goal**: Standardize variant notation and determine molecular consequence **Tools**: | Tool | Purpose | |------|---------| | `myvariant_query` | Get variant annotations from MyVariant.info | | `Ensembl_get_variant_info` | Variant effect predictor data | | `NCBI_gene_search` | Gene information | **Key Information to Capture**: - HGVS notation (c. and p.) - Gene symbol and Ensembl ID - Transcript (canonical/MANE Select) - Consequence type - Amino acid change (for missense) - Exon/intron location ### Phase 2: Clinical Database Queries **Goal**: Aggregate existing clinical knowledge **Tools**: | Tool | Purpose | Key Data | |------|---------|----------| | `clinvar_search` | Existing classifications | Classification, review status, submissions | | `gnomad_search` | Population frequency | AF, ancestry-specific AFs, homozygotes | | `OMIM_search`, `OMIM_get_entry` | Gene-disease | Inheritance, phenotypes | | `ClinGen_gene_validity` | Curation status | Gene-disease validity level | | `COSMIC_search_mutations` | **Somatic mutations (NEW)** | Cancer frequency, histology | | `DisGeNET_search_gene` | **Gene-disease associations (NEW)** | Evidence scores, sources | ### 2.1 COSMIC for Somatic Context (NEW) For cancer variants, check COSMIC for somatic mutation frequency: ```python def get_somatic_context(tu, gene_symbol, variant_aa): """Get somatic mutation context from COSMIC.""" # Search for specific mutation cosmic = tu.tools.COSMIC_search_mutations( operation="search", terms=f"{gene_symbol} {variant_aa}", max_results=20, genome_build=38 ) # Get all gene mutations for context gene_mutations = tu.tools.COSMIC_get_mutations_by_gene( operation="get_by_gene", gene=gene_symbol, max_results=100 ) # Determine if it's a hotspot mutation_counts = Counter(m['MutationAA'] for m in gene_mutations.get('results', [])) is_hotspot = variant_aa in [m[0] for m in mutation_counts.most_common(10)] return { 'cosmic_hits': cosmic.get('results', []), 'is_somatic_hotspot': is_hotspot, 'cancer_types': [m['PrimarySite'] for m in cosmic.get('results', [])], 'total_cosmic_count': cosmic.get('total_count', 0) } ``` ### 2.2 OMIM Gene-Disease Context (NEW) ```python def get_omim_context(tu, gene_symbol): """Get OMIM gene-disease associations.""" # Search OMIM for gene search = tu.tools.OMIM_search( operation="search", query=gene_symbol, limit=5 ) omim_data = [] for entry in search.get('data', {}).get('entries', []): mim = entry.get('mimNumber') # Get detailed entry details = tu.tools.OMIM_get_entry( operation="get_entry", mim_number=str(mim) ) # Get clinical synopsis synopsis = tu.tools.OMIM_get_clinical_synopsis( operation="get_clinical_synopsis", mim_number=str(mim) ) omim_data.append({ 'mim_number': mim, 'title': details.get('data', {}).get('titles', {}), 'inheritance': synopsis.get('data', {}).get('inheritance'), 'clinical_features': synopsis.get('data', {}) }) return omim_data ``` ### 2.3 DisGeNET Gene-Disease Evidence (NEW) ```python def get_disgenet_context(tu, gene_symbol, variant_rsid=None): """Get gene-disease associations from DisGeNET.""" # Gene-disease associations gda = tu.tools.DisGeNET_search_gene( operation="search_gene", gene=gene_symbol, limit=20 ) # Variant-disease associations (if rsID available) vda = None if variant_rsid: vda = tu.tools.DisGeNET_get_vda( operation="get_vda", variant=variant_rsid, limit=20 ) return { 'gene_associations': gda.get('data', {}).get('associations', []), 'variant_associations': vda.get('data', {}).get('associations', []) if vda else [] } ``` ### 2.4 ClinGen Gene Validity & Dosage Sensitivity (NEW) ClinGen provides authoritative curation of gene-disease relationships: ```python def get_clingen_evidence(tu, gene_symbol): """ Get ClinGen gene validity and dosage sensitivity data. CRITICAL for ACMG classification - establishes gene-disease validity. """ # 1. Gene-disease validity (Definitive/Strong/Moderate/Limited) validity = tu.tools.ClinGen_search_gene_validity(gene=gene_symbol) validity_data = [] if validity.get('data'): for entry in validity.get('data', []): validity_data.append({ 'disease': entry.get('Disease Label'), 'classification': entry.get('Classification'), # Definitive, Strong, etc. 'inheritance': entry.get('Inheritance'), 'mondo_id': entry.get('Disease ID (MONDO)') }) # 2. Dosage sensitivity (haploinsufficiency, triplosensitivity) dosage = tu.tools.ClinGen_search_dosage_sensitivity(gene=gene_symbol) dosage_data = {} if dosage.get('data'): for entry in dosage.get('data', []): dosage_data = { 'haploinsufficiency_score': entry.get('Haploinsufficiency Score'), 'triplosensitivity_score': entry.get('Triplosensitivity Score'), 'disease': entry.get('Disease') } break # Usually one entry per gene # 3. Clinical actionability (for incidental findings context) actionability = tu.tools.ClinGen_search_actionability(gene=gene_symbol) return { 'gene_validity': validity_data, 'dosage_sensitivity': dosage_data, 'actionability': actionability.get('data', {}), 'has_definitive_validity': any(v['classification'] == 'Definitive' for v in validity_data), 'is_haploinsufficient': dosage_data.get('haploinsufficiency_score') == '3' } ``` **ClinGen Validity Levels** (for ACMG PM1/PP4): | Classification | Meaning | ACMG Impact | |----------------|---------|-------------| | **Definitive** | Multiple concordant studies | Strong gene-disease support | | **Strong** | Extensive evidence | Moderate-strong support | | **Moderate** | Some evidence | Moderate support | | **Limited** | Minimal evidence | Weak support, use caution | | **Disputed** | Conflicting evidence | Do not use for classification | | **Refuted** | Evidence against | Gene NOT associated | **Dosage Sensitivity Scores** (for CNV interpretation): | Score | Meaning | Interpretation | |-------|---------|----------------| | **3** | Sufficient evidence | Haploinsufficiency/triplosensitivity established | | **2** | Emerging evidence | Some support, not definitive | | **1** | Little evidence | Minimal support | | **0** | No evidence | Unknown | ### 2.5 SpliceAI Splice Variant Prediction (NEW) ~15% of pathogenic variants affect splicing. SpliceAI is the gold standard for splice prediction: ```python def get_spliceai_prediction(tu, chrom, pos, ref, alt, genome="38"): """ Get SpliceAI splice effect predictions. Delta scores: - DS_AG: Acceptor gain - DS_AL: Acceptor loss - DS_DG: Donor gain - DS_DL: Donor loss Thresholds: - ≥0.8: High pathogenicity (strong PP3) - 0.5-0.8: Moderate (supporting PP3) - 0.2-0.5: Low (weak evidence) - <0.2: Likely benign """ # Format variant for SpliceAI variant = f"chr{chrom}-{pos}-{ref}-{alt}" # Get full splice predictions result = tu.tools.SpliceAI_predict_splice( variant=variant, genome=genome ) if result.get('data'): max_score = result['data'].get('max_delta_score', 0) interpretation = result['data'].get('interpretation', '') # Determine ACMG support if max_score >= 0.8: acmg = 'PP3 (strong) - high splice impact' elif max_score >= 0.5: acmg = 'PP3 (supporting) - moderate splice impact' elif max_score >= 0.2: acmg = 'PP3 (weak) - possible splice impact' else: acmg = 'BP7 (if synonymous) - splice benign' return { 'max_delta_score': max_score, 'interpretation': interpretation, 'acmg_support': acmg, 'scores': result['data'].get('scores', []) } return None def quick_splice_check(tu, variant, genome="38"): """Quick triage using max delta score only.""" result = tu.tools.SpliceAI_get_max_delta( variant=variant, genome=genome ) return result.get('data', {}) ``` **When to Use SpliceAI**: - **Intronic variants** near splice sites (±50bp) - **Synonymous variants** (may still affect splicing) - **Exonic variants** near splice junctions - **Variants creating cryptic splice sites** **Report Section for Splice Variants**: ```markdown ### Splice Impact Analysis (SpliceAI) | Score Type | Value | Position | Interpretation | |------------|-------|----------|----------------| | DS_AG | 0.02 | +15 | Acceptor gain unlikely | | DS_AL | 0.85 | -2 | **High acceptor loss** | | DS_DG | 0.01 | +8 | Donor gain unlikely | | DS_DL | 0.03 | +1 | Donor loss unlikely | **Max Delta Score**: 0.85 (DS_AL) **Interpretation**: High impact - likely disrupts acceptor site **ACMG Support**: PP3 (strong) for splice-altering effect *Source: SpliceAI via `SpliceAI_predict_splice`* ``` **ClinVar Classification Map**: | ClinVar | Interpretation | |---------|----------------| | Pathogenic | Disease-causing | | Likely pathogenic | 90%+ confidence pathogenic | | VUS | Uncertain significance | | Likely benign | 90%+ confidence benign | | Benign | Not disease-causing | | Conflicting | Multiple interpretations | **gnomAD Thresholds (for rare disease)**: | Frequency | ACMG Code | Interpretation | |-----------|-----------|----------------| | Absent | PM2_Supporting | Absent from controls | | <0.00001 | PM2_Supporting | Extremely rare | | <0.0001 | - | Rare (use with caution) | | >0.01 | BS1/BA1 | Too common for rare disease | **COSMIC Somatic Evidence (NEW)**: | COSMIC Finding | Interpretation | ACMG Support | |----------------|----------------|--------------| | Recurrent hotspot (>100 samples) | Known oncogenic driver | PS3 (functional) | | Moderate frequency (10-100) | Likely oncogenic | PM1 (hotspot) | | Rare somatic (<10) | Unknown significance | No support | **DisGeNET Score Interpretation (NEW)**: | GDA Score | Evidence Level | ACMG Support | |-----------|----------------|--------------| | >0.7 | Strong | PP4 (phenotype) | | 0.4-0.7 | Moderate | Supporting | | <0.4 | Weak | Insufficient | ### Phase 2.5: Regulatory Context (NEW - for Non-Coding Variants) **Goal**: Assess regulatory impact for non-coding, intronic, and promoter variants **When to Apply**: - Intronic variants (not splice site) - Promoter variants - 5'UTR / 3'UTR variants - Intergenic variants near disease genes **Tools**: | Tool | Purpose | Key Data | |------|---------|----------| | `ChIPAtlas_enrichment_analysis` | TF binding at position | Bound TFs, cell types | | `ChIPAtlas_get_peak_data` | ChIP-seq peaks | Peak coordinates, scores | | `ENCODE_search_experiments` | Regulatory elements | Enhancers, promoters, DHS | | `ENCODE_get_experiment` | Experiment details | Assay type, targets | **Regulatory Impact Assessment**: ```python def assess_regulatory_impact(tu, variant_position, gene_symbol): """Assess regulatory impact of non-coding variant.""" # Check TF binding at position tf_binding = tu.tools.ChIPAtlas_enrichment_analysis( gene=gene_symbol, cell_type="all" ) # Get ChIP-seq peaks overlapping variant peaks = tu.tools.ChIPAtlas_get_peak_data( gene=gene_symbol, experiment_type="TF" ) # Search ENCODE for regulatory annotations encode_data = tu.tools.ENCODE_search_experiments( assay_title="ATAC-seq", biosample="all" ) # Assess if variant disrupts TF binding binding_disrupted = check_motif_disruption(variant_position, peaks) return { 'tf_binding': tf_binding, 'regulatory_peaks': peaks, 'encode_annotations': encode_data, 'likely_regulatory': binding_disrupted } ``` **Regulatory Impact Categories**: | Category | Criteria | ACMG Support | |----------|----------|--------------| | **High impact** | Disrupts known TF binding motif | PP3 (supporting) | | **Moderate impact** | In active regulatory region | Consider context | | **Low impact** | No regulatory annotation | No support | **Output for Report**: ```markdown ### 2.5 Regulatory Context (for Non-Coding Variants) | Feature | Finding | Significance | |---------|---------|--------------| | Variant location | Intron 5, 120bp from exon 6 | Not canonical splice | | TF binding site | CTCF binding peak (ChIPAtlas) | May affect insulation | | ENCODE annotation | Active enhancer (H3K27ac) | Regulatory function | | Conservation | PhyloP = 2.8 | Moderate conservation | **Regulatory Interpretation**: Variant overlaps CTCF binding site in active enhancer region. Potential impact on gene regulation. *Source: ChIPAtlas, ENCODE* ``` ### Phase 3: Computational Predictions (ENHANCED) **Goal**: Assess in silico pathogenicity predictions using state-of-the-art models **Tools**: | Tool | Purpose | Score Range | |------|---------|-------------| | `CADD_get_variant_score` | **Deleteriousness score (NEW API)** | PHRED 0-99 | | `AlphaMissense_get_variant_score` | **DeepMind pathogenicity (NEW)** | 0-1 | | `EVE_get_variant_score` | **Evolutionary pathogenicity (NEW)** | 0-1 | | `myvariant_query` | Aggregated predictions | SIFT, PolyPhen | | `Ensembl_get_variant_info` | VEP predictions | SIFT, PolyPhen | ### 3.1 CADD Deleteriousness Scoring (NEW) ```python def get_cadd_score(tu, chrom, pos, ref, alt): """Get CADD deleteriousness score for a variant.""" result = tu.tools.CADD_get_variant_score( chrom=str(chrom), pos=pos, ref=ref, alt=alt, version="GRCh38-v1.7" ) if result.get('status') == 'success': phred = result['data'].get('phred_score') return { 'score': phred, 'interpretation': result['data'].get('interpretation'), 'acmg_support': 'PP3' if phred >= 20 else ('BP4' if phred < 15 else 'neutral') } return None ``` ### 3.2 AlphaMissense Pathogenicity (NEW) DeepMind's AlphaMissense provides state-of-the-art missense pathogenicity prediction: ```python def get_alphamissense_score(tu, uniprot_id, variant): """ Get AlphaMissense pathogenicity score. variant format: 'R123H' or 'p.R123H' Thresholds: - Pathogenic: score > 0.564 - Ambiguous: 0.34-0.564 - Benign: score < 0.34 """ result = tu.tools.AlphaMissense_get_variant_score( uniprot_id=uniprot_id, variant=variant ) if result.get('status') == 'success' and result.get('data'): score = result['data'].get('pathogenicity_score') classification = result['data'].get('classification') # Map to ACMG if classification == 'pathogenic': acmg = 'PP3 (strong)' # AlphaMissense has high accuracy elif classification == 'benign': acmg = 'BP4 (strong)' else: acmg = 'neutral' return { 'score': score, 'classification': classification, 'acmg_support': acmg } return None ``` ### 3.3 EVE Evolutionary Prediction (NEW) EVE uses unsupervised learning on evolutionary data: ```python def get_eve_score(tu, chrom, pos, ref, alt): """ Get EVE evolutionary pathogenicity score. Threshold: >0.5 indicates likely pathogenic """ result = tu.tools.EVE_get_variant_score( chrom=str(chrom), pos=pos, ref=ref, alt=alt ) if result.get('status') == 'success': eve_scores = result['data'].get('eve_scores', []) if eve_scores: best_score = eve_scores[0] return { 'score': best_score.get('eve_score'), 'classification': best_score.get('classification'), 'gene': best_score.get('gene_symbol'), 'acmg_support': 'PP3' if best_score.get('eve_score', 0) > 0.5 else 'BP4' } return None ``` ### 3.4 Integrated Prediction Strategy **For VUS (Variants of Uncertain Significance)**, combine multiple predictors: ```python def comprehensive_pathogenicity_assessment(tu, variant_info): """ Combine all prediction tools for robust classification. """ chrom = variant_info['chrom'] pos = variant_info['pos'] ref = variant_info['ref'] alt = variant_info['alt'] uniprot_id = variant_info.get('uniprot_id') aa_change = variant_info.get('aa_change') # e.g., 'R123H' predictions = {} # 1. CADD (works for all variant types) cadd = get_cadd_score(tu, chrom, pos, ref, alt) if cadd: predictions['cadd'] = cadd # 2. AlphaMissense (missense only, requires UniProt ID) if uniprot_id and aa_change: am = get_alphamissense_score(tu, uniprot_id, aa_change) if am: predictions['alphamissense'] = am # 3. EVE (missense only) eve = get_eve_score(tu, chrom, pos, ref, alt) if eve: predictions['eve'] = eve # Consensus assessment damaging_count = sum(1 for p in predictions.values() if 'PP3' in p.get('acmg_support', '')) benign_count = sum(1 for p in predictions.values() if 'BP4' in p.get('acmg_support', '')) if damaging_count >= 2 and benign_count == 0: consensus = 'likely_damaging' acmg = 'PP3 (multiple predictors concordant)' elif benign_count >= 2 and damaging_count == 0: consensus = 'likely_benign' acmg = 'BP4 (multiple predictors concordant)' else: consensus = 'uncertain' acmg = 'neutral (discordant predictions)' return { 'predictions': predictions, 'consensus': consensus, 'acmg_recommendation': acmg } ``` **Prediction Interpretation** (Updated): | Predictor | Damaging | Benign | |-----------|----------|--------| | **AlphaMissense** | >0.564 | <0.34 | | **CADD PHRED** | ≥20 (top 1%) | <15 | | **EVE** | >0.5 | ≤0.5 | | SIFT | <0.05 | ≥0.05 | | PolyPhen2 | >0.85 (probably) | <0.15 (benign) | **ACMG Application** (Enhanced): - **PP3**: Multiple concordant damaging predictions (AlphaMissense + CADD + EVE agreement = strong PP3) - **BP4**: Multiple concordant benign predictions - **Note**: AlphaMissense alone achieves ~90% accuracy on ClinVar pathogenic variants ### Phase 4: Structural Analysis **Goal**: Assess protein structural impact (especially for VUS) **Tools**: | Tool | Purpose | |------|---------| | `PDB_search_by_uniprot` | Find experimental structures | | `NvidiaNIM_alphafold2` | Predict structure if no PDB | | `alphafold_get_prediction` | Get AlphaFold DB structure | | `InterPro_get_protein_domains` | Domain annotations | | `UniProt_get_protein_function` | Functional sites | **Structural Impact Categories**: | Impact Level | Description | ACMG Support | |--------------|-------------|--------------| | **Critical** | Active site, catalytic residue | PM1 (strong) | | **High** | Buried residue, disulfide, structural core | PM1 (moderate) | | **Moderate** | Domain interface, binding site | PM1 (supporting) | | **Low** | Surface, flexible region | No support | **Using AlphaFold2 for VUS**: ``` 1. Get wildtype structure (PDB or AlphaFold) 2. Identify residue location: - pLDDT at position (confidence) - Solvent accessibility - Secondary structure 3. Assess structural context: - Distance to functional sites - Interaction partners - Conservation in structure 4. Predict impact: - Side chain burial - Hydrogen bond disruption - Charge changes in buried positions ``` ### Phase 4.5: Expression Context (NEW) **Goal**: Validate gene expression in disease-relevant tissues/cells **Tools**: | Tool | Purpose | Key Data | |------|---------|----------| | `CELLxGENE_get_expression_data` | Cell-type specific expression | TPM per cell type | | `CELLxGENE_get_cell_metadata` | Cell type annotations | Tissue, disease state | | `GTEx_get_median_gene_expression` | Tissue expression | TPM per tissue | **Expression Validation**: ```python def validate_expression_context(tu, gene_symbol, phenotype_tissues): """Validate gene is expressed in phenotype-relevant tissues.""" # Single-cell expression sc_expression = tu.tools.CELLxGENE_get_expression_data( gene=gene_symbol, tissue=phenotype_tissues[0] if phenotype_tissues else "all" ) # Bulk tissue expression (GTEx) gtex = tu.tools.GTEx_get_median_gene_expression( gene=gene_symbol ) # Check expression in relevant tissues relevant_expression = { tissue: gtex.get(tissue, 0) for tissue in phenotype_tissues } return { 'single_cell': sc_expression, 'gtex': relevant_expression, 'expressed_in_phenotype_tissue': any(v > 1 for v in relevant_expression.values()) } ``` **Why it matters**: - Confirms gene is expressed where disease manifests - Supports PP4 (phenotype-specific) if highly restricted expression - Can challenge classification if not expressed in affected tissue **Output for Report**: ```markdown ### 4.5 Expression Context | Tissue | Expression (TPM) | Relevance | |--------|------------------|-----------| | Heart | 45.2 | ✓ Primary disease tissue | | Skeletal muscle | 38.7 | ✓ Secondary involvement | | Liver | 2.1 | Low expression | | Brain | 0.5 | Not expressed | **Single-Cell Analysis (CELLxGENE)**: - **Cardiomyocytes**: High expression (TPM=85) - **Cardiac fibroblasts**: Low expression (TPM=5) **Interpretation**: Gene highly expressed in cardiomyocytes, supporting cardiac phenotype association. *Source: GTEx, CELLxGENE Census* ``` ### Phase 5: Literature Evidence (ENHANCED) **Goal**: Find functional studies, case reports, and cutting-edge preprints **Tools**: | Tool | Purpose | Coverage | |------|---------|----------| | `PubMed_search` | Peer-reviewed studies | Comprehensive | | `EuropePMC_search` | Additional literature | Europe PMC | | `BioRxiv_search_preprints` | Biology preprints | Recent findings | | `MedRxiv_search_preprints` | Clinical preprints | Clinical studies | | `openalex_search_works` | Citation analysis | Impact metrics | | `SemanticScholar_search_papers` | AI-ranked search | Relevance | **Search Strategies**: ```python def comprehensive_literature_search(tu, gene, variant, phenotype): """Search across all literature sources.""" # 1. PubMed: Peer-reviewed pubmed = tu.tools.PubMed_search( query=f'"{gene}" AND ("{variant}" OR functional)', max_results=30 ) # 2. BioRxiv: Recent preprints biorxiv = tu.tools.BioRxiv_search_preprints( query=f"{gene} {phenotype}", limit=10 ) # 3. MedRxiv: Clinical preprints medrxiv = tu.tools.MedRxiv_search_preprints( query=f"{gene} variant {phenotype}", limit=10 ) # 4. Citation analysis key_papers = pubmed[:5] # Top papers for paper in key_papers: citations = tu.tools.openalex_search_works( query=paper['title'], limit=1 ) paper['citation_count'] = citations[0].get('cited_by_count', 0) if citations else 0 return { 'pubmed': pubmed, 'preprints': biorxiv + medrxiv, 'key_papers_with_citations': key_papers } ``` **Search Queries**: ``` # Gene + variant specific "{GENE} AND ({HGVS_p} OR {AA_change})" # Functional studies "{GENE} AND (functional OR functional study OR mutagenesis)" # Clinical reports "{GENE} AND (case report OR patient) AND {phenotype}" # Preprint-specific "{GENE} genetics 2024" (for recent preprints) ``` **⚠️ Preprint Warning**: Always flag preprints as NOT peer-reviewed in reports. **Evidence Types**: | Evidence | ACMG Code | Weight | |----------|-----------|--------| | Functional study (null) | PS3 | Strong | | Functional study (reduced) | PS3_Moderate | Moderate | | Case reports with segregation | PP1 | Supporting to Moderate | | Co-occurrence with pathogenic | BP2 | Supporting against | ### Phase 6: ACMG Classification **Goal**: Systematic classification with explicit evidence **ACMG Evidence Codes**: **Pathogenic**: | Code | Strength | Description | |------|----------|-------------| | PVS1 | Very Strong | Null variant in gene where LOF is mechanism | | PS1 | Strong | Same amino acid change as known pathogenic | | PS3 | Strong | Well-established functional studies | | PM1 | Moderate | Mutational hot spot / functional domain | | PM2 | Moderate | Absent from controls | | PM5 | Moderate | Different missense at same residue as pathogenic | | PP3 | Supporting | Multiple computational predictions | | PP5 | Supporting | Reputable source reports pathogenic | **Benign**: | Code | Strength | Description | |------|----------|-------------| | BA1 | Stand-alone | MAF >5% | | BS1 | Strong | MAF greater than expected | | BS3 | Strong | Functional studies show no effect | | BP4 | Supporting | Multiple computational predictions benign | | BP7 | Supporting | Synonymous with no splice impact | **Classification Algorithm**: | Classification | Evidence Required | |----------------|-------------------| | Pathogenic | 1 Very Strong + 1 Strong; OR 2 Strong; OR 1 Strong + 3 Moderate | | Likely Pathogenic | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 1 Strong + 2 Supporting | | Likely Benign | 1 Strong + 1 Supporting; OR 2 Supporting | | Benign | 1 Stand-alone; OR 2 Strong | | VUS | Criteria not met | --- ## Output Structure ### Report Sections ```markdown # Variant Interpretation Report: {GENE} {VARIANT} ## Executive Summary - **Variant**: {HGVS notation} - **Gene**: {gene symbol} - **Classification**: {Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign} - **Evidence Strength**: {strong/moderate/limited} - **Key Finding**: {one-sentence summary} ## 1. Variant Identity {gene, transcript, protein change, consequence} ## 2. Population Data {gnomAD frequencies, ancestry breakdown} ## 3. Clinical Database Evidence {ClinVar, ClinGen, OMIM} ## 4. Computational Predictions {SIFT, PolyPhen, CADD scores} ## 5. Structural Analysis {Domain location, functional site proximity, AlphaFold confidence} ## 6. Literature Evidence {Functional studies, case reports} ## 7. ACMG Classification {Evidence codes applied, classification rationale} ## 8. Clinical Recommendations {Testing, management, family screening} ## 9. Limitations & Uncertainties {Missing data, conflicting evidence} ## Data Sources {All tools and databases queried} ``` --- ## Evidence Grading ### Classification Confidence | Symbol | Classification | Evidence Level | |--------|----------------|----------------| | ★★★ | High confidence | Multiple independent lines | | ★★☆ | Moderate confidence | Some supporting evidence | | ★☆☆ | Limited confidence | Minimal evidence | | VUS | Uncertain | Insufficient data | ### Structural Impact Confidence | pLDDT Range | Interpretation | |-------------|----------------| | >90 | Very high confidence in position | | 70-90 | High confidence | | 50-70 | Moderate (often loops) | | <50 | Low confidence (disorder) | --- ## Special Scenarios ### Scenario 1: Novel Missense VUS **Additional workflow**: 1. Check if other pathogenic variants at same residue 2. Get AlphaFold2 structure 3. Analyze: - Is residue buried or surface? - What secondary structure? - Proximity to active/binding sites? - Conservation across species? 4. Apply PM1 if in functional domain 5. Apply PP3 if predictions concordant ### Scenario 2: Truncating Variant **Additional workflow**: 1. Check if LOF is mechanism for gene 2. Determine if escapes NMD (last exon) 3. Check for alternative isoforms 4. Review ClinGen LOF curation **PVS1 Application**: | Scenario | PVS1 Strength | |----------|---------------| | Canonical LOF gene, NMD predicted | Very Strong | | LOF gene, last exon | Moderate | | Non-LOF gene | Not applicable | ### Scenario 3: Splice Variant **Additional workflow**: 1. Check SpliceAI scores (if available) 2. Determine canonical splice site distance 3. Review for in-frame skipping potential 4. Check for cryptic splice activation --- ## Quantified Minimums | Section | Requirement | |---------|-------------| | Population frequency | gnomAD overall + ≥3 ancestry groups | | Predictions | ≥3 computational predictors | | Literature search | ≥2 search strategies | | ACMG codes | All applicable codes listed | --- ## NVIDIA NIM Integration ### When to Use AlphaFold2 for Variants **Use Case**: VUS missense variants where structural context aids interpretation **Workflow**: ```python # 1. Get protein sequence protein_seq = tu.tools.UniProt_get_protein_sequence(accession=uniprot_id) # 2. Get/predict structure try: pdb_hits = tu.tools.PDB_search_by_uniprot(uniprot_id=uniprot_id) structure = tu.tools.PDB_get_structure(pdb_id=pdb_hits[0]['pdb_id']) except: # Predict with AlphaFold2 structure = tu.tools.NvidiaNIM_alphafold2( sequence=protein_seq['sequence'], algorithm="mmseqs2" ) # 3. Analyze variant position # - Extract pLDDT at residue position # - Calculate solvent accessibility # - Check for nearby functional sites ``` **Structural Features to Report**: - pLDDT at variant position - Secondary structure (helix/sheet/coil) - Solvent accessibility (buried/exposed) - Distance to active site (if applicable) - Interactions disrupted (H-bonds, salt bridges) --- ## Report File Naming ``` {GENE}_{VARIANT}_interpretation_report.md Examples: BRCA1_c.5266dupC_interpretation_report.md TP53_p.R273H_interpretation_report.md ``` --- ## Clinical Recommendations Framework ### For Pathogenic/Likely Pathogenic | Disease Context | Recommendations | |-----------------|-----------------| | Cancer predisposition | Enhanced screening, risk-reducing options | | Pharmacogenomics | Drug dosing adjustment | | Carrier status | Reproductive counseling | | Predictive testing | Family cascade screening | ### For VUS | Action | Details | |--------|---------| | Clinical management | Do not use for medical decisions | | Follow-up | Reinterpret in 1-2 years | | Research | Functional studies if available | | Family | Segregation data valuable | ### For Benign/Likely Benign | Action | Details | |--------|---------| | Clinical | Not expected to cause disease | | Family | No cascade testing needed | | Documentation | Include in report for completeness | --- ## See Also - `CHECKLIST.md` - Pre-delivery verification - `EXAMPLES.md` - Sample interpretations - `TOOLS_REFERENCE.md` - Tool parameters and fallbacks