---
name: tooluniverse-structural-variant-analysis
description: Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
---

# Structural Variant Analysis Workflow

Systematic analysis of structural variants (deletions, duplications, inversions, translocations, complex rearrangements) for clinical genomics interpretation using ACMG-adapted criteria.

**KEY PRINCIPLES**:
1. **Report-first approach** - Create SV_analysis_report.md FIRST, then populate progressively
2. **ACMG-style classification** - Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign with explicit evidence
3. **Evidence grading** - Grade all findings by confidence level (★★★/★★☆/★☆☆)
4. **Dosage sensitivity critical** - Gene dosage effects drive SV pathogenicity
5. **Breakpoint precision matters** - Exact gene disruption vs dosage-only effects
6. **Population context essential** - gnomAD SVs for frequency assessment
7. **English-first queries** - Always use English terms in tool calls (gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

---

## Problem This Skill Solves

Structural variants (SVs) present unique interpretation challenges:

1. **Complex molecular consequences** - SVs can cause gene dosage changes, gene disruption, gene fusions, position effects
2. **Size matters** - Pathogenicity depends on size, gene content, and breakpoint precision
3. **Limited databases** - Fewer curated SVs in ClinVar compared to SNVs
4. **Dosage sensitivity** - Haploinsufficiency and triplosensitivity are critical but gene-specific
5. **Population frequency** - Large benign CNVs are common; distinguishing pathogenic from benign is challenging

**This skill provides**: A systematic workflow integrating SV classification, gene content analysis, dosage sensitivity assessment, population frequencies, and ACMG-adapted criteria into clinically actionable interpretations.

---

## Triggers

Use this skill when users:
- Ask about structural variant interpretation
- Have CNV data from array or sequencing
- Ask "is this deletion/duplication pathogenic?"
- Need ACMG classification for SVs
- Want to assess gene dosage effects
- Ask about chromosomal rearrangements
- Have large-scale genomic alterations requiring interpretation

---

## Workflow Overview

```
┌─────────────────────────────────────────────────────────────────┐
│              STRUCTURAL VARIANT INTERPRETATION                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: SV IDENTITY & CLASSIFICATION                          │
│  ├── Normalize SV coordinates (hg19/hg38)                       │
│  ├── Determine SV type (DEL/DUP/INV/TRA/CPX)                   │
│  ├── Calculate SV size                                          │
│  └── Assess breakpoint precision                                │
│                                                                  │
│  Phase 2: GENE CONTENT ANALYSIS                                  │
│  ├── Identify genes fully contained in SV                       │
│  ├── Identify genes with breakpoints (disrupted)                │
│  ├── Annotate gene function and disease associations            │
│  ├── Identify regulatory elements affected                      │
│  └── Assess gene orientation (for inversions/translocations)    │
│                                                                  │
│  Phase 3: DOSAGE SENSITIVITY ASSESSMENT                          │
│  ├── ClinGen dosage sensitivity scores                          │
│  │   └─ Haploinsufficiency / Triplosensitivity ratings          │
│  ├── DECIPHER haploinsufficiency predictions                    │
│  ├── pLI scores (gnomAD) for loss-of-function intolerance       │
│  ├── OMIM gene-disease associations (dominant/recessive)        │
│  └── Known dosage-sensitive genes from literature               │
│                                                                  │
│  Phase 4: POPULATION FREQUENCY CONTEXT                           │
│  ├── gnomAD SV database (overlapping SVs)                       │
│  ├── DGV (Database of Genomic Variants)                         │
│  ├── ClinVar (known pathogenic/benign SVs)                      │
│  └── Calculate reciprocal overlap with population SVs           │
│                                                                  │
│  Phase 5: PATHOGENICITY SCORING                                  │
│  ├── Pathogenicity score (0-10 scale)                           │
│  │   ├─ Gene content weight (40%)                               │
│  │   ├─ Dosage sensitivity weight (30%)                         │
│  │   ├─ Population frequency weight (20%)                       │
│  │   └─ Inheritance/phenotype match weight (10%)                │
│  ├── Apply ACMG SV criteria                                     │
│  └── Generate classification recommendation                      │
│                                                                  │
│  Phase 6: LITERATURE & CLINICAL EVIDENCE                         │
│  ├── PubMed: Similar SVs, gene disruption studies               │
│  ├── DECIPHER: Developmental disorder cases                     │
│  ├── Clinical case reports                                      │
│  └── Functional evidence for gene dosage effects                │
│                                                                  │
│  Phase 7: ACMG-ADAPTED CLASSIFICATION                            │
│  ├── Apply SV-specific evidence codes                           │
│  ├── Calculate final classification                             │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Phase Details

### Phase 1: SV Identity & Classification

**Goal**: Standardize SV notation and classify type

**SV Types**:
| Type | Abbreviation | Description | Molecular Effect |
|------|--------------|-------------|------------------|
| **Deletion** | DEL | Loss of genomic segment | Haploinsufficiency, gene disruption |
| **Duplication** | DUP | Gain of genomic segment | Triplosensitivity, gene dosage imbalance |
| **Inversion** | INV | Segment flipped in orientation | Gene disruption at breakpoints, position effects |
| **Translocation** | TRA | Segment moved to different chromosome | Gene fusions, disruption, position effects |
| **Complex** | CPX | Multiple rearrangement types | Variable effects |

**Key Information to Capture**:
- Chromosome(s) involved
- Coordinates (start, end) in hg19/hg38
- SV size (bp or Mb)
- SV type (DEL/DUP/INV/TRA/CPX)
- Breakpoint precision (±50bp, ±1kb, etc.)
- Inheritance pattern (de novo, inherited, unknown)

**Example**:
```
SV: arr[GRCh38] 17q21.31(44039927-44352659)x1
- Type: Deletion (heterozygous)
- Size: 313 kb
- Genes: MAPT, KANSL1 (fully contained)
- Breakpoints: Well-defined (array resolution ±5kb)
```

---

### Phase 2: Gene Content Analysis

**Goal**: Comprehensive annotation of genes affected by SV

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `Ensembl_lookup_gene` | Gene structure, coordinates | Gene boundaries, exons, transcripts |
| `NCBI_gene_search` | Gene information | Official symbol, aliases, description |
| `Gene_Ontology_get_term_info` | Gene function | Biological process, molecular function |
| `OMIM_search`, `OMIM_get_entry` | Disease associations | Inheritance, clinical features |
| `DisGeNET_search_gene` | Gene-disease associations | Evidence scores |

**Gene Categories**:

1. **Fully contained genes** - Entire gene within SV boundaries
   - Deletion: Complete loss of one copy (haploinsufficiency)
   - Duplication: Extra copy (triplosensitivity)

2. **Partially disrupted genes** - Breakpoint within gene
   - Likely loss-of-function for affected allele
   - Check if critical domains disrupted

3. **Flanking genes** - Within 1 Mb of breakpoints
   - May be affected by position effects
   - Regulatory disruption possible

**Example Gene Content Analysis**:

```python
def analyze_gene_content(tu, chrom, sv_start, sv_end, sv_type):
    """
    Identify and annotate all genes within SV region.
    """
    genes = {
        'fully_contained': [],
        'partially_disrupted': [],
        'flanking': []
    }

    # Use Ensembl to find overlapping genes
    # This is pseudocode - actual implementation depends on available tools

    for gene in genes_in_region:
        gene_start = gene['start']
        gene_end = gene['end']

        # Classify gene relationship to SV
        if gene_start >= sv_start and gene_end <= sv_end:
            # Fully contained
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['fully_contained'].append(gene_info)

        elif (gene_start < sv_start < gene_end) or (gene_start < sv_end < gene_end):
            # Partially disrupted
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['partially_disrupted'].append(gene_info)

        elif abs(gene_start - sv_end) < 1000000 or abs(gene_end - sv_start) < 1000000:
            # Flanking (within 1 Mb)
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['flanking'].append(gene_info)

    return genes

def annotate_gene(tu, gene_symbol):
    """
    Comprehensive gene annotation.
    """
    # OMIM associations
    omim = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )

    # DisGeNET associations
    disgenet = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=10
    )

    # Gene Ontology
    # Note: Need gene ID first
    ncbi = tu.tools.NCBI_gene_search(
        term=gene_symbol,
        organism="human"
    )

    return {
        'symbol': gene_symbol,
        'omim': omim,
        'disgenet': disgenet,
        'ncbi': ncbi
    }
```

**Report Section**:
```markdown
### 2.1 Fully Contained Genes (Complete Dosage Effect)

| Gene | Function | Disease Association | Inheritance | Evidence |
|------|----------|---------------------|-------------|----------|
| **MAPT** | Microtubule-associated protein tau | Frontotemporal dementia (AD) | Autosomal Dominant | ★★★ |
| **KANSL1** | Histone acetyltransferase complex | Koolen-De Vries syndrome (AD) | Autosomal Dominant | ★★★ |

**Interpretation**: Deletion results in haploinsufficiency of two dosage-sensitive genes. KANSL1 haploinsufficiency is the primary cause of pathogenicity.

*Sources: OMIM, DisGeNET, Ensembl*

### 2.2 Partially Disrupted Genes (Breakpoint Within Gene)

| Gene | Breakpoint Location | Effect | Critical Domains Lost |
|------|-------------------|--------|----------------------|
| **NF1** | Intron 28 of 58 | 5' portion deleted | Yes - GTPase-activating domain |

**Interpretation**: Breakpoint disrupts NF1 coding sequence, likely resulting in loss-of-function. NF1 is haploinsufficient (causes neurofibromatosis type 1).

### 2.3 Flanking Genes (Potential Position Effects)

| Gene | Distance from SV | Regulatory Risk | Evidence |
|------|------------------|-----------------|----------|
| **KCNJ2** | 450 kb upstream | Low | ★☆☆ |

**Note**: Position effects are possible but less common. Consider if phenotype unexplained by contained genes.
```

---

### Phase 3: Dosage Sensitivity Assessment

**Goal**: Determine if affected genes are dosage-sensitive

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `ClinGen_search_dosage_sensitivity` | Gold standard curation | HI/TS scores (0-3) |
| `ClinGen_search_gene_validity` | Gene-disease validity | Definitive/Strong/Moderate |
| `gnomad_search` (pLI) | Loss-of-function intolerance | pLI score (0-1) |
| `DECIPHER_search` | Developmental disorders | Patient phenotypes with similar SVs |
| `OMIM_get_entry` | Inheritance pattern | AD/AR indicates dosage sensitivity |

**ClinGen Dosage Sensitivity Scores**:

| Score | Haploinsufficiency (HI) | Triplosensitivity (TS) | Interpretation |
|-------|------------------------|------------------------|----------------|
| **3** | Sufficient evidence | Sufficient evidence | Gene IS dosage-sensitive |
| **2** | Emerging evidence | Emerging evidence | Likely dosage-sensitive |
| **1** | Little evidence | Little evidence | Insufficient evidence |
| **0** | No evidence | No evidence | No established dosage sensitivity |

**pLI Score Interpretation** (gnomAD):
| pLI Range | Interpretation | LoF Intolerance |
|-----------|----------------|-----------------|
| **≥0.9** | Extremely intolerant | High - likely haploinsufficient |
| **0.5-0.9** | Moderately intolerant | Moderate |
| **<0.5** | Tolerant | Low - likely NOT haploinsufficient |

**Implementation**:

```python
def assess_dosage_sensitivity(tu, gene_list):
    """
    Assess dosage sensitivity for all genes in SV.
    Returns dosage scores and interpretation.
    """
    dosage_data = []

    for gene_symbol in gene_list:
        # 1. ClinGen dosage sensitivity (gold standard)
        clingen = tu.tools.ClinGen_search_dosage_sensitivity(
            gene=gene_symbol
        )

        hi_score = None
        ts_score = None
        if clingen.get('data'):
            for entry in clingen['data']:
                hi_score = entry.get('Haploinsufficiency Score')
                ts_score = entry.get('Triplosensitivity Score')
                break

        # 2. ClinGen gene validity (supports dosage sensitivity)
        validity = tu.tools.ClinGen_search_gene_validity(
            gene=gene_symbol
        )

        validity_level = None
        if validity.get('data'):
            for entry in validity['data']:
                validity_level = entry.get('Classification')
                break

        # 3. pLI score from gnomAD (if available via gene search)
        # Note: May need to use myvariant or other tools
        # pli_score = get_pli_score(tu, gene_symbol)

        # 4. OMIM inheritance pattern
        omim = tu.tools.OMIM_search(
            operation="search",
            query=gene_symbol,
            limit=3
        )

        inheritance_pattern = None
        if omim.get('data', {}).get('entries'):
            for entry in omim['data']['entries']:
                mim = entry.get('mimNumber')
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim)
                )
                # Extract inheritance from details
                # inheritance_pattern = parse_inheritance(details)

        # Integrate evidence
        dosage_assessment = {
            'gene': gene_symbol,
            'hi_score': hi_score,
            'ts_score': ts_score,
            'validity_level': validity_level,
            'inheritance': inheritance_pattern,
            'is_dosage_sensitive': (hi_score == '3' or ts_score == '3'),
            'evidence_grade': calculate_evidence_grade(hi_score, ts_score, validity_level)
        }

        dosage_data.append(dosage_assessment)

    return dosage_data

def calculate_evidence_grade(hi_score, ts_score, validity):
    """
    Calculate evidence grade for dosage sensitivity.
    """
    if (hi_score == '3' or ts_score == '3') and validity == 'Definitive':
        return '★★★'  # High confidence
    elif (hi_score in ['2', '3'] or ts_score in ['2', '3']):
        return '★★☆'  # Moderate confidence
    else:
        return '★☆☆'  # Low confidence
```

**Report Section**:
```markdown
### 3. Dosage Sensitivity Assessment

#### Haploinsufficient Genes (Deletions/Disruptions)

| Gene | ClinGen HI Score | pLI | Validity | Disease | Evidence |
|------|-----------------|-----|----------|---------|----------|
| **KANSL1** | 3 (Sufficient) | 0.99 | Definitive | Koolen-De Vries syndrome | ★★★ |
| **MAPT** | 2 (Emerging) | 0.85 | Strong | FTD (rare) | ★★☆ |

**Interpretation**: KANSL1 has definitive evidence for haploinsufficiency. Deletion of one copy is expected to cause Koolen-De Vries syndrome (intellectual disability, hypotonia, distinctive facial features).

*Sources: ClinGen Dosage Sensitivity Map, gnomAD pLI*

#### Triplosensitive Genes (Duplications)

| Gene | ClinGen TS Score | Disease Mechanism | Evidence |
|------|-----------------|-------------------|----------|
| **MECP2** | 3 (Sufficient) | MECP2 duplication syndrome | ★★★ |
| **PMP22** | 3 (Sufficient) | Charcot-Marie-Tooth 1A | ★★★ |

**Note**: For this deletion, triplosensitivity is not applicable. Listed for reference.

#### Non-Dosage-Sensitive Genes

| Gene | HI Score | TS Score | Interpretation |
|------|----------|----------|----------------|
| **GENE_X** | 0 | 0 | No established dosage sensitivity |
| **GENE_Y** | 1 | 1 | Insufficient evidence |

**Interpretation**: These genes lack evidence for dosage sensitivity. Deletion/duplication less likely to be pathogenic solely due to these genes.
```

---

### Phase 4: Population Frequency Context

**Goal**: Determine if SV is common in general population (likely benign) or rare (supports pathogenicity)

**Tools**:
| Tool | Purpose | Key Data |
|------|---------|----------|
| `gnomad_search` | Population SV frequencies | Overlapping SVs, frequencies |
| `ClinVar_search_variants` | Known pathogenic/benign SVs | Classification, review status |
| `DECIPHER_search` | Patient SVs with phenotypes | Case reports, phenotype similarity |

**Frequency Interpretation** (adapted from ACMG):

| SV Frequency | ACMG Code | Interpretation |
|--------------|-----------|----------------|
| **≥1% in gnomAD SVs** | BA1 (Stand-alone Benign) | Too common for rare disease |
| **0.1-1%** | BS1 (Strong Benign) | Likely benign common variant |
| **<0.01%** | PM2 (Supporting Pathogenic) | Rare, supports pathogenicity |
| **Absent** | PM2 (Supporting) | Very rare, supports pathogenicity |

**Reciprocal Overlap Calculation**:

For proper comparison, calculate reciprocal overlap between query SV and population SV:

```
Reciprocal Overlap = min(overlap_with_A, overlap_with_B)
where:
  overlap_with_A = (overlap length) / (SV_A length)
  overlap_with_B = (overlap length) / (SV_B length)

Threshold: ≥70% reciprocal overlap = "same" SV
```

**Implementation**:

```python
def assess_population_frequency(tu, chrom, sv_start, sv_end, sv_type):
    """
    Check population databases for overlapping SVs.
    """
    # 1. Check ClinVar for known pathogenic/benign SVs
    clinvar = tu.tools.ClinVar_search_variants(
        chromosome=str(chrom),
        start=sv_start,
        stop=sv_end,
        variant_type=sv_type.upper()
    )

    known_svs = []
    if clinvar.get('data'):
        for variant in clinvar['data']:
            classification = variant.get('clinical_significance')
            known_svs.append({
                'database': 'ClinVar',
                'classification': classification,
                'review_status': variant.get('review_status'),
                'coordinates': f"{variant.get('chromosome')}:{variant.get('start')}-{variant.get('stop')}"
            })

    # 2. gnomAD SVs (if available)
    # Note: gnomAD SV database may not have direct API access via ToolUniverse
    # May need to use genomic coordinate search

    # 3. DECIPHER for similar patient cases
    decipher_search = tu.tools.DECIPHER_search(
        query=f"chr{chrom}:{sv_start}-{sv_end}",
        search_type="region"
    )

    patient_cases = []
    if decipher_search.get('data'):
        patient_cases = decipher_search['data']

    return {
        'clinvar_matches': known_svs,
        'decipher_cases': patient_cases,
        'frequency_interpretation': interpret_frequency(known_svs)
    }

def interpret_frequency(known_svs):
    """
    Interpret frequency based on ClinVar matches.
    """
    if any(sv['classification'] == 'Benign' for sv in known_svs):
        return {
            'acmg_code': 'BA1 or BS1',
            'interpretation': 'Likely benign based on ClinVar benign classification',
            'evidence_grade': '★★★'
        }
    elif any(sv['classification'] == 'Pathogenic' for sv in known_svs):
        return {
            'acmg_code': 'PS1',
            'interpretation': 'Pathogenic based on ClinVar pathogenic classification',
            'evidence_grade': '★★★'
        }
    else:
        return {
            'acmg_code': 'PM2',
            'interpretation': 'Rare variant, not found in ClinVar or population databases',
            'evidence_grade': '★★☆'
        }
```

**Report Section**:
```markdown
### 4. Population Frequency Context

#### ClinVar Matches (Overlapping SVs)

| VCV ID | Classification | Size | Overlap | Review Status | Genes |
|--------|----------------|------|---------|---------------|-------|
| VCV000012345 | Pathogenic | 320 kb | 95% reciprocal | ★★★ Reviewed by expert panel | KANSL1, MAPT |

**Match Found**: Query deletion has 95% reciprocal overlap with known pathogenic deletion in ClinVar (VCV000012345). This is the Koolen-De Vries syndrome deletion.

**ACMG Code**: **PS1** (Strong) - Same genomic region as established pathogenic SV

*Source: ClinVar via `ClinVar_search_variants`*

#### gnomAD SV Database

**Search Result**: No overlapping deletions found in gnomAD SV v4.0 (>10,000 genomes)

**Interpretation**: Absence from gnomAD supports rarity and pathogenic potential.

**ACMG Code**: **PM2** (Moderate) - Absent from population databases

*Note: gnomAD SVs queried via browser (no direct API access)*

#### DECIPHER Patient Cases

| Case ID | Phenotype | SV Type | Size | Overlap | Similarity |
|---------|-----------|---------|------|---------|------------|
| 12345 | Intellectual disability, hypotonia | DEL | 315 kb | 98% | High |
| 67890 | Developmental delay, facial dysmorphism | DEL | 305 kb | 92% | High |

**Phenotype Match**: 8/10 DECIPHER patients have intellectual disability and hypotonia, consistent with Koolen-De Vries syndrome.

**ACMG Support**: **PP4** (Supporting) - Patient phenotype consistent with gene's disease association

*Source: DECIPHER via `DECIPHER_search`*
```

---

### Phase 5: Pathogenicity Scoring

**Goal**: Quantitative pathogenicity assessment (0-10 scale)

**Scoring Components**:

1. **Gene Content (40 points max)**:
   - 10 points per dosage-sensitive gene (HI/TS score 3)
   - 5 points per likely dosage-sensitive gene (score 2)
   - 2 points per gene with disease association
   - Cap at 40 points

2. **Dosage Sensitivity Evidence (30 points max)**:
   - 30 points: Multiple genes with definitive HI/TS (score 3)
   - 20 points: One gene with definitive HI/TS
   - 10 points: Genes with emerging evidence (score 2)
   - 5 points: Predicted haploinsufficiency (pLI >0.9)

3. **Population Frequency (20 points max)**:
   - 20 points: Absent from gnomAD, DGV
   - 10 points: Rare (<0.01%)
   - 0 points: Common (>0.1%)
   - -20 points: Very common (>1%) - likely benign

4. **Clinical Evidence (10 points max)**:
   - 10 points: Matching ClinVar pathogenic SV
   - 8 points: DECIPHER cases with matching phenotype
   - 5 points: Literature support for gene dosage effects
   - 3 points: Phenotype consistent with genes

**Pathogenicity Score Interpretation**:

| Score | Classification | Confidence | Interpretation |
|-------|----------------|------------|----------------|
| **9-10** | Pathogenic | ★★★ | High confidence pathogenic |
| **7-8** | Likely Pathogenic | ★★☆ | Strong evidence for pathogenicity |
| **4-6** | VUS | ★☆☆ | Uncertain significance |
| **2-3** | Likely Benign | ★★☆ | Strong evidence for benign |
| **0-1** | Benign | ★★★ | High confidence benign |

**Implementation**:

```python
def calculate_pathogenicity_score(gene_content, dosage_data, frequency_data, clinical_data):
    """
    Calculate comprehensive pathogenicity score (0-10 scale).
    """
    score = 0
    breakdown = {}

    # 1. Gene content scoring (40 points max)
    gene_score = 0
    for gene in gene_content['fully_contained'] + gene_content['partially_disrupted']:
        dosage_info = next((d for d in dosage_data if d['gene'] == gene['symbol']), None)
        if dosage_info:
            if dosage_info['hi_score'] == '3':
                gene_score += 10
            elif dosage_info['hi_score'] == '2':
                gene_score += 5
            elif gene.get('omim_disease'):
                gene_score += 2

    gene_score = min(gene_score, 40)  # Cap at 40
    breakdown['gene_content'] = gene_score / 40 * 4  # Scale to 0-4

    # 2. Dosage sensitivity scoring (30 points max)
    dosage_score = 0
    definitive_genes = sum(1 for d in dosage_data if d['hi_score'] == '3')

    if definitive_genes >= 2:
        dosage_score = 30
    elif definitive_genes == 1:
        dosage_score = 20
    else:
        emerging_genes = sum(1 for d in dosage_data if d['hi_score'] == '2')
        dosage_score = emerging_genes * 5

    dosage_score = min(dosage_score, 30)
    breakdown['dosage_sensitivity'] = dosage_score / 30 * 3  # Scale to 0-3

    # 3. Population frequency scoring (20 points max)
    freq_score = 0
    if frequency_data.get('frequency') is None:
        freq_score = 20  # Absent
    elif frequency_data['frequency'] < 0.0001:
        freq_score = 10  # Rare
    elif frequency_data['frequency'] < 0.001:
        freq_score = 5  # Uncommon
    elif frequency_data['frequency'] > 0.01:
        freq_score = -20  # Common - likely benign

    breakdown['population_frequency'] = freq_score / 20 * 2  # Scale to -2 to 2

    # 4. Clinical evidence scoring (10 points max)
    clinical_score = 0
    if clinical_data.get('clinvar_pathogenic'):
        clinical_score = 10
    elif clinical_data.get('decipher_matching_phenotype'):
        clinical_score = 8
    elif clinical_data.get('literature_support'):
        clinical_score = 5

    clinical_score = min(clinical_score, 10)
    breakdown['clinical_evidence'] = clinical_score / 10 * 1  # Scale to 0-1

    # Total score (0-10 scale)
    total_score = breakdown['gene_content'] + breakdown['dosage_sensitivity'] + \
                  breakdown['population_frequency'] + breakdown['clinical_evidence']

    total_score = max(0, min(10, total_score))  # Ensure 0-10 range

    return {
        'total_score': round(total_score, 1),
        'breakdown': breakdown,
        'classification': classify_score(total_score)
    }

def classify_score(score):
    """Map score to ACMG-style classification."""
    if score >= 9:
        return 'Pathogenic'
    elif score >= 7:
        return 'Likely Pathogenic'
    elif score >= 4:
        return 'VUS'
    elif score >= 2:
        return 'Likely Benign'
    else:
        return 'Benign'
```

**Report Section**:
```markdown
### 5. Pathogenicity Scoring

#### Quantitative Assessment (0-10 Scale)

| Component | Points | Max | Contribution | Rationale |
|-----------|--------|-----|-------------|-----------|
| **Gene Content** | 4.0 | 4 | 40% | KANSL1 (HI score 3), MAPT (HI score 2) |
| **Dosage Sensitivity** | 2.5 | 3 | 25% | One definitive HI gene (KANSL1) |
| **Population Frequency** | 2.0 | 2 | 20% | Absent from gnomAD SVs |
| **Clinical Evidence** | 1.0 | 1 | 10% | ClinVar pathogenic match |
| **Total Score** | **9.5** | 10 | 100% | |

**Classification**: **Pathogenic** (★★★ High Confidence)

**Interpretation**: Score of 9.5/10 indicates high confidence pathogenic SV. Deletion encompasses established haploinsufficient gene (KANSL1), absent from population databases, and matches known pathogenic ClinVar variant.

#### Score Breakdown Visualization

```
Gene Content:        ████████████████████████████████████████ 4.0/4
Dosage Sensitivity:  ██████████████████████████░░░░░░░░░░░░░ 2.5/3
Population Freq:     ████████████████████████████████████████ 2.0/2
Clinical Evidence:   ██████████████████████████████████████░░ 1.0/1
                     ─────────────────────────────────────────
Total:               ██████████████████████████████████████░░ 9.5/10
```

**Key Drivers of Pathogenicity**:
1. KANSL1 haploinsufficiency (definitive evidence)
2. Exact match to known pathogenic deletion
3. Absence from population databases
4. Phenotype consistency with Koolen-De Vries syndrome
```

---

### Phase 6: Literature & Clinical Evidence

**Goal**: Find case reports, functional studies, and clinical validation

**Tools**:
| Tool | Purpose | Coverage |
|------|---------|----------|
| `PubMed_search` | Peer-reviewed literature | Comprehensive |
| `DECIPHER_search` | Patient case database | Developmental disorders |
| `EuropePMC_search` | European literature | Additional coverage |

**Search Strategies**:

```python
def comprehensive_literature_search(tu, genes, sv_type, phenotype):
    """
    Search literature for SV evidence.
    """
    # 1. Gene-specific searches
    literature = []
    for gene in genes:
        # Dosage sensitivity literature
        dosage_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND (haploinsufficiency OR dosage sensitivity OR deletion syndrome)',
            max_results=20
        )

        # Case reports
        case_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND deletion AND {phenotype}',
            max_results=15
        )

        literature.append({
            'gene': gene,
            'dosage_papers': dosage_papers,
            'case_reports': case_papers
        })

    # 2. SV-specific searches
    if sv_type == 'DEL':
        sv_papers = tu.tools.PubMed_search(
            query=f'deletion AND {" AND ".join(genes[:3])} AND syndrome',
            max_results=25
        )

    # 3. DECIPHER cases
    decipher_cases = []
    for gene in genes:
        cases = tu.tools.DECIPHER_search(
            query=gene,
            search_type="gene"
        )
        decipher_cases.append(cases)

    return {
        'gene_literature': literature,
        'sv_literature': sv_papers,
        'decipher_cases': decipher_cases
    }
```

**Report Section**:
```markdown
### 6. Literature & Clinical Evidence

#### Key Publications

| Study | Finding | Evidence Type | PMID |
|-------|---------|---------------|------|
| Koolen et al., 2006 | Described 17q21.31 microdeletion syndrome | Original description | 16222315 |
| Koolen et al., 2008 | KANSL1 haploinsufficiency confirmed | Functional validation | 18394581 |
| Zollino et al., 2012 | Phenotype characterization (n=52) | Clinical series | 22736773 |

**Key Findings**:
- 17q21.31 deletion is recurrent (mediated by LCRs)
- KANSL1 haploinsufficiency is primary mechanism
- Phenotype: ID (100%), hypotonia (95%), friendly demeanor (85%)
- Penetrance: >95% for developmental features

*Source: PubMed via `PubMed_search`*

#### DECIPHER Patient Cases (n=45)

**Phenotype Frequency in DECIPHER Cohort**:
| Feature | Frequency | Match to Patient |
|---------|-----------|------------------|
| Intellectual disability | 45/45 (100%) | ✓ Yes |
| Hypotonia | 42/45 (93%) | ✓ Yes |
| Feeding difficulties | 38/45 (84%) | ✓ Yes |
| Distinctive facies | 40/45 (89%) | ✓ Yes |
| Friendly personality | 35/45 (78%) | Unknown |

**Phenotype Match**: Patient phenotype highly consistent with DECIPHER cohort (4/4 assessable features present).

**ACMG Code**: **PP4** (Supporting) - Patient's clinical features consistent with gene's known phenotype

*Source: DECIPHER via `DECIPHER_search`*

#### Functional Evidence for KANSL1 Dosage Sensitivity

| Study | Model | Finding | PMID |
|-------|-------|---------|------|
| Koolen et al., 2012 | Patient cells | Reduced KANSL1 protein | 22736773 |
| Zollino et al., 2015 | Mouse model | Kansl1+/- recapitulates phenotype | 25607366 |
| Arbogast et al., 2017 | Zebrafish | kansl1 knockdown → developmental defects | 28666126 |

**Strength of Evidence**: ★★★ (High) - Multiple independent studies confirm haploinsufficiency mechanism

**ACMG Code**: **PS3_Moderate** - Well-established functional studies showing dosage sensitivity
```

---

### Phase 7: ACMG-Adapted Classification

**Goal**: Apply ACMG/ClinGen criteria adapted for SVs

**SV-Specific ACMG Criteria**:

### Pathogenic Evidence Codes

| Code | Strength | Criteria | SV Application |
|------|----------|----------|----------------|
| **PVS1** | Very Strong | Null variant in HI gene | Complete deletion of HI gene |
| **PS1** | Strong | Same SV as known pathogenic | ≥70% reciprocal overlap with ClinVar pathogenic |
| **PS2** | Strong | De novo (maternity/paternity confirmed) | De novo SV in patient with matching phenotype |
| **PS3** | Strong | Functional studies | Gene dosage effects demonstrated |
| **PS4** | Strong | Case-control enrichment | SV enriched in cases vs controls |
| **PM1** | Moderate | Critical region | Deletion of exons in HI gene |
| **PM2** | Moderate | Absent from controls | Not in gnomAD SVs, DGV |
| **PM3** | Moderate | Recessive: homozygous or compound het | Both alleles affected (rare for SVs) |
| **PM4** | Moderate | Protein length change | In-frame deletion/duplication |
| **PM5** | Moderate | Similar SVs pathogenic | Nearby SVs in ClinVar pathogenic |
| **PM6** | Moderate | De novo (no confirmation) | De novo SV, phenotype consistent |
| **PP1** | Supporting | Segregation in family | SV segregates with phenotype |
| **PP2** | Supporting | Gene/pathway relevant | Genes in SV match phenotype |
| **PP3** | Supporting | Computational evidence | Multiple predictors support haploinsufficiency |
| **PP4** | Supporting | Phenotype consistent | Patient phenotype matches gene-disease |

### Benign Evidence Codes

| Code | Strength | Criteria | SV Application |
|------|----------|----------|----------------|
| **BA1** | Stand-Alone | MAF >5% | SV frequency >5% in gnomAD |
| **BS1** | Strong | MAF too high for disease | SV frequency >1% |
| **BS2** | Strong | Healthy adult with phenotype-associated genotype | SV in healthy individual (careful - reduced penetrance) |
| **BS3** | Strong | Functional studies show no effect | No dosage sensitivity demonstrated |
| **BS4** | Strong | Non-segregation | SV doesn't segregate with phenotype |
| **BP1** | Supporting | Missense in gene without known LOF | N/A for SVs |
| **BP2** | Supporting | Observed in trans with pathogenic | SV + pathogenic SNV = compound het (patient unaffected) |
| **BP4** | Supporting | Computational evidence benign | Predictors suggest no haploinsufficiency |
| **BP5** | Supporting | Found in case with alt cause | Phenotype explained by different variant |
| **BP7** | Supporting | Synonymous with no splice effect | N/A for SVs |

**Classification Algorithm** (ACMG SV Criteria):

| Classification | Evidence Required |
|----------------|-------------------|
| **Pathogenic** | PVS1 + PS1; OR 2 Strong; OR 1 Strong + 3 Moderate |
| **Likely Pathogenic** | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 3 Moderate |
| **VUS** | Criteria not met; OR conflicting evidence |
| **Likely Benign** | 1 Strong + 1 Supporting; OR 2 Supporting |
| **Benign** | BA1; OR BS1 + BS2; OR 2 Strong |

**Implementation**:

```python
def apply_acmg_criteria(gene_content, dosage_data, frequency_data, clinical_data, inheritance):
    """
    Apply ACMG SV criteria and calculate classification.
    """
    evidence = {
        'pathogenic': [],
        'benign': []
    }

    # PVS1: Complete deletion of HI gene
    hi_genes = [d for d in dosage_data if d['hi_score'] == '3']
    if len(hi_genes) > 0 and len(gene_content['fully_contained']) > 0:
        evidence['pathogenic'].append({
            'code': 'PVS1',
            'strength': 'Very Strong',
            'rationale': f"Complete deletion of haploinsufficient gene(s): {', '.join(g['gene'] for g in hi_genes)}"
        })

    # PS1: Same as known pathogenic SV
    if clinical_data.get('clinvar_pathogenic_match'):
        evidence['pathogenic'].append({
            'code': 'PS1',
            'strength': 'Strong',
            'rationale': f"≥70% overlap with ClinVar pathogenic SV: {clinical_data['clinvar_id']}"
        })

    # PS2: De novo with phenotype match
    if inheritance == 'de_novo' and clinical_data.get('phenotype_match'):
        evidence['pathogenic'].append({
            'code': 'PS2',
            'strength': 'Strong',
            'rationale': "De novo occurrence in patient with consistent phenotype"
        })

    # PS3: Functional studies
    if clinical_data.get('functional_evidence'):
        evidence['pathogenic'].append({
            'code': 'PS3',
            'strength': 'Strong',
            'rationale': "Well-established functional studies demonstrate dosage sensitivity"
        })

    # PM2: Absent from controls
    if frequency_data.get('frequency') == 0 or frequency_data.get('frequency') is None:
        evidence['pathogenic'].append({
            'code': 'PM2',
            'strength': 'Moderate',
            'rationale': "Absent from gnomAD SV database and DGV"
        })

    # PP4: Phenotype consistent
    if clinical_data.get('phenotype_consistent'):
        evidence['pathogenic'].append({
            'code': 'PP4',
            'strength': 'Supporting',
            'rationale': "Patient phenotype highly consistent with gene-disease association"
        })

    # BA1: Common variant
    if frequency_data.get('frequency', 0) > 0.05:
        evidence['benign'].append({
            'code': 'BA1',
            'strength': 'Stand-Alone',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} too high for rare disease"
        })

    # BS1: High frequency
    if 0.01 < frequency_data.get('frequency', 0) <= 0.05:
        evidence['benign'].append({
            'code': 'BS1',
            'strength': 'Strong',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} exceeds expected for disease"
        })

    # Calculate classification
    classification = determine_classification(evidence)

    return {
        'evidence': evidence,
        'classification': classification['class'],
        'confidence': classification['confidence']
    }

def determine_classification(evidence):
    """
    Apply ACMG classification rules.
    """
    path = evidence['pathogenic']
    ben = evidence['benign']

    # Count evidence by strength
    very_strong = len([e for e in path if e['strength'] == 'Very Strong'])
    strong_path = len([e for e in path if e['strength'] == 'Strong'])
    moderate_path = len([e for e in path if e['strength'] == 'Moderate'])
    supporting_path = len([e for e in path if e['strength'] == 'Supporting'])

    standalone_ben = len([e for e in ben if e['strength'] == 'Stand-Alone'])
    strong_ben = len([e for e in ben if e['strength'] == 'Strong'])
    supporting_ben = len([e for e in ben if e['strength'] == 'Supporting'])

    # Benign criteria (takes precedence if strong)
    if standalone_ben >= 1:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 2:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 1 and supporting_ben >= 1:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}
    if supporting_ben >= 2:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}

    # Pathogenic criteria
    if very_strong >= 1 and strong_path >= 1:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if strong_path >= 2:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if very_strong >= 1 and moderate_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 2:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 1 and supporting_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if moderate_path >= 3:
        return {'class': 'Likely Pathogenic', 'confidence': '★☆☆'}

    # Default to VUS
    return {'class': 'VUS', 'confidence': '★☆☆'}
```

**Report Section**:
```markdown
### 7. ACMG-Adapted Classification

#### Evidence Codes Applied

**Pathogenic Evidence**:

| Code | Strength | Rationale |
|------|----------|-----------|
| **PVS1** | Very Strong | Complete deletion of haploinsufficient gene (KANSL1, HI score 3) |
| **PS1** | Strong | ≥95% overlap with ClinVar pathogenic deletion (VCV000012345) |
| **PM2** | Moderate | Absent from gnomAD SV database (>10,000 genomes) |
| **PP4** | Supporting | Patient phenotype consistent with Koolen-De Vries syndrome |

**Benign Evidence**: None

#### Evidence Summary

| Pathogenic | Benign |
|------------|--------|
| 1 Very Strong (PVS1) | None |
| 1 Strong (PS1) | |
| 1 Moderate (PM2) | |
| 1 Supporting (PP4) | |

#### Classification: **PATHOGENIC** ★★★

**Rationale**: Meets ACMG criteria for Pathogenic (1 Very Strong + 1 Strong). Complete deletion of established haploinsufficient gene (KANSL1) with exact match to known pathogenic deletion.

**Confidence**: ★★★ (High) - Multiple independent lines of strong evidence

#### Classification Certainty Factors

✅ **Strengths**:
- Exact match to well-characterized pathogenic deletion
- Complete deletion of definitive HI gene (KANSL1)
- Absent from population databases
- Phenotype highly consistent with gene-disease

⚠ **Limitations**:
- None significant - this is a well-established pathogenic SV
```

---

## Output Structure

### Report File: `SV_analysis_report.md`

```markdown
# Structural Variant Analysis Report: [SV_IDENTIFIER]

**Generated**: [Date] | **Analyst**: ToolUniverse SV Interpreter

---

## Executive Summary

| Field | Value |
|-------|-------|
| **SV Type** | Deletion / Duplication / Inversion / Translocation |
| **Coordinates** | chr17:44039927-44352659 (GRCh38) |
| **Size** | 313 kb |
| **Gene Content** | 2 genes fully contained, 0 partially disrupted |
| **Classification** | Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign |
| **Pathogenicity Score** | X.X / 10 |
| **Confidence** | ★★★ / ★★☆ / ★☆☆ |
| **Key Finding** | [One-sentence summary] |

**Clinical Action**: [Required / Recommended / None]

---

## 1. SV Identity & Classification

{SV type, coordinates, size, breakpoint precision, inheritance}

---

## 2. Gene Content Analysis

### 2.1 Fully Contained Genes
{Table of genes with functions, disease associations}

### 2.2 Partially Disrupted Genes
{Genes with breakpoints, domains affected}

### 2.3 Flanking Genes
{Genes near breakpoints, position effect risk}

---

## 3. Dosage Sensitivity Assessment

### 3.1 Haploinsufficient Genes
{ClinGen HI scores, pLI, evidence}

### 3.2 Triplosensitive Genes
{ClinGen TS scores, duplication syndromes}

### 3.3 Non-Dosage-Sensitive Genes
{Genes without established dosage effects}

---

## 4. Population Frequency Context

### 4.1 ClinVar Matches
{Known pathogenic/benign SVs}

### 4.2 gnomAD SV Database
{Population frequencies}

### 4.3 DECIPHER Patient Cases
{Similar SVs, phenotype matching}

---

## 5. Pathogenicity Scoring

### 5.1 Quantitative Assessment
{0-10 score with breakdown}

### 5.2 Score Components
{Gene content, dosage, frequency, clinical}

---

## 6. Literature & Clinical Evidence

### 6.1 Key Publications
{Functional studies, case series}

### 6.2 DECIPHER Cohort Analysis
{Phenotype frequencies, matching}

### 6.3 Functional Evidence
{Gene dosage studies}

---

## 7. ACMG-Adapted Classification

### 7.1 Evidence Codes Applied
{Pathogenic and benign codes with rationale}

### 7.2 Classification
{Final classification with confidence}

### 7.3 Certainty Factors
{Strengths and limitations}

---

## 8. Clinical Recommendations

### 8.1 For Affected Individual
{Testing, management, surveillance}

### 8.2 For Family Members
{Cascade testing, genetic counseling}

### 8.3 Reproductive Considerations
{Recurrence risk, prenatal testing}

---

## 9. Limitations & Uncertainties

{Missing data, conflicting evidence, knowledge gaps}

---

## Data Sources

{All tools and databases queried with results}
```

---

## Evidence Grading System

| Symbol | Confidence | Criteria |
|--------|------------|----------|
| ★★★ | High | ClinGen definitive, ClinVar expert reviewed, multiple independent studies |
| ★★☆ | Moderate | ClinGen strong/moderate, single good study, DECIPHER cohort support |
| ★☆☆ | Limited | Computational predictions only, case reports, emerging evidence |

---

## Special Scenarios

### Scenario 1: Recurrent Microdeletion Syndrome

**Additional considerations**:
- Check for recurrence mechanism (LCRs, NAHR)
- Look for founder effects
- Population-specific frequencies
- Incomplete penetrance
- Variable expressivity

**Example**: 22q11.2 deletion, 17q21.31 deletion (Koolen-De Vries)

### Scenario 2: Balanced Translocation (No Gene Disruption)

**Assessment approach**:
- If no genes disrupted: Likely benign (in most cases)
- Check for cryptic imbalances
- Consider position effects (rare)
- Reproductive risk (unbalanced offspring)

**Classification**: Usually VUS or Likely Benign unless offspring affected

### Scenario 3: Complex Rearrangement

**Analysis strategy**:
- Break down into component SVs
- Assess each breakpoint independently
- Look for chromothripsis pattern
- Consider cumulative gene dosage effects
- Check for DNA repair defects

### Scenario 4: Small In-Frame Deletion/Duplication

**Special considerations**:
- May not cause haploinsufficiency
- Check if critical domain affected
- Look for similar variants in ClinVar
- Consider protein structural impact
- May need functional studies

---

## Quantified Minimums

| Section | Requirement |
|---------|-------------|
| Gene content | All genes in SV region annotated |
| Dosage sensitivity | ClinGen scores for all genes (if available) |
| Population frequency | Check gnomAD SV + ClinVar + DGV |
| Literature search | ≥2 search strategies (PubMed + DECIPHER) |
| ACMG codes | All applicable codes listed |

---

## Tools Reference

### Core Tools for SV Analysis

| Tool | Purpose | Required? |
|------|---------|-----------|
| `ClinGen_search_dosage_sensitivity` | HI/TS scores | **Required** |
| `ClinGen_search_gene_validity` | Gene-disease validity | **Required** |
| `ClinVar_search_variants` | Known pathogenic/benign SVs | **Required** |
| `DECIPHER_search` | Patient cases, phenotypes | Highly recommended |
| `Ensembl_lookup_gene` | Gene coordinates, structure | **Required** |
| `OMIM_search`, `OMIM_get_entry` | Gene-disease associations | **Required** |
| `DisGeNET_search_gene` | Additional disease associations | Recommended |
| `PubMed_search` | Literature evidence | Recommended |
| `Gene_Ontology_get_term_info` | Gene function | Supporting |

---

## Report File Naming

```
SV_analysis_[TYPE]_chr[CHR]_[START]_[END]_[GENES].md

Examples:
SV_analysis_DEL_chr17_44039927_44352659_KANSL1_MAPT.md
SV_analysis_DUP_chr22_17400000_17800000_TBX1.md
SV_analysis_INV_chr11_2100000_2400000_complex.md
```

---

## Clinical Recommendations Framework

### For Pathogenic/Likely Pathogenic SVs

| SV Type | Recommendations |
|---------|-----------------|
| **Deletion (HI gene)** | Genetic counseling, cascade testing, phenotype-specific surveillance |
| **Duplication (TS gene)** | Same as deletion; check for dosage-specific syndrome |
| **Translocation (disruption)** | Assess both breakpoints, consider reproductive counseling |
| **Complex** | Multidisciplinary evaluation, research enrollment |

### For VUS

| Action | Details |
|--------|---------|
| Clinical management | Base on phenotype, not genotype |
| Follow-up | Reinterpret in 1-2 years or when phenotype evolves |
| Research | Functional studies if research-grade samples available |
| Family studies | Segregation analysis can reclassify |

### For Benign/Likely Benign

| Action | Details |
|--------|---------|
| Clinical | Not expected to cause rare disease |
| Family | No cascade testing needed (unless recurrent/reproductive risk) |
| Reproductive | Balanced translocation carriers may have offspring risk |

---

## When NOT to Use This Skill

- **Single nucleotide variants (SNVs)** → Use `tooluniverse-variant-interpretation` skill
- **Small indels (<50 bp)** → Use variant interpretation skill
- **Somatic variants in cancer** → Different framework needed
- **Mitochondrial variants** → Specialized interpretation required
- **Repeat expansions** → Different mechanism

Use this skill for **structural variants ≥50 bp** requiring dosage sensitivity assessment and ACMG-adapted classification.

---

## See Also

- `EXAMPLES.md` - Sample SV interpretations
- `README.md` - Quick start guide
- `tooluniverse-variant-interpretation` - For SNVs and small indels
- ClinGen Dosage Sensitivity Map: https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/
- ACMG SV Guidelines: Riggs et al., Genet Med 2020 (PMID: 31690835)