---
name: clonalstats
description: Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.
---

# ClonalStats Process Configuration

## Purpose
Generate comprehensive clonality statistics and diversity visualizations for TCR/BCR repertoire analysis. Quantifies clonal expansion, measures diversity metrics (Shannon, Simpson, Gini), and creates publication-ready plots.

## When to Use
- To quantify clonal expansion patterns in TCR/BCR data
- For diversity analysis comparing multiple samples or conditions
- To identify hyperexpanded clones and their distribution
- For rarefaction analysis to assess sampling depth
- After `ScRepCombiningExpression` to analyze integrated TCR+RNA data

## Configuration Structure

### Process Enablement
```toml
[ClonalStats]
cache = true
```

### Input Specification
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]
```

### Core Environment Variables
```toml
[ClonalStats.envs]
# Clone definition: "gene" (VDJC), "aa" (CDR3 amino acid), "nt" (CDR3 nucleotide)
clone_call = "aa"
# Chain analysis: "both", "TRA", "TRB", "TRG", "IGH", "IGL"
chain = "both"
# Data transformations (dplyr::mutate syntax)
mutaters = {}
# Data filtering (dplyr::filter syntax)
subset = null
# Output device parameters
devpars = {width = 800, height = 600, res = 100}
# Save code and data (large files - use with caution)
save_code = false
save_data = false
```

### Case-Based Plot Generation
```toml
[ClonalStats.envs.cases."Case Name"]
viz_type = "volume"  # volume, abundance, length, residency, stat,
                    # composition, overlap, diversity, geneusage,
                    # positional, kmer, rarefaction
```

## Diversity Metrics

| Metric | Range | Interpretation | Best For |
|--------|-------|----------------|----------|
| **shannon** | 0 - ∞ | Higher = more diversity | General comparison |
| **inv.simpson** | 1 - ∞ | Higher = more diversity | Common clones |
| **gini.coeff** | 0 - 1 | 0 = equality, 1 = inequality | Clonality dominance |
| **norm.entropy** | 0 - 1 | Higher = more diversity | Evenness-focused |
| **chao1** | ≥ richness | Estimates total richness | Small samples |
| **d50** | Count | Clones making up 50% | Practical dominance |

**Interpretation:**
- High diversity = Many unique clones, even distribution (healthy repertoire)
- Low diversity = Few dominant clones (antigen-specific response, infection, cancer)
- Gini ≈ 1 = Very skewed, few clones dominate
- Gini ≈ 0 = Even distribution

## Visualization Types

**viz_type options:**
- `volume` - Number of clones per sample/group
- `abundance` - Clone abundance distribution (trend/histogram/density)
- `length` - CDR3 sequence length distribution
- `residency` - Clones present across groups (venn/upset)
- `stat` - Expanded clone analysis (pies/sankey)
- `diversity` - Diversity metrics (bar/box/violin)
- `geneusage` - V/D/J gene usage frequency
- `rarefaction` - Sampling depth assessment

## Configuration Examples

### Minimal Configuration
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]
```

### Standard Diversity Analysis
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
plot_type = "box"
group_by = "Diagnosis"
comparisons = true

[ClonalStats.envs.cases."Gini Coeff"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "violin"
group_by = "Diagnosis"
add_box = true
```

### Expanded Clone Analysis
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Expanded Clones"]
viz_type = "stat"
plot_type = "pies"
group_by = "Diagnosis"
subgroup_by = "seurat_clusters"
clones = {"Expanded (>2)" = "sel(Colitis > 2)"}
```

### Rarefaction Analysis
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"
group_by = "Patient"
q = 1  # 0=richness, 1=shannon, 2=simpson
n_boots = 20
```

### Complete Analysis Suite
```toml
[ClonalStats.in]
screpfile = ["ScRepCombiningExpression"]

[ClonalStats.envs.cases."Volume"]
viz_type = "volume"

[ClonalStats.envs.cases."Abundance"]
viz_type = "abundance"
plot_type = "density"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"

[ClonalStats.envs.cases."Rarefaction"]
viz_type = "rarefaction"
```

## Common Patterns

### Disease vs Healthy
```toml
[ClonalStats.envs.cases."Comparison"]
viz_type = "diversity"
method = "gini.coeff"
plot_type = "box"
group_by = "Condition"
comparisons = true
```

### Time Course
```toml
[ClonalStats.envs.cases."Timepoint"]
viz_type = "volume"
x = "Timepoint"

[ClonalStats.envs.cases."Diversity"]
viz_type = "diversity"
method = "shannon"
group_by = "Timepoint"
```

### Treatment Response
```toml
[ClonalStats.envs.cases."Response"]
viz_type = "diversity"
method = "gini.coeff"
group_by = "Response"
plot_type = "box"
comparisons = true
```

## Dependencies
- **Upstream**: `ScRepCombiningExpression` (required)
- **Related**: `ScRepLoading`, `CDR3Clustering`, `TESSA` (optional)

## Validation Rules
- Input must be valid scRepertoire object
- For `viz_type = "diversity"`, method must be supported
- For rarefaction, `n_boots` should be ≥ 10
- Use `sel()` syntax in `clones` parameter for filtering

## Troubleshooting

**Sample column not found**: Input must have `Sample` column or specify `x` parameter.

**Strange diversity values**: Small repertoire sizes cause bias. Use `plot_type = "box"`.

**Rarefaction curves noisy**: Increase `n_boots` (try 50-100).

**Too many clones in stat plots**: Use `subset` or stricter `clones` thresholds.

**Plot generation slow**: Use `clone_call = "gene"` for speed, apply `subset`.

**Missing comparisons**: Set `comparisons = true` to add significance tests.

## Best Practices
1. Start with default cases to see standard visualizations
2. Use multiple diversity metrics: Shannon + Gini
3. Check rarefaction curves to ensure sufficient sampling
4. Document clone thresholds when defining expanded clones
5. Use `clone_call = "gene"` for speed, "aa" for granularity
6. Set `save_data = true` for debugging (watch disk space)
7. Validate findings with complementary diversity indices
8. Consider sample size: small samples underestimate richness