---
name: mechinterp-investigator
description: Orchestrate a systematic research program to investigate and meaningfully label SAE features
---

# MechInterp Investigator

This skill guides a systematic investigation of SAE features to arrive at meaningful, non-trivial labels. It orchestrates the other mechinterp skills into a coherent research workflow.

## Phase 0: Triage (ALWAYS START HERE)

**Goal:** Quickly filter out weak/auxiliary features that don't warrant deep investigation.

**Time:** 1-2 minutes

Many SAE features have minimal influence on model outputs. Triage identifies these early so you can skip expensive analysis.

### Step 0.1: Check Decoder Weight Percentile

```python
import torch

sae_path = '/mnt/e/dev_spillover/SplatNLP/sae_runs/run_20250704_191557/sae_model_final.pth'
sae_checkpoint = torch.load(sae_path, map_location='cpu', weights_only=True)
decoder_weight = sae_checkpoint['decoder.weight']  # [512, 24576]

# Get this feature's max absolute decoder weight
feature_decoder = decoder_weight[:, FEATURE_ID]
max_abs = torch.abs(feature_decoder).max().item()

# Compare to all features
all_max_abs = torch.abs(decoder_weight).max(dim=0).values
percentile = (all_max_abs < max_abs).float().mean() * 100

print(f"Feature {FEATURE_ID} decoder weight percentile: {percentile:.1f}%")
```

| Percentile | Action |
|------------|--------|
| < 10% | **Likely weak** - check overview structure |
| 10-25% | Borderline - overview decides |
| > 25% | Proceed to Phase 1 (Overview) |

### Step 0.2: Quick Overview Check (if <10%)

If decoder percentile < 10%, run a quick overview:

```bash
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {FEATURE_ID} --model ultra --top-k 10
```

**Signs of clear structure (proceed to Phase 1):**
- One family dominates (>40% of breakdown)
- Strong weapon concentration (>50% one weapon)
- Clear binary ability pattern
- Top PageRank token has score > 0.20

**Signs of no structure (label as weak):**
- Family breakdown is flat (all <15%)
- Weapons are diverse
- Top PageRank score < 0.10
- High sparsity (>99%) with no clear pattern

### Triage Decision

```
Decoder percentile < 10% AND no clear structure in overview?
  │
  Yes → Label as "Weak/Aux Feature {ID}" and STOP
  │
  No → Proceed to Phase 1 (Overview)
```

### Weak Feature Label Format

```json
{
  "dashboard_name": "Weak/Aux Feature {ID}",
  "dashboard_category": "auxiliary",
  "dashboard_notes": "TRIAGE: Decoder weight {X}th percentile, no clear structure in overview. Skipped deep dive.",
  "hypothesis_confidence": 0.0,
  "source": "claude code (triage)"
}
```

### When to Override Triage

Even with low decoder weights, proceed if:
- The feature is part of a cluster you're investigating
- You have external reason to believe it's important
- You're doing exhaustive analysis of a subset

---

## ⚠️ Deep Dive Basics

A proper deep dive requires **experiments**, not just reading overview data. The overview shows correlations; experiments reveal causation.

### Minimum Requirements for a Deep Dive

| Step | What to Do | Why |
|------|------------|-----|
| 1. Overview | Run overview to see correlations | Generate hypotheses |
| 2. 1D Sweeps | Test top 3-5 families with 1D sweeps | Find causal drivers (scaling abilities) |
| 3. Binary Check | For binary abilities (Comeback, Stealth Jump, LDE, Haunt, etc.), check presence rate | Binary abilities show delta=0 in sweeps but may still be characteristic |
| 4. Bottom Tokens | Check suppressors from overview | What the feature AVOIDS is often more informative |
| 5. 2D Heatmaps | Test interactions between primary driver and correlated tokens | Verify if correlations are causal or spurious |
| 6. Kit Analysis | Check if core weapons share sub/special/class pattern | Can explain "why" behind build philosophy - determine if causal or spurious |

### Binary Abilities Need Special Handling

**Binary abilities** (you have them or you don't) show **delta=0 in 1D sweeps** because there's no scaling. This does NOT mean they're unimportant.

| Binary Abilities |
|------------------|
| Comeback, Stealth Jump, Last-Ditch Effort, Haunt, Ninja Squid, Respawn Punisher, Object Shredder, Drop Roller, Opening Gambit, Tenacity |

**To evaluate binary abilities:**
1. Check PageRank score (correlation strength)
2. Check presence rate: What % of high-activation examples contain it?
3. Compare mean activation WITH vs WITHOUT the binary token
4. Run 2D heatmap: `scaling_ability × binary_ability` to see conditional effect

### Binary Ability Analysis Protocol (CRITICAL)

Binary abilities can have **strong conditional effects** that ONLY show up in 2D analysis. Here's the exact methodology:

**Step 1: Check presence rate enrichment**
```python
from splatnlp.mechinterp.skill_helpers import load_context
import polars as pl

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Find binary token ID
binary_id = None
for tok_id, tok_name in ctx.inv_vocab.items():
    if tok_name == 'comeback':  # or stealth_jump, etc.
        binary_id = tok_id
        break

# Calculate enrichment
threshold = df['activation'].quantile(0.90)  # Top 10%
high_df = df.filter(pl.col('activation') >= threshold)

with_binary_all = df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
with_binary_high = high_df.filter(pl.col('ability_input_tokens').list.contains(binary_id))

baseline_rate = len(with_binary_all) / len(df)
high_rate = len(with_binary_high) / len(high_df)
enrichment = high_rate / baseline_rate

print(f"Baseline presence: {baseline_rate:.1%}")
print(f"High-activation presence: {high_rate:.1%}")
print(f"Enrichment ratio: {enrichment:.2f}x")
# Enrichment > 1.5x suggests binary ability is characteristic
```

**Step 2: Check mean activation WITH vs WITHOUT**
```python
with_binary = df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
without_binary = df.filter(~pl.col('ability_input_tokens').list.contains(binary_id))

mean_with = with_binary['activation'].mean()
mean_without = without_binary['activation'].mean()
delta = mean_with - mean_without

print(f"Mean WITH: {mean_with:.4f}")
print(f"Mean WITHOUT: {mean_without:.4f}")
print(f"Delta: {delta:+.4f}")
# Delta > 0.03 suggests meaningful effect
```

**Step 3: Run 2D heatmap (MOST IMPORTANT)**

Binary abilities can have **conditional effects** that vary by the scaling ability level:

```python
# Manual 2D analysis for binary abilities
# (The built-in 2D heatmap may not handle binary tokens correctly)

scaling_ids = {3: 48, 6: 49, 12: 50, 21: 53, 29: 80}  # ISM example
binary_id = 27  # Comeback

print("Scaling | No Binary | With Binary | Delta")
print("-" * 50)

for level, tok_id in scaling_ids.items():
    level_df = df.filter(pl.col('ability_input_tokens').list.contains(tok_id))

    with_binary = level_df.filter(pl.col('ability_input_tokens').list.contains(binary_id))
    without_binary = level_df.filter(~pl.col('ability_input_tokens').list.contains(binary_id))

    mean_with = with_binary['activation'].mean() if len(with_binary) > 0 else 0
    mean_without = without_binary['activation'].mean() if len(without_binary) > 0 else 0
    delta = mean_with - mean_without

    print(f"{level:>7} | {mean_without:>9.4f} | {mean_with:>11.4f} | {delta:>+.4f}")
```

**Example (Feature 13352):**
```
ISM × Comeback 2D Analysis:
ISM | No CB  | With CB | Delta
  0 | 0.066  | 0.117   | +0.051
  3 | 0.122  | 0.261   | +0.139
  6 | 0.147  | 0.352   | +0.205  ← PEAK INTERACTION
 12 | 0.094  | 0.163   | +0.069
 21 | 0.094  | 0.129   | +0.035

Interpretation: Comeback has STRONG conditional effect at ISM 3-6.
The +0.205 delta at ISM_6 means Comeback DOUBLES the activation!
1D sweep showed delta=0 because most examples have ISM=0 (low baseline).
```

**Step 4: Test combinations of binary abilities together**
```python
# Test multiple binary abilities together
binary_id_1 = 27  # e.g., comeback
binary_id_2 = 1   # e.g., stealth_jump

both = df.filter(
    pl.col('ability_input_tokens').list.contains(binary_id_1) &
    pl.col('ability_input_tokens').list.contains(binary_id_2)
)
neither = df.filter(
    ~pl.col('ability_input_tokens').list.contains(binary_id_1) &
    ~pl.col('ability_input_tokens').list.contains(binary_id_2)
)

# Then do 2D analysis at each scaling level
# Combinations can have stronger effects than individual abilities!
```

**Key Insight:** Binary abilities may have stronger effects when combined. Always test combinations, not just individual tokens.

### Additional Learnings

1. **Conditional effects can be much stronger than marginal effects**: A feature might show ISM with only 0.069 max_delta in 1D sweeps, but a binary ability combination at moderate ISM could produce +0.335 delta - the interaction effect can be 5x stronger than the marginal effect. 1D sweeps can dramatically underestimate a feature's true behavior.

2. **Depletion is informative**: If a binary ability shows enrichment < 1.0 (e.g., 0.72x), the feature actively *avoids* that ability. This is meaningful for interpretation - it tells you what the feature excludes, not just what it includes.

3. **Manual 2D analysis required for binary tokens**: The `Family2DHeatmapRunner` uses `parse_token()` which expects `family_name_AP` format, but binary abilities appear as just the token name (e.g., `comeback` not `comeback_10`). Use manual 2D analysis code for binary abilities (see protocol above).

4. **"Weak feature" needs decoder weight check**: A feature with weak activation effects (max_delta < 0.03) might still have high influence on outputs. Remember: **net influence = activation strength × decoder weight**. Before labeling as "weak", check the feature's decoder weights to the output tokens it contributes to. A "weak activation" feature with high decoder weights may actually be important.

5. **Watch for error-correction features**: If 1D sweeps show small deltas or effects only in unusual rung combinations, the feature may fire when prerequisites are MISSING (OOD detection). Test "explains-away" behavior by comparing activation when low-level evidence is present vs missing. Example: Does feature fire MORE when SCU_3 is absent from a high-SCU build?

6. **Beware of flanderization in top activations**: The top 100 activations over-emphasize extreme cases. The TRUE concept often lives in the **mid-activation range (25-75th percentile)**. Always compare mid vs top activation regions - if they show different weapon/ability patterns, label the mid-range concept and note the extremes as "super-stimuli".

### What Counts as Evidence

| Evidence Type | Strength | Example |
|---------------|----------|---------|
| 1D sweep max_delta > 0.05 | Strong causal | "ISM drives this feature" |
| 1D sweep max_delta 0.02-0.05 | Weak causal | "ISM has minor effect" |
| 1D sweep max_delta < 0.02 | Negligible | "ISM doesn't drive this" |
| Binary delta = 0 | Inconclusive | Need presence rate check |
| High PageRank + low delta | Spurious correlation | Token co-occurs but doesn't cause |
| 2D heatmap shows conditional effect | Interaction confirmed | "X matters only when Y is high" |
| Bottom tokens (suppressors) | Avoidance pattern | "Feature avoids death-perks" |
| Higher activation when prerequisite MISSING | Error-correction | "Fires on OOD rung combos" |
| Mid-range (25-75%) differs from top | Flanderization | "Top is super-stimuli; label mid-range" |

### Common Mistakes to Avoid

1. **Presenting overview as findings** - Overview is hypotheses, not conclusions
2. **Ignoring binary abilities** - Delta=0 doesn't mean unimportant
3. **Skipping bottom tokens** - Suppressors reveal what feature avoids
4. **Only running 1D sweeps** - 2D heatmaps needed for interaction effects
5. **Not checking weapon patterns** - Feature may be weapon-specific, not ability-specific
6. **Using only top activations** - Top activations (90%+ of max) may be "flanderized" extremes; check core region (25-75% of max)
7. **Missing error-correction features** - Small deltas in weird rung combos may indicate OOD detection
8. **Confusing data sparsity with suppression** - Zero examples at a condition ≠ "suppression to 0" (see below)
9. **Shallow validation** - Just checking if numbers "look right" without running enrichment analysis
10. **Semantic contradictions in labels** - e.g., "Zombie" (embraces death) + "high SSU" (avoids death) is contradictory
11. **Reporting weapon percentages from top-100** - Use top 20-30% instead; top-100 can be 5-10x off (e.g., 78% vs 10%)
12. **Not checking meta archetypes** - Weapons may cluster by playstyle, not kit; use splatoon3-meta skill
13. **Assuming kit-based patterns** - Check if weapons share sub/special BEFORE assuming it's kit-related
14. **Ignoring flanderization crossover** - Note where a "super-stimulus" weapon overtakes the general pattern (usually 90%+ of max activation)

### ⚠️ CRITICAL: Data Sparsity vs Suppression

**This is a common and dangerous mistake.** When you see "activation = 0" or "no effect" at some condition, ask: **Is this suppression or data sparsity?**

**Example of the mistake (Feature 1819):**
```
Original claim: "QR is HARD SUPPRESSOR - SSU_57+QR_any=0.000"
Reality: There were ZERO examples with SSU_57 + any QR in the dataset!
         The "0.000" was missing data, not suppression.
```

**How to detect data sparsity:**
```python
# ALWAYS check sample sizes when claiming suppression!
at_high_ssu = df.filter(pl.col('ability_input_tokens').list.contains(ssu_57_id))
with_qr = at_high_ssu.filter(pl.col('ability_input_tokens').list.set_intersection(qr_ids).list.len() > 0)

print(f"Examples at SSU_57 with QR: {len(with_qr)}")  # If 0, this is SPARSITY not suppression!
```

**Rule:** Never claim "suppression" unless you have ≥20 examples in the suppressed condition. Report sample sizes with all claims.

## Philosophy

A **meaningful label** should capture:
- What concept the feature encodes (not just "detects token X")
- Why the model might have learned this representation
- How it relates to strategic/tactical gameplay

**Avoid trivial labels** like:
- "SCU Detector" (just describes token presence)
- "High activation feature" (describes statistics, not meaning)

**Aim for interpretable labels** like:
- "Aggressive Slayer Build" (strategic concept)
- "Special Spam Enabler" (functional role)
- "Backline Support Kit" (playstyle archetype)

## Investigation Workflow

### Phase 0: Triage

See [Phase 0: Triage](#phase-0-triage-always-start-here) above. **Always start here.**

If feature passes triage (decoder weight ≥10% OR has clear structure), proceed to Phase 1.

### Phase 1: Initial Assessment

Run the overview and classify the feature type:

```bash
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 20
```

**Classify based on family breakdown:**

| Pattern | Type | Next Steps |
|---------|------|------------|
| One family >40% | Single-family | Check for interference, weapon specificity |
| Top 2-3 families ~20% each | Multi-family | Check synergy/redundancy, build archetype |
| Many families <15% each | Distributed | Look for meta-pattern, weapon class |
| Weapons concentrated | Weapon-specific | Weapon sweep, class analysis |

**CRITICAL**: Always check for non-monotonic effects! Higher AP doesn't always mean higher activation.

### Phase 1.5: Activation Region Analysis (CRITICAL - Anti-Flanderization)

**Don't only examine extreme activations!** High activations may be "flanderized" - exaggerated, extreme versions of the true concept that over-emphasize niche cases.

**Key insight:** The TRUE concept often lives in the **core region (25-75% of effective max)**, not the top examples. Top activations (90%+ of effective max) can mislead you into labeling a niche pattern instead of the general concept.

**Why "effective max"?** Activation distributions are heavy-tailed. Using `effective_max = 99.5th percentile of nonzero activations` prevents single outliers from making the core region nearly empty.

Run activation region analysis:

```python
from splatnlp.mechinterp.skill_helpers import load_context
import numpy as np
from collections import Counter

ctx = load_context("{MODEL}")
df = ctx.db.get_all_feature_activations_for_pagerank({FEATURE_ID})

acts = df['activation'].to_numpy()
weapons = df['weapon_id'].to_list()

# Use EFFECTIVE MAX (99.5th percentile) to handle heavy-tailed distributions
# This prevents single outliers from making the core region nearly empty
nonzero_acts = acts[acts > 0]
effective_max = np.percentile(nonzero_acts, 99.5)
true_max = acts.max()
print(f"True max: {true_max:.4f}, Effective max (99.5%ile): {effective_max:.4f}")

# Define activation regions as % of EFFECTIVE max
regions = [
    ('Floor (≤1%)', lambda a: a <= 0.01 * effective_max),
    ('Low (1-10%)', lambda a: 0.01 * effective_max < a <= 0.10 * effective_max),
    ('Below Core (10-25%)', lambda a: 0.10 * effective_max < a <= 0.25 * effective_max),
    ('Core (25-75%) - TRUE CONCEPT', lambda a: 0.25 * effective_max < a <= 0.75 * effective_max),
    ('High (75-90%)', lambda a: 0.75 * effective_max < a <= 0.90 * effective_max),
    ('Flanderization Zone (90%+)', lambda a: a > 0.90 * effective_max),
]

for region_name, filter_fn in regions:
    indices = [i for i, a in enumerate(acts) if filter_fn(a)]
    weps = [weapons[i] for i in indices]
    print(f"\n{region_name} (n={len(indices)}):")
    for wep, count in Counter(weps).most_common(5):
        name = ctx.id_to_weapon_display_name(wep)
        print(f"  {name}: {count}")
```

**Key signals to look for:**

| Pattern | Interpretation |
|---------|----------------|
| Same weapons in ALL regions | General concept (continuous feature) |
| Different weapons in core vs 90%+ | Super-stimuli detected |
| Diverse weapons in core, concentrated in 90%+ | True concept is in core region |
| Niche weapons only in 90%+ | High activations are "flanderized" extremes |

**Example (Feature 9971):**
```
Core (25-75%): Splattershot (115), Wellstring (65), Sploosh (57)...
Flanderization (90%+): Bloblobber (44), Glooga Deco (39), Range Blaster (28)

Interpretation: Core region shows GENERAL offensive investment.
Flanderization zone shows EXTREME SCU on special-dependent weapons (super-stimuli).
Label the general concept, note the super-stimuli pattern.
```

**CRITICAL**: Always check the **Bottom Tokens (Suppressors)** section! Tokens that rarely appear in high-activation examples can reveal what the feature *avoids*:

| Suppressor Pattern | Interpretation |
|-------------------|----------------|
| Death-mitigation (QR, SS, CB) suppressed | Feature avoids "death-accepting" builds |
| Defensive (IR, SR) suppressed | Feature prefers aggressive/ranged builds |
| Mobility suppressed | Feature prefers stationary/positional play |
| Special abilities suppressed | Feature encodes non-special playstyle |

**Example**: If SCU is enhanced but `quick_respawn`, `special_saver`, and `comeback` are ALL suppressed, the feature doesn't just detect "SCU" - it detects "death-averse SCU builds" (players who stack SCU but don't plan to die).

### Phase 1.6: Weapon Distribution Analysis (CRITICAL - Anti-Flanderization)

**NEVER report weapon percentages from top-100 samples.** Top-100 is severely flanderized and can give wildly misleading weapon distributions.

**Example (Feature 14096 - Real Case):**
```
Top 100:     Dark Tetra 78%, Stamper 20%  ← WRONG, flanderized
Top 10%:     Stamper 35%, Dark Tetra 21%  ← Better but still skewed
Top 30%:     Stamper 23%, Dark Tetra 10%  ← TRUE CONCEPT
Full dataset: Stamper 9%, Dark Tetra 3.5% ← Includes noise/floor
```

**Use top 20-30% for weapon characterization:**

```python
import polars as pl
import numpy as np
from collections import Counter
from splatnlp.mechinterp.skill_helpers import load_context

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Get percentile thresholds
acts = df['activation'].to_numpy()
thresholds = {p: np.percentile(acts, p) for p in [0, 50, 70, 80, 90, 95, 99]}

# Analyze by region
regions = [
    ("Bottom 50% (noise)", 0, 50),
    ("50-70% (weak)", 50, 70),
    ("Top 30% (TRUE CONCEPT)", 70, 100),
    ("Top 10%", 90, 100),
    ("Top 1% (flanderized)", 99, 100),
]

print("Region | Top Weapons")
print("-" * 60)

for name, p_low, p_high in regions:
    t_low, t_high = thresholds[p_low], thresholds.get(p_high, float('inf'))
    if p_high == 100:
        region_df = df.filter(pl.col('activation') >= t_low)
    else:
        region_df = df.filter((pl.col('activation') >= t_low) & (pl.col('activation') < t_high))

    if len(region_df) == 0:
        continue

    weapon_counts = region_df.group_by('weapon_id').agg(
        pl.col('activation').count().alias('n')
    ).sort('n', descending=True)

    top3 = []
    for row in weapon_counts.head(3).iter_rows(named=True):
        wname = ctx.id_to_weapon_display_name(row['weapon_id'])
        pct = row['n'] / len(region_df) * 100
        top3.append(f"{wname[:12]}({pct:.0f}%)")

    print(f"{name:<25} | {', '.join(top3)}")
```

**Interpretation Guide:**

| Pattern | Meaning |
|---------|---------|
| Same weapons in top-30% and top-1% | Continuous feature, no flanderization |
| Different weapons in top-30% vs top-1% | **Flanderization detected** - label top-30% concept |
| One weapon jumps from 10% to 70%+ | That weapon is "super-stimulus" for the feature |
| Weapons consistent 50%→30%→10%→1% | Stable feature, safe to use any region |

**Rule: Report weapon percentages from top 20-30%, note if top-1% differs significantly.**

### Phase 1.6.5: Ability Flanderization Check (CRITICAL)

**The same flanderization that applies to weapons applies to abilities.** A binary ability with high tail enrichment but low core coverage is a **super-stimulus**, not the core concept.

**The Rule:** If a "dominant" driver has **<30% core coverage**, it's a **tail marker**, not the headline concept.

**Use the core coverage experiment:**

```bash
cd /root/dev/SplatNLP

# Direct subcommand (recommended)
poetry run python -m splatnlp.mechinterp.cli.runner_cli coverage \
    --feature-id {FEATURE_ID} --model ultra \
    --tokens respawn_punisher,comeback,stealth_jump \
    --threshold 0.30
```

**Output tables:**
- `token_coverage`: Shows core_coverage_pct, tail_enrichment, is_tail_marker for each token
- `weapon_coverage`: Shows core vs tail weapon distributions (catches weapon flanderization)

**Coverage Interpretation:**

| Core Coverage | Interpretation | Label Implication |
|---------------|----------------|-------------------|
| >50% | **Primary driver** | Safe to headline |
| 30-50% | **Significant but not universal** | Mention in notes, not headline |
| <30% | **Tail marker / super-stimulus** | NOT the headline concept |

**Example (Feature 13934):**
```
respawn_punisher: 8.57x tail enrichment, BUT only 12% core coverage
→ RP is a super-stimulus, NOT the core concept
→ Wrong label: "RP Backline Anchor"
→ Right approach: Split core by RP presence to reveal hidden modes
```

**When you find a super-stimulus (<30% coverage):**
1. Split the core by presence/absence of the super-stimulus
2. Analyze both modes separately
3. Look for what they have in COMMON (the true concept)
4. Label the commonality, note the super-stimulus as a tail marker

### Phase 1.7: Meta-Informed Weapon Analysis (USE AFTER WEAPON SWEEP)

After identifying top weapons, **always check if they match a known meta archetype** using the `splatoon3-meta` skill.

**Step 1: Look up weapon kits**

Check `references/weapons.md` for each top weapon's sub and special:

```python
# Top weapons from Feature 14096 (top 30%):
kits = {
    "Splatana Stamper": ("Burst Bomb", "Zipcaster"),
    "Dark Tetra Dualies": ("Autobomb", "Reefslider"),
    "Glooga Dualies": ("Splash Wall", "Booyah Bomb"),
    "Dapple Dualies Nouveau": ("Torpedo", "Reefslider"),
    "Splatana Wiper": ("Torpedo", "Ultra Stamp"),
}

# Check for shared subs/specials
from collections import Counter
subs = Counter(k[0] for k in kits.values())
specials = Counter(k[1] for k in kits.values())

# If one sub/special dominates → kit-based feature
# If diverse → playstyle-based feature
```

**Step 2: Check archetype reference**

Read `references/archetypes.md` to see if weapons match a known archetype:

| Archetype | Key Weapons | Signature Abilities |
|-----------|-------------|---------------------|
| Zombie Slayer | Tetra Dualies, Splatana Wiper | QR + Comeback + Stealth Jump |
| Stealth Slayer | Carbon Roller, Inkbrush | Ninja Squid + SSU + Stealth Jump |
| Anchor/Backline | E-liter, Hydra Splatling | Respawn Punisher + Object Shredder |
| Support/Beacon | Squid Beakon weapons | Sub Power Up + ISS + Comeback |

**Step 3: Classification decision**

```
Kit Analysis Result:
├─ Shared sub weapon? → Feature may encode SUB PLAYSTYLE
├─ Shared special? → Feature may encode SPECIAL FARMING
├─ No kit pattern + archetype match? → PLAYSTYLE FEATURE (label as archetype)
└─ No kit pattern + no archetype? → WEAPON CLASS feature (check if all dualies, all shooters, etc.)
```

**Example (Feature 14096):**
```
Top 30% weapons: Stamper, Dark Tetra, Glooga, Dapple, Wiper
Kit analysis: Diverse subs (Burst, Auto, Splash Wall, Torpedo), diverse specials
Archetype check: Dark Tetra + Splatana Wiper = "Zombie Slayer" archetype!
Conclusion: PLAYSTYLE feature encoding Zombie Slayer (death-accepting aggressive)
Label: "Zombie Slayer QR (Splatana/Dualies)" - tactical category
```

**When to invoke splatoon3-meta skill:**
- After weapon_sweep shows concentrated weapon pattern
- When top weapons seem unrelated by kit but share a playstyle
- To validate that ability patterns match expected meta builds
- To identify if weapons share archetype despite different kits

### Phase 1.7.5: Kit Component Analysis (OPTIONAL but Recommended)

**When to use:** After weapon sweep, check if the core weapons share patterns in ANY kit component: **sub weapon**, **special weapon**, or **main weapon class**. This can reveal WHY certain build philosophies emerge.

**Key insight:** Weapons may cluster by:
- **Sub weapon** (Burst Bomb users, Beakon users → explains SPU/ISS builds)
- **Special weapon** (Aggressive push specials → explains survival builds)
- **Main weapon class** (All dualies, all chargers → explains mobility/positioning builds)

The feature may be driven by ONE of these - identify which, then determine if it's causal or spurious.

---

#### Component 1: Sub Weapon Pattern Analysis

**When relevant:** If kit_sweep (Phase 1.7/3d) shows sub concentration, investigate further.

```python
from collections import Counter

# Map top weapons to their subs (from weapons.md)
weapon_subs = {
    "Splattershot Jr.": "Splat Bomb",
    "Neo Splash-o-matic": "Suction Bomb",
    "Sploosh-o-matic 7": "Splat Bomb",
    # ... add more as needed
}

# Categorize subs
sub_categories = {
    # Lethal bombs
    "Splat Bomb": "lethal", "Suction Bomb": "lethal", "Burst Bomb": "lethal",
    "Curling Bomb": "lethal", "Autobomb": "lethal", "Torpedo": "lethal",
    "Fizzy Bomb": "lethal", "Ink Mine": "lethal", "Toxic Mist": "lethal",
    # Utility/Support
    "Squid Beakon": "utility", "Splash Wall": "utility", "Sprinkler": "utility",
    "Point Sensor": "utility", "Angle Shooter": "utility",
}

# Count categories
sub_counts = Counter()
for weapon in top_weapons:
    sub = weapon_subs.get(weapon)
    if sub:
        category = sub_categories.get(sub, "other")
        sub_counts[category] += 1

print("Sub Weapon Breakdown:")
for sub, count in Counter(weapon_subs.get(w) for w in top_weapons if weapon_subs.get(w)).most_common():
    print(f"  {sub}: {count}")
```

**Sub pattern implications:**

| Sub Pattern | Build Implication | Example |
|-------------|-------------------|---------|
| Shared Beakons | SPU/ISS focus for sub spam | Beacon Support builds |
| Shared Burst Bomb | Mobility + burst damage | Aggressive flanker builds |
| Shared Splash Wall | Positional/defensive play | Lane control builds |
| Diverse subs | Sub is NOT the clustering factor | Check special or main class |

---

#### Component 2: Special Weapon Pattern Analysis

**When relevant:** After weapon sweep, check if core weapons share a special weapon pattern.

```python
from collections import Counter

# Map top weapons to their specials (from weapons.md)
weapon_specials = {
    "Splatana Stamper": "Zipcaster",
    "Sloshing Machine": "Booyah Bomb",
    "Squeezer": "Trizooka",
    # ... add more as needed
}

# Categorize specials
special_categories = {
    # Zoning/Area Denial
    "Ink Storm": "zoning", "Wave Breaker": "zoning", "Tenta Missiles": "zoning",
    "Killer Wail 5.1": "zoning", "Triple Inkstrike": "zoning",
    # Team Support
    "Tacticooler": "team_support", "Big Bubbler": "team_support",
    "Splattercolor Screen": "team_support",
    # Aggression/Push
    "Trizooka": "aggression", "Crab Tank": "aggression", "Ink Jet": "aggression",
    "Ultra Stamp": "aggression", "Booyah Bomb": "aggression", "Reefslider": "aggression",
    "Kraken Royale": "aggression", "Zipcaster": "aggression",
    # Utility/Defense
    "Ink Vac": "utility", "Super Chump": "utility", "Triple Splashdown": "utility",
}

# Count categories
category_counts = Counter()
for weapon in top_weapons:
    special = weapon_specials.get(weapon)
    if special:
        category = special_categories.get(special, "other")
        category_counts[category] += 1

print("Special Category Breakdown:")
for cat, count in category_counts.most_common():
    print(f"  {cat}: {count/sum(category_counts.values())*100:.0f}%")
```

**Special pattern implications:**

| Special Pattern | Build Implication | Example |
|-----------------|-------------------|---------|
| >60% aggression | Players build for survival to deploy push specials | Feature 14964 |
| >60% zoning | Players may invest in SCU/SPU for area denial uptime | Ink Storm spam |
| >50% team_support | Team-oriented builds, may see Tenacity/CB | Support kit |
| Diverse specials | Special is NOT the clustering factor | Check sub or main class |

---

#### Component 3: Main Weapon Class Pattern Analysis

**When relevant:** If weapons seem diverse but may share a class (all shooters, all dualies, all chargers).

```python
# Weapon class mapping (from weapon-vibes.md)
weapon_classes = {
    "Splattershot": "shooter", "Splattershot Jr.": "shooter", "Splattershot Pro": "shooter",
    "Dark Tetra Dualies": "dualie", "Dapple Dualies": "dualie", "Splat Dualies": "dualie",
    "E-liter 4K": "charger", "Splat Charger": "charger", "Goo Tuber": "charger",
    "Luna Blaster": "blaster", "Range Blaster": "blaster", "Rapid Blaster": "blaster",
    "Hydra Splatling": "splatling", "Mini Splatling": "splatling",
    "Splatana Stamper": "splatana", "Splatana Wiper": "splatana",
    # ... add more as needed
}

# Count classes
class_counts = Counter(weapon_classes.get(w, "other") for w in top_weapons)

print("Weapon Class Breakdown:")
for cls, count in class_counts.most_common():
    pct = count / len(top_weapons) * 100
    print(f"  {cls}: {pct:.0f}%")
```

**Class pattern implications:**

| Class Pattern | Build Implication | Example |
|---------------|-------------------|---------|
| >60% dualies | Mobility-focused, dodge-roll builds | SSU + QSJ synergy |
| >60% chargers | Positioning, low death tolerance | Anchor builds |
| >60% blasters | Burst damage, trade-happy | QR + Comeback synergy |
| >60% splatlings | Charge management, lane holding | ISM + positioning |
| Diverse classes | Class is NOT the clustering factor | Check sub or special |

---

#### Step 4: Determine if Pattern is CAUSAL or SPURIOUS

**This is the critical step.** A strong pattern in ANY component could be causal or spurious.

| Pattern Type | Evidence | Implication |
|--------------|----------|-------------|
| **CAUSAL** | Kit component explains build philosophy | Include in label rationale |
| **SPURIOUS** | Weapons share other traits that better explain clustering | Don't emphasize that component |

**Questions to determine causality:**

1. **Does the kit component align with decoder output?**
   - Decoder promotes SCU/SS/SPU + aggressive specials → Special farming is likely causal
   - Decoder promotes ISS/SPU + shared sub weapon → Sub spam is likely causal
   - Decoder promotes SSU/QSJ + all dualies → Weapon class mobility is likely causal

2. **Do weapons share OTHER traits that better explain the clustering?**
   - All dualies with aggressive specials → Is it the CLASS or the SPECIAL?
   - Test: Do other dualies (without aggressive specials) also cluster here?

3. **Does the build philosophy make sense for this kit component?**
   - Survival builds + aggressive specials → "Stay alive to use push special" (causal)
   - Mobility builds + all dualies → "Dualies need SSU for dodge-roll play" (causal)
   - Survival builds + diverse subs/specials + all chargers → "Chargers can't trade" (class is causal)

**Example Analysis (Special-driven):**

```
Feature 14964 special breakdown: 77% aggression (Zipcaster, Booyah Bomb, Trizooka)
Build philosophy: "Balanced utility spread for survival"

Analysis:
- Decoder suppresses death-trading (Comeback, RP) ✓
- Decoder promotes survival abilities (SS, ISM) ✓
- Weapons have LOW-MED death tolerance ✓
- Weapons have aggressive push specials ✓
- Sub weapons are DIVERSE (no pattern)
- Weapon classes are DIVERSE (shooters, slosher, splatana)

Conclusion: CAUSAL - Players build for survival BECAUSE they have aggressive specials
           that require staying alive to deploy effectively.

Note: "Core weapons have aggressive push specials (77%) requiring survival to deploy"
```

**Example Analysis (Class-driven):**

```
Feature shows: 80% dualies (Dark Tetra, Dapple, Dualie Squelchers)
Decoder promotes: SSU, QSJ, RSU (mobility family)

Analysis:
- Specials are DIVERSE (not the driver)
- Subs are DIVERSE (not the driver)
- All weapons are DUALIES with dodge-roll mechanics ✓
- Dualies benefit uniquely from SSU for roll distance/recovery

Conclusion: CAUSAL - Dualies cluster because dodge-roll playstyle needs mobility
           The feature encodes "dualie mobility optimization"
```

**Counter-example (Spurious):**

```
Feature has 70% aggression specials
But: All weapons are CLOSE-range SLAYER with HIGH death tolerance
And: Decoder promotes QR, Comeback (death-trading)

Conclusion: SPURIOUS - Weapons are aggressive slayers who happen to have aggressive specials
           The special type is incidental to the slayer playstyle.
           Primary driver is ROLE (slayer), not KIT.
```

---

#### Step 5: Record findings in notes

**If pattern is CAUSAL, add to dashboard_notes:**
```
KIT PATTERN: {component} - {X}% {category/type} ({list top examples}).
INTERPRETATION: [Why this explains the build philosophy]
```

**If pattern is SPURIOUS, note briefly:**
```
KIT PATTERN: Diverse/incidental. Weapons cluster by [range/role/playstyle], not kit.
```

---

#### When to skip this phase:
- Feature is clearly mechanical (single ability stacker like "SCU_57 threshold")
- Weapons are highly diverse with no concentration in any component
- Earlier analysis already identified clear driver (e.g., single weapon dominance)

### Phase 1.8: Weapon Range/Role Classification (REQUIRED for Labels)

Before proposing any label, you MUST classify the feature's weapons by range and role. This prevents incorrect role assumptions (e.g., calling Jr./Rapid Blasters "anchors" when they're midrange).

**Step 1: Extract properties for top 5-10 core weapons from weapon-vibes.md**

| Property | Values | Label Implication |
|----------|--------|-------------------|
| RANGE | CLOSE, MID, LONG, SNIPER | Determines qualifier |
| LANE | FRONT, MID, BACK, FLEX | Confirms positioning |
| JOB | SLAYER, SUPPORT, ANCHOR, SKIRMISH, ASSASSIN | Determines role word |
| NS_FIT | CORE, GOOD, MEH, BAD, NO | Stealth vs visible |
| DEATH_TOL | HIGH, MED, LOW | Trading vs survival |

**Step 2: Find the common pattern**

If most weapons share:
- LONG/SNIPER + BACK + ANCHOR → use "Anchor" or "Backline" qualifier
- MID/LONG + MID + SKIRMISH/SUPPORT → use "Midrange" qualifier
- CLOSE/MID + FRONT + SLAYER → use "Slayer" or "Frontline" qualifier
- NO/BAD NS_FIT + LOW DEATH_TOL → "Visible" or "Positional" concept (not stealth, not trading)

**Step 3: Record in notes**

Always include weapon classification in dashboard_notes:
```
WEAPON ROLE: Midrange (MID-LONG range, SKIRMISH/SUPPORT jobs, NO/BAD NS fit, LOW death tolerance)
```

### Phase 2: Hypothesis Generation

Based on Phase 1, generate hypotheses about what the feature might encode:

**For single-family dominated features:**
- H1: Pure token detector (trivial - try to disprove)
- H2: Threshold detector (activates only at high AP)
- H3: Interaction detector (family + something else)
- H4: Weapon-conditional (family matters only for certain weapons)

**For multi-family features:**
- H1: Synergy detector (families work together)
- H2: Build archetype (strategic loadout pattern)
- H3: Playstyle indicator (aggressive, defensive, support)
- H4: Shared NEED (different builds solving the same tactical problem)

### Build NEED Framework (For Multi-Modal/Diffuse Features)

**When a feature activates on seemingly different build types, ask: "What NEED do these builds share?"**

Features can encode solutions to problems, not just correlations. Different builds may trigger the same feature because they're different answers to the same question.

**Step 1: Identify the tactical constraint these builds solve**

| Question | Example |
|----------|---------|
| What gameplay problem do these builds address? | "How to handle death for low-death-tolerance weapons" |
| What enemy behavior are they countering? | "Dealing with aggressive flankers" |
| What win condition are they enabling? | "Special pressure" or "Map control" |

**Step 2: Check weapon properties (use splatoon3-meta)**

Compare enriched weapons on these axes from `weapon-vibes.md`:
- **Ink feel**: STARVING / HUNGRY / AVERAGE / EFFICIENT / PAINTER
- **Range**: MELEE / CLOSE / MID / LONG / SNIPER
- **Ninja Squid affinity**: CORE / GOOD / MEH / BAD / NO
- **Death tolerance**: HIGH / MED / LOW
- **Role**: SLAYER / SUPPORT / ANCHOR / SKIRMISH / ASSASSIN

If all enriched weapons share properties (e.g., all HUNGRY ink + NO ninja squid + LOW death tolerance), the feature may encode a need specific to that weapon class.

**Step 3: Reframe the modes as "answers to the same question"**

**Example (Feature 13934):**
```
Mode A (12%): RP anchor builds (E-liter) - "I won't die, make their deaths hurt"
Mode B (88%): Zombie utility builds (DS) - "I will die sometimes, optimize respawns"

Shared NEED: "Death management for non-stealth, low-death-tolerance, midrange+ weapons"
Both modes are VALID ANSWERS to the same tactical question.
```

**Step 4: Label the NEED, not the modes**

Instead of: "Mixed: Zombie + RP Anchor" (describes the modes)
Label as: "Balanced Utility Axis (Non-Stealth Midline+)" (describes the need)

**Key Insight:** The model learned that these seemingly different builds share a common requirement. The feature encodes that requirement, and the modes are just different implementations.

**For weapon-specific features:**
- H1: Weapon class pattern (all shooters, all chargers, etc.)
- H2: Meta build (optimal loadout for that weapon)
- H3: Weapon-ability interaction

### Phase 3: Targeted Experiments

Run experiments to test hypotheses. **Available experiment types:**

| Type | Purpose |
|------|---------|
| `family_1d_sweep` | Activation across AP rungs for one family |
| `family_2d_heatmap` | Interaction between two families |
| `within_family_interference` | Detect error correction within a family |
| `weapon_sweep` | Activation by weapon (optionally conditioned on family) |
| `weapon_group_analysis` | Compare high vs low activation by weapon |
| `pairwise_interactions` | Synergy/redundancy between tokens |
| `token_influence_sweep` | Identify enhancers and suppressors across all tokens |

## ⚠️ CRITICAL: Iterative Conditional Testing Protocol

**1D sweeps can be MISLEADING for secondary abilities.** When a feature has a strong primary driver:

### The Problem

1D sweep for secondary ability (e.g., QR) across ALL contexts might show **delta ≈ 0**

**Why this happens:**
- Most contexts have LOW primary driver (e.g., low SCU) → activation already near zero
- Secondary ability can't suppress what's already zero
- The few high-primary contexts get drowned out in the average

**Example (Feature 18712):**
```
QR 1D sweep (all contexts): mean_delta = -0.0006 → "QR has no effect" ❌ WRONG!
SCU × QR 2D heatmap:
  - At SCU_15: QR_0=0.13, QR_12=0.04 → QR suppresses 70%! ✅
  - At SCU_29: QR_0=0.15, QR_12=0.04 → QR suppresses 74%! ✅
```

### The Solution: Iterative 2D Testing

**Protocol for features with a strong primary driver:**

```
1. Confirm primary driver with 1D sweep
   └─ If monotonic response confirmed → proceed to step 2

2. For EACH correlated ability in overview (top 5-10):
   └─ Run 2D heatmap: PRIMARY × SECONDARY
   └─ Check activation at EACH primary level
   └─ Look for:
      - Suppression: secondary reduces activation at high primary
      - Synergy: secondary boosts activation at high primary
      - Spurious: no conditional effect (correlation was coincidence)

3. Group findings by semantic category:
   └─ Death-mitigation (QR, SS, CB): all suppress? → "death-averse"
   └─ Mobility (SSU, RSU): all enhance? → "mobility-synergistic"
   └─ Efficiency (ISM, ISS): mixed? → test individually
```

### 2D Heatmap Interpretation Guide

| Pattern | Interpretation |
|---------|----------------|
| Peak at (high_X, 0_Y) | Y is a **suppressor** |
| Peak at (high_X, high_Y) | Y is a **synergy** |
| Flat across Y at each X | Y has **no conditional effect** (spurious) |
| Non-monotonic in X at some Y | **Interference** pattern |

### Heatmap Cell Validity Check

**Before drawing conclusions from heatmap cells, check the cell metadata:**

Each cell in heatmap output includes:
- `n`: Number of valid samples in this cell
- `std`: Standard deviation of activations
- `stderr`: Standard error (std / sqrt(n)) - **new field**

| n (samples) | Interpretation |
|-------------|----------------|
| null/0 | Impossible combination (constraint violation) - **don't interpret** |
| 1-4 | Very weak evidence - note uncertainty in conclusions |
| 5-20 | Moderate evidence - interpret with caution |
| 20+ | Strong evidence - interpret confidently |

**High stderr (>0.1)** indicates high variance - the mean may not be reliable.

**Anti-patterns to avoid:**
- Drawing conclusions from cells with n < 5
- Claiming "peak at X=57, Y=29" when that cell has n=2
- Ignoring null cells (they represent impossible ability combinations)

**Example interpretation:**
```
Cell (ISM=51, IRU=29): mean=0.35, n=3, stderr=0.08
→ "ISM=51 with IRU=29 shows high activation, but n=3 means this could be noise"

Cell (ISM=51, IRU=0): mean=0.35, n=45, stderr=0.02
→ "ISM=51 without IRU shows reliable high activation (n=45)"
```

### When to Use 2D vs 1D

| Scenario | Use 1D | Use 2D |
|----------|--------|--------|
| Testing primary driver | ✅ | - |
| Testing secondary abilities | ❌ MISLEADING | ✅ REQUIRED |
| Looking for interactions | - | ✅ |
| Confirming suppressor hypothesis | - | ✅ |
| Quick initial scan | ✅ (with caution) | - |

### Template: Death-Aversion Test Battery

For single-family dominated features, always test death-mitigation:

```bash
# Test 1: Primary × Quick Respawn
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {PRIMARY} --family-y quick_respawn \
    --rungs-x 0,6,15,29,41,57 --rungs-y 0,6,12,21,29

# Test 2: Primary × Special Saver
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {PRIMARY} --family-y special_saver \
    --rungs-x 0,6,15,29,41,57 --rungs-y 0,3,6,12,21

# Test 3: Primary × Comeback (binary ability - use binary subcommand for this)
poetry run python -m splatnlp.mechinterp.cli.runner_cli binary \
    --feature-id {ID} --model ultra
```

If ALL three show suppression at Y>0, label includes "death-averse"

### Template: Error-Correction Detection

If 1D sweeps show **small deltas** or effects **only in unusual rung combinations**, test for error-correction behavior:

```python
import polars as pl
from splatnlp.mechinterp.skill_helpers import load_context

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Get token IDs for high and low rungs
# Example: SCU_57 (high) and SCU_3 (low)
high_rung_id = ctx.vocab['special_charge_up_57']
low_rung_id = ctx.vocab['special_charge_up_3']

# Compare activation when low rung is present vs missing (among high-rung builds)
high_with_low = df.filter(
    pl.col('ability_input_tokens').list.contains(high_rung_id) &
    pl.col('ability_input_tokens').list.contains(low_rung_id)
)
high_without_low = df.filter(
    pl.col('ability_input_tokens').list.contains(high_rung_id) &
    ~pl.col('ability_input_tokens').list.contains(low_rung_id)
)

mean_with = high_with_low['activation'].mean()
mean_without = high_without_low['activation'].mean()

print(f"High rung WITH low rung present: {mean_with:.4f} (n={len(high_with_low)})")
print(f"High rung WITHOUT low rung: {mean_without:.4f} (n={len(high_without_low)})")
print(f"Delta: {mean_without - mean_with:+.4f}")

# If WITHOUT > WITH, feature fires when prerequisite is MISSING = error correction!
```

**Signs of error-correction:**

| Pattern | Interpretation | Label Style |
|---------|----------------|-------------|
| Higher activation when low rung MISSING | "Explains away" missing evidence | "Error-Correction: {FAMILY}" |
| Only fires on weird rung combos | OOD detector | "OOD Detector: {PATTERN}" |
| Negative interactions in 2D heatmaps | Within-family interference | "Interference Feature: {FAMILY}" |

**Test for within-family interference (CRITICAL for single-family):**
```bash
poetry run python -m splatnlp.mechinterp.cli.runner_cli family-sweep \
    --feature-id {FEATURE_ID} --family {FAMILY} --model {MODEL}
# Check for non-monotonic response patterns in the output
```

**Test for interactions (2D heatmap):**
```bash
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {FEATURE_ID} --family-x {FAMILY_A} --family-y {FAMILY_B} --model {MODEL}
```

**Test for weapon specificity:**
```bash
poetry run python -m splatnlp.mechinterp.cli.runner_cli weapon-sweep \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 20 --min-examples 10
```

**CHECKPOINT: After weapon_sweep, check for dominant weapon pattern:**

If weapon_sweep diagnostics show "DOMINANT WEAPON" warning (one weapon has >2x delta of second):

1. **Run kit_sweep** to analyze by sub weapon and special weapon:
```bash
poetry run python -m splatnlp.mechinterp.cli.runner_cli kit-sweep \
    --feature-id {FEATURE_ID} --model {MODEL} --top-k 10 --analyze-combinations
```

2. **Use splatoon3-meta skill** to look up the dominant weapon's kit:
   - Read `.claude/skills/splatoon3-meta/references/weapons.md`
   - Find the weapon's sub weapon and special weapon

3. **Cross-reference** other high-activation weapons:
   - Do they share the same sub weapon?
   - Do they share the same special weapon?
   - If yes, the feature may encode **kit behavior** not weapon behavior

4. **Update hypothesis** based on findings:
   - If shared sub: Feature may encode sub weapon playstyle
   - If shared special: Feature may encode special spam/farming
   - If no kit pattern: Feature is truly weapon-specific

**Example**: Feature 18712 shows Octobrush Nouveau dominant. Kit lookup reveals Squid Beakon + Ink Storm. Other high weapons (Rapid Blaster, Range Blaster) also have "special-dependent" characteristics per meta → Feature encodes "SCU for Ink Storm spam" not just "Octobrush".

**Test for threshold effects:**
- Compare low-rung vs high-rung responses
- Look for non-linear jumps in activation
- Check if certain rungs REDUCE activation (interference)

### Phase 4: Synthesis

Combine findings into a coherent interpretation:

1. **What triggers activation?** (tokens, combinations, weapons)
2. **Is there structure beyond simple detection?** (interactions, thresholds)
3. **What gameplay concept does this represent?**
4. **Why would the model learn this?** (predictive value for recommendations)

### Phase 5: Label Proposal

Propose a label at the appropriate level:

| Complexity | Label Type | Example |
|------------|------------|---------|
| Trivial | Token detector | "SCU Presence" (avoid if possible) |
| Simple | Threshold detector | "High SCU Investment (29+ AP)" |
| Moderate | Interaction | "SCU + Mobility Combo" |
| Strategic | Build archetype | "Special Spam Slayer Kit" |
| Tactical | Playstyle | "Aggressive Frontline Build" |

### Label Specificity by Category

**The label's specificity should match its concept level:**

| Category | Specificity | Style | Examples |
|----------|-------------|-------|----------|
| **mechanical** | Terse | Token-focused, technical | "SCU Threshold 29+", "ISM Stacker" |
| **tactical** | Mid-level | Ability combos, weapon synergies | "Zombie Slayer Dualies", "Beacon Support Kit" |
| **strategic** | High-concept | Playstyle, gameplay philosophy | "Positional Survival - Midrange", "Aggressive Reentry" |

**Why this matters:**
- Mechanical features encode low-level patterns → label should be precise and technical
- Tactical features encode build strategies → label should name the strategy
- Strategic features encode gameplay philosophies → label should capture the "why"

**Examples by level:**

```
Feature encodes "SCU above 29 AP threshold"
→ Category: mechanical
→ Label: "SCU Threshold 29+" (terse, specific)

Feature encodes "QR + Comeback + Stealth Jump on dualies"
→ Category: tactical
→ Label: "Zombie Slayer Dualies" (names the combo + weapon)

Feature encodes "survive through positioning, not stealth or trading"
→ Category: strategic
→ Label: "Positional Survival - Midrange" (high-concept + role)
```

### Strategic Label Quality Checklist

Before finalizing a label, verify:

1. **Concept over tokens**: Does the label describe a GAMEPLAY CONCEPT, not just list abilities?
   - BAD: "SSU + ISM + SRU Kit", "Swim Efficiency Kit"
   - GOOD: "Positional Survival", "Aggressive Reentry"

2. **Positive framing**: Does the label describe what the feature IS, not just what it avoids?
   - BAD: "Death-Averse Efficiency", "Anti-Stealth Build"
   - GOOD: "Positional Survival", "Visible Zone Control"

3. **The "why" test**: Can you answer "why would a player build this?"
   - If answer is "to have SSU and ISM" → label is too mechanical
   - If answer is "to survive through positioning at midrange" → label captures concept

4. **Range/role qualifier**: Have you verified weapon range (Phase 1.8) and added appropriate qualifier?
   - Backline (SNIPER/LONG + ANCHOR) → "- Anchor" or "- Backline"
   - Midrange (MID/LONG + SUPPORT/SKIRMISH) → "- Midrange"
   - Frontline (CLOSE/MID + SLAYER) → "- Slayer" or "- Frontline"

### Strategic Label Format

**Prefer: "[Concept] - [Qualifier]"**

| Concept Examples | What it captures |
|------------------|------------------|
| Positional Survival | Stay alive through positioning, not stealth/trading |
| Aggressive Reentry | Pressure through fast respawn (zombie) |
| Stealth Approach | Win through concealment (NS builds) |
| Special Pressure | Win through special uptime |
| Lane Persistence | Hold lanes through sustain |

| Qualifier Examples | When to use |
|--------------------|-------------|
| Midrange | MID-range weapons, SKIRMISH/SUPPORT jobs |
| Anchor | LONG/SNIPER range, ANCHOR job, chargers/splatlings |
| Slayer | CLOSE/MID range, SLAYER job, aggressive weapons |
| Support | SUPPORT job, team utility focus |
| (Weapon Class) | When specific to dualies, blasters, etc. |

### Label Anti-Patterns to Avoid

| Anti-Pattern | Example | Why It's Bad | Better Label |
|--------------|---------|--------------|--------------|
| Token listing | "SSU + ISM Kit" | Describes tokens, not purpose | "Positional Survival" |
| Negation-only | "Death-Averse" | Describes avoidance, not identity | "Positional Survival" |
| Wrong role | "Anchor" for Jr./Rapid | Anchor implies backline chargers | "- Midrange" |
| Too generic | "Utility Build" | Could mean anything | "Positional Survival - Midrange" |
| Flanderized | Based on top 100 only | Captures tail, not core concept | Check core region first |

### Phase 6: Deeper Dive (For Thorny Features)

**When to use:** If the standard deep dive (Phases 1-5) didn't produce a clear interpretation:
- All scaling effects weak (max_delta < 0.03)
- No clear primary driver
- Conflicting signals from different experiments
- Feature seems important (high contribution to outputs) but unclear why

**The Deeper Dive uses the hypothesis/state management system** for systematic exploration:

#### Step 1: Initialize Research State

```python
from splatnlp.mechinterp.state import ResearchState, Hypothesis

state = ResearchState(feature_id=FEATURE_ID, model_type="ultra")

# Add competing hypotheses based on what you've observed
state.add_hypothesis(Hypothesis(
    id="h1",
    description="Feature encodes weapon-specific pattern for Dapple Nouveau",
    status="pending"
))
state.add_hypothesis(Hypothesis(
    id="h2",
    description="Feature encodes binary ability package (Stealth + Comeback)",
    status="pending"
))
state.add_hypothesis(Hypothesis(
    id="h3",
    description="Feature has high decoder weights despite weak activation effects",
    status="pending"
))
```

#### Step 2: Check Decoder Weights

For "weak activation" features, check if they have high influence via decoder weights:

```python
# Load SAE decoder weights
import torch
sae_path = '/mnt/e/dev_spillover/SplatNLP/sae_runs/run_20250704_191557/sae_model_final.pth'
sae_checkpoint = torch.load(sae_path, map_location='cpu', weights_only=True)
decoder_weight = sae_checkpoint['decoder.weight']  # [512, 24576]

# Get this feature's decoder weights to output space
feature_decoder = decoder_weight[:, FEATURE_ID]  # [512]

# Check magnitude
print(f"Decoder weight L2 norm: {torch.norm(feature_decoder):.4f}")
print(f"Max absolute weight: {torch.abs(feature_decoder).max():.4f}")

# Compare to other features
all_norms = torch.norm(decoder_weight, dim=0)
percentile = (all_norms < torch.norm(feature_decoder)).float().mean() * 100
print(f"Percentile among all features: {percentile:.1f}%")
```

If decoder weights are high (>75th percentile), the feature may be important despite weak activation effects.

#### Step 3: Decoder Output Analysis (CRITICAL for Diffuse Features)

**When activation analysis doesn't yield a clean interpretation, analyze what the feature RECOMMENDS.**

This technique asks: "What does this feature push the model to predict?" rather than "What activates this feature?"

**Use the decoder CLI:**

```bash
cd /root/dev/SplatNLP

# Quick output influence check
poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \
    --feature-id {FEATURE_ID} \
    --model ultra \
    --top-k 15

# Check decoder weight importance
poetry run python -m splatnlp.mechinterp.cli.decoder_cli weight-percentile \
    --feature-id {FEATURE_ID} \
    --model ultra
```

See **mechinterp-decoder** skill for full documentation.

**Interpretation Guide:**

| Output Pattern | Interpretation |
|----------------|----------------|
| Promotes low-AP tokens (_3, _6) | "Recommend light investment" |
| Promotes high-AP tokens (_51, _57) | "Recommend heavy stacking" |
| Suppresses high-AP tokens | "Anti-stacking / balanced build" |
| Promotes death-mitigation (QR, CB, SS) | "Recommend zombie/respawn optimization" |
| Suppresses death-mitigation | "Death-averse / stay alive" |

**Example (Feature 13934):**
```
PROMOTES: respawn_punisher (+0.23), comeback (+0.16), QSJ_6 (+0.15), IA_3 (+0.14), ISM_6 (+0.13)
SUPPRESSES: RSU_57 (-0.30), QR_57 (-0.25), RSU_51 (-0.24)

Interpretation: Feature recommends "balanced utility spread with low-AP investments"
               and DISCOURAGES heavy stacking of any single ability.
```

**When to use decoder output analysis:**
- Activation analysis shows multi-modal or diffuse patterns
- No single signature covers >50% of core
- Feature seems "confused" between different build types
- You want to understand the feature's PURPOSE, not just what triggers it

**Key Insight:** A feature can activate on seemingly different builds because they share the same NEED. The output analysis reveals what the feature is recommending, which may unify apparently contradictory activation patterns.

### Decoder Output Semantic Grouping (CRITICAL for Labels)

After running decoder output analysis, group promoted/suppressed tokens by MEANING, not just family:

| Semantic Group | Token Families | Gameplay Meaning |
|----------------|----------------|------------------|
| **Mobility** | SSU, RSU | How you reposition |
| **Survival** | BRU, IRU, RES, QR, SS, RP | How you stay alive |
| **Efficiency** | ISM, ISS, IRU | How you sustain pressure |
| **Lethality** | IA, MPU, BPU (bomb damage) | How you get kills |
| **Special-Focus** | SCU, SS, SPU, Tenacity | How you use specials |
| **Stealth** | NS, (high SSU) | How you approach unseen |
| **Death-Trading** | QR, CB, SJ, SS | How you weaponize respawn |

**Abbreviation Key:**
- SSU = Swim Speed Up, RSU = Run Speed Up
- BRU = Bomb (Sub) Resistance Up, RES = Ink Resistance Up
- IRU = Ink Recovery Up, ISM = Ink Saver Main, ISS = Ink Saver Sub
- BPU = Bomb (Sub) Power Up, SPU = Special Power Up
- SCU = Special Charge Up, SS = Special Saver
- QR = Quick Respawn, CB = Comeback, SJ = Stealth Jump
- IA = Intensify Action, MPU = Main Power Up, NS = Ninja Squid, RP = Respawn Punisher

**Then ask:** "What COMBINATION of groups defines this feature?"

| Promoted Groups | Suppressed Groups | Strategic Concept |
|-----------------|-------------------|-------------------|
| Mobility + Survival + Efficiency | Death-Trading, Stealth | **Positional Survival** |
| Death-Trading + Mobility | Survival | **Zombie/Aggressive Reentry** |
| Stealth + Mobility | - | **Stealth Approach** |
| Special-Focus + Efficiency | Mobility | **Special Farming** |
| Lethality + Mobility | Efficiency | **Aggressive Slayer** |

**This semantic grouping directly informs the strategic label.**

### Post-Decoder Sweep Rule

**After decoder output analysis, verify the top promoted/suppressed families with causal 1D sweeps.**

The decoder tells you what the feature RECOMMENDS, but not whether it's causally driven by those tokens. To validate:

1. **Identify top 2 promoted families** from decoder output (highest positive contributions)
2. **Identify top 2 suppressed families** from decoder output (most negative contributions)
3. **Run 1D sweeps** for any not yet tested in Phase 2

| Decoder Shows | Test With | Expected If Valid |
|---------------|-----------|-------------------|
| BRU highly promoted | `family_1d_sweep` BRU | Positive delta with BRU levels |
| RSU suppressed | `family_1d_sweep` RSU | Negative delta or flat |

**Example:** Feature 10938 decoder showed BRU heavily promoted (+0.126, +0.120, +0.108 for different rungs), but initial sweeps only tested SSU/ISM. Should have run:
```bash
# Missing sweep that would validate decoder findings
poetry run python -m splatnlp.mechinterp.cli.runner_cli run-spec \
    --spec '{"type": "family_1d_sweep", "variables": {"family": "bomb_resistance_up"}}' \
    --feature-id 10938 --model ultra
```

**Anti-pattern:** Trusting decoder output without causal validation. Decoder weights show correlation to output tokens, not causal effect of input tokens.

#### Step 4: Run Targeted Experiments

Based on hypotheses, run specific tests:

```python
# Log experiments and findings to state
state.add_evidence(
    hypothesis_id="h1",
    experiment_type="weapon_sweep",
    finding="37% Dapple Nouveau, but also 10% .96 Gal Deco - not single-weapon",
    supports=False
)

state.add_evidence(
    hypothesis_id="h3",
    experiment_type="decoder_weight_check",
    finding="Decoder L2 norm: 0.89 (92nd percentile) - HIGH despite weak activation",
    supports=True
)
```

#### Step 5: Synthesize

```python
# Review all evidence
state.summarize()

# Update hypothesis statuses
state.update_hypothesis("h1", status="rejected")
state.update_hypothesis("h3", status="supported")

# Propose final interpretation
state.set_conclusion(
    "Feature has weak activation effects but high decoder weights. "
    "It acts as a 'fine-tuning' feature that makes small but important "
    "adjustments to output probabilities."
)
```

#### When Deeper Dive is Complete

The state object provides an audit trail of:
- What hypotheses were considered
- What experiments were run
- What evidence was found
- Why the final interpretation was chosen

This is useful for:
- Revisiting the feature later
- Explaining the interpretation to others
- Identifying if new evidence should change the interpretation

## Decision Trees

### Single-Family Dominated Feature

```
1. Run within_family_interference to check for error correction
   └─ If interference found → "Error-Correcting {FAMILY} Detector"
   └─ If enhancement patterns → "{FAMILY} Stacker (synergistic)"
   └─ If neutral → continue

2. Check for non-monotonic 1D response
   └─ If drops at certain rungs → investigate interference
   └─ If monotonic with threshold → "High {FAMILY} Investment"
   └─ If monotonic with no threshold → probably trivial

3. Run weapon_sweep to check weapon specificity
   └─ If weapon-concentrated → run weapon_group_analysis
   └─ If weapon-specific patterns → "{WEAPON_CLASS} + {FAMILY}"

4. Run 2D sweep with second-ranked family
   └─ If interaction effect → "{FAMILY_A} + {FAMILY_B} Combo"
   └─ If no interaction → try third family

5. If all trivial → label as "{FAMILY} Stacker" with note "simple detector"
```

### Multi-Family Feature

```
1. Check if families are related
   └─ All mobility (SSU, RSU, QSJ) → "Mobility Kit"
   └─ All ink efficiency (ISM, ISS, IRU) → "Efficiency Kit"
   └─ Mixed → continue

2. Run pairwise interaction analysis
   └─ Positive synergy → "Synergistic Build"
   └─ Redundancy → "Alternative Paths"

3. Check weapon breakdown
   └─ Weapon class pattern → "{CLASS} Optimal Build"

4. Consider strategic meaning
   └─ What playstyle does this combination enable?
```

## Example Investigation

**Feature 18712 (Deep Analysis):**

1. **Overview**: SCU 31%, SSU 11%, ISS 10% → Single-family dominated
2. **Hypothesis**: Could be SCU + something, or just trivial SCU detector
3. **2D Heatmap (SCU × SSU)**: Peak at SCU=57, SSU=0. Non-monotonic drops visible!
   - SCU 6→12: DROP of 0.02 (unexpected)
   - SCU 15→21: DROP of 0.01
4. **Interference Analysis**:
   - SCU_12 REDUCES SCU_51 signal by 0.10 (interference!)
   - SCU_15 ENHANCES SCU_51 signal by 0.12 (synergy!)
5. **Weapon Analysis**: Effect varies by weapon
   - weapon_id_50: SCU_3 reduces SCU_15 (-0.08)
   - weapon_id_7020: SCU_3 enhances SCU_15 (+0.03)
6. **Interpretation**: Feature detects "clean" high-SCU builds.
   - Low rungs (SCU_3, SCU_12) can contaminate the signal
   - Effect is weapon-dependent
7. **Label**: "SCU Purity Detector (weapon-conditional)" - NOT trivial!

**Key Insight**: What looked like a simple "SCU detector" actually encodes
complex error-correction behavior. Always check for interference!

## Commands Summary

```bash
# Phase 1: Overview (with extended analyses)
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {ID} --model ultra --top-k 20

# Phase 1 with extended analyses (enrichment, regions, binary, kit)
poetry run python -m splatnlp.mechinterp.cli.overview_cli \
    --feature-id {ID} --model ultra --all

# Phase 3a: 1D sweep for dominant family (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli family-sweep \
    --feature-id {ID} --family {FAMILY} --model ultra

# Phase 3b: 2D heatmap for interactions (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli heatmap \
    --feature-id {ID} --family-x {FAMILY_A} --family-y {FAMILY_B} --model ultra

# Phase 3c: Weapon sweep (direct subcommand)
poetry run python -m splatnlp.mechinterp.cli.runner_cli weapon-sweep \
    --feature-id {ID} --model ultra --top-k 20

# Phase 3d: Kit sweep (if dominant weapon detected)
poetry run python -m splatnlp.mechinterp.cli.runner_cli kit-sweep \
    --feature-id {ID} --model ultra --analyze-combinations

# Phase 3e: Binary ability analysis
poetry run python -m splatnlp.mechinterp.cli.runner_cli binary \
    --feature-id {ID} --model ultra

# Phase 3f: Core coverage analysis
poetry run python -m splatnlp.mechinterp.cli.runner_cli coverage \
    --feature-id {ID} --tokens {TOKEN1},{TOKEN2}

# Phase 1.7.5: Kit Component Analysis (see skill for full code)
# After weapon sweep, check for patterns in: sub weapons, specials, or weapon class
# For any concentrated pattern, determine if CAUSAL (explains build) or SPURIOUS (incidental)

# Phase 5: Set label
poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \
    --feature-id {ID} --name "{LABEL}" --category {tactical|strategic|mechanical}
```

## Labeling Categories

- **mechanical**: Low-level patterns (token presence, simple combinations)
- **tactical**: Mid-level patterns (build synergies, weapon kits)
- **strategic**: High-level patterns (playstyles, meta concepts)

## See Also

- **mechinterp-overview**: Initial feature assessment (now includes bottom tokens)
- **mechinterp-runner**: Execute experiments (includes `core_coverage_analysis` and `decoder_output_analysis`)
- **mechinterp-decoder**: Decoder weight analysis - what features recommend (USE for diffuse/heterogeneous features)
- **mechinterp-next-step-planner**: Generate experiment specs
- **mechinterp-labeler**: Save labels
- **mechinterp-glossary-and-constraints**: Domain reference
- **mechinterp-ability-semantics**: Ability semantic groupings (check AFTER hypotheses)
- **splatoon3-meta**: Weapon archetypes, kit lookups, meta knowledge (USE for weapon pattern interpretation)