---
name: mechinterp-decoder
description: Analyze SAE decoder weights - output influence, feature importance, and decoder similarity
---

# MechInterp Decoder

Analyze SAE features through their decoder weights. This skill answers: **"What does this feature RECOMMEND?"** rather than "What activates this feature?"

## Purpose

Decoder analysis provides a complementary perspective to activation analysis:

| Analysis Type | Question Answered |
|---------------|-------------------|
| **Activation** (overview, sweeps) | "What inputs activate this feature?" |
| **Decoder** (this skill) | "What outputs does this feature promote?" |

For **diffuse or heterogeneous features** where activation analysis shows multiple modes, decoder analysis often reveals the unifying concept.

## When to Use

Use this skill when:

1. **Activation analysis is inconclusive** - Multiple modes or no clear pattern
2. **Feature appears heterogeneous** - Different builds activate it for different reasons
3. **Looking for "what does it recommend"** - Shift from inputs to outputs
4. **Checking AP level preferences** - Does feature prefer low-AP (_3, _6) vs high-AP (_57)?
5. **Finding similar features** - Cluster features by decoder similarity

## Commands

### Output Influence

Show what tokens a feature promotes (positive contribution) or suppresses (negative contribution):

```bash
cd /root/dev/SplatNLP

# Basic output influence
poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \
    --feature-id 13934 \
    --model ultra

# JSON output
poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \
    --feature-id 13934 \
    --model ultra \
    --format json

# More tokens
poetry run python -m splatnlp.mechinterp.cli.decoder_cli output-influence \
    --feature-id 13934 \
    --model ultra \
    --top-k 25
```

**Sample Output:**
```markdown
## Feature 13934 Output Influence (ultra)

### Tokens This Feature PROMOTES

| Token | Contribution | Family | AP Level |
|-------|--------------|--------|----------|
| respawn_punisher | +0.232 | respawn_punisher | binary |
| comeback | +0.159 | comeback | binary |
| quick_super_jump_6 | +0.155 | quick_super_jump | 6 |
| intensify_action_3 | +0.140 | intensify_action | 3 |
| ink_saver_main_6 | +0.128 | ink_saver_main | 6 |

### Tokens This Feature SUPPRESSES

| Token | Contribution | Family | AP Level |
|-------|--------------|--------|----------|
| run_speed_up_57 | -0.301 | run_speed_up | 57 |
| quick_respawn_57 | -0.247 | quick_respawn | 57 |
| swim_speed_up_57 | -0.209 | swim_speed_up | 57 |

### Interpretation
- **Top promoted**: respawn_punisher (+0.232)
- **Top suppressed**: run_speed_up_57 (-0.301)
- **Pattern**: Promotes low-AP tokens, suppresses high-AP stacking
```

### Weight Percentile

Check how important a feature is by its decoder weight magnitude:

```bash
poetry run python -m splatnlp.mechinterp.cli.decoder_cli weight-percentile \
    --feature-id 13934 \
    --model ultra
```

**Sample Output:**
```markdown
## Feature 13934 Decoder Weight (ultra)

- **Magnitude**: 2.3456
- **Percentile**: 78.5%
- **Total features**: 24576
```

**Interpretation:**
- High percentile (>90%): Feature has strong output influence
- Low percentile (<10%): Feature has weak output influence
- Note: Low-magnitude features may still be important for specific tokens

### Similar Features (by Decoder)

Find features with similar decoder patterns (what they recommend):

```bash
poetry run python -m splatnlp.mechinterp.cli.decoder_cli similar \
    --feature-id 13934 \
    --model ultra \
    --top-k 10
```

**Sample Output:**
```markdown
## Features Similar to 13934 (ultra)

| Feature ID | Cosine Similarity |
|------------|-------------------|
| 13892 | 0.9234 |
| 14501 | 0.8876 |
| 12044 | 0.8521 |
```

## Experiment Runner

For programmatic use or integration with runner_cli:

```bash
# Create spec file
cat > decoder_spec.json << 'EOF'
{
  "type": "decoder_output_analysis",
  "feature_id": 13934,
  "model_type": "ultra",
  "variables": {
    "top_k_promoted": 15,
    "top_k_suppressed": 15,
    "group_by_family": true,
    "include_ap_level": true
  }
}
EOF

# Run via runner CLI
poetry run python -m splatnlp.mechinterp.cli.runner_cli \
    --spec-path decoder_spec.json
```

## Interpretation Guide

### AP Level Patterns

| Pattern | Meaning |
|---------|---------|
| Promotes _3, _6; Suppresses _51, _57 | "Use balanced spread, not stacking" |
| Promotes _57; Suppresses low AP | "Heavy stacking is the goal" |
| Promotes binary (RP, CB, OG) | "These specific abilities are key" |
| Mixed AP levels promoted | "Ability presence matters, not amount" |

### Common Feature Types

| Output Pattern | Feature Type |
|----------------|--------------|
| Single family promoted | Family detector (e.g., SCU detector) |
| Low-AP promoted, high-AP suppressed | "Balanced utility recommendation" |
| Binary abilities promoted | "Build style marker" (aggressive, defensive) |
| Death perks promoted (QR, SS, CB) | "Death-tolerant" archetype |
| Death perks suppressed | "Death-averse" archetype |

## Integration with Investigation Workflow

Decoder analysis fits into the investigation workflow as follows:

```
1. Overview (mechinterp-overview)
   ↓
2. Hypothesis formation
   ↓
3. 1D Sweeps (mechinterp-runner)
   ↓
4. Core Coverage Check ← NEW: Catch tail markers
   ↓
5. If diffuse/heterogeneous:
   → Decoder Output Analysis ← THIS SKILL
   ↓
6. Label formulation
```

## Example: Feature 13934 (from investigation log)

**Problem**: Activation analysis showed two opposite modes (RP anchor vs Zombie builds).

**Solution**: Decoder analysis revealed unifying pattern:

```
PROMOTES: low-AP utility (_3, _6 tokens)
SUPPRESSES: heavy stacking (_51, _57 tokens)

→ Feature recommends "balanced utility spread" regardless of death strategy
```

**Key Insight**: Different builds (RP vs Zombie) activate the feature because they share a NEED (balanced utility), not a BUILD pattern.

## See Also

- **mechinterp-overview**: Initial feature assessment
- **mechinterp-runner**: Run experiments (including core_coverage_analysis, decoder_output_analysis)
- **mechinterp-investigator**: Full investigation workflow
- **mechinterp-labeler**: Save labels after investigation