---
name: mechinterp-labeler
description: Manage feature labeling workflow - queue management, label storage, similar features, progress tracking
---

# MechInterp Labeler

Manage the feature labeling workflow. This skill provides tools for:
- Priority queue management
- Setting and syncing labels
- Finding similar features
- Tracking labeling progress

## Purpose

The labeler skill enables interactive feature labeling sessions:
1. Get the next feature to label from a priority queue
2. Use overview and experiments to understand the feature
3. Save labels with categories and notes
4. Find similar features to label next
5. Track overall progress

## Commands

### Get Next Feature

```bash
cd /root/dev/SplatNLP

# Get next feature from queue
poetry run python -m splatnlp.mechinterp.cli.labeler_cli next --model ultra

# Don't auto-build queue if empty
poetry run python -m splatnlp.mechinterp.cli.labeler_cli next --model ultra --no-build
```

### Set a Label

**IMPORTANT**: Always use `--source` to track label provenance.

**Source Options:**
- `claude code` — Label created through Claude Code CLI investigation
- `codex` — Label created through Codex (OpenAI) agent
- `codex/claude` — Label created through Codex orchestrating Claude
- `manual` — Label created by human manually
- `dashboard` — Label created through dashboard UI (default)

```bash
# Label from Claude Code investigation
poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \
    --feature-id 18712 \
    --name "Special Charge Stacker" \
    --model ultra \
    --source "claude code"

# With category and notes
poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \
    --feature-id 18712 \
    --name "SCU Detector" \
    --category tactical \
    --notes "Responds to Special Charge Up presence, stronger at high AP" \
    --source "claude code"

# Manual labeling by human
poetry run python -m splatnlp.mechinterp.cli.labeler_cli label \
    --feature-id 18712 \
    --name "My Label" \
    --source "manual"
```

**Categories:**
- `mechanical`: Low-level patterns (token presence, combinations)
- `tactical`: Mid-level patterns (build strategies, weapon synergies)
- `strategic`: High-level patterns (playstyle, meta concepts)
- `none`: Uncategorized

## Required Label Fields

Every label in `consolidated_ultra.json` MUST include these fields:

| Field | Required | Description |
|-------|----------|-------------|
| `feature_id` | ✓ | Integer feature ID |
| `model_type` | ✓ | "ultra" or "full" |
| `dashboard_name` | ✓ | The label displayed in dashboard |
| `dashboard_category` | ✓ | mechanical, tactical, strategic, or none |
| `dashboard_notes` | ✓ | Investigation notes with evidence |
| `display_name` | ✓ | Same as dashboard_name (for compatibility) |
| `last_updated` | ✓ | ISO timestamp of last update |
| `source` | ✓ | Who created it (e.g., "claude code (full investigation)") |
| `hypothesis_confidence` | ✓ | 0.0-1.0 confidence score (DEPRECATED - use interpretability_confidence) |
| `importance_percentile` | ✓ | Decoder weight percentile (0-100, objective measure of model importance) |
| `interpretability_confidence` | ✓ | How confident we are in the interpretation (0.0-1.0, subjective) |
| `stability_score` | Optional | Split-half stability if validation was run (0.0-1.0) |
| `research_label` | Optional | Alternative label for research context |
| `research_state_path` | Optional | Path to research state JSON |

### Separating Importance from Interpretability

These three fields capture distinct dimensions:

| Field | Question Answered | Source |
|-------|-------------------|--------|
| `importance_percentile` | "Is this feature important to the model?" | Decoder weight magnitude (objective) |
| `interpretability_confidence` | "Do we understand what this feature does?" | Investigation quality (subjective) |
| `stability_score` | "Does this feature behave consistently?" | Split-half validation (objective) |

**Common combinations:**

| Importance | Interpretability | Meaning |
|------------|------------------|---------|
| High (>80) | High (>0.8) | Strong, well-understood feature |
| High (>80) | Low (<0.5) | Important but mysterious - needs more investigation |
| Low (<20) | High (>0.8) | Understood but weak - may be noise or redundant |
| Low (<20) | Low (<0.5) | Skip - not worth investigating |

**Rule of thumb**: Don't conflate these. A feature with 9th percentile importance but 0.85 interpretability confidence is "weak but understood" - useful for pattern recognition but not a major model component.

**Example complete label:**
```json
{
  "feature_id": 10938,
  "model_type": "ultra",
  "dashboard_name": "Positional Survival - Midrange",
  "dashboard_category": "strategic",
  "dashboard_notes": "Survival through positioning, not stealth/trading. Decoder promotes: SSU, BRU (all levels), ISS, IA, IRU. Suppresses: BPU, RSU, QR, SS. Weapons: Midrange with NO/BAD NS fit, LOW death tolerance. NS 0.84x depleted, QR 0.66x suppressed.",
  "display_name": "Positional Survival - Midrange",
  "last_updated": "2025-12-14T01:30:00.000000",
  "source": "claude code (full investigation)",
  "hypothesis_confidence": 0.85,
  "importance_percentile": 9.3,
  "interpretability_confidence": 0.85,
  "stability_score": null,
  "research_label": "Positional Survival - Midrange",
  "research_state_path": "/mnt/e/mechinterp_runs/state/feature_10938_ultra.json"
}
```

## ⚠️ Super-Stimuli Warning

**High activations may be "flanderized" versions of the true concept!**

When labeling features, don't only examine extreme activations. High activation builds can be:
- **Super-stimuli**: Extreme, exaggerated versions of the core concept
- **Weapon-gated**: Only achievable on specific niche weapons
- **Unrepresentative**: Missing the general pattern that applies across weapons

### How to Detect Super-Stimuli

1. **Examine activation regions** (as % of **effective max** = 99.5th percentile):
   - Floor (≤1%), Low (1-10%), Below Core (10-25%)
   - Core (25-75%), High (75-90%), Flanderization Zone (90%+)
   - Use effective max to prevent outliers from distorting region boundaries

2. **Look for weapons that span ALL levels continuously**:
   - If Splattershot appears in every region → feature encodes a general concept
   - If only niche weapons reach 90%+ → those are "super-stimuli"

3. **Compare core (25-75%) vs flanderization zone (90%+)**:
   - Core region: diverse weapons, general builds = TRUE CONCEPT
   - Flanderization zone: concentrated on 3-4 special-dependent weapons = SUPER-STIMULI

### Example: Feature 9971

```
Initial label (wrong): "Death-Averse SCU Stacker"
- Only looked at 90%+ activations (SCU_57 + special-dependent weapons)

Better label: "Offensive Intensity (Death-Averse)"
- Core region (25-75%) showed diverse weapons (Splattershot family, Sploosh, Hydra)
- Feature tracks general offensive investment, not specifically SCU
- Flanderization zone (90%+) with Bloblobber, Glooga are "super-stimuli" not the core concept
```

**Key insight**: The core region (25-75% of effective max) reveals the TRUE feature concept. High activations (90%+ of effective max) show what happens when that concept is pushed to flanderized extremes.

### Core Coverage Validation (BEFORE LABELING)

**Before finalizing any label, verify core coverage of the proposed signature.**

A label based on a token/ability that only appears in <30% of core examples is labeling the TAIL, not the concept.

```python
from splatnlp.mechinterp.skill_helpers import load_context
import polars as pl
import numpy as np

ctx = load_context('ultra')
df = ctx.db.get_all_feature_activations_for_pagerank(FEATURE_ID)

# Define core region
acts = df['activation'].to_numpy()
nonzero_acts = acts[acts > 0]
effective_max = np.percentile(nonzero_acts, 99.5)
core_df = df.filter(
    (pl.col('activation') > 0.25 * effective_max) &
    (pl.col('activation') <= 0.75 * effective_max)
)

# Check coverage of proposed label driver
driver_id = ctx.vocab['YOUR_TOKEN_HERE']  # e.g., 'respawn_punisher'
core_with_driver = core_df.filter(
    pl.col('ability_input_tokens').list.contains(driver_id)
)

coverage = len(core_with_driver) / len(core_df) * 100
print(f"Core coverage: {coverage:.1f}%")
```

| Core Coverage | Label Guidance |
|---------------|----------------|
| >50% | Safe to headline this token/ability |
| 30-50% | Mention in notes, but not as headline |
| <30% | **WRONG LABEL** - this is a tail marker, not the concept |

**Red flags that indicate wrong labeling:**
- Binary ability with >5x tail enrichment but <20% core presence → tail marker
- Weapon with >40% in top-100 but <15% in core → flanderized
- Proposed signature covers <30% of core examples → incomplete interpretation

**Example (Feature 13934):**
```
Wrong approach: See RP with 8.57x enrichment → label as "RP Backline Anchor"
Reality: RP only in 12% of core → RP is super-stimulus, not concept

Right approach: Check core coverage FIRST
→ RP at 12% means it's a tail marker
→ Split by RP presence to find true concept
→ Label the commonality across modes
```

## Label Quality Examples

### Evolution from Mechanical to Strategic

| Investigation Stage | Label | Problem |
|--------------------|-------|---------|
| After 1D sweeps | "SSU + ISM + IRU Kit" | Just lists tokens |
| After binary analysis | "Swim Efficiency Kit (Death-Averse)" | Mechanical + negation |
| After decoder grouping | "Swim Utility Sustain" | Better but still mechanical |
| After weapon role check | "Positional Survival - Midrange" | Strategic concept + role |

### Good vs Bad Labels

| Bad Label | Why | Good Label | Why |
|-----------|-----|------------|-----|
| "SCU Detector" | Token presence only | "Special Pressure Build" | Gameplay purpose |
| "Death-Averse Efficiency" | Negation + mechanical | "Positional Survival" | Positive concept |
| "High SSU Anchor" | Wrong role (Jr. isn't anchor) | "- Midrange" | Correct role |
| "Zombie + RP Mixed" | Describes modes, not concept | "Utility Axis (Multi-Modal)" | Names the pattern |
| "ISM Build" | Single token | "Ink Sustain - Backline" | Concept + role |

### The Strategic Label Test

Before saving a label, ask:

1. "Would a competitive Splatoon player recognize this playstyle?"
   - If no → too mechanical or wrong terminology

2. "Does this explain WHY the model learned this pattern?"
   - If no → you're describing correlation, not causation

3. "Could I explain this to someone who doesn't know the tokens?"
   - If no → label is too technical

### Mandatory Label Components

Every strategic/tactical label should have:

1. **Core concept** - The gameplay behavior (e.g., "Positional Survival")
2. **Role qualifier** - Where/how it's played (e.g., "- Midrange")
3. **Notes with evidence** - Decoder groups, weapon classification, key enrichments

### Label Specificity by Category

**Match label specificity to concept level:**

| Category | Specificity | Example |
|----------|-------------|---------|
| **mechanical** | Terse, technical | "SCU Threshold 29+", "ISM Stacker" |
| **tactical** | Mid-level, names the combo | "Zombie Slayer Dualies", "Beacon Support Kit" |
| **strategic** | High-concept, captures the "why" | "Positional Survival - Midrange" |

- Mechanical = low-level pattern → precise, token-focused
- Tactical = build strategy → names the combo + weapon/class
- Strategic = gameplay philosophy → high-concept + role qualifier

### Skip a Feature

```bash
# Skip the next feature
poetry run python -m splatnlp.mechinterp.cli.labeler_cli skip --model ultra

# Skip specific feature with reason
poetry run python -m splatnlp.mechinterp.cli.labeler_cli skip \
    --feature-id 18712 \
    --reason "ReLU floor too high, hard to interpret"
```

### Add Features to Queue

```bash
# Add single feature
poetry run python -m splatnlp.mechinterp.cli.labeler_cli add 18712 --model ultra

# Add multiple with priority
poetry run python -m splatnlp.mechinterp.cli.labeler_cli add 18712,18890,19042 \
    --priority 0.8 \
    --reason "SCU-related cluster"
```

### Find Similar Features

```bash
poetry run python -m splatnlp.mechinterp.cli.labeler_cli similar \
    --feature-id 18712 \
    --top-k 5 \
    --model ultra
```

### Check Status

```bash
poetry run python -m splatnlp.mechinterp.cli.labeler_cli status --model ultra
```

Output example:
```
## Labeling Status (ultra)

### Labels
- Total labeled: 45
- From dashboard: 30
- From research: 10
- Merged: 5

### Categories
- tactical: 20
- mechanical: 15
- strategic: 5
- uncategorized: 5

### Queue
- Pending: 25
- Completed: 40
- Skipped: 5
```

### Sync Labels

Pull labels from all sources (dashboard, research states):

```bash
poetry run python -m splatnlp.mechinterp.cli.labeler_cli sync --model ultra
```

### Export Labels

```bash
poetry run python -m splatnlp.mechinterp.cli.labeler_cli export \
    --model ultra \
    --output /mnt/e/mechinterp_runs/labels/export.csv
```

### Build Priority Queue

```bash
# By activation count (features with most data)
poetry run python -m splatnlp.mechinterp.cli.labeler_cli build-queue \
    --model ultra \
    --method activation_count \
    --top-k 50

# From cluster (similar to a seed feature)
poetry run python -m splatnlp.mechinterp.cli.labeler_cli build-queue \
    --model ultra \
    --method cluster \
    --seed 18712 \
    --top-k 10
```

## Typical Labeling Session

```
User: Let's label some features

Claude: [runs: labeler_cli next --model ultra]
        Next feature: 18712 (priority: 0.85)

        [runs: overview_cli --feature-id 18712]
        ## Feature 18712 Overview
        - Top token: special_charge_up (27%)
        - Family: SCU 31%
        ...

        Based on the overview, this feature appears to detect
        Special Charge Up stacking. Want me to run a sweep?

User: Yes, confirm with an SCU sweep

Claude: [runs: runner_cli with family_1d_sweep]
        Results confirm monotonic increase with SCU.

        Proposed label: "SCU Detector - High AP"

User: Call it "Special Charge Stacker"

Claude: [runs: labeler_cli label --feature-id 18712
               --name "Special Charge Stacker" --category tactical
               --source "claude code"]
        Label saved (source: claude code). Finding similar features...

        [runs: labeler_cli similar --feature-id 18712]
        Similar features:
        - 19042 (sim=0.82)
        - 18890 (sim=0.75)

        Want to add these to the queue?
```

## Label Storage

Labels are stored in three places (kept in sync):

1. **Dashboard**: `src/splatnlp/dashboard/feature_labels_{model}.json`
2. **Research State**: `/mnt/e/mechinterp_runs/state/{model}/f{id}.json`
3. **Consolidated**: `/mnt/e/mechinterp_runs/labels/consolidated_{model}.json`

The consolidator merges all sources and resolves conflicts.

## Queue Storage

Queue state is persisted at:
- `/mnt/e/mechinterp_runs/labels/queue_{model}.json`

Contains:
- Pending entries with priorities
- Completed feature IDs
- Skipped feature IDs

## Programmatic Usage

```python
from splatnlp.mechinterp.labeling import (
    LabelConsolidator,
    LabelingQueue,
    QueueBuilder,
    SimilarFinder,
)

# Queue management
queue = LabelingQueue.load("ultra")
entry = queue.get_next()
queue.mark_complete(entry.feature_id, "My Label")

# Set labels
consolidator = LabelConsolidator("ultra")
consolidator.set_label(
    feature_id=18712,
    name="SCU Detector",
    category="tactical",
    notes="Responds to SCU presence",
)

# Find similar
finder = SimilarFinder("ultra")
similar = finder.find_by_top_tokens(18712, top_k=5)

# Build queue
builder = QueueBuilder("ultra")
queue = builder.build_by_activation_count(top_k=50)
```

## See Also

- **mechinterp-overview**: Quick feature overview before labeling
- **mechinterp-runner**: Run experiments to validate hypotheses
- **mechinterp-state**: Track detailed research progress
- **mechinterp-summarizer**: Generate notes from experiments