---
name: style-analyzer
description: Quantified prose analysis for voice.md generation. Extracts sentence metrics, dialogue profiles, vocabulary patterns, and forbidden word detection from exemplar texts.
model: sonnet
---

# Style Analyzer

Extracts measurable style patterns from exemplar texts to build objective voice documentation. Transforms subjective "write like X" into quantified constraints.

## When to Use

- Building voice.md from exemplar novels
- Analyzing author style for replication
- Creating benchmarks for manuscript validation
- Comparing draft style against targets

## Usage

### Command Line

```bash
# Analyze single file (markdown output)
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" source.txt

# Analyze directory of texts
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" exemplars/

# JSON output for programmatic use
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" source.txt --json

# Suppress specific cliche categories (e.g., fantasy project)
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" source.txt --suppress shimmer_family shadow_worship

# Auto-read suppressions from voice.md
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" source.txt --voice bible/voice.md

# List all available category IDs
python "/home/pknull/life/asha/plugins/write/skills/style-analyzer/scripts/analyze_style.py" --list-categories
```

### Agent Integration

The book-analyzer agent uses this script for metric extraction:

```
@task: Analyze Ishiguro's Never Let Me Go
→ style-analyzer produces metrics
→ book-analyzer interprets for voice.md
```

## Metrics Extracted

### Sentence Metrics

| Metric | Description |
|--------|-------------|
| Mean length | Average words per sentence |
| Median length | Middle value (less skewed by outliers) |
| Std deviation | Rhythm variation indicator |
| Short ratio | % sentences < 8 words |
| Long ratio | % sentences > 25 words |

### Dialogue Profile

| Metric | Description |
|--------|-------------|
| Dialogue ratio | % of text in quotation marks |
| Quote style | Double vs single quotes |
| Tag distribution | said/asked/replied frequencies |
| Attribution style | "said Name" vs "Name said" |

### Vocabulary Profile

| Metric | Description |
|--------|-------------|
| Unique word ratio | Vocabulary diversity |
| Rare word ratio | Words appearing only once |
| Adverb density | -ly words per 1000 words |
| AI signal density | Known flat-prose indicators per 1000 |

### Forbidden Patterns

Detects and counts across 36 named cliche categories:

- **Filter words**: "he saw", "she heard", "they felt"
- **Hedging**: "seemed to", "appeared to", "somewhat"
- **Fiction cliche categories** (28): body-as-metaphor, cardiac sequence, breath-as-device, hands-as-surrogate, impossible faces, smile catalogue, shimmer family, shadow worship, silence-as-drama, something vagueness, agency removers, cool observer mode, overworked dialogue tags, vague depth adjectives, melodramatic emotion, weight/gravity, transition crutches, introspection filler, pseudo-profound nouns, redundant intensifiers, overworked metaphor families, weather projection, metallic taste trinity, "kind of" construction, threshold metaphor, unsaid apologies, time freezing, things living in body
- **Structural tells** (8): em-dash density, triplet framing, inspirational pivot, countdown pattern, self-answered rhetorical, anaphora, "here's the thing" phrases, "think of it as" analogies
- **AI signals**: "delve", "utilize", "palpable", etc.

Projects can suppress categories via `--suppress` or `suppress_categories` in voice.md.

### Repetition Analysis

- Overused content words (> 1% of text)
- Repeated bigrams and trigrams
- Phrase echoes

## Output Formats

### Markdown (default)

Human-readable report with tables and grep patterns for validation.

### JSON (--json flag)

Machine-readable output for integration with other tools:

```json
{
  "source": "chapter01.txt",
  "word_count": 5432,
  "sentence_metrics": {
    "mean_length": 14.2,
    "std_dev": 8.3,
    "short_ratio": 0.15,
    "long_ratio": 0.08
  },
  "dialogue_metrics": {
    "dialogue_ratio": 0.35,
    "quote_style": {"dominant": "double"}
  },
  "vocabulary_metrics": {
    "adverb_density_per_1000": 12.5,
    "ai_signals": {"density_per_1000": 0.8}
  },
  "forbidden_patterns": {
    "totals": {"filter_words": 3, "hedging": 12, "cliches": 1}
  }
}
```

## Integration Points

| Component | Relationship |
|-----------|--------------|
| book-analyzer agent | Uses this script for metric extraction |
| bible-merger agent | Consumes analysis outputs |
| novel-style-linter | Validates against derived thresholds |
| perplexity-gate | Complements with AI flatness detection |

## AI Signal Words

The script detects 60+ known AI-prose indicators including:

- Hedging: seemingly, apparently, somewhat
- Overused transitions: furthermore, moreover, consequently
- Generic intensifiers: incredibly, fundamentally, essentially
- Flat descriptors: various, numerous, significant
- AI-favored verbs: delve, utilize, leverage, facilitate
- Emotional tells: palpable, visceral, resonated

## Example Analysis

```markdown
# Style Analysis: ishiguro_sample.txt

## Sentence Metrics
| Metric | Value |
|--------|-------|
| Mean length | 16.2 words |
| Std deviation | 9.4 |
| Short sentences (<8 words) | 12.3% |
| Long sentences (>25 words) | 8.7% |

## Dialogue Profile
| Metric | Value |
|--------|-------|
| Dialogue ratio | 28.5% |
| Most common tag | "said" (78%) |
| Attribution style | Name said |

## Forbidden Patterns Found
- Filter words: 2 occurrences
- Hedging: 8 occurrences
- Clichés: 0 occurrences
```

## Requirements

- Python 3.10+
- No external dependencies (uses only stdlib)