# Readability Metrics Quantitative formulas the comprehension axis computes alongside the pattern catalog. These are well-validated, decades-old measurements with known limitations. The scanner reports them as a panel; calibration thresholds are in `calibration.md`. The metrics measure mechanical proxies for difficulty (sentence length, syllable count, word familiarity), not actual comprehension. They're useful as a panel — no single metric is decisive — and they ground the verdict in something more than regex hits. ## The 8 metrics the scanner computes The scanner computes these eight (chosen for breadth of coverage and lack of redundancy): 1. **Flesch Reading Ease** — overall ease of reading (0–100 scale, higher = easier) 2. **Flesch-Kincaid Grade Level** — U.S. school grade required to understand 3. **SMOG Index** — grade level for full comprehension (gold standard for healthcare) 4. **Coleman-Liau Index** — grade level via character counts (no syllable estimation) 5. **Dale-Chall Score** — grade level via word familiarity against a 3,000-word list 6. **Lexical Density** — content words ÷ total words 7. **Average Sentence Length** + variance 8. **Passive Voice Percentage** Plus the comprehension-specific density signals (acronym density, named-entity density, numeric density) defined in `comprehension.md`. --- ## 1. Flesch Reading Ease (FRE) ``` 206.835 − 1.015 × (words / sentences) − 84.6 × (syllables / words) ``` **Scale:** 0–100. Higher = easier. | Score | Grade equivalent | Description | |---|---|---| | 90–100 | 5th grade | Very easy | | 80–89 | 6th grade | Easy | | 70–79 | 7th grade | Fairly easy | | 60–69 | 8th–9th grade | Plain English | | 50–59 | 10th–12th | Fairly difficult | | 30–49 | College | Difficult | | 0–29 | College graduate | Very difficult | **Targets** (per audience, see `calibration.md`): - General blog: 60–70 - Marketing copy: 65–80 - Technical docs: 40–50 - Healthcare patient material: 70–80 - Academic: 30–50 **Limitations:** Ignores reader prior knowledge, syntax complexity, jargon density, organization, and layout. Random character strings can produce absurd scores. Use as one signal among several. **Source:** [Wikipedia: Flesch–Kincaid](https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests), [Readable on FRE](https://readable.com/readability/flesch-reading-ease-flesch-kincaid-grade-level/) --- ## 2. Flesch-Kincaid Grade Level (FKGL) ``` 0.39 × (words / sentences) + 11.8 × (syllables / words) − 15.59 ``` **What it measures:** U.S. school grade level required to understand the text. **Targets:** - General audience: 7–9 - Marketing: 7–8 - Technical docs: 10–12 - Hemingway editor's default: grade 9 - GOV.UK: age 9 (~grade 4) - Average U.S. adult reads at: ~grade 8 **Limitations:** Two metrics only (sentence and word length). Doesn't measure comprehension, only mechanical proxies. Built on 1970s Navy training material; may not generalize to modern topics. **Source:** [Hemingway on FK](https://hemingwayapp.com/articles/readability/flesch-kincaid-readability-test), [AHRQ on readability formula limitations](https://www.ahrq.gov/talkingquality/resources/writing/tip6.html) --- ## 3. SMOG Index (Simple Measure of Gobbledygook) ``` 1.0430 × √(polysyllables × 30 / sentences) + 3.1291 ``` Where *polysyllables* = words of 3+ syllables. **What it measures:** Years of education needed for *full* comprehension (vs. 50–75% for Flesch-Kincaid). The healthcare gold standard. **Targets:** - Patient-facing healthcare materials: grade 6–8 - General public: 7–9 - Best for shorter texts (≥30 sentences); may be unreliable on very short samples **Limitations:** Designed for full comprehension, so scores skew higher than FK. Overestimates difficulty for short pieces. **Source:** [Readability Formulas: how to choose](https://readabilityformulas.com/how-to-decide-which-readability-formula-to-use/), [Gorby readability guide](https://gorby.app/readability/readability-formulas-guide/) --- ## 4. Coleman-Liau Index (CLI) ``` 0.0588 × L − 0.296 × S − 15.8 ``` Where: - L = average letters per 100 words - S = average sentences per 100 words **What it measures:** Grade level using character counts instead of syllables. **Targets:** Grade 8–10 for general; 12+ for academic. **Strength:** Avoids syllable-counting ambiguity, which makes it reliable for technical text with acronyms and abbreviations (where syllable counting often errors). **Limitations:** Doesn't capture vocabulary difficulty within similar character counts. *"Bake"* and *"work"* score the same as *"axes"* and *"bond"*. **Source:** [Gorby readability guide](https://gorby.app/readability/readability-formulas-guide/) --- ## 5. Dale-Chall Readability Score ``` raw_score = 0.1579 × (% difficult words) + 0.0496 × (words / sentences) if % difficult words > 5: raw_score += 3.6365 ``` Where *difficult* = not in the Dale-Chall list of 3,000 words that 80% of 4th-graders know. | Score | Grade | |---|---| | < 5.0 | 4th grade or below | | 5.0–5.9 | 5th–6th | | 6.0–6.9 | 7th–8th | | 7.0–7.9 | 9th–10th | | 8.0–8.9 | 11th–12th | | 9.0–9.9 | College | | 10.0+ | College graduate | **Strength:** The only major formula that catches simple-syntax-but-obscure-vocabulary problems. Direct check of word familiarity, not length proxy. *"The man eschewed the indolent quotidian routine"* scores high here even though every word is short. **Limitations:** Word list last updated 1995; missing modern vocabulary (it doesn't know *email*, *online*, *startup*, etc.). The scanner uses a curated subset since the full 3,000-word list is large. **Source:** [Wikipedia: Dale–Chall](https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula), [Dale-Chall word list](https://readabilityformulas.com/word-lists/the-dale-chall-word-list-for-readability-formulas/) --- ## 6. Lexical Density ``` (content_words / total_words) × 100 ``` Where *content words* = nouns, main verbs, adjectives, adverbs (everything except function words: articles, prepositions, conjunctions, pronouns, auxiliaries). **Thresholds:** | Density | Type | |---|---| | 30–40% | Spoken English | | 40–50% | Written prose, news | | 50–55% | Magazine features | | 55–65% | Academic, legal, technical | | 65%+ | Information-dense, hard to scan | **What it measures:** How "packed" with content words a text is. High density = hard to skim; reader has to absorb every word. **Limitations:** Only measures vocabulary, not syntactic complexity or organization. Density alone doesn't equal difficulty if structure is clear. The scanner uses POS-tag heuristics (no full parser) so the number is approximate. **Source:** [Wikipedia: lexical density](https://en.wikipedia.org/wiki/Lexical_density), [TextInspector on lexical density](https://textinspector.com/lexical-density-vs-lexical-diversity/) --- ## 7. Average Sentence Length + Variance ``` mean = total_words / total_sentences stddev = sqrt(Σ(sentence_length − mean)² / n) variance_ratio = stddev / mean # also called burstiness ``` **Comprehension thresholds:** | Avg length | Comprehension | |---|---| | ≤ 8 words | ~100% | | 14 words | > 90% | | 25 words | noticeable drop | | 43+ words | < 10% | **Targets:** - General prose: 15–18 words average - Marketing copy: 12–16 - GOV.UK / civic: 12–15 - Technical docs: 18–22 - Academic: 20–28 **Variance:** humans cluster 0.6–1.2 (variance / mean ratio). LLMs cluster 0.2–0.4 (uniform sentences). **Mechanism:** Working memory holds the sentence in a buffer. Long sentences overflow it; the reader loses the thread. This metric also feeds the AI-slop axis (low burstiness = AI tell). **Source:** [Letter Counter on sentence length](https://lettercounter.org/blog/sentence-length-readability/), [Siteimprove on long sentences](https://help.siteimprove.com/support/solutions/articles/80000447968-readability-why-are-long-sentences-over-20-words-) --- ## 8. Passive Voice Percentage ``` passive_count / total_sentences × 100 ``` Detected via regex: *be-verb (is/was/were/been/being/are/am) + past participle (-ed or irregular)*. **Thresholds:** | Tool | Threshold | |---|---| | Yoast | < 10% (cap) | | Readable | < 3% (recommendation) | | Monash | < 5% | | Common consensus | 4–10% for general prose | **What it measures:** Sentences where the subject receives the action rather than performs it. **Why it matters:** Passive constructions hide the agent, lengthen sentences, and reverse subject-verb-object expectation. *"Mistakes were made"* obscures who made them. **Limitations:** Some passive is fine and necessary (when the agent is unknown, irrelevant, or being de-emphasized for tact). The regex catches many but not all forms; some active sentences match the pattern (*"The book was finished by Tuesday"* — finished here is past tense, not participle). **Source:** [Yoast on passive voice](https://yoast.com/the-passive-voice-what-is-it-and-how-to-avoid-it/), [Readable on active voice](https://readable.com/blog/are-you-using-the-active-voice-in-your-content/) --- ## Cold-reader-specific density signals These three are not classic readability formulas. They're new measurements that target the failure modes in `comprehension.md` patterns F1, F2, F3. ### Acronym density per 100 words ``` undefined_acronyms / words × 100 ``` **Definition:** *Undefined* = uppercase token of 2–5 letters not in the known-acronyms allowlist (USB, FAQ, URL, API, JSON, HTML, CSS, SQL, AWS, CEO, CFO, CTO, etc.) and not introduced earlier in the document with a parenthetical expansion. **Threshold:** 3+ undefined acronyms per 100 words = high cognitive load. ### Named-entity density per 100 words ``` proper_noun_runs / words × 100 ``` **Definition:** Proper noun runs = capitalized non-sentence-start tokens (excluding common-noun-uses-of-capitalization like in titles). **Threshold:** 5+ named-entity introductions without context per 100 words = "named-entity bombing." ### Numeric density per sentence ``` max(numeric_claims_per_sentence) ``` **Threshold:** Any sentence with 3+ numeric claims (with units like %, $, K, M, year) is a stat-bombing flag. --- ## How the metrics feed the verdict The verdict is computed from the pattern catalog (`comprehension.md`) — not from the readability formulas directly. The metrics serve three purposes: 1. **Diagnosis.** When the verdict is HIGH/CRITICAL on comprehension, the metrics tell the reader *why*. "FK Grade 16, average sentence 32 words, lexical density 68%" is a different failure mode than "FK Grade 4 but acronym density 8 per 100 words." 2. **Audience calibration.** The same Flesch score has different meanings for different audiences. `calibration.md` maps the metrics to audience targets. 3. **Compound triggers.** A draft can pass the pattern catalog (no individual H violations) and still fail the metrics panel (FK Grade 16, lexical density 70%, no concrete examples). The metrics catch the cumulative texture of dense academic prose that any single pattern would miss. The audit report includes the metrics panel under the verdict line. See `audit-report-template.md` for the format. --- ## Why these eight, not all twenty researched There are 20+ readability formulas in the literature (Gunning Fog, ARI, FORCAST, Linsear Write, etc.). The scanner ships with the 8 above because: - **Three Flesch-family** (FRE, FKGL) covers the most-cited and most-validated baseline - **SMOG** is the healthcare gold standard - **Coleman-Liau** is character-based, robust against acronyms - **Dale-Chall** uniquely captures vocabulary familiarity - **Lexical density, sentence length variance, passive voice** are independent signals that the others don't catch Other formulas are largely correlated with FK and don't add diagnostic information. They're documented in `sources.md` for reference but not computed by the scanner. ## Limitations of metrics overall Janice Redish's classic critique applies: readability formulas measure mechanical proxies (length) for properties they don't actually measure (comprehension). A short-sentence document can be incoherent. A long-sentence document can be lucid. The scanner reports the metrics because they correlate with comprehension on average, but the patterns in `comprehension.md` are the load-bearing checks. The metrics calibrate; the patterns rule. **Source:** [Redish on readability formulas](https://redish.net/wp-content/uploads/Redish_on_Readability_Formulas.pdf)