--- name: ln-614-docs-fact-checker description: "Extracts verifiable claims from ALL .md files (paths, versions, counts, configs, names, endpoints), verifies each against codebase, cross-checks between documents for contradictions." allowed-tools: Read, Grep, Glob, Bash license: MIT --- > **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root. # Documentation Fact-Checker (L3 Worker) Specialized worker that extracts verifiable claims from documentation and validates each against the actual codebase. ## Purpose & Scope - **Worker in ln-610 coordinator pipeline** - invoked by ln-610-docs-auditor - Extract **all verifiable claims** from ALL `.md` files in project - Verify each claim against codebase (Grep/Glob/Read/Bash) - Detect **cross-document contradictions** (same fact stated differently) - Includes `docs/reference/`, `docs/tasks/`, `tests/` in scope - Single invocation (not per-document) — cross-doc checks require global view - Does NOT check scope alignment or structural quality ## Inputs (from Coordinator) **MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`. Receives `contextStore` with: `tech_stack`, `project_root`, `output_dir`. ## Workflow ### Phase 1: Parse Context Extract tech stack, project root, output_dir from contextStore. ### Phase 2: Discover Documents Glob ALL `.md` files in project. Exclude: - `node_modules/`, `.git/`, `dist/`, `build/` - `docs/project/.audit/` (audit output, not project docs) - `CHANGELOG.md` (historical by design) ### Phase 3: Extract Claims (Layer 1) **MANDATORY READ:** Load `shared/references/two_layer_detection.md` for detection methodology. For each document, extract verifiable claims using Grep/regex patterns. **MANDATORY READ:** Load [references/claim_extraction_rules.md](references/claim_extraction_rules.md) for detailed extraction patterns per claim type. 9 claim types: | # | Claim Type | What to Extract | Extraction Pattern | |---|-----------|-----------------|-------------------| | 1 | **File paths** | Paths to source files, dirs, configs | Backtick paths, link targets matching `src/`, `lib/`, `app/`, `docs/`, `config/`, `tests/` | | 2 | **Versions** | Package/tool/image versions | Semver patterns near dependency/package/image names | | 3 | **Counts/Statistics** | Numeric claims about codebase | `\d+ (modules|formats|endpoints|services|tables|parsers|files|workers)` | | 4 | **API endpoints** | HTTP method + path | `(GET|POST|PUT|DELETE|PATCH) /[\w/{}:]+` | | 5 | **Config keys/env vars** | Environment variables, config keys | `[A-Z][A-Z_]{2,}` in config context, `process.env.`, `os.environ` | | 6 | **CLI commands** | Shell commands | `npm run`, `python`, `docker`, `make` in backtick blocks | | 7 | **Function/class names** | Code entity references | CamelCase/snake_case in backticks or code context | | 8 | **Line number refs** | file:line patterns | `[\w/.]+:\d+` patterns | | 9 | **Docker/infra claims** | Image tags, ports, service names | Image names with tags, port mappings in docker context | Output per claim: `{doc_path, line, claim_type, claim_value, raw_context}`. ### Phase 4: Verify Claims (Layer 2) For each extracted claim, verify against codebase: | Claim Type | Verification Method | Finding Type | |------------|-------------------|--------------| | File paths | Glob or `ls` for existence | PATH_NOT_FOUND | | Versions | Grep package files (package.json, requirements.txt, docker-compose.yml), compare | VERSION_MISMATCH | | Counts | Glob/Grep to count actual entities, compare with claimed number | COUNT_MISMATCH | | API endpoints | Grep route/controller definitions | ENDPOINT_NOT_FOUND | | Config keys | Grep in source for actual usage | CONFIG_NOT_FOUND | | CLI commands | Check package.json scripts, Makefile targets, binary existence | COMMAND_NOT_FOUND | | Function/class | Grep in source for definition | ENTITY_NOT_FOUND | | Line numbers | Read file at line, check content matches claimed context | LINE_MISMATCH | | Docker/infra | Grep docker-compose.yml for image tags, ports | INFRA_MISMATCH | **False positive filtering (Layer 2 reasoning):** - Template placeholders (`{placeholder}`, `YOUR_*`, ``, `xxx`) — skip - Example/hypothetical paths (preceded by "e.g.", "for example", "such as") — skip - Future-tense claims ("will add", "planned", "TODO") — skip or LOW - Conditional claims ("if using X, configure Y") — verify only if X detected in tech_stack - External service paths (URLs, external repos) — skip - Paths in SCOPE/comment HTML blocks describing other projects — skip - `.env.example` values — skip (expected to differ from actual) ### Phase 5: Cross-Document Consistency Compare extracted claims across documents to find contradictions: | Check | Method | Finding Type | |-------|--------|--------------| | Same path, different locations | Group file path claims, check if all point to same real path | CROSS_DOC_PATH_CONFLICT | | Same entity, different version | Group version claims by entity name, compare values | CROSS_DOC_VERSION_CONFLICT | | Same metric, different count | Group count claims by subject, compare values | CROSS_DOC_COUNT_CONFLICT | | Endpoint in spec but not in guide | Compare endpoint claims across api_spec.md vs guides/runbook | CROSS_DOC_ENDPOINT_GAP | Algorithm: ``` claim_index = {} # key: normalized(claim_type + entity), value: [{doc, line, value}] FOR claim IN all_verified_claims WHERE claim.verified == true: key = normalize(claim.claim_type, claim.entity_name) claim_index[key].append({doc: claim.doc_path, line: claim.line, value: claim.claim_value}) FOR key, entries IN claim_index: unique_values = set(entry.value for entry in entries) IF len(unique_values) > 1: CREATE finding(type=CROSS_DOC_*_CONFLICT, severity=HIGH, location=entries[0].doc + ":" + entries[0].line, issue="'" + key + "' stated as '" + val1 + "' in " + doc1 + " but '" + val2 + "' in " + doc2) ``` ### Phase 6: Score & Report **MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/references/audit_scoring.md`. Calculate score using penalty formula. Write report. ## Audit Categories (for Checks table) | ID | Check | What It Covers | |----|-------|---------------| | `path_claims` | File/Directory Paths | All path references verified against filesystem | | `version_claims` | Version Numbers | Package, tool, image versions against manifests | | `count_claims` | Counts & Statistics | Numeric assertions against actual counts | | `endpoint_claims` | API Endpoints | Route definitions against controllers/routers | | `config_claims` | Config & Env Vars | Environment variables, config keys against source | | `command_claims` | CLI Commands | Scripts, commands against package.json/Makefile | | `entity_claims` | Code Entity Names | Functions, classes against source definitions | | `line_ref_claims` | Line Number References | file:line against actual file content | | `cross_doc` | Cross-Document Consistency | Same facts across documents agree | ## Severity Mapping | Issue Type | Severity | Rationale | |------------|----------|-----------| | PATH_NOT_FOUND (critical file: CLAUDE.md, runbook, api_spec) | CRITICAL | Setup/onboarding fails | | PATH_NOT_FOUND (other docs) | HIGH | Misleading reference | | VERSION_MISMATCH (major version) | HIGH | Fundamentally wrong | | VERSION_MISMATCH (minor/patch) | MEDIUM | Cosmetic drift | | COUNT_MISMATCH | MEDIUM | Misleading metric | | ENDPOINT_NOT_FOUND | HIGH | API consumers affected | | CONFIG_NOT_FOUND | HIGH | Deployment breaks | | COMMAND_NOT_FOUND | HIGH | Setup/CI breaks | | ENTITY_NOT_FOUND | MEDIUM | Confusion | | LINE_MISMATCH | LOW | Minor inaccuracy | | INFRA_MISMATCH | HIGH | Docker/deployment affected | | CROSS_DOC_*_CONFLICT | HIGH | Trust erosion, contradictory docs | ## Output Format **MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md` and `shared/templates/audit_worker_report_template.md`. Write report to `{output_dir}/614-fact-checker.md` with `category: "Fact Accuracy"` and checks: path_claims, version_claims, count_claims, endpoint_claims, config_claims, command_claims, entity_claims, line_ref_claims, cross_doc. Return summary to coordinator: ``` Report written: docs/project/.audit/ln-610/{YYYY-MM-DD}/614-fact-checker.md Score: X.X/10 | Issues: N (C:N H:N M:N L:N) ``` ## Critical Rules **MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`. - **Do not auto-fix:** Report violations only; coordinator aggregates for user - **Code is truth:** When docs contradict code, document is wrong (unless code is a bug) - **Evidence required:** Every finding includes verification command used and result - **No false positives:** Better to miss an issue than report incorrectly. When uncertain, classify as LOW with note - **Location precision:** Always include `file:line` for programmatic navigation - **Broad scope:** Scan ALL .md files — do not skip docs/reference/, tests/, or task docs - **Cross-doc matters:** Contradictions between documents erode trust more than single-doc errors - **Batch efficiently:** Extract all claims first, then verify in batches by type (all paths together, all versions together) ## Definition of Done **MANDATORY READ:** Load `shared/references/audit_worker_core_contract.md`. - contextStore parsed successfully (including output_dir) - All `.md` files discovered (broad scope) - Claims extracted across 9 types - Each claim verified against codebase with evidence - Cross-document consistency checked - False positives filtered via Layer 2 reasoning - Score calculated using penalty algorithm - Report written to `{output_dir}/614-fact-checker.md` (atomic single Write call) - Summary returned to coordinator ## Reference Files - **Audit output schema:** `shared/references/audit_output_schema.md` - **Detection methodology:** `shared/references/two_layer_detection.md` - Claim extraction rules: [references/claim_extraction_rules.md](references/claim_extraction_rules.md) --- **Version:** 1.0.0 **Last Updated:** 2026-03-06