---
name: audit-documentation
description:
  Run a multi-stage parallel documentation audit with 18 specialized agents
---

# Multi-Stage Parallel Documentation Audit

**Version:** 2.0 **Total Agents:** 18 parallel agents across 5 stages + 1
synthesis stage

---

## Overview

This audit uses parallel agent execution across 6 stages to comprehensively
analyze documentation quality, accuracy, and lifecycle status. Each stage
produces JSONL output that feeds into the final synthesis.

**Output Directory:**
`docs/audits/single-session/documentation/audit-[YYYY-MM-DD]/`

---

## Pre-Audit Setup

**Step 0: Episodic Memory Search (Session #128)**

Before running documentation audit, search for context from past sessions:

```javascript
// Search for past documentation audit findings
mcp__plugin_episodic -
  memory_episodic -
  memory__search({
    query: ["documentation audit", "stale docs", "broken links"],
    limit: 5,
  });

// Search for doc structure decisions
mcp__plugin_episodic -
  memory_episodic -
  memory__search({
    query: ["DOCUMENTATION_STANDARDS", "tier", "lifecycle"],
    limit: 5,
  });
```

**Why this matters:**

- Compare against previous doc health metrics
- Identify recurring documentation gaps
- Track which docs were flagged for updates before
- Prevent re-flagging known orphans or intentional gaps

---

**Step 1: Create Output Directory**

```bash
AUDIT_DIR="docs/audits/single-session/documentation/audit-$(date +%Y-%m-%d)"
mkdir -p "$AUDIT_DIR"
echo "Audit output: $AUDIT_DIR"
```

**Step 2: Load False Positives Database**

Read `docs/audits/FALSE_POSITIVES.jsonl` and note patterns to exclude from
findings (filter by category: `documentation`).

**Step 3: Check Thresholds**

Run `npm run review:check` - proceed regardless of result (user invoked
intentionally).

---

## Stage 1: Inventory & Baseline (3 Parallel Agents)

Launch these 3 agents in parallel:

### Agent 1A: Document Inventory

**Task:** Build complete document catalog

```
Count all .md files by directory and tier:
- Root level: ROADMAP.md, README.md, etc.
- docs/: by subdirectory
- .claude/: skills, plans

Extract metadata from each:
- Version number (if present)
- Last Updated date (if present)
- Status field (if present)
- Word count

Output: ${AUDIT_DIR}/stage-1-inventory.md
Format: Markdown summary with counts and file list
```

### Agent 1B: Baseline Metrics

**Task:** Capture current state via existing tools

```bash
# Run these commands and capture output:
npm run docs:check > ${AUDIT_DIR}/baseline-docs-check.txt 2>&1
npm run docs:sync-check > ${AUDIT_DIR}/baseline-sync-check.txt 2>&1
npm run format:check -- docs/ > ${AUDIT_DIR}/baseline-format-check.txt 2>&1

# Check DOCUMENTATION_INDEX.md for orphans
grep -c "orphan" docs/DOCUMENTATION_INDEX.md || echo "0"
```

Output: `${AUDIT_DIR}/stage-1-baselines.md`

### Agent 1C: Link Extraction

**Task:** Build link graph for later stages

```
Extract from all .md files:
1. Internal links: [text](path.md) -> list with source file:line
2. External URLs: https://... -> list with source file:line
3. Anchor links: #section -> list with source file:line

Output: ${AUDIT_DIR}/stage-1-links.json
Schema:
{
  "internal": [{"source": "file.md", "line": 1, "target": "other.md", "text": "..."}],
  "external": [{"source": "file.md", "line": 1, "url": "https://...", "text": "..."}],
  "anchors": [{"source": "file.md", "line": 1, "anchor": "#section", "text": "..."}]
}
```

### Stage 1 Completion Audit

Before proceeding to Stage 2, verify:

- [ ] `stage-1-inventory.md` exists and is non-empty
- [ ] `stage-1-baselines.md` exists with metrics
- [ ] `stage-1-links.json` exists and is valid JSON
- [ ] Display summary: "Stage 1 Complete: X docs, Y internal links, Z external
      URLs"

---

## Stage 2: Link Validation (4 Parallel Agents)

Launch these 4 agents in parallel using Stage 1 outputs:

### Agent 2A: Internal Link Checker

**Task:** Verify internal .md links resolve

```
For each internal link from stage-1-links.json:
1. Check target file exists
2. If link has anchor (#section), verify heading exists in target
3. Detect circular references (A→B→C→A)

Output: ${AUDIT_DIR}/stage-2-internal-links.jsonl
JSONL schema per finding (JSONL_SCHEMA_STANDARD.md format):
{
  "category": "documentation",
  "title": "Broken internal link to target.md",
  "fingerprint": "documentation::source.md::broken-link-target",
  "severity": "S1|S2",
  "effort": "E0",
  "confidence": 90,
  "files": ["source.md:123"],
  "why_it_matters": "Broken links frustrate readers and indicate stale documentation",
  "suggested_fix": "Update link to correct path or remove if target no longer exists",
  "acceptance_tests": ["Link resolves correctly", "No 404 when clicking"],
  "evidence": ["target: path.md", "resolved: /full/path.md"]
}
```

### Agent 2B: External URL Checker

**Task:** HTTP HEAD requests to external URLs

```bash
# Use the new script for external link checking
npm run docs:external-links -- --output ${AUDIT_DIR}/stage-2-external-links.jsonl
```

Or manually check each URL from stage-1-links.json with:

- 10-second timeout
- Rate limiting (100ms between same domain)
- Cache results
- Flag: 404, 403, 5xx, timeouts, redirects

Output: `${AUDIT_DIR}/stage-2-external-links.jsonl`

### Agent 2C: Cross-Reference Validator

**Task:** Verify references to project artifacts

```
Check documentation references:
1. ROADMAP item references (P1.2, Phase 3, etc.) - do they exist?
2. PR/Issue references (#123) - format valid?
3. SESSION_CONTEXT references - files mentioned exist?
4. Skill/hook path references - paths valid?

Output: ${AUDIT_DIR}/stage-2-cross-refs.jsonl
```

### Agent 2D: Orphan & Connectivity

**Task:** Find disconnected documents

```
From stage-1-links.json, identify:
1. Docs with zero inbound links (orphans)
2. Docs with only broken outbound links
3. Isolated clusters (group of docs only linking to each other)

Exclude from orphan detection:
- README.md (entry point)
- Root-level canonical docs
- Archive docs

Output: ${AUDIT_DIR}/stage-2-orphans.jsonl
```

### Stage 2 Completion Audit

Before proceeding to Stage 3, verify:

- [ ] All 4 JSONL files exist
- [ ] Run schema validation:
      `node scripts/debt/validate-schema.js ${AUDIT_DIR}/stage-2-*.jsonl`
- [ ] Display summary: "Stage 2 Complete: X link issues found"

---

## Stage 3: Content Quality (4 Parallel Agents)

Launch these 4 agents in parallel:

### Agent 3A: Accuracy Checker

**Task:** Verify content matches codebase

```bash
# Use the new script for accuracy checking
node scripts/check-content-accuracy.js --output ${AUDIT_DIR}/stage-3-accuracy.jsonl
```

Checks:

- Version numbers match package.json
- File paths mentioned exist
- npm script references valid
- Code snippet syntax (basic validation)

Output: `${AUDIT_DIR}/stage-3-accuracy.jsonl`

### Agent 3B: Completeness Checker

**Task:** Check for missing/incomplete content

```
For each document, check:
1. Required sections present per tier:
   - Tier 1: Purpose, Version History
   - Tier 2: Purpose, Version History, AI Instructions
   - Tier 3+: Purpose, Status, Version History
2. No TODO/TBD/FIXME placeholders
3. No empty sections (heading with no content)
4. No stub documents (< 100 words, excluding code blocks)

Output: ${AUDIT_DIR}/stage-3-completeness.jsonl
```

### Agent 3C: Coherence Checker

**Task:** Check terminology and duplication

```
Analyze across all documents:
1. Terminology inconsistency:
   - "skill" vs "command" vs "slash command"
   - "agent" vs "subagent" vs "worker"
   - Collect all term usages, flag inconsistencies
2. Duplicate content:
   - Exact match: identical content blocks (>50 words)
   - Fuzzy match: 80%+ similarity (same topic, minor rewording)
3. Contradictory information (conflicting guidance for same task)

Output: ${AUDIT_DIR}/stage-3-coherence.jsonl
```

### Agent 3D: Freshness Checker

**Task:** Check for stale content

```bash
# Use the new script for placement/staleness
npm run docs:placement -- --output ${AUDIT_DIR}/stage-3-freshness.jsonl
```

Tier-specific staleness thresholds:

- Tier 1 (Canonical): >60 days
- Tier 2 (Foundation): >90 days
- Tier 3+ (Planning/Reference/Guides): >120 days

Additional checks:

- Outdated version references
- Deprecated terminology still used

Output: `${AUDIT_DIR}/stage-3-freshness.jsonl`

### Stage 3 Completion Audit

Before proceeding to Stage 4, verify:

- [ ] All 4 JSONL files exist
- [ ] Schema validation passes
- [ ] Display summary: "Stage 3 Complete: X content quality issues"

---

## Stage 4: Format & Structure (3 Parallel Agents)

Launch these 3 agents in parallel:

### Agent 4A: Markdown Lint

**Task:** Run markdownlint on all docs

```bash
# Note: docs:lint should lint all markdown locations:
# "*.md" "docs/**/*.md" ".claude/**/*.md"
npm run docs:lint > ${AUDIT_DIR}/markdownlint-raw.txt 2>&1

# Parse output into JSONL findings
# Each markdownlint violation becomes a finding
```

Convert violations to JSONL format in `${AUDIT_DIR}/stage-4-markdownlint.jsonl`

### Agent 4B: Prettier Compliance

**Task:** Check Prettier formatting

```bash
npm run format:check -- docs/ > ${AUDIT_DIR}/prettier-raw.txt 2>&1

# Parse output for files that need formatting
```

Convert violations to JSONL format in `${AUDIT_DIR}/stage-4-prettier.jsonl`

### Agent 4C: Structure Standards

**Task:** Check document structure conventions

````
For each document, verify:
1. Frontmatter present and valid (for skill docs)
2. Required headers per tier
3. Version history format (table with Version|Date|Description)
4. Table formatting consistency (aligned pipes)
5. Code block language tags (all ``` blocks have language)
6. Heading uniqueness (no duplicate headings in same doc)

Output: ${AUDIT_DIR}/stage-4-structure.jsonl
````

### Stage 4 Completion Audit

Before proceeding to Stage 5, verify:

- [ ] All 3 JSONL files exist
- [ ] Schema validation passes
- [ ] Display summary: "Stage 4 Complete: X format issues"

---

## Stage 5: Placement & Lifecycle (4 Parallel Agents)

Launch Agents 5A, 5B, 5C in parallel, then 5D sequentially after 5B completes:

### Agent 5A: Location Validator

**Task:** Check documents in correct directories

```
Verify placement rules:
- Plans → docs/plans/ or .planning/
- Archives → docs/archive/
- Templates → docs/templates/
- Audits → docs/audits/
- Tier 1 → root level
- Tier 2 → docs/ or root

Output: ${AUDIT_DIR}/stage-5-location.jsonl
```

### Agent 5B: Archive Candidate Finder (Surface-Level)

**Task:** Quick scan for archive candidates

```
Surface-level detection:
1. Completed plans not archived (status: completed)
2. Session handoffs > 30 days old
3. Old audit results (> 60 days, likely in MASTER_DEBT.jsonl already)
4. Plans not referenced in current ROADMAP.md

Output: ${AUDIT_DIR}/stage-5-archive-candidates-raw.jsonl
```

### Agent 5C: Cleanup Candidate Finder

**Task:** Find files that should be deleted/merged

```
Identify:
1. Exact duplicate files (same content hash)
2. Near-empty files (< 50 words)
3. Draft files > 60 days old
4. Temp/test files (names starting with temp, test, scratch)
5. Merge candidates (fragmented docs on same topic)

Output: ${AUDIT_DIR}/stage-5-cleanup-candidates.jsonl
```

### Agent 5D: Deep Lifecycle Analysis (Runs After 5B)

**Sequential dependency: Read 5B output first**

**Task:** Detailed analysis of archive candidates

```
For each candidate from stage-5-archive-candidates-raw.jsonl:
1. Read the actual document content
2. Determine original purpose
3. Assess current status:
   - Purpose met? (completed successfully)
   - Overtaken? (superseded by other work)
   - Deprecated? (no longer relevant)
4. Check if content was consumed:
   - Audit findings → in MASTER_DEBT.jsonl?
   - Plan outcomes → documented elsewhere?
5. Provide recommendation with justification

Output: ${AUDIT_DIR}/stage-5-lifecycle-analysis.jsonl
Extended schema:
{
  ...standard fields...,
  "purpose": "Original intent of the document",
  "status_reason": "Why marked for archive",
  "consumed_by": "Where content lives now (if applicable)",
  "recommendation": "ARCHIVE|DELETE|KEEP|MERGE_INTO:<target>"
}
```

### Stage 5 Completion Audit

Before proceeding to Stage 6, verify:

- [ ] All 4 JSONL files exist (5A, 5B raw, 5C, 5D analysis)
- [ ] Schema validation passes
- [ ] Display summary: "Stage 5 Complete: X lifecycle issues, Y archive
      candidates"

---

## Stage 6: Synthesis & Prioritization (Sequential)

This stage runs sequentially after all parallel stages complete.

### Step 6.1: Merge All Findings

```bash
# Combine all stage outputs
cat ${AUDIT_DIR}/stage-2-*.jsonl \
    ${AUDIT_DIR}/stage-3-*.jsonl \
    ${AUDIT_DIR}/stage-4-*.jsonl \
    ${AUDIT_DIR}/stage-5-location.jsonl \
    ${AUDIT_DIR}/stage-5-archive-candidates-raw.jsonl \
    ${AUDIT_DIR}/stage-5-cleanup-candidates.jsonl \
    ${AUDIT_DIR}/stage-5-lifecycle-analysis.jsonl > ${AUDIT_DIR}/all-findings-raw.jsonl
```

### Step 6.2: Deduplicate

**Input:** `${AUDIT_DIR}/all-findings-raw.jsonl` **Output:**
`${AUDIT_DIR}/all-findings-deduped.jsonl`

```
Remove duplicates where same file:line appears from multiple agents.
Keep the finding with:
1. Higher severity
2. Higher confidence
3. More evidence items
```

### Step 6.3: Cross-Reference FALSE_POSITIVES.jsonl

**Input:** `${AUDIT_DIR}/all-findings-deduped.jsonl` **Output:**
`${AUDIT_DIR}/all-findings.jsonl` (final file for TDMS intake)

```
Filter out findings matching patterns in docs/audits/FALSE_POSITIVES.jsonl:
- Match by file pattern
- Match by title pattern
- Check expiration dates
```

### Step 6.4: Priority Scoring

```
For each finding, calculate priority:

priority = (severityWeight × categoryMultiplier × confidenceWeight) / effortWeight

Where:
- severityWeight: S0=100, S1=50, S2=20, S3=5
- categoryMultiplier: links=1.5, accuracy=1.3, freshness=1.0, format=0.8
- confidenceWeight: HIGH=1.0, MEDIUM=0.7, LOW=0.4
- effortWeight: E0=1, E1=2, E2=4, E3=8

Sort findings by priority descending.
```

### Step 6.5: Generate Action Plan

```
Create three queues:

1. IMMEDIATE FIXES (S0/S1, E0/E1):
   - List with specific file:line and fix command

2. ARCHIVE QUEUE:
   - node scripts/archive-doc.js commands for each candidate

3. DELETE/MERGE QUEUE:
   - Justification for each deletion
   - Merge target for consolidations
```

### Step 6.6: Generate Final Report

Output: `${AUDIT_DIR}/FINAL_REPORT.md`

````markdown
# Documentation Audit Report - [DATE]

## Executive Summary

- **Total findings:** X
- **By severity:** S0: X, S1: X, S2: X, S3: X
- **By category:** Links: X, Content: X, Format: X, Lifecycle: X
- **False positives filtered:** X

## Baseline Comparison

| Metric               | Before | After Fixes |
| -------------------- | ------ | ----------- |
| docs:check errors    | X      | -           |
| docs:sync issues     | X      | -           |
| Orphaned docs        | X      | -           |
| Stale docs (>90 day) | X      | -           |

## Top 20 Priority Items

| #   | Severity | File | Issue | Effort |
| --- | -------- | ---- | ----- | ------ |
| 1   | S1       | ...  | ...   | E0     |

## Stage-by-Stage Breakdown

### Stage 2: Link Validation

- Internal link errors: X
- External link errors: X
- Orphaned documents: X

### Stage 3: Content Quality

- Accuracy issues: X
- Completeness issues: X
- Coherence issues: X
- Freshness issues: X

### Stage 4: Format & Structure

- Markdownlint violations: X
- Prettier violations: X
- Structure issues: X

### Stage 5: Lifecycle

- Location issues: X
- Archive candidates: X
- Cleanup candidates: X

## Action Plan

### Immediate Fixes (Do Now)

1. `file.md:line` - Fix description

### Archive Queue

```bash
node scripts/archive-doc.js "path/to/doc.md"
```
````

### Cleanup Queue

- DELETE: `path/to/temp-file.md` (reason)
- MERGE: `fragmented.md` → `main-doc.md`

## Recommendations

1. ...
2. ...

---

## Post-Audit Actions

### 1. Save Outputs

Verify all files saved to `${AUDIT_DIR}/`:

- [ ] stage-1-\*.md, stage-1-links.json
- [ ] stage-2-\*.jsonl
- [ ] stage-3-\*.jsonl
- [ ] stage-4-\*.jsonl
- [ ] stage-5-\*.jsonl
- [ ] all-findings.jsonl (merged, deduplicated)
- [ ] FINAL_REPORT.md

### 2. TDMS Integration

```bash
node scripts/debt/intake-audit.js ${AUDIT_DIR}/all-findings.jsonl --source "audit-documentation-$(date +%Y-%m-%d)"
```

### 3. Update AUDIT_TRACKER.md

Add entry to "Documentation Audits" table:

| Date    | Session | Commits | Files | Findings  | Confidence | Validation |
| ------- | ------- | ------- | ----- | --------- | ---------- | ---------- |
| [today] | [#]     | [X]     | [Y]   | [summary] | HIGH       | PASSED     |

### 4. Reset Threshold

Single-session audits reset the documentation category threshold.

### 5. Offer Fixes

Ask user: "Would you like me to fix any immediate items now?"

---

## Category Mapping for TDMS

| Stage         | Category ID Prefix | TDMS Category |
| ------------- | ------------------ | ------------- |
| 2 - Links     | DOC-LINK-\*        | documentation |
| 3 - Content   | DOC-CONTENT-\*     | documentation |
| 4 - Format    | DOC-FORMAT-\*      | documentation |
| 5 - Lifecycle | DOC-LIFECYCLE-\*   | documentation |

---

## Recovery Procedures

### If Stage Fails

1. **Missing output file:** Re-run specific agent with explicit file write
2. **Empty output file:** Check agent for errors, re-run with verbose
3. **Schema validation fails:** Parse errors line-by-line, fix malformed
4. **Context compaction:** Verify AUDIT_DIR path, re-run from last checkpoint

### If Context Compacts Mid-Audit

Read the partial outputs already saved to `${AUDIT_DIR}/` and resume from the
last completed stage.

---

## Multi-AI Escalation

After 3 single-session documentation audits, a full multi-AI Documentation Audit
is recommended. Track in AUDIT_TRACKER.md "Single audits completed" counter.

---

## Update Dependencies

When modifying this skill, also update:

| Document                                                  | Section                        |
| --------------------------------------------------------- | ------------------------------ |
| `docs/templates/MULTI_AI_DOCUMENTATION_AUDIT_TEMPLATE.md` | Sync category list             |
| `docs/SLASH_COMMANDS_REFERENCE.md`                        | /audit-documentation reference |

---

## Version History

| Version | Date       | Description                                             |
| ------- | ---------- | ------------------------------------------------------- |
| 2.0     | 2026-02-02 | Complete rewrite: 6-stage parallel audit with 18 agents |
| 1.0     | 2025-xx-xx | Original single-session sequential audit                |

---

## Documentation References

Before running this audit, review:

### TDMS Integration (Required)

- [PROCEDURE.md](docs/technical-debt/PROCEDURE.md) - Full TDMS workflow
- [MASTER_DEBT.jsonl](docs/technical-debt/MASTER_DEBT.jsonl) - Canonical debt
  store
- Intake command:
  `node scripts/debt/intake-audit.js <output.jsonl> --source "audit-documentation-<date>"`

### Documentation Standards (Critical for This Audit)

- [JSONL_SCHEMA_STANDARD.md](docs/templates/JSONL_SCHEMA_STANDARD.md) - Output
  format requirements and TDMS field mapping
- [DOCUMENTATION_STANDARDS.md](docs/DOCUMENTATION_STANDARDS.md) - **The
  canonical guide** this audit validates against (5-tier hierarchy, metadata
  requirements, quality protocols)