--- name: devtu-docs-quality description: Comprehensive documentation quality system combining automated validation with ToolUniverse-specific auditing. Detects outdated commands, circular navigation, inconsistent terminology, auto-generated file conflicts, broken links, and structural problems. Use when reviewing documentation, before releases, after refactoring, or when user asks to audit, optimize, or improve documentation quality. --- # Documentation Quality Assurance Systematic documentation quality system combining automated validation scripts with ToolUniverse-specific structural audits. ## When to Use - Pre-release documentation review - After major refactoring (commands, APIs, tool counts changed) - User reports confusing or outdated documentation - Circular navigation or structural problems suspected - Want to establish automated validation pipeline ## Approach: Two-Phase Strategy **Phase A: Automated Validation** (15-20 min) - Create validation scripts for systematic detection - Test commands, links, terminology consistency - Priority-based fixes (blockers → polish) **Phase B: ToolUniverse-Specific Audit** (20-25 min) - Circular navigation checks - MCP configuration duplication - Tool count consistency - Auto-generated file conflicts ## Phase A: Automated Validation ### A1. Build Validation Script Create `scripts/validate_documentation.py`: ```python #!/usr/bin/env python3 """Documentation validator for ToolUniverse""" import re import glob from pathlib import Path DOCS_ROOT = Path("docs") # ToolUniverse-specific patterns DEPRECATED_PATTERNS = [ (r"python -m tooluniverse\.server", "tooluniverse-server"), (r"600\+?\s+tools", "1000+ tools"), (r"750\+?\s+tools", "1000+ tools"), ] def is_false_positive(match, content): """Smart context checking to avoid false positives""" start = max(0, match.start() - 100) end = min(len(content), match.end() + 100) context = content[start:end].lower() # Skip if discussing deprecation itself if any(kw in context for kw in ['deprecated', 'old version', 'migration']): return True # Skip technical values (ports, dimensions, etc.) if any(kw in context for kw in ['width', 'height', 'port', '":"']): return True return False def validate_file(filepath): """Check one file for issues""" with open(filepath, 'r', encoding='utf-8') as f: content = f.read() issues = [] # Check deprecated patterns for old_pattern, new_text in DEPRECATED_PATTERNS: matches = re.finditer(old_pattern, content) for match in matches: if is_false_positive(match, content): continue line_num = content[:match.start()].count('\n') + 1 issues.append({ 'file': filepath, 'line': line_num, 'severity': 'HIGH', 'found': match.group(), 'suggestion': new_text }) return issues # Scan all docs all_issues = [] for doc_file in glob.glob(str(DOCS_ROOT / "**/*.md"), recursive=True): all_issues.extend(validate_file(doc_file)) for doc_file in glob.glob(str(DOCS_ROOT / "**/*.rst"), recursive=True): all_issues.extend(validate_file(doc_file)) # Report if all_issues: print(f"❌ Found {len(all_issues)} issues\n") for issue in all_issues: print(f"{issue['file']}:{issue['line']} [{issue['severity']}]") print(f" Found: {issue['found']}") print(f" Should be: {issue['suggestion']}\n") exit(1) else: print("✅ Documentation validation passed") exit(0) ``` ### A2. Command Accuracy Check Test that commands in docs actually work: ```bash # Extract and test commands grep -r "^\s*\$\s*" docs/ | while read line; do cmd=$(echo "$line" | sed 's/.*\$ //' | cut -d' ' -f1) if ! command -v "$cmd" &> /dev/null; then echo "❌ Command not found: $cmd in $line" fi done ``` ### A3. Link Integrity Check For RST docs: ```python def check_rst_links(docs_root): """Validate :doc: references""" pattern = r':doc:`([^`]+)`' for rst_file in glob.glob(f"{docs_root}/**/*.rst", recursive=True): with open(rst_file) as f: content = f.read() matches = re.finditer(pattern, content) for match in matches: ref = match.group(1) # Check if target exists possible = [f"{ref}.rst", f"{ref}.md", f"{ref}/index.rst"] if not any(Path(docs_root, p).exists() for p in possible): print(f"❌ Broken link in {rst_file}: {ref}") ``` ### A4. Terminology Consistency Track variations and standardize: ```python # Define standard terms TERMINOLOGY = { 'api_endpoint': ['endpoint', 'url', 'route', 'path'], 'tool_count': ['tools', 'resources', 'integrations'], } def check_terminology(content): """Find inconsistent terminology""" for standard, variations in TERMINOLOGY.items(): counts = {v: content.lower().count(v) for v in variations} if len([c for c in counts.values() if c > 0]) > 2: return f"Inconsistent terminology: {counts}" return None ``` ## Phase B: ToolUniverse-Specific Audit ### B1. Circular Navigation Check **Issue**: Documentation pages that reference each other in loops. **Check manually**: ```bash # Find cross-references grep -r ":doc:\`" docs/*.rst | grep -E "(quickstart|getting_started|installation)" ``` **Checklist**: - [ ] Is there a clear "Start Here" on `docs/index.rst`? - [ ] Does navigation follow linear path: index → quickstart → getting_started → guides? - [ ] No "you should have completed X first" statements that create dependency loops? **Common patterns to fix**: - `quickstart.rst` → "See getting_started" - `getting_started.rst` → "Complete quickstart first" ### B2. Duplicate Content Check **Common duplicates in ToolUniverse**: 1. Multiple FAQs: `docs/faq.rst` and `docs/help/faq.rst` 2. Getting started: `docs/installation.rst`, `docs/quickstart.rst`, `docs/getting_started.rst` 3. MCP configuration: All files in `docs/guide/building_ai_scientists/` **Detection**: ```bash # Find MCP config duplication rg "MCP.*configuration" docs/ -l | wc -l rg "pip install tooluniverse" docs/ -l | wc -l ``` **Action**: Consolidate or clearly differentiate ### B3. Tool Count Consistency **Standard**: Use "1000+ tools" consistently. **Detection**: ```bash # Find all tool count mentions rg "[0-9]+\+?\s+(tools|resources|integrations)" docs/ --no-filename | sort -u ``` **Check**: - [ ] Are different numbers used (600, 750, 1195)? - [ ] Is "1000+ tools" used consistently? - [ ] Exact counts avoided in favor of "1000+"? ### B4. Auto-Generated File Headers **Auto-generated directories**: - `docs/tools/*_tools.rst` (from `generate_config_index.py`) - `docs/api/*.rst` (from `sphinx-apidoc`) **Required header**: ```rst .. AUTO-GENERATED - DO NOT EDIT MANUALLY .. Generated by: docs/generate_config_index.py .. Last updated: 2024-02-05 .. .. To modify, edit source files and regenerate. ``` **Check**: ```bash head -5 docs/tools/*_tools.rst | grep "AUTO-GENERATED" ``` ### B5. CLI Tools Documentation **Check pyproject.toml for all CLIs**: ```bash grep -A 20 "\[project.scripts\]" pyproject.toml ``` **Common undocumented**: - `tooluniverse-expert-feedback` - `tooluniverse-expert-feedback-web` - `generate-mcp-tools` **Action**: Ensure all in `docs/reference/cli_tools.rst` ### B6. Environment Variables **Discovery**: ```bash # Find all env vars in code rg "os\.getenv|os\.environ" src/tooluniverse/ -o | sort -u rg "TOOLUNIVERSE_[A-Z_]+" src/tooluniverse/ -o | sort -u ``` **Categories to document**: - Cache: `TOOLUNIVERSE_CACHE_*` - Logging: `TOOLUNIVERSE_LOG_*` - LLM: `TOOLUNIVERSE_LLM_*` - API keys: `*_API_KEY` **Check**: - [ ] Does `docs/reference/environment_variables.rst` exist? - [ ] Are variables categorized? - [ ] Each has: default, description, example? - [ ] Is there `.env.template` at project root? ### B7. ToolUniverse-Specific Jargon **Terms to define on first use**: - Tool Specification - EFO ID - MCP, SMCP - Compact Mode - Tool Finder - AI Scientist **Check**: - [ ] Is there `docs/glossary.rst`? - [ ] Terms defined inline with `:term:` references? - [ ] Glossary linked from main index? ### B8. CI/CD Documentation Regeneration **Required in `.github/workflows/deploy-docs.yml`**: ```yaml - name: Regenerate tool documentation run: | cd docs python generate_config_index.py python generate_remote_tools_docs.py python generate_tool_reference.py ``` **Check**: - [ ] CI/CD regenerates docs before build? - [ ] Regeneration happens BEFORE Sphinx build? - [ ] `docs/api/` excluded from cache? ## Priority Framework ### Issue Severity | Severity | Definition | Examples | Timeline | |----------|------------|----------|----------| | **CRITICAL** | Blocks release | Broken builds, dangerous instructions | Immediate | | **HIGH** | Blocks users | Wrong commands, broken setup | Same day | | **MEDIUM** | Causes confusion | Inconsistent terminology, unclear examples | Same week | | **LOW** | Reduces quality | Long files, minor formatting | Future task | ### Fix Order 1. Run automated validation → Fix HIGH issues 2. Check circular navigation → Fix CRITICAL loops 3. Verify tool counts → Standardize to "1000+" 4. Check auto-generated headers → Add missing 5. Validate CLI docs → Document all from pyproject.toml 6. Check env vars → Create reference page 7. Review jargon → Create/update glossary 8. Verify CI/CD → Add regeneration steps ## Validation Checklist Before considering docs "done": ### Accuracy - [ ] Automated validation passes - [ ] All commands tested - [ ] Version numbers current - [ ] Counts match reality ### Structure (ToolUniverse-specific) - [ ] No circular navigation - [ ] Clear "Start Here" entry point - [ ] Linear learning path - [ ] Max 2-3 level hierarchy ### Consistency - [ ] "1000+ tools" everywhere - [ ] Same terminology throughout - [ ] Auto-generated files have headers - [ ] All CLIs documented ### Completeness - [ ] All features documented - [ ] All CLIs in pyproject.toml covered - [ ] All env vars documented - [ ] Glossary includes all jargon ## Output: Audit Report ```markdown # Documentation Quality Report **Date**: [date] **Scope**: Automated validation + ToolUniverse audit ## Executive Summary - Files scanned: X - Issues found: Y (Critical: A, High: B, Medium: C, Low: D) ## Critical Issues 1. **[Issue]** - Location: file:line - Problem: [description] - Fix: [action] - Effort: [time] ## Automated Validation Results - Deprecated commands: X instances - Inconsistent counts: Y instances - Broken links: Z instances ## ToolUniverse-Specific Findings - Circular navigation: [yes/no] - Tool count variations: [list] - Missing CLI docs: [list] - Auto-generated headers: X missing ## Recommendations 1. Immediate (today): [list] 2. This week: [list] 3. Next sprint: [list] ## Validation Command Run `python scripts/validate_documentation.py` to verify fixes ``` ## CI/CD Integration Add to `.github/workflows/validate-docs.yml`: ```yaml name: Validate Documentation on: [pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install dependencies run: pip install -r requirements.txt - name: Run validation run: python scripts/validate_documentation.py - name: Check auto-generated headers run: | for f in docs/tools/*_tools.rst; do if ! head -1 "$f" | grep -q "AUTO-GENERATED"; then echo "Missing header: $f" exit 1 fi done ``` ## Common Issues Quick Reference | Issue | Detection | Fix | |-------|-----------|-----| | Deprecated command | `rg "old-cmd" docs/` | Replace with `new-cmd` | | Wrong tool count | `rg "[0-9]+ tools" docs/` | Change to "1000+ tools" | | Circular nav | Manual trace | Remove back-references | | Missing header | `head -1 file.rst` | Add AUTO-GENERATED header | | Undocumented CLI | Check pyproject.toml | Add to cli_tools.rst | | Missing env var | `rg "os.getenv" src/` | Add to env vars reference | ## Best Practices 1. **Automate first** - Build validation before manual audit 2. **Context matters** - Smart pattern matching avoids false positives 3. **Fix systematically** - Batch similar issues together 4. **Validate continuously** - Add to CI/CD pipeline 5. **ToolUniverse-specific last** - Automated checks catch most issues ## Success Criteria Documentation quality achieved when: - ✅ Automated validation reports 0 HIGH issues - ✅ No circular navigation - ✅ "1000+ tools" used consistently - ✅ All auto-generated files have headers - ✅ All CLIs from pyproject.toml documented - ✅ All env vars have reference page - ✅ Glossary covers all technical terms - ✅ CI/CD validates on every PR