--- name: source-unifier description: Merge multiple documentation sources (docs, GitHub, PDF) with conflict detection. Use when combining docs + code for complete skill coverage. --- # Source Unifier Skill ## Purpose Single responsibility: Intelligently merge documentation from multiple sources (websites, GitHub repos, PDFs) while detecting and transparently reporting conflicts between documented and implemented behavior. (BP-4) ## Grounding Checkpoint (Archetype 1 Mitigation) Before executing, VERIFY: - [ ] All source URLs/paths are accessible - [ ] Each source type is correctly identified (docs, github, pdf) - [ ] Output directory is writable - [ ] Merge mode is specified (rule-based or AI-enhanced) - [ ] Conflict resolution strategy is defined **DO NOT merge without inspecting each source first.** ## Uncertainty Escalation (Archetype 2 Mitigation) ASK USER instead of guessing when: - Conflict severity unclear (is doc or code authoritative?) - Multiple valid interpretations of API signature - Source versions don't match (v2 docs vs v3 code) - Merge strategy produces ambiguous results **NEVER silently resolve conflicts. Always report discrepancies.** ## Context Scope (Archetype 3 Mitigation) | Context Type | Included | Excluded | |--------------|----------|----------| | RELEVANT | All specified sources, merge config | Unrelated documentation | | PERIPHERAL | Version history for context | Other projects | | DISTRACTOR | Previous merge attempts | Unrelated codebases | ## Conflict Types | Type | Severity | Description | Example | |------|----------|-------------|---------| | Missing in code | HIGH | Documented but not implemented | API endpoint in docs, not in code | | Missing in docs | MEDIUM | Implemented but not documented | Hidden feature in code | | Signature mismatch | MEDIUM | Different parameters/types | `func(a, b)` vs `func(a, b, c=None)` | | Description mismatch | LOW | Different explanations | Wording differences | ## Workflow Steps ### Step 1: Verify Sources (Grounding) ```bash # Test documentation URL curl -I https://docs.example.com/ # Test GitHub repo gh repo view owner/repo --json name,description # Test PDF file file manual.pdf && pdfinfo manual.pdf ``` ### Step 2: Create Unified Configuration ```json { "name": "myframework", "description": "Complete framework knowledge from docs + code", "merge_mode": "rule-based", "conflict_resolution": { "missing_in_code": "warn", "missing_in_docs": "include", "signature_mismatch": "show_both", "description_mismatch": "prefer_docs" }, "sources": [ { "type": "documentation", "base_url": "https://docs.example.com/", "extract_api": true, "max_pages": 200 }, { "type": "github", "repo": "owner/myframework", "include_code": true, "code_analysis_depth": "surface", "max_issues": 100 }, { "type": "pdf", "path": "docs/manual.pdf", "extract_tables": true } ] } ``` ### Step 3: Execute Unified Scraping **Option A: With skill-seekers** ```bash skill-seekers unified --config unified-config.json ``` **Option B: Manual merge workflow** 1. Scrape each source independently 2. Extract API signatures from each 3. Compare and detect conflicts 4. Generate merged output with conflict annotations ### Step 4: Review Conflict Report The unifier generates a conflict report: ```markdown # Conflict Report: myframework ## Summary - Total APIs analyzed: 245 - Conflicts detected: 18 - Missing in code: 3 (HIGH) - Missing in docs: 8 (MEDIUM) - Signature mismatches: 5 (MEDIUM) - Description mismatches: 2 (LOW) ## HIGH Severity Conflicts ### `deprecated_function()` - **Status**: Documented but not found in code - **Documentation**: "Use this function to..." - **Code**: NOT FOUND - **Recommendation**: Remove from docs or implement ## MEDIUM Severity Conflicts ### `process_data(input: str)` - **Status**: Signature mismatch - **Documentation**: `process_data(input: str)` - **Code**: `process_data(input: str, validate: bool = True)` - **Recommendation**: Update documentation to include `validate` parameter ``` ### Step 5: Validate Merged Output ```bash # Check merged skill structure ls -la output/myframework/ # Verify conflict annotations grep -r "⚠️\|Conflict\|WARNING" output/myframework/references/ # Count conflict markers grep -c "Conflict" output/myframework/references/*.md ``` ## Recovery Protocol (Archetype 4 Mitigation) On error: 1. **PAUSE** - Preserve partial merge state 2. **DIAGNOSE** - Check error type: - `Source unavailable` → Skip source, note in report - `Parse error` → Check source format, retry with different parser - `Memory error` → Process sources sequentially - `Conflict overflow` → Increase conflict threshold or filter by severity 3. **ADAPT** - Adjust merge strategy 4. **RETRY** - Resume merge (max 3 attempts) 5. **ESCALATE** - Present partial results, ask user for conflict resolution ## Checkpoint Support State saved to: `.aiwg/working/checkpoints/source-unifier/` ``` checkpoints/source-unifier/ ├── source_1_docs.json # Processed docs ├── source_2_github.json # Processed GitHub ├── source_3_pdf.json # Processed PDF ├── conflicts.json # Detected conflicts └── merge_progress.json # Current merge state ``` Resume: `skill-seekers unified --config config.json --resume` ## Output Structure ``` output/myframework/ ├── SKILL.md # Main skill with conflict summary ├── references/ │ ├── index.md # Unified index │ ├── api_reference.md # Merged API docs (with conflict markers) │ ├── guides.md # Merged guides │ └── conflicts.md # Detailed conflict report ├── sources/ │ ├── documentation.md # Original docs content │ ├── github.md # GitHub-extracted content │ └── pdf.md # PDF-extracted content └── metadata/ ├── sources.json # Source metadata └── conflict_summary.json # Machine-readable conflicts ``` ## Conflict Markers in Output Merged content includes inline conflict markers: ```markdown #### `process_data(input: str, validate: bool = True)` ⚠️ **Conflict**: Documentation signature differs from implementation **Documentation says:** ```python def process_data(input: str) -> dict: """Process input data and return results.""" ``` **Code implementation:** ```python def process_data(input: str, validate: bool = True) -> dict: """Process input data with optional validation.""" ``` **Resolution**: Documentation should be updated to include the `validate` parameter added in v2.3. ``` ## Merge Modes | Mode | Description | Use Case | |------|-------------|----------| | `rule-based` | Apply predefined rules for conflict resolution | Fast, deterministic | | `ai-enhanced` | Use AI to intelligently merge conflicting content | Better quality, slower | | `manual` | Generate conflicts only, user resolves | Full control | ## Troubleshooting | Issue | Diagnosis | Solution | |-------|-----------|----------| | Too many conflicts | Sources very different | Filter by severity, merge incrementally | | False positives | Parser differences | Normalize API extraction | | Missing content | Source incomplete | Add supplementary source | | Merge too slow | Large sources | Use parallel processing | ## References - Skill Seekers Unified Scraping: https://github.com/jmagly/Skill_Seekers/blob/main/docs/UNIFIED_SCRAPING.md - REF-001: Production-Grade Agentic Workflows (BP-4, BP-6 model consortium parallel) - REF-002: LLM Failure Modes (Archetype 1-4 mitigations)