--- name: mine-best-practices description: Extract best practices from PR review comments to build a curated library for code review automation license: MIT argument-hint: "--since YYYY-MM-DD [--until YYYY-MM-DD] [--scope NAME]" metadata: author: Valon Technologies version: "1.0" --- # Mine Best Practices Extract insights from PR review threads, validate against codebase, and consolidate into the best practices library. ## Your Role as Orchestrator **You are the orchestrator** for this multi-stage pipeline. Your responsibilities: 1. **Execute scripts** - Run the Python scripts that prepare batches and aggregate results 2. **Launch subagents** - Create Task() calls to dispatch specialized subagents for extraction, validation, and synthesis. **Max 10 concurrent** — if more batches exist, wait for a wave to complete before launching the next. 3. **Validate outputs** - After each phase, review subagent outputs for quality, format correctness, and issues 5. **Stop on anomalies** - If you detect problems (malformed output, unexpected results, low yield), stop and alert the user. Do not attempt to fix issues on-the-fly. **Key principle:** Validate each stage's output before proceeding. Only interrupt the user when something needs human judgment. ## When to Use This Skill **Use when:** - Building/updating the best practices library from recent PRs - Mining a date range of PR reviews for patterns - Seeding the library from historical review threads **Don't use for:** - Reviewing code against the library of current practices - General PR reviews ## Usage ``` /mine-best-practices --since 2025-01-01 /mine-best-practices --since 2025-06-01 --until 2025-07-01 --scope backend ``` All date ranges refer to **PR merge date** (inclusive on both ends). ### Advanced For debugging and manual intervention: ``` /mine-best-practices resume validate --identifier web_2025-01-29 /mine-best-practices status /mine-best-practices pending /mine-best-practices for-topic error_handling ``` `--batch-size` and `--id-prefix` are tuning parameters rarely needed in normal operation. ## Data Refresh Before mining, ensure threads are up to date: ```bash python3 scripts/mine.py refresh # Incremental (new PRs only) python3 scripts/mine.py refresh --since 2025-01-01 # From specific merge date python3 scripts/mine.py refresh --since 2026-01-09 --until 2026-01-26 # Specific range python3 scripts/mine.py refresh --full # Full re-extraction ``` Requires `gh` CLI authenticated with repo access. Safe to re-fetch overlapping ranges (deduplicates by thread_id). ## Execution Workflow **NOTE**: All commands run from the skill directory (where this SKILL.md lives). ### Step 1: Start Extraction ```bash python3 scripts/mine.py extract --since 2025-01-01 --scope backend ``` Outputs extraction Task prompts for each batch. ### Step 2: Launch Extraction Subagents Launch the Task prompts from Step 1 in parallel using the Task tool. **Output:** `tmp/mining_{identifier}/extraction/batch_{n}.yaml` **After subagents complete, validate:** - Check each batch output file exists - Verify YAML format is correct (insights list, skipped entries) - Review yield rate (typically 30-40% extracted, 60-70% skipped) - Spot-check 2-3 insight content samples for quality - Stop and alert user if: yield is unusually low/high, format errors, or quality issues ### Step 3: Aggregate Extraction ```bash python3 scripts/aggregate_extraction.py {identifier} ``` Merges results into `insights.yaml` and outputs validation Task prompts. **After aggregation, validate:** - Verify insights.yaml was updated with new insights - Check insight count matches expected (extracted - duplicates) - Review a few insight content samples - Stop and alert user if: counts don't match, format issues, or quality concerns ### Step 4: Launch Validation Subagents Launch validation Task prompts in parallel using the Task tool. **Output:** `tmp/mining_{identifier}/validation/batch_{n}.yaml` **After subagents complete, validate:** - Check each batch output file exists - Verify YAML format is correct (rejections list) - Review rejection rate (expect 0-10% for recent threads, higher for older) - Spot-check rejection reasons for appropriateness - Stop and alert user if: rejection rate is surprisingly high/low, unclear rejection reasons, or format issues ### Step 5: Aggregate Validation ```bash python3 scripts/aggregate_validation.py {identifier} ``` Updates `insights.yaml` with validation results and outputs topic assignment prompt. **After aggregation, validate:** - Verify insights.yaml statuses updated (pending → validated or rejected) - Check all pending insights were processed - Review rejection reasons if any - Stop and alert user if: missing updates, unexpected rejection patterns ### Step 6: Launch Topic Assignment Launch the topic assignment Task prompt(s) in parallel. **Output:** `tmp/mining_{identifier}/topics/batch_{n}.yaml` **After subagents complete:** 1. Read all `topics/batch_{n}.yaml` outputs 2. Merge all `assignments` lists into one `topics.yaml` in the working directory 3. Deduplicate `__new__:` topics: same name across batches → keep as-is (natural merge). Similar but differently-named proposals → flag to user for resolution. 4. Verify all insight_ids were assigned, check topic distribution is reasonable 5. Stop and alert user if: many new topics proposed, odd distribution, or missing assignments ### Step 7: Dispatch Synthesis ```bash python3 scripts/dispatch_synthesis.py {identifier} ``` Applies topic assignments and outputs synthesis Task prompts (one per topic). **After dispatch, validate:** - Verify insights.yaml was updated with topic assignments - Check all validated insights have topics - Review new topic files were created (for `__new__:` topics) - Stop and alert user if: assignments missing, too many new topics, or odd groupings ### Step 8: Launch Synthesis Subagents Launch synthesis Task prompts in parallel using the Task tool (one per topic). **Output:** Updates `library/{topic}.yaml` directly. **After subagents complete, validate:** - Check each topic's library file was updated - Verify YAML format is correct - Review subagent summaries (preserved/updated/added counts) - Spot-check 1-2 updated practices for quality - Stop and alert user if: files weren't updated, format errors, or suspicious changes ### Step 9: Verify Synthesis Quality Check that: - Existing practices were preserved appropriately - New practices are well-written and actionable - One-off patterns were filtered (not everything became a practice) - Code examples are correct and follow codebase conventions Stop and alert user if: practices were deleted without replacement, excessive additions, or empty library files. ### Step 10: Aggregate Synthesis ```bash python3 scripts/aggregate_synthesis.py {identifier} ``` Marks all validated insights with topics as `synthesized`. ### Step 11: Build ```bash python3 scripts/build_sections.py ``` Generates markdown files for the review skill. **Output:** the configured `sections_output_dir` ### Step 12: Verify ```bash python3 scripts/mine.py status python3 scripts/mine.py pending ``` Confirm: - `status` shows insights as `synthesized` - `pending` shows no remaining work ### Step 13: Build Review Rules ```bash python3 scripts/build_bugbot.py ``` Produces Task prompts for generating bugbot rules from the library. Launch the Task prompts (one per scope). Each subagent reads the existing BUGBOT.md and library practices, then merges incrementally — adding rules for new practices, removing rules for deleted practices, and preserving unchanged rules verbatim. Sections use `## {topic}` headings (matching library filenames) with `**{practice_title}**` rule keys. Related practices are synthesized into fewer condensed rules. **Targets:** Scope-specific rules files from config.yaml **After subagents complete, verify:** - Diff is minimal — only new/removed/updated rules, not full rewrites - New rules are mechanical and actionable (not vague design guidance) - No duplication with root `.cursor/BUGBOT.md` (manually maintained cross-cutting rules) ## Status Commands ```bash python3 scripts/mine.py status # Overview: threads, insights, library python3 scripts/mine.py pending # What needs work at each stage python3 scripts/mine.py for-topic X # All insights for topic X ``` ## Data Locations - **Threads:** `code_insights/threads.yaml` - **Insights:** `code_insights/insights.yaml` - **Library:** `code_insights/library/*.yaml` - **Working dir:** `tmp/mining_{identifier}/` ## Architecture ``` User: /mine-best-practices --since 2024-01-01 | v mine.py --> Batch threads, output extraction prompts | v Extraction subagents (parallel) --> batch_n.yaml | v aggregate_extraction.py --> insights.yaml + validation prompts | v Validation subagents (parallel) --> batch_n.yaml | v aggregate_validation.py --> insights.yaml + topic prompt | v Topic assignment subagent --> topics.yaml | v dispatch_synthesis.py --> synthesis prompts (per topic) | v Synthesis subagents (parallel) --> library/{topic}.yaml | v [VERIFY: Check for anomalies] | v aggregate_synthesis.py --> insights.yaml (status: synthesized) | v build_sections.py --> sections/*.md | v build_bugbot.py --> bugbot rules (via subagent) ``` ## Notes - Extraction filters out already-processed thread_ids - Validation checks patterns against current codebase - Synthesis prioritizes recurring patterns over one-offs - Library practices derive from `insights.yaml` (full provenance)