---
name: mine-best-practices
description: Extract best practices from PR review comments to build a curated library for code review automation
license: MIT
argument-hint: "--since YYYY-MM-DD [--until YYYY-MM-DD] [--scope NAME]"
metadata:
  author: Valon Technologies
  version: "1.0"
---

# Mine Best Practices

Extract insights from PR review threads, validate against codebase, and consolidate into the best practices library.

## Your Role as Orchestrator

**You are the orchestrator** for this multi-stage pipeline. Your responsibilities:

1. **Execute scripts** - Run the Python scripts that prepare batches and aggregate results
2. **Launch subagents** - Create Task() calls to dispatch specialized subagents for extraction, validation, and synthesis. **Max 10 concurrent** — if more batches exist, wait for a wave to complete before launching the next.
3. **Validate outputs** - After each phase, review subagent outputs for quality, format correctness, and issues
5. **Stop on anomalies** - If you detect problems (malformed output, unexpected results, low yield), stop and alert the user. Do not attempt to fix issues on-the-fly.

**Key principle:** Validate each stage's output before proceeding. Only interrupt the user when something needs human judgment.

## When to Use This Skill

**Use when:**
- Building/updating the best practices library from recent PRs
- Mining a date range of PR reviews for patterns
- Seeding the library from historical review threads

**Don't use for:**
- Reviewing code against the library of current practices
- General PR reviews

## Usage

```
/mine-best-practices --since 2025-01-01
/mine-best-practices --since 2025-06-01 --until 2025-07-01 --scope backend
```

All date ranges refer to **PR merge date** (inclusive on both ends).

### Advanced

For debugging and manual intervention:

```
/mine-best-practices resume validate --identifier web_2025-01-29
/mine-best-practices status
/mine-best-practices pending
/mine-best-practices for-topic error_handling
```

`--batch-size` and `--id-prefix` are tuning parameters rarely needed in normal operation.

## Data Refresh

Before mining, ensure threads are up to date:

```bash
python3 scripts/mine.py refresh                          # Incremental (new PRs only)
python3 scripts/mine.py refresh --since 2025-01-01       # From specific merge date
python3 scripts/mine.py refresh --since 2026-01-09 --until 2026-01-26  # Specific range
python3 scripts/mine.py refresh --full                   # Full re-extraction
```

Requires `gh` CLI authenticated with repo access. Safe to re-fetch overlapping ranges (deduplicates by thread_id).

## Execution Workflow

**NOTE**: All commands run from the skill directory (where this SKILL.md lives).

### Step 1: Start Extraction

```bash
python3 scripts/mine.py extract --since 2025-01-01 --scope backend
```

Outputs extraction Task prompts for each batch.

### Step 2: Launch Extraction Subagents

Launch the Task prompts from Step 1 in parallel using the Task tool.

**Output:** `tmp/mining_{identifier}/extraction/batch_{n}.yaml`

**After subagents complete, validate:**
- Check each batch output file exists
- Verify YAML format is correct (insights list, skipped entries)
- Review yield rate (typically 30-40% extracted, 60-70% skipped)
- Spot-check 2-3 insight content samples for quality
- Stop and alert user if: yield is unusually low/high, format errors, or quality issues

### Step 3: Aggregate Extraction

```bash
python3 scripts/aggregate_extraction.py {identifier}
```

Merges results into `insights.yaml` and outputs validation Task prompts.

**After aggregation, validate:**
- Verify insights.yaml was updated with new insights
- Check insight count matches expected (extracted - duplicates)
- Review a few insight content samples
- Stop and alert user if: counts don't match, format issues, or quality concerns

### Step 4: Launch Validation Subagents

Launch validation Task prompts in parallel using the Task tool.

**Output:** `tmp/mining_{identifier}/validation/batch_{n}.yaml`

**After subagents complete, validate:**
- Check each batch output file exists
- Verify YAML format is correct (rejections list)
- Review rejection rate (expect 0-10% for recent threads, higher for older)
- Spot-check rejection reasons for appropriateness
- Stop and alert user if: rejection rate is surprisingly high/low, unclear rejection reasons, or format issues

### Step 5: Aggregate Validation

```bash
python3 scripts/aggregate_validation.py {identifier}
```

Updates `insights.yaml` with validation results and outputs topic assignment prompt.

**After aggregation, validate:**
- Verify insights.yaml statuses updated (pending → validated or rejected)
- Check all pending insights were processed
- Review rejection reasons if any
- Stop and alert user if: missing updates, unexpected rejection patterns

### Step 6: Launch Topic Assignment

Launch the topic assignment Task prompt(s) in parallel.

**Output:** `tmp/mining_{identifier}/topics/batch_{n}.yaml`

**After subagents complete:**
1. Read all `topics/batch_{n}.yaml` outputs
2. Merge all `assignments` lists into one `topics.yaml` in the working directory
3. Deduplicate `__new__:` topics: same name across batches → keep as-is (natural merge). Similar but differently-named proposals → flag to user for resolution.
4. Verify all insight_ids were assigned, check topic distribution is reasonable
5. Stop and alert user if: many new topics proposed, odd distribution, or missing assignments

### Step 7: Dispatch Synthesis

```bash
python3 scripts/dispatch_synthesis.py {identifier}
```

Applies topic assignments and outputs synthesis Task prompts (one per topic).

**After dispatch, validate:**
- Verify insights.yaml was updated with topic assignments
- Check all validated insights have topics
- Review new topic files were created (for `__new__:` topics)
- Stop and alert user if: assignments missing, too many new topics, or odd groupings

### Step 8: Launch Synthesis Subagents

Launch synthesis Task prompts in parallel using the Task tool (one per topic).

**Output:** Updates `library/{topic}.yaml` directly.

**After subagents complete, validate:**
- Check each topic's library file was updated
- Verify YAML format is correct
- Review subagent summaries (preserved/updated/added counts)
- Spot-check 1-2 updated practices for quality
- Stop and alert user if: files weren't updated, format errors, or suspicious changes

### Step 9: Verify Synthesis Quality

Check that:
- Existing practices were preserved appropriately
- New practices are well-written and actionable
- One-off patterns were filtered (not everything became a practice)
- Code examples are correct and follow codebase conventions

Stop and alert user if: practices were deleted without replacement, excessive additions, or empty library files.

### Step 10: Aggregate Synthesis

```bash
python3 scripts/aggregate_synthesis.py {identifier}
```

Marks all validated insights with topics as `synthesized`.

### Step 11: Build

```bash
python3 scripts/build_sections.py
```

Generates markdown files for the review skill.

**Output:** the configured `sections_output_dir`

### Step 12: Verify

```bash
python3 scripts/mine.py status
python3 scripts/mine.py pending
```

Confirm:
- `status` shows insights as `synthesized`
- `pending` shows no remaining work

### Step 13: Build Review Rules

```bash
python3 scripts/build_bugbot.py
```

Produces Task prompts for generating bugbot rules from the library. Launch the Task prompts (one per scope). Each subagent reads the existing BUGBOT.md and library practices, then merges incrementally — adding rules for new practices, removing rules for deleted practices, and preserving unchanged rules verbatim.

Sections use `## {topic}` headings (matching library filenames) with `**{practice_title}**` rule keys. Related practices are synthesized into fewer condensed rules.

**Targets:** Scope-specific rules files from config.yaml

**After subagents complete, verify:**
- Diff is minimal — only new/removed/updated rules, not full rewrites
- New rules are mechanical and actionable (not vague design guidance)
- No duplication with root `.cursor/BUGBOT.md` (manually maintained cross-cutting rules)

## Status Commands

```bash
python3 scripts/mine.py status        # Overview: threads, insights, library
python3 scripts/mine.py pending       # What needs work at each stage
python3 scripts/mine.py for-topic X   # All insights for topic X
```

## Data Locations

- **Threads:** `code_insights/threads.yaml`
- **Insights:** `code_insights/insights.yaml`
- **Library:** `code_insights/library/*.yaml`
- **Working dir:** `tmp/mining_{identifier}/`

## Architecture

```
User: /mine-best-practices --since 2024-01-01
  |
  v
mine.py --> Batch threads, output extraction prompts
  |
  v
Extraction subagents (parallel) --> batch_n.yaml
  |
  v
aggregate_extraction.py --> insights.yaml + validation prompts
  |
  v
Validation subagents (parallel) --> batch_n.yaml
  |
  v
aggregate_validation.py --> insights.yaml + topic prompt
  |
  v
Topic assignment subagent --> topics.yaml
  |
  v
dispatch_synthesis.py --> synthesis prompts (per topic)
  |
  v
Synthesis subagents (parallel) --> library/{topic}.yaml
  |
  v
[VERIFY: Check for anomalies]
  |
  v
aggregate_synthesis.py --> insights.yaml (status: synthesized)
  |
  v
build_sections.py --> sections/*.md
  |
  v
build_bugbot.py --> bugbot rules (via subagent)
```

## Notes

- Extraction filters out already-processed thread_ids
- Validation checks patterns against current codebase
- Synthesis prioritizes recurring patterns over one-offs
- Library practices derive from `insights.yaml` (full provenance)