# PR Babysitter Loop

**Goal**: Reduce the human time spent herding pull requests through review, CI, rebase, and merge while keeping the human in the judgment seat.

## Scheduling

**Recommended**:
- `/loop 5m /pr-babysit check` (Grok TUI)
- Equivalent scheduled task or GitHub Action in other environments (every 5–15 minutes during working hours is common).

Many teams run a faster "watcher" loop (2–5m) during active review periods and a slower sweeper overnight.

## Required Skills

- `pr-review-triage` — Understands your project's review norms, required checks, and what "ready to merge" means.
- `minimal-fix` — Produces the smallest possible change that addresses a specific reviewer comment or CI failure.
- `rebase-and-clean` — Safe rebase + conflict resolution patterns for your repo.

## State

Keep a small `pr-babysitter-state.md` (or a Linear board / GitHub project view) with:

- Watched PRs + current status
- Last action taken + outcome
- Human decisions that overrode the loop

Example state entry:
```markdown
- #1234 (feat/auth-refresh)
  Status: Changes requested by @reviewer
  Last action: Loop proposed minimal diff for comment X
  Human decision: Approved the diff, asked for one more test
```

## How the Loop Runs (Typical Cycle)

1. Discover open PRs authored by the team (or all PRs the user cares about).
2. For each PR:
   - Run triage skill.
   - If CI is red → spawn sub-agent with `minimal-fix` skill to address the failure.
   - If review comments exist and are actionable → propose minimal patches.
   - If ready (all checks green, approvals present, no blocking comments) → add "ready to merge" label or ping human.
3. For PRs that have been idle too long → suggest close or hand-off.
4. Write concise updates back to the PR and to state.
5. Anything ambiguous or high-risk → surface to human with context.

## Verification Strategy

- Never let the implementer sub-agent mark its own work "done".
- Use a separate verifier sub-agent (maker/checker) (or a stronger model on higher effort) that must explicitly confirm:
  - The change addresses the comment/failure.
  - No unrelated files were touched.
  - Tests/lint still pass in the worktree.
- The loop only proposes; a human (or an explicit "auto-merge" allowlist for very safe cases) actually merges.

## Human Handoff Points

- High-risk refactors
- Changes touching security, payments, auth, or core infrastructure
- When the loop has proposed > N fixes on the same PR without progress
- When the state file shows the same PR surfacing for several days

## Tool-Specific Notes

**Grok Build TUI**:
- The `pr-babysit` skill (if installed) is designed exactly for this.
- Run with `/loop 5m /pr-babysit check`.
- Use worktree isolation for any fix attempts.
- The skill can call `scheduler_delete` on itself when the watchlist is empty.

**Claude Code**:
- Boris Cherny has publicly described running very similar `/loop 5m /babysit` flows.
- Combine with `/goal` for "keep working on this PR until CI is green and no blocking comments remain".

**General**:
- Expose the state file in the repo or a shared doc so the whole team can see what the loop is doing.
- Make the loop's comments on PRs clearly signed (e.g. "🤖 Loop Engineering — PR Babysitter").

## Failure Modes & Mitigations

- **Loop proposes bad fixes** → Strong verifier sub-agent (maker/checker) + human review gate for anything beyond trivial.
- **Infinite rebase loops** → Limit number of automated rebase attempts per PR.
- **Stale state** → The loop should prune closed/merged PRs on every run.
- **Notification fatigue** → Use selective notifications (only when human action is truly required).

## Cost Profile

| Scenario | Tokens/run | Notes |
|----------|------------|-------|
| No-op (empty watchlist) | ~3k | **Target most runs** — exit early |
| Triage pass | ~80k | PR + CI status scan |
| Fix attempt (L2) | ~250k | Worktree + minimal-fix + verifier |

**Cadence**: 5m–15m · **Tier**: high · **Suggested daily cap**: 2M tokens · **Early exit required**

```bash
npx @cobusgreyling/loop-cost --pattern pr-babysitter --cadence 10m --level L1 --conservative
```

High cadence without early-exit burns tokens fast. Use `loop-budget` skill + `loop-run-log.md`.

## Success Metrics

- Average time from "ready for review" to merge (for PRs the loop touched).
- Number of human comments that were purely "LGTM, loop handled the rest".
- Reduction in "can you rebase?" or "CI is red" pings in Slack/Linear.

Start with one team or one repo. Measure for a week. Then expand.