# Advanced Patterns — Guards, MCP, CI/CD, and More

Reference for power users: guard commands, custom verification, MCP integration, CI/CD pipelines, and the principles behind the loop.

---

## Guard Commands

Guards prevent regressions while you optimize a different metric. Set a Guard when your metric is NOT your test suite — otherwise you risk improving one dimension while silently breaking another.

**Use Guard when:** optimizing bundle size, Lighthouse score, query performance, or any metric that doesn't itself catch functional regressions.

### Guard recovery flow

1. Metric improves, but guard command fails
2. Claude reverts the change immediately
3. Reads guard output to understand what broke
4. Reworks the optimization to avoid the regression (max 2 attempts)
5. If 2 attempts fail — discard and move on

### Examples

```
# Optimize bundle size WITHOUT breaking tests
/autoresearch
Goal: Reduce bundle size below 200KB
Verify: npm run build 2>&1 | grep "gzipped"
Guard: npm test
```

```
# Optimize performance WITHOUT breaking types
/autoresearch
Goal: Reduce p95 response time to under 100ms
Verify: npm run bench:api | grep "p95"
Guard: tsc --noEmit && npm test
```

```
# Optimize Lighthouse WITHOUT breaking e2e tests
/autoresearch
Goal: Lighthouse performance score 95+
Verify: npx lighthouse http://localhost:3000 --output=json --quiet | jq '.categories.performance.score * 100'
Guard: npx playwright test
```

```
# Multiple guard commands — chained with &&
/autoresearch
Goal: Reduce LOC by 30% in services module
Verify: wc -l src/services/**/*.ts | tail -1
Guard: npm test && tsc --noEmit && npx eslint src/
```

```
# Python: coverage + mypy
/autoresearch
Goal: Increase pytest coverage to 90%
Verify: pytest --cov=app 2>&1 | grep "TOTAL" | awk '{print $4}'
Guard: mypy app/ --strict
```

---

## Custom Verification Scripts

For complex metrics that can't be expressed as a one-liner, write a dedicated script.

**Rules for verify scripts:**
- Must output a parseable number (Claude extracts it mechanically)
- Must be deterministic — same input produces same output every time
- Must be fast — under 30 seconds; faster = more experiments per session
- Must exit 0 on success, non-zero on failure

### Python example

```python
#!/usr/bin/env python3
# scripts/verify-coverage.py
import subprocess, re, sys

result = subprocess.run(
    ["npm", "test", "--", "--coverage"],
    capture_output=True, text=True
)

match = re.search(r'All files\s*\|\s*([\d.]+)', result.stdout)
if match:
    print(f"coverage: {match.group(1)}")
    sys.exit(0)
else:
    print("coverage: 0")
    sys.exit(1)
```

```
/autoresearch
Verify: python scripts/verify-coverage.py | grep "coverage"
```

### Node.js example

```javascript
// scripts/score-example.js — Template for custom scoring
const fs = require('fs');
const file = process.argv[2];
const content = fs.readFileSync(file, 'utf-8');

// Your scoring logic here
const score = content.split('\n').filter(l => l.startsWith('- ')).length;

// Output MUST be a single number on its own line
console.log(`SCORE: ${score}`);
process.exit(score > 0 ? 0 : 1);
```

### Shell script example

```bash
#!/bin/bash
# scripts/lint-count.sh
count=$(npx eslint src/ 2>&1 | grep -c "error" || echo "0")
echo "lint_errors: $count"
exit 0
```

### Composite metric example

Combine multiple signals into one weighted score:

```python
#!/usr/bin/env python3
# scripts/composite-score.py
import subprocess, re, sys

# Run tests, get coverage
cov_out = subprocess.run(["npm", "test", "--", "--coverage"],
    capture_output=True, text=True).stdout
cov = float(re.search(r'All files\s*\|\s*([\d.]+)', cov_out).group(1))

# Count lint errors
lint_out = subprocess.run(["npx", "eslint", "src/"],
    capture_output=True, text=True).stderr
lint_errors = lint_out.count("error")

# Count TypeScript any usages
ts_out = subprocess.run(["grep", "-r", "any", "src/", "--include=*.ts", "-l"],
    capture_output=True, text=True).stdout
any_count = len(ts_out.strip().split('\n')) if ts_out.strip() else 0

# Weighted formula: coverage matters most, penalize quality issues
score = (cov * 0.6) - (lint_errors * 0.5) - (any_count * 0.3)
print(f"composite: {score:.2f}")
sys.exit(0)
```

---

## Using with MCP Servers

Any MCP server configured in Claude Code is available during the loop. This enables real-time data-driven iteration — Claude can query live databases, analytics platforms, or external APIs as part of each verify step.

### Database-aware optimization

Use a PostgreSQL MCP server to iterate on real query performance:

```
/autoresearch
Goal: Reduce average query time for dashboard queries
Scope: src/queries/**/*.sql, src/api/dashboard/**/*.ts
Metric: avg query time in ms (lower is better)
Verify: psql -c "SELECT avg(duration_ms) FROM query_log WHERE created_at > now() - interval '5 min'" | grep -oP '\d+\.\d+'
Guard: npm test
```

Or via MCP directly:

```
/autoresearch
Goal: Optimize slow dashboard queries — reduce p95 query time
Scope: queries/dashboard/*.sql
Metric: avg query time in ms (lower is better)
Verify: Use MCP postgres tool to run EXPLAIN ANALYZE on each query, sum total costs
```

### Analytics-driven content optimization

```
/autoresearch
Goal: Improve blog post structure based on engagement metrics
Scope: content/blog/*.md
Metric: avg time on page for modified posts (higher is better)
Verify: Use MCP analytics tool to fetch page metrics, compare against baseline
```

### API endpoint verification

```
/autoresearch
Goal: All API endpoints return valid JSON with correct status codes in <200ms
Scope: src/api/**/*.ts
Metric: endpoints passing all checks (higher is better)
Verify: Use MCP HTTP tool to hit each endpoint, validate response schema + timing
```

### Recommended MCP servers

| MCP Server | Use Case | Metric Source |
|---|---|---|
| **PostgreSQL** | Query optimization, data validation | Query execution time, row counts |
| **GitHub** | Issue triage, PR quality, CI status | Issue counts, check pass rates |
| **Filesystem** | File organization, cleanup | File counts, directory depth |
| **Puppeteer/Playwright** | Visual regression, performance | Lighthouse scores, screenshot diffs |
| **Slack** | Notification quality, alert tuning | Message delivery, response times |
| **Stripe** | Payment flow optimization | Checkout completion rates |
| **Sentry** | Error reduction | Error count, crash-free rate |
| **Cloudflare** | Edge performance | Cache hit rate, TTFB |

---

## Combining with APIs

Beyond MCP, Claude calls APIs directly via scripts in the verify step.

### GitHub API integration (using gh CLI)

```bash
# scripts/gh-check-rate.sh — check CI pass rate for recent PRs
gh pr list --state merged --limit 20 --json statusCheckRollup \
  | jq '[.[] | .statusCheckRollup[] | select(.conclusion=="SUCCESS")] | length'
```

```
/autoresearch
Goal: Improve CI pass rate for new PRs
Verify: bash scripts/gh-check-rate.sh
Guard: npm test
```

### REST API verification

```javascript
// scripts/lighthouse-score.js
const { exec } = require('child_process');
exec('npx lighthouse http://localhost:3000 --output json --quiet', (err, stdout) => {
  const report = JSON.parse(stdout);
  const perf = report.categories.performance.score * 100;
  console.log(`SCORE: ${perf}`);
  process.exit(perf > 0 ? 0 : 1);
});
```

```
/autoresearch
Goal: Lighthouse performance score above 95
Scope: src/components/**/*.tsx, src/app/**/*.tsx
Metric: Lighthouse performance score (higher is better)
Verify: node scripts/lighthouse-score.js
```

### GraphQL query optimization

```javascript
// scripts/gql-response-time.js
const { execSync } = require('child_process');

const start = Date.now();
const result = execSync(`curl -s -X POST http://localhost:4000/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ products(first:20) { id name price } }"}'`);
const duration = Date.now() - start;

const data = JSON.parse(result.toString());
const hasErrors = data.errors && data.errors.length > 0;

console.log(`response_ms: ${duration}`);
process.exit(hasErrors ? 1 : 0);
```

---

## Claude Code Patterns

### Using with other Claude Code skills

Autoresearch runs inside your Claude Code session — it has access to all skills, MCP tools, and file context. Common combinations:

- Run `/autoresearch:plan` first to validate your verify command before committing to a long loop
- Chain commands: `debug → fix → ship`, `plan → loop → security → ship`
- Use `/autoresearch:security --diff` after implementing new auth flows

### Pattern catalog

| Pattern | When to use | Config hint |
|---------|-------------|-------------|
| Run Overnight | Large goal, no time pressure | No `Iterations:` limit |
| Controlled Sprint | 30-min focused session | `Iterations: 10-15` |
| Compound Improvements | Many small wins compound | Start easiest first |
| Explore and Exploit | Stuck, need bold experiments | Prompt Claude to try radical approaches |
| Refactor Without Breaking | Safe LOC reduction | Pair metric with Guard |
| Progressive Hardening | Multi-phase quality gates | Chain goals in prompt |

### Workspace configuration

Autoresearch reads your Claude Code settings automatically. No additional configuration file is required.

**Global installation** (available in every project):

```bash
# Install once via Claude Code skill system
# See INSTALLATION.md in the autoresearch skill directory
```

**Project-level** — add `/autoresearch` to `.claude/` in your repo so the team shares consistent commands without individual installation.

---

## CI/CD Integration

### GitHub Actions: security gate

Blocks PRs if critical findings exist; runs a deeper weekly scheduled audit:

```yaml
# .github/workflows/autoresearch-security.yml
name: Security Gate
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2am
jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Security audit
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            claude -p "/autoresearch:security --diff --fail-on critical --iterations 5"
          else
            claude -p "/autoresearch:security --fail-on high --iterations 15"
          fi
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: security/
```

### GitHub Actions: auto-fix workflow

Manual trigger with configurable iteration count:

```yaml
# .github/workflows/autoresearch-fix.yml
name: Auto-Fix
on:
  workflow_dispatch:
    inputs:
      iterations:
        description: 'Number of fix iterations'
        default: '20'
jobs:
  fix:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Auto-fix
        run: |
          claude "/autoresearch:fix Iterations: ${{ inputs.iterations }}"
```

### GitHub Actions: scheduled overnight run

```yaml
# .github/workflows/autoresearch-nightly.yml
name: Nightly Optimization
on:
  schedule:
    - cron: '0 1 * * *'  # 1am daily
jobs:
  optimize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run overnight loop
        run: |
          claude -p "/autoresearch
          Goal: Reduce lint errors and improve coverage
          Iterations: 50
          Guard: npm test"
      - name: Commit improvements
        run: |
          git config user.name "autoresearch-bot"
          git config user.email "bot@example.com"
          git push origin main
```

### GitLab CI example

```yaml
# .gitlab-ci.yml
autoresearch-security:
  stage: test
  only:
    - merge_requests
  script:
    - claude -p "/autoresearch:security --diff --fail-on critical --iterations 5"
  artifacts:
    paths:
      - security/
```

### Pre-commit hook example

```bash
#!/bin/bash
# .git/hooks/pre-commit
# Quick security scan on staged files only
staged=$(git diff --cached --name-only | tr '\n' ' ')
if [ -n "$staged" ]; then
  claude -p "/autoresearch:security --diff --fail-on critical --iterations 3 --scope $staged"
fi
```

### CI/CD integration

Run autoresearch in automated pipelines for nightly optimization:

```yaml
# .github/workflows/autoresearch.yml
name: Nightly Autoresearch
on:
  schedule:
    - cron: '0 2 * * *'  # 2am UTC daily
  workflow_dispatch:
    inputs:
      iterations:
        description: 'Number of iterations'
        default: '10'

jobs:
  optimize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci

      - name: Run autoresearch
        run: |
          claude --print "/autoresearch
          Goal: Improve test coverage
          Scope: src/**/*.ts
          Verify: npx jest --coverage 2>&1 | grep 'All files' | awk '{print \$4}'
          Iterations: ${{ github.event.inputs.iterations || '10' }}"

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: autoresearch-results
          path: autoresearch-results.tsv

      - name: Create PR with improvements
        if: success()
        run: |
          git diff --quiet || (
            git checkout -b autoresearch/nightly-$(date +%Y%m%d)
            git add -A && git commit -m "feat: autoresearch nightly improvements"
            gh pr create --title "Autoresearch nightly improvements" --body "Automated optimization run"
          )
```

**GitLab CI equivalent:**
```yaml
autoresearch:
  stage: optimize
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
  script:
    - claude --print "/autoresearch Iterations: 10 Goal: Improve coverage Verify: pytest --cov"
  artifacts:
    paths:
      - autoresearch-results.tsv
```

---

## Results Tracking Deep Dive

### Verification command templates by language

| Language | Verify Command | Metric | Direction |
|----------|---------------|--------|-----------|
| **Node.js** | `npx jest --coverage 2>&1 \| grep 'All files' \| awk '{print $4}'` | Coverage % | higher |
| **Python** | `pytest --cov=src --cov-report=term 2>&1 \| grep TOTAL \| awk '{print $4}'` | Coverage % | higher |
| **Rust** | `cargo test 2>&1 \| grep -oP '\d+ passed' \| grep -oP '\d+'` | Tests passed | higher |
| **Go** | `go test -count=1 ./... 2>&1 \| grep -c '^ok'` | Packages OK | higher |
| **Java** | `mvn test 2>&1 \| grep 'Tests run:' \| tail -1 \| grep -oP 'Failures: \d+' \| grep -oP '\d+'` | Failures | lower |
| **Ruby** | `bundle exec rspec 2>&1 \| tail -1 \| grep -oP '\d+ examples'` | Examples | higher |
| **Bundle size** | `npx esbuild src/index.ts --bundle --minify \| wc -c` | Bytes | lower |
| **Lighthouse** | `npx lighthouse http://localhost:3000 --output=json \| jq '.categories.performance.score * 100'` | Score | higher |
| **API latency** | `wrk -t2 -c10 -d10s http://localhost:3000/api 2>&1 \| grep 'Avg Lat' \| awk '{print $2}'` | ms | lower |

Each command outputs a single number. Claude parses this to make keep/discard decisions.

### TSV format specification

Every iteration is appended to the results log in tab-separated format:

```tsv
iteration  commit   metric  delta   guard  status    description
0          a1b2c3d  85.2    0.0     -      baseline  initial state
1          b2c3d4e  87.1    +1.9    pass   keep      add auth edge case tests
2          -        86.5    -0.6    -      discard   refactor helpers (broke 2 tests)
3          c3d4e5f  88.3    +1.2    pass   keep      add error handling tests
```

| Column | Values | Meaning |
|--------|--------|---------|
| `iteration` | integer | Loop counter starting at 0 (baseline) |
| `commit` | short SHA or `-` | Git commit if kept; `-` if discarded |
| `metric` | float | Raw metric value from verify command |
| `delta` | signed float | Change from previous kept iteration |
| `guard` | `pass`, `fail`, `-` | Guard result; `-` if no guard set |
| `status` | `baseline`, `keep`, `discard`, `rework` | Outcome of this iteration |
| `description` | string | One-line summary of what changed |

### How to read results logs

Claude reads the full log at the start of each iteration. It uses the history to avoid repeating failed approaches and to combine successful patterns.

### Progress summaries

Every 10 iterations, Claude prints an inline summary:

```
=== Progress (iteration 20) ===
Baseline: 72.0% → Current: 84.1% (+12.1%)
Keeps: 8 | Discards: 11 | Crashes: 1
Best so far: +3.2% (iteration 14 — add payment edge case tests)
```

### Bounded loop final summary

```
=== Autoresearch Complete (25/25 iterations) ===
Baseline: 72.0% → Final: 89.3% (+17.3%)
Keeps: 12 | Discards: 11 | Crashes: 2
Best iteration: #18 — add tests for payment processing edge cases
```

---

## Performance Tips

### Fast verify commands = more experiments

Every second saved in verification is an extra experiment per minute. Optimize the verify command before launching a long run.

| Slow | Fast |
|------|------|
| Full E2E suite (minutes) | Unit tests only (seconds) |
| Build entire project | Build only changed module |
| Run all coverage | `--testPathPattern` target dir |
| Lighthouse real browser | Lighthouse `--preset desktop` |

### Scope narrowing for focused runs

Narrow scope = faster context load = bolder experiments. Claude reads all in-scope files each iteration — keep scope tight.

```
# Too broad
Scope: src/**/*.ts

# Better for targeted work
Scope: src/api/payments/**/*.ts
```

### When to use bounded vs unbounded

| Scenario | Mode | Iterations |
|----------|------|-----------|
| Overnight deep optimization | Unbounded | unlimited |
| CI/CD gating | Bounded | 5-10 |
| Targeted bug fix | Bounded | 5 |
| Moderate improvement sprint | Bounded | 15-25 |
| Exploratory | Bounded | 15 |
| Deep ML training optimization | Bounded | 50+ |

### Optimal iteration counts by task type

- **Syntax/type errors** — 5-10; usually resolved quickly
- **Test coverage** — 15-25; incremental additions compound
- **Performance/Lighthouse** — 20-50; requires exploration
- **Refactoring** — 15-30; conservative changes add up
- **Content quality** — 10-20; diminishing returns after goal

---

## Core Principles

7 principles extracted from [Karpathy's autoresearch](https://github.com/karpathy/autoresearch), generalized to any domain.

### 1. Constraint = Enabler

Autonomy succeeds through intentional constraint, not despite it. A narrow scope that fits agent context, a fixed iteration budget, and a single mechanical success criterion are what make unsupervised improvement possible.

| Autoresearch | Generalized |
|---|---|
| 630-line training codebase | Bounded scope that fits agent context |
| 5-minute time budget per iteration | Fixed iteration cost |
| One metric (val_bpb) | Single mechanical success criterion |

### 2. Strategy != Tactics

Humans set direction (what to improve). Agents execute iterations (how to improve it). Mixing the two — asking Claude to also choose the goal — produces unfocused loops.

| Strategic (Human) | Tactical (Agent) |
|---|---|
| "Improve page load speed" | "Lazy-load images, code-split routes" |
| "Increase test coverage" | "Add tests for uncovered edge cases" |
| "Reduce API errors" | "Add retry logic, improve validation" |

### 3. Mechanical Metrics Only

The loop breaks if the metric requires judgment. Claude cannot act on "looks cleaner" — it needs a number.

Good: `npm test -- --coverage | grep "All files"` outputs `87.3%`

Bad: "Looks better", "probably improved" — kills the feedback loop.

### 4. Fast Verification

Verification speed determines experiment throughput. A 30-second verify command gives 2 experiments per minute. A 5-second command gives 12.

| Fast (enables iteration) | Slow (kills iteration) |
|---|---|
| Unit tests (seconds) | Full E2E suite (minutes) |
| Type check (seconds) | Manual QA (hours) |
| Lint check (instant) | Code review (async) |

### 5. Iteration Cost Shapes Behavior

Cheap iteration encourages bold exploration and many experiments. Expensive iteration forces Claude to be conservative. Optimize verify speed to unlock better results.

### 6. Git as Memory

Every successful change is committed before verification. Failures are reverted. The git log becomes Claude's research journal — enabling causality tracking, stacking wins, and pattern learning across iterations.

### 7. Honest Limitations

If the agent hits a wall — missing permissions, an external dependency, or a decision requiring human judgment — it says so clearly instead of guessing or silently degrading. Trust the discard mechanism.

---

### Working with noisy metrics

Some metrics fluctuate between runs (benchmark times, Lighthouse scores, ML loss). Configure noise handling:

```
/autoresearch
Goal: Reduce API response time below 100ms
Verify: wrk -t2 -c10 -d10s http://localhost:3000/api 2>&1 | grep 'Avg Lat' | awk '{print $2}'
Noise: high          # run verify 3 times, use median
Min-Delta: 5.0       # only keep if improvement > 5ms
Noise-Runs: 5        # use 5 runs instead of default 3
```

| Metric Type | Noise Level | Recommended Config |
|-------------|-------------|-------------------|
| Test coverage | None | No config needed |
| Bundle size | None | No config needed |
| Benchmark time | Medium | `Noise: high` |
| Lighthouse score | Medium | `Noise: high` + `Noise-Runs: 5` |
| ML training loss | High | `Noise: high` + `Min-Delta: 0.01` + environment pinning |
| API latency | High | `Noise: high` + `Min-Delta: 5.0` + warm-up |

For detailed noise handling strategies (multi-run, confirmation runs, environment pinning), see the [autonomous loop protocol](../skills/autoresearch/references/autonomous-loop-protocol.md#phase-51-noise-handling).

---

## FAQ

**Q: I don't know what metric to use.**
A: Run `/autoresearch:plan` — it analyzes your codebase, suggests metrics, and dry-runs the verify command before you launch.

**Q: Does this work with any project?**
A: Yes. Any language, framework, or domain. If you can measure it with a command, autoresearch can optimize it.

**Q: How do I stop the loop?**
A: `Ctrl+C` or add `Iterations: N` for bounded runs. Claude commits before verifying, so your last successful state is always in git.

**Q: Can I use this for non-code tasks?**
A: Absolutely. Sales emails, marketing copy, HR policies, research papers — anything with a measurable metric works.

**Q: Does /autoresearch:security modify my code?**
A: No. It's read-only by default. Use `--fix` to opt into auto-remediation of confirmed Critical/High findings.

**Q: Can I chain commands?**
A: Yes. Run `debug → fix → ship`, or `plan → loop → ship`, or `scenario → debug → fix`. Each command's output feeds the next.

**Q: What's the difference between Metric and Guard?**
A: Metric = "did we improve?" (the goal). Guard = "did we break anything?" (safety net). If metric improves but guard fails, Claude reworks the change.

**Q: Can I use MCP servers during the loop?**
A: Yes. Any MCP server configured in Claude Code is available — databases, analytics, APIs, etc.

**Q: How many iterations should I run?**
A: 5-10 for targeted fixes. 15-25 for moderate improvements. 50+ for deep optimization. Unlimited for overnight runs.

**Q: Does it work in CI/CD?**
A: Yes. Use `--fail-on` (security command) or bounded iterations with `Iterations: N`. See [CI/CD Integration](#cicd-integration).

**Q: What if Claude makes things worse?**
A: Every change is committed before verification. If worse, it's instantly `git revert`ed. Your codebase is always in a known-good state.

**Q: Can I run it overnight?**
A: Yes — that's the intended use for unbounded mode. Run `/autoresearch` without `Iterations:`, walk away, review results in the morning.

**Q: What languages are supported?**
A: All of them. The loop is language-agnostic. The verify command adapts to your tooling (npm, pytest, cargo, go test, etc.).

**Q: How do I roll back if I don't like the results?**
A: Every kept iteration is a git commit. Use `git log` to review, `git revert` or `git reset` to undo specific iterations. The TSV log tells you which commit corresponds to which result.

**Q: Can I predict token/cost usage before a long run?**
A: Use `/autoresearch:predict --budget 1.00` to get an estimate of findings and cost before committing to a full loop.

**Q: What is adversarial mode?**
A: `/autoresearch:security` uses red-team personas (security adversary, supply chain attacker, insider threat) to stress-test your codebase from an attacker's perspective rather than a reviewer's.

**Q: What's a good budget for a first run?**
A: Start with `Iterations: 10` to calibrate. Review the TSV log. If results look promising, remove the limit or increase to 25-50.

### Can I use `learn` with monorepos?

Yes. The scouting phase detects workspace configs (`package.json` workspaces, `lerna.json`, `pnpm-workspace.yaml`, `Cargo.toml [workspace]`) and notes the structure in the generation context. Use `--scope packages/api/**` to focus on a specific package.

### What's the validation-fix loop?

After generating docs, Claude runs `validate-docs.cjs` (if it exists) to check code references, internal links, and config keys. If any fail, it re-spawns the docs-manager with specific issues to fix — up to 3 retries. This prevents hallucinated code references that plague one-shot doc generators.

---

## Crash Recovery

| Failure Type | Automatic Response |
|---|---|
| Syntax error | Fix immediately — does not count as an iteration |
| Runtime error | Attempt fix (max 3 tries), then move on |
| Resource exhaustion | Revert, try a smaller variant of the same idea |
| Infinite loop / hang | Kill after timeout, revert, log as discard |
| External dependency failure | Skip, log reason, try a different approach |

---

## Troubleshooting

### Common issues

| Problem | Cause | Fix |
|---------|-------|-----|
| "Verify command failed" | Command doesn't work on current codebase | Run it manually first; use `/autoresearch:plan` to validate |
| Metric doesn't improve | Scope too narrow or already near optimal | Expand scope, lower the goal, or try a different approach |
| Guard keeps failing | Changes break existing behavior | Narrow scope further, or add the broken tests to scope |
| Loop seems slow | Verify command takes too long | Optimize verify: use `--quick-eval`, reduce test scope, split suite |
| Wrong metric extracted | Verify output format changed | Check output manually, adjust grep/jq pattern |
| Crashes on first iteration | Missing dependency or bad path | Run verify command in terminal first to confirm it works |

### When stuck — 5+ consecutive discards

If Claude accumulates 5 or more consecutive discards, it automatically escalates strategy:

1. Re-reads ALL in-scope files from scratch
2. Re-reads the original goal statement
3. Reviews the entire results log for patterns across all iterations
4. Tries combining 2-3 successful changes from earlier iterations
5. Tries the OPPOSITE approach of what hasn't worked
6. Tries a radical architectural change (different algorithm, data structure, or abstraction)

If still stuck after escalation, expand scope or adjust the goal — you may have reached a local optimum that requires structural changes outside the current scope boundary.