# Examples by Domain — Real-World Configurations

Copy-paste configurations organized by domain. Every example includes the command, config, and what Claude does. All verification commands are real — paste them into your project and adjust paths as needed.

[Software Engineering](#software-engineering-typescriptjavascript) · [Python & Django](#python--django) · [Go](#go) · [Rust](#rust) · [Sales & Lead Generation](#sales--lead-generation) · [SEO & Content Marketing](#seo--content-marketing) · [Marketing & Growth](#marketing--growth) · [Web Scraping & Data Collection](#web-scraping--data-collection) · [Research & Analysis](#research--analysis) · [DevOps & Infrastructure](#devops--infrastructure) · [Data Science & ML](#data-science--ml) · [Design & Accessibility](#design--accessibility) · [HR & People Operations](#hr--people-operations) · [Operations](#operations) · [Documentation & Knowledge Management](#documentation--knowledge-management) · [MCP Servers](#combining-with-mcp-servers) · [CI/CD Integration](#cicd-integration) · [Verification Scripts](#custom-verification-scripts)

---

## Software Engineering (TypeScript/JavaScript)

### Increase test coverage

```
/autoresearch
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.test.ts, src/**/*.ts
Metric: coverage % (higher is better)
Verify: npm test -- --coverage | grep "All files"
```

Bounded variant — run exactly 20 iterations then stop:

```
/autoresearch
Iterations: 20
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.test.ts, src/**/*.ts
Metric: coverage % (higher is better)
Verify: npm test -- --coverage | grep "All files"
```

Claude adds tests one-by-one. Each iteration: write test → run coverage → keep if % increased → discard if not → repeat.

### Reduce bundle size

```
/autoresearch
Iterations: 15
Goal: Reduce production bundle size
Scope: src/**/*.tsx, src/**/*.ts
Metric: bundle size in KB (lower is better)
Verify: npm run build 2>&1 | grep "First Load JS"
```

Claude tries: tree-shaking unused imports, lazy-loading routes, replacing heavy libraries, code-splitting — one change at a time. 15 iterations is usually enough to find the big wins.

### Fix flaky tests

```
/autoresearch
Iterations: 10
Goal: Zero flaky tests (all tests pass 5 consecutive runs)
Scope: src/**/*.test.ts
Metric: failure count across 5 runs (lower is better)
Verify: for i in {1..5}; do npm test 2>&1; done | grep -c "FAIL"
```

### API performance optimization

```
/autoresearch
Goal: API response time under 100ms (p95)
Scope: src/api/**/*.ts, src/services/**/*.ts
Metric: p95 response time in ms (lower is better)
Verify: npm run bench:api | grep "p95"
Guard: npm test
```

Quick 30-minute sprint variant:

```
/autoresearch
Iterations: 10
Goal: API response time under 100ms (p95)
Scope: src/api/**/*.ts, src/services/**/*.ts
Metric: p95 response time in ms (lower is better)
Verify: npm run bench:api | grep "p95"
```

### Eliminate TypeScript `any` types

```
/autoresearch
Iterations: 25
Goal: Eliminate all TypeScript `any` types
Scope: src/**/*.ts
Metric: count of `any` occurrences (lower is better)
Verify: grep -r ":\s*any" src/ --include="*.ts" | wc -l
```

### Reduce lines of code (refactoring)

```
/autoresearch
Iterations: 20
Goal: Reduce lines of code in src/services/ by 30% while keeping all tests green
Metric: LOC count (lower is better)
Verify: npm test && find src/services -name "*.ts" | xargs wc -l | tail -1
```

### Lighthouse performance score

```
/autoresearch
Goal: Lighthouse performance score 95+
Scope: src/components/**/*.tsx, src/app/**/*.tsx
Metric: Lighthouse performance score (higher is better)
Verify: npx lighthouse http://localhost:3000 --output=json --quiet | jq '.categories.performance.score * 100'
Guard: npx playwright test
```

---

## Python & Django

### Increase pytest coverage

```
/autoresearch
Iterations: 30
Goal: Increase pytest coverage from 68% to 90%
Scope: tests/**/*.py, app/**/*.py
Metric: coverage % (higher is better)
Verify: pytest --cov=app --cov-report=term-missing 2>&1 | grep "TOTAL" | awk '{print $4}'
```

### Reduce Django N+1 queries

```
/autoresearch
Iterations: 15
Goal: Eliminate N+1 queries — reduce total DB queries per request
Scope: app/views/**/*.py, app/models/**/*.py
Metric: total query count per request (lower is better)
Verify: python manage.py test --settings=settings.test 2>&1 | grep "queries" | awk '{print $1}'
Guard: pytest
```

### Fix mypy strict errors

```
/autoresearch:fix --target "mypy app/ --strict"
Guard: pytest
Iterations: 25
```

### FastAPI response time

```
/autoresearch
Iterations: 20
Goal: Reduce p95 response time to under 50ms
Scope: app/routers/**/*.py, app/services/**/*.py
Metric: p95 response time in ms (lower is better)
Verify: python scripts/bench_api.py | grep "p95"
Guard: pytest
```

### Flask security audit

```
/autoresearch:security
Scope: app/**/*.py, config/**/*.py
Focus: SQL injection, CSRF, session management, secret handling
Iterations: 15
```

---

## Go

### Increase test coverage

```
/autoresearch
Iterations: 25
Goal: Increase test coverage to 85%
Scope: **/*.go
Metric: coverage % (higher is better)
Verify: go test ./... -coverprofile=cover.out && go tool cover -func=cover.out | grep "total:" | awk '{print $3}'
```

### Reduce binary size

```
/autoresearch
Iterations: 10
Goal: Reduce compiled binary size
Scope: cmd/**/*.go, internal/**/*.go
Metric: binary size in MB (lower is better)
Verify: go build -o /tmp/bench ./cmd/server && ls -la /tmp/bench | awk '{print $5/1048576}'
Guard: go test ./...
```

### Fix go vet + staticcheck errors

```
/autoresearch:fix --target "go vet ./... && staticcheck ./..."
Guard: go test ./...
Iterations: 15
```

### Benchmark optimization

```
/autoresearch
Iterations: 20
Goal: Improve hot-path benchmark by 2x
Scope: internal/parser/**/*.go
Metric: ns/op from benchmark (lower is better)
Verify: go test -bench=BenchmarkParse -benchmem ./internal/parser/ | grep "BenchmarkParse" | awk '{print $3}'
Guard: go test ./...
```

---

## Rust

### Increase test coverage

```
/autoresearch
Iterations: 20
Goal: Increase test coverage to 80%
Scope: src/**/*.rs
Metric: coverage % (higher is better)
Verify: cargo tarpaulin --out Stdout 2>&1 | grep "coverage" | awk '{print $2}'
```

### Reduce compile time

```
/autoresearch
Iterations: 15
Goal: Reduce incremental compile time
Scope: src/**/*.rs, Cargo.toml
Metric: compile time in seconds (lower is better)
Verify: cargo build --timings 2>&1 | grep "Finished" | awk '{print $2}'
Guard: cargo test
```

### Fix clippy warnings

```
/autoresearch:fix --target "cargo clippy -- -D warnings"
Guard: cargo test
Iterations: 20
```

### Criterion benchmark optimization

```
/autoresearch
Iterations: 25
Goal: Reduce p95 request handling time
Scope: src/handlers/**/*.rs
Metric: ns/iter from criterion (lower is better)
Verify: cargo bench -- --output-format bencher 2>&1 | grep "bench:" | awk '{print $5}'
Guard: cargo test
```

---

## Sales & Lead Generation

### Cold email optimization

```
/autoresearch
Iterations: 15
Goal: Improve cold email reply rate prediction score
Scope: content/email-templates/*.md
Metric: readability score + personalization token count (higher is better)
Verify: node scripts/score-email-template.js
```

Claude iterates on subject lines, opening hooks, CTAs, personalization variables — keeping changes that score higher.

### Sales deck refinement

```
/autoresearch
Iterations: 10
Goal: Reduce slide count while maintaining all key points
Scope: content/sales-deck/*.md
Metric: slide count (lower is better), constraint: key-points-checklist.md must all be present
Verify: node scripts/check-deck-coverage.js && wc -l content/sales-deck/*.md
```

### Objection handling docs

```
/autoresearch
Iterations: 20
Goal: Cover all 20 common objections with responses under 50 words each
Scope: content/objection-responses.md
Metric: objections covered + avg word count per response (more covered + fewer words = better)
Verify: node scripts/score-objections.js
```

### Lead magnet optimization

```
/autoresearch
Iterations: 20
Goal: Improve lead magnet download page conversion score
Scope: content/lead-magnets/**/*.md, content/landing-pages/lead-magnet.md
Metric: conversion checklist score (higher is better)
Verify: node scripts/lead-magnet-score.js
```

Claude iterates on headline, value proposition, form fields, social proof, urgency elements — one change per iteration.

### LinkedIn outreach sequences

```
/autoresearch
Iterations: 25
Goal: Improve LinkedIn outreach sequence — personalization, hook quality, CTA clarity
Scope: content/outreach/linkedin-sequence/*.md
Metric: sequence quality score (higher is better)
Verify: node scripts/outreach-scorer.js --platform linkedin
```

### Lead scoring model refinement

```
/autoresearch
Iterations: 15
Goal: Improve lead scoring accuracy — reduce false positive rate
Scope: scripts/lead-scoring/*.py
Metric: false positive rate (lower is better)
Verify: python scripts/evaluate-lead-scoring.py | grep "false_positive_rate"
Guard: python -m pytest tests/scoring/
```

### Ship a sales proposal

```
/autoresearch:ship --type sales
Target: proposals/enterprise-q1.md
```

Checklist: prospect name correct, pricing current, CTA clear, case studies current, branding consistent.

### Generate sales scenarios

```
/autoresearch:scenario --domain business --depth deep
Scenario: Enterprise customer evaluates our SaaS during procurement with 5 stakeholders
Iterations: 30
```

---

## SEO & Content Marketing

### Blog SEO score optimization

```
/autoresearch
Goal: Maximize SEO score for target keywords
Scope: content/blog/*.md
Metric: SEO score from audit tool (higher is better)
Verify: node scripts/seo-score.js --file content/blog/target-post.md
```

Claude tweaks headings, keyword density, meta descriptions, internal links — one change per iteration. Run unlimited overnight, or bounded:

```
/autoresearch
Iterations: 25
Goal: Maximize SEO score for target keywords
Scope: content/blog/*.md
Metric: SEO score from audit tool (higher is better)
Verify: node scripts/seo-score.js --file content/blog/target-post.md
```

### Content depth score

```
/autoresearch
Iterations: 15
Goal: Maximize Flesch readability + keyword density for "AI automation"
Scope: content/landing-pages/ai-automation.md
Metric: readability_score * 0.7 + keyword_density_score * 0.3 (higher is better)
Verify: node scripts/content-score.js content/landing-pages/ai-automation.md
```

### Meta descriptions batch

```
/autoresearch
Iterations: 20
Goal: Ensure all blog posts have meta descriptions under 160 chars with target keyword
Scope: content/blog/*.md
Metric: posts meeting criteria (higher is better)
Verify: node scripts/meta-description-audit.js
```

### Ship blog content

```
/autoresearch:ship
Target: content/blog/my-new-post.md
Type: content
```

### Content scenarios

```
/autoresearch:scenario --domain product --format use-cases --depth deep
Scenario: Researcher evaluates autonomous iteration techniques across ML, DevOps, and content
Iterations: 30
```

---

## Marketing & Growth

### Email campaign click rate

```
/autoresearch
Iterations: 20
Goal: Optimize 7-day nurture sequence for clarity and CTA strength
Scope: content/email-sequences/onboarding/*.md
Metric: avg readability + CTA score per email (higher is better)
Verify: node scripts/score-email-sequence.js onboarding
```

### Landing page conversion optimization

```
/autoresearch
Iterations: 15
Goal: Maximize landing page quality score — clear CTA, social proof, urgency, mobile-friendly
Scope: content/landing-pages/product-launch.md
Metric: CRO checklist score (higher is better)
Verify: node scripts/cro-score.js content/landing-pages/product-launch.md
```

### Ad copy variants

```
/autoresearch
Iterations: 25
Goal: Generate and refine 20 ad copy variants, each under 90 chars with power words
Scope: content/ads/facebook-q1.md
Metric: variants meeting criteria (higher is better)
Verify: node scripts/validate-ad-copy.js
```

### Google Ads headlines

```
/autoresearch
Iterations: 30
Goal: Generate 50 ad headline variants (max 30 chars) with power words + CTA
Scope: content/ads/google-search/*.md
Metric: headlines meeting char limit + power word + CTA criteria (higher is better)
Verify: node scripts/google-ads-validator.js --type headlines
```

### Ship email campaign

```
/autoresearch:ship
Target: content/emails/product-launch-campaign.md
Type: content
```

### Ship campaign assets

```
/autoresearch:ship --type sales
Target: content/campaigns/q1-growth-push/
```

---

## Web Scraping & Data Collection

### Improve scraper success rate

```
/autoresearch
Iterations: 25
Goal: Increase scraper success rate from 85% to 99%
Scope: scrapers/**/*.py
Metric: success rate % (higher is better)
Verify: python scripts/scraper-test.py --sample 100 | grep "success_rate"
Guard: python -m pytest tests/scrapers/
```

Claude iterates on retry logic, selector resilience, timeout handling, rate limiting — one improvement per iteration.

### Reduce scraping time per page

```
/autoresearch
Iterations: 20
Goal: Reduce average scrape time from 3s to under 1s per page
Scope: scrapers/**/*.py
Metric: avg time per page in seconds (lower is better)
Verify: python scripts/scraper-bench.py | grep "avg_time"
Guard: python -m pytest tests/scrapers/
```

### Debug scraper failures

```
/autoresearch:debug
Scope: scrapers/**/*.py
Symptom: Scraper fails on paginated results after page 5 with 403 errors
Iterations: 10
```

### Scraping edge cases exploration

```
/autoresearch:scenario --domain software --focus edge-cases
Scenario: Web scraper encounters anti-bot measures, dynamic content, and rate limiting
Iterations: 25
```

Explores: CAPTCHAs, IP blocking, JavaScript rendering, infinite scroll, login walls, A/B test variants, geo-blocking, cookie consent popups.

### Improve data extraction accuracy

```
/autoresearch
Iterations: 20
Goal: Increase structured data extraction accuracy to 98%
Scope: scrapers/extractors/**/*.py
Metric: extraction accuracy % (higher is better)
Verify: python scripts/extraction-accuracy.py --ground-truth fixtures/expected.json | grep "accuracy"
Guard: python -m pytest tests/extractors/
```

---

## Research & Analysis

### Research paper readability

```
/autoresearch
Iterations: 20
Goal: Improve research paper Flesch readability score to 60+
Scope: papers/draft/**/*.md
Metric: Flesch readability score (higher is better)
Verify: python scripts/readability.py papers/draft/ | grep "flesch_score"
```

### Ship a research paper

```
/autoresearch:ship --type research
Target: papers/final/autonomous-iteration-patterns.pdf
```

Checklist: abstract present, citations formatted, data sources linked, methodology complete, figures labeled, conclusion addresses hypothesis, acknowledgments included.

### Literature review structure (PRISMA)

```
/autoresearch
Iterations: 15
Goal: Ensure all literature review sections follow PRISMA checklist
Scope: papers/lit-review/**/*.md
Metric: PRISMA checklist compliance % (higher is better)
Verify: python scripts/prisma-check.py | grep "compliance"
```

### Data analysis report quality

```
/autoresearch
Iterations: 20
Goal: Ensure all analysis reports have methodology, data sources, visualizations, and conclusions
Scope: reports/analysis/**/*.md
Metric: report completeness score (higher is better)
Verify: python scripts/report-audit.py | grep "completeness"
```

### Research scenario exploration

```
/autoresearch:scenario --domain product --format use-cases --depth deep
Scenario: Researcher evaluates autonomous iteration techniques across ML, DevOps, and content
Iterations: 30
```

---

## DevOps & Infrastructure

### Reduce Docker image size

```
/autoresearch
Iterations: 10
Goal: Reduce Docker image size and build time
Scope: Dockerfile, .dockerignore
Metric: image size in MB (lower is better)
Verify: docker build -t bench . 2>&1 && docker images bench --format "{{.Size}}"
```

### Optimize Docker build time

```
/autoresearch
Goal: Reduce Docker build time from 180s to under 60s
Scope: Dockerfile, .dockerignore
Verify: docker build --no-cache . 2>&1 | tail -1 | grep -oP '[\d.]+'
Iterations: 10
```

Claude targets one optimization per iteration: layer ordering, multi-stage builds, .dockerignore rules, apt-get cleanup, build argument caching.

### Kubernetes deployment optimization

```
/autoresearch
Goal: Reduce pod startup time from 45s to under 15s
Scope: k8s/deployment.yaml, k8s/service.yaml, k8s/configmap.yaml
Verify: kubectl rollout status deployment/app --timeout=60s 2>&1 | grep -oP '\d+(?=s)'
Guard: kubectl get pods | grep -c 'Running'
Iterations: 10
```

Changes spanning `deployment.yaml` + `service.yaml` + `configmap.yaml` are ONE atomic change when they serve the same purpose (e.g., "add resource limits" touches deployment + configmap).

### Optimize CI/CD pipeline duration

```
/autoresearch
Goal: Reduce CI/CD pipeline from 12 minutes to under 5 minutes
Scope: .github/workflows/*.yml, Dockerfile, docker-compose.yml
Verify: gh run list --limit 1 --json durationMs --jq '.[0].durationMs / 60000'
Guard: docker compose up -d && sleep 5 && curl -sf http://localhost:3000/health
Iterations: 15
```

Multi-file changes are common in DevOps. The rule: **same intent = one change**, even across files.

**Example iterations:**
```bash
# Iteration 1: Enable Docker layer caching (Dockerfile + CI workflow — ONE intent)
# Files: Dockerfile, .github/workflows/ci.yml
git add Dockerfile .github/workflows/ci.yml
git commit -m "experiment(ci): enable Docker layer caching in build step"
# Verify: pipeline time dropped from 12min → 9min ✓ KEEP

# Iteration 2: Parallelize test matrix (CI workflow only)
# Files: .github/workflows/ci.yml
git add .github/workflows/ci.yml
git commit -m "experiment(ci): split tests into 3 parallel matrix jobs"
# Verify: 9min → 6min ✓ KEEP

# Iteration 3: Switch to slim base image (Dockerfile + compose — ONE intent)
# Files: Dockerfile, docker-compose.yml
git add Dockerfile docker-compose.yml
git commit -m "experiment(ci): switch node:20 to node:20-slim base image"
# Verify: 6min → 5.2min ✓ KEEP
```

| One Change (OK) | Two Changes (Split Into Separate Iterations) |
|-----------------|---------------------------------------------|
| Change port in Dockerfile + compose + nginx | Change port AND add new service |
| Update Node version in Dockerfile + CI + package.json | Update Node AND switch package manager |
| Add caching in CI workflow + Dockerfile | Add caching AND parallelize tests |

### CI/CD pipeline speed

```
/autoresearch
Iterations: 15
Goal: Reduce CI pipeline duration from 12min to under 5min
Scope: .github/workflows/*.yml
Metric: pipeline duration in seconds (lower is better)
Verify: node scripts/estimate-ci-time.js
```

### Fix CI/CD failures

```
/autoresearch:fix
Target: gh run view --log-failed
Scope: .github/workflows/*.yml
```

### Terraform/IaC security compliance

```
/autoresearch
Iterations: 20
Goal: Pass all tfsec security checks + reduce resource count
Scope: infra/*.tf
Metric: tfsec violations (lower is better)
Verify: tfsec . --format json | jq '.results | length'
```

### Infrastructure security audit

```
/autoresearch:security
Scope: infra/*.tf, .github/workflows/*.yml, Dockerfile
Focus: exposed secrets, container privileges, network policies
Iterations: 15
```

### Ship a deployment

```
/autoresearch:ship --type deployment --monitor 10
```

Runs readiness checklist, deploys, then monitors for 10 minutes. Triggers auto-rollback on error spike.

### CLI invocation for DevOps pipelines

Invoke autoresearch from the command line for DevOps workflows:

```bash
# Interactive mode — Claude guides the optimization
claude "/autoresearch
Goal: Reduce CI/CD pipeline from 12min to 5min
Scope: .github/workflows/*.yml, Dockerfile, docker-compose.yml
Verify: gh run list --limit 1 --json durationMs --jq '.[0].durationMs / 60000'
Iterations: 15"

# Non-interactive (CI/CD mode) — runs headless
claude --print "/autoresearch
Goal: Reduce Docker build time
Scope: Dockerfile
Verify: docker build . 2>&1 | grep -oP 'total [\d.]+s' | grep -oP '[\d.]+'
Iterations: 10"

# With guard to prevent breaking deployments
claude "/autoresearch
Goal: Optimize Kubernetes resource usage
Scope: k8s/*.yaml
Verify: kubectl top pods -n prod --no-headers | awk '{sum+=\$3} END {print sum}'
Guard: kubectl rollout status deployment/app -n prod --timeout=60s
Iterations: 10"
```

### Error handling for DevOps experiments

DevOps changes can fail in ways code changes don't — broken deploys, unreachable services, resource exhaustion:

```bash
# Verify with timeout (prevent hanging on stuck deployments)
timeout 120 kubectl rollout status deployment/app --timeout=90s 2>&1 \
  | grep -oP '\d+(?=s)' || echo "999"
# → Returns 999 on timeout, triggering discard

# Health check with retry (services may take time to start)
verify_with_retry() {
  for i in 1 2 3; do
    if curl -sf http://localhost:3000/health > /dev/null 2>&1; then
      return 0
    fi
    sleep 5
  done
  return 1  # Failed after 3 retries
}

# Guard: ensure deployment didn't break production
guard_production() {
  # Check pod status
  kubectl get pods -n prod | grep -v Running | grep -v Completed | grep -c . && return 1
  # Check endpoint health
  curl -sf "https://api.example.com/health" > /dev/null || return 1
  return 0
}
```

**Common DevOps failure patterns and recovery:**

| Failure | Detection | Recovery |
|---------|-----------|----------|
| Deploy timeout | `kubectl rollout status` exits non-zero | `safe_revert()` restores previous YAML |
| OOM killed | Pod status = OOMKilled | Revert resource change, try smaller increment |
| Health check fails | `curl -f` returns non-zero | Rollback deploy: `kubectl rollout undo` |
| Build cache miss | Build time spikes | Revert Dockerfile change, try different layer strategy |
| Port conflict | Container fails to start | Revert port change in compose + app config |

### Defining metrics for complex pipeline changes

DevOps metrics often require composite measurement:

```bash
# Pipeline duration (minutes)
gh run list --limit 1 --json durationMs --jq '.[0].durationMs / 60000'

# Docker image size (MB)
docker images myapp:latest --format '{{.Size}}' | grep -oP '[\d.]+'

# Deployment rollout time (seconds)
kubectl rollout status deployment/app 2>&1 | grep -oP '\d+(?= seconds)'

# Resource utilization (average CPU across all pods)
kubectl top pods --no-headers | awk '{sum+=$3} END {print sum/NR}'

# Cost estimation (compute-hours)
kubectl get pods -o json | jq '[.items[].spec.containers[].resources.requests.cpu // "100m"] | map(rtrimstr("m") | tonumber) | add / 1000'
```

### Rollback in production environments

```bash
# Autoresearch automatically handles rollback via safe_revert():
# 1. Code change is committed BEFORE verification (Phase 4)
# 2. If verification fails, git revert restores previous state (Phase 6)
# 3. For Kubernetes, add explicit rollback as part of safe_revert:

# In your Guard command, combine code check + deploy check:
Guard: kubectl rollout status deployment/app --timeout=60s && curl -sf http://api.example.com/health

# If guard fails:
# 1. safe_revert() reverts the git commit (YAML files restored)
# 2. kubectl apply -f k8s/ re-applies the reverted config
# 3. kubectl rollout status confirms rollback succeeded

# For critical production systems, add a pre-deploy snapshot:
# Verify: kubectl apply -f k8s/ && kubectl rollout status deployment/app && curl -sf http://api.example.com/health | jq '.responseTime'
```

---

## Data Science & ML

### Python ML training loss

```
/autoresearch
Goal: Reduce validation loss (val_bpb)
Scope: train.py, model.py
Metric: val_bpb (lower is better)
Verify: uv run train.py --epochs 1 2>&1 | grep "val_bpb" | tail -1 | awk '{print $NF}'
```

### Optimize ML model accuracy

```
/autoresearch
Goal: Improve classification accuracy from 85% to 95%
Scope: model.py, config.yaml, data/augmentation.py
Verify: python train.py --eval-only 2>&1 | grep 'val_accuracy' | awk '{print $NF}'
Guard: python -m pytest tests/test_model.py -q
Noise: high
Min-Delta: 0.5
Iterations: 25
```

The agent targets one hyperparameter or architectural change per iteration: learning rate, batch size, layer count, dropout rate, optimizer, augmentation strategy. Each experiment is committed before verification, enabling git-based rollback if accuracy drops.

**Example iterations:**
```bash
# Iteration 1: Increase learning rate
# model.py: lr = 0.001 → lr = 0.01
git commit -m "experiment(model): increase learning rate from 0.001 to 0.01"
# Verify: accuracy = 87.2% (+2.2%) → KEEP

# Iteration 2: Add data augmentation
# data/augmentation.py: add random flip + rotation
git commit -m "experiment(data): add random flip and rotation augmentation"
# Verify: accuracy = 89.1% (+1.9%) → KEEP

# Iteration 3: Try larger batch size
# config.yaml: batch_size = 32 → 128
git commit -m "experiment(config): increase batch size from 32 to 128"
# Verify: accuracy = 88.5% (-0.6%) → DISCARD (reverted)
```

### SQL query optimization

```
/autoresearch
Iterations: 15
Goal: Reduce total query execution time for dashboard queries
Scope: queries/dashboard/*.sql
Metric: total execution time in ms (lower is better)
Verify: psql -f scripts/bench-queries.sql | grep "total_ms"
```

### Data pipeline quality

```
/autoresearch
Iterations: 20
Goal: Increase data validation pass rate from 85% to 99%
Scope: scripts/validators/*.py
Metric: validation pass rate % (higher is better)
Verify: python scripts/run_validations.py | grep "pass_rate"
```

---

## Design & Accessibility

### WCAG 2.1 AA compliance

```
/autoresearch
Iterations: 25
Goal: Reach WCAG 2.1 AA compliance — zero axe violations
Scope: src/components/**/*.tsx
Metric: axe violation count (lower is better)
Verify: npx playwright test a11y.spec.ts | grep "violations"
```

### Color contrast and design tokens

```
/autoresearch
Iterations: 20
Goal: Replace all hardcoded colors/spacing with design tokens
Scope: src/**/*.tsx, src/**/*.css
Metric: hardcoded values count (lower is better)
Verify: grep -rE "#[0-9a-fA-F]{3,6}|px\b" src/ --include="*.tsx" --include="*.css" | wc -l
```

### Ship design assets

```
/autoresearch:ship
Target: design/tokens/v2/
Type: content
```

---

## HR & People Operations

### Job description clarity

```
/autoresearch
Iterations: 15
Goal: Improve job descriptions — bias-free language, clear requirements, inclusive tone
Scope: content/job-descriptions/*.md
Metric: inclusivity score from textio-style checker (higher is better)
Verify: node scripts/jd-inclusivity-score.js
```

### Onboarding docs readability

```
/autoresearch
Iterations: 10
Goal: Reduce average reading level of HR policies to grade 8
Scope: content/policies/*.md
Metric: Flesch-Kincaid grade level (lower is better)
Verify: node scripts/readability.js content/policies/
```

### Hiring scenarios

```
/autoresearch:scenario --domain business --depth deep
Scenario: Candidate moves through interview process from application to offer
Iterations: 30
```

### Interview question bank

```
/autoresearch
Iterations: 20
Goal: Ensure all questions are behavioral (STAR format) + cover all competencies
Scope: content/interview-questions.md
Metric: STAR-format compliance % + competency coverage % (higher is better)
Verify: node scripts/interview-quality.js
```

---

## Operations

### Runbook accuracy and brevity

```
/autoresearch
Iterations: 15
Goal: Reduce average runbook steps while maintaining completeness
Scope: docs/runbooks/*.md
Metric: avg steps per runbook (lower is better), constraint: all checklist items preserved
Verify: node scripts/runbook-audit.js
```

### SLA compliance documentation

```
/autoresearch
Iterations: 10
Goal: Standardize all SOPs to template format with <100 words per step
Scope: docs/sops/*.md
Metric: template compliance % + avg words per step (higher compliance + lower words = better)
Verify: node scripts/sop-score.js
```

### Incident response playbooks

```
/autoresearch
Iterations: 20
Goal: Ensure all playbooks have decision trees, escalation paths, rollback steps
Scope: docs/incident-playbooks/*.md
Metric: completeness checklist score (higher is better)
Verify: node scripts/playbook-completeness.js
```

---

## Documentation & Knowledge Management

### Generate docs for an unknown codebase

```
/autoresearch:learn --mode init --depth deep
Scope: src/**
```

Claude scouts the codebase, detects project type (web app, library, CLI, API), generates all relevant docs (architecture, code standards, overview, summary), validates references, and iteratively fixes any hallucinated code refs. Creates deployment-guide.md only if Dockerfile/CI config detected.

### Update docs after a major refactor

```
/autoresearch:learn --mode update
Iterations: 3
```

Uses git-diff scoping to prioritize changed areas. Reads existing docs in parallel, updates all `docs/*.md` dynamically (no hardcoded list — catches custom docs too). Validation-fix loop ensures updated refs are valid.

### Check documentation health

```
/autoresearch:learn --mode check
```

Read-only diagnostic: staleness gap (days between last code commit vs last docs commit), validation warnings, file inventory with LOC, coverage assessment. No files modified.

---

## Combining with MCP Servers

Claude Code supports MCP (Model Context Protocol) servers. When combined with autoresearch, this enables real-time data-driven iteration loops.

### Database-aware query optimization

Use a PostgreSQL MCP server to iterate on real query performance — no mock data:

```
/autoresearch
Goal: Optimize slow dashboard queries — reduce p95 query time
Scope: queries/dashboard/*.sql
Metric: avg query time in ms (lower is better)
Verify: Use MCP postgres tool to run EXPLAIN ANALYZE on each query, sum total costs
```

Claude modifies queries, runs EXPLAIN ANALYZE via MCP on the live database, keeps improvements. Each iteration tests on real data, not synthetic benchmarks.

### Analytics-driven content optimization

Use a Google Analytics or Plausible MCP server:

```
/autoresearch
Goal: Improve blog post structure based on engagement metrics
Scope: content/blog/*.md
Metric: avg time on page for modified posts (higher is better)
Verify: Use MCP analytics tool to fetch page metrics, compare against baseline
```

### CRM-driven email template refinement

Use a HubSpot or Salesforce MCP server:

```
/autoresearch
Goal: Optimize email templates based on actual open/reply rates
Scope: content/email-templates/*.md
Metric: avg open rate from CRM data (higher is better)
Verify: Use MCP CRM tool to pull latest campaign metrics for template variants
```

### Recommended MCP servers by use case

| MCP Server | Use Case | Metric Source |
|---|---|---|
| **PostgreSQL** | Query optimization, data validation | Query execution time, row counts |
| **GitHub** | Issue triage, PR quality, CI status | Issue counts, check pass rates |
| **Puppeteer/Playwright** | Visual regression, performance | Lighthouse scores, screenshot diffs |
| **Sentry** | Error reduction | Error count, crash-free rate |
| **Cloudflare** | Edge performance | Cache hit rate, TTFB |
| **Stripe** | Payment flow optimization | Checkout completion rates |
| **Slack** | Notification quality, alert tuning | Message delivery, response times |

---

## CI/CD Integration

Run autoresearch autonomously in GitHub Actions pipelines.

### Security audit on pull requests

```yaml
# .github/workflows/security-audit.yml
name: Security Audit
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2am

jobs:
  security-audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Security Audit
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            claude -p "/autoresearch:security --diff --fail-on critical --iterations 5"
          else
            claude -p "/autoresearch:security --fail-on high --iterations 15"
          fi
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: security/
```

### Coverage enforcement on main

```yaml
# .github/workflows/coverage-gate.yml
name: Coverage Gate
on:
  push:
    branches: [main]

jobs:
  coverage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: Run autoresearch coverage sprint
        run: claude -p "/autoresearch --iterations 10 --goal 'Keep coverage above 85%' --verify 'npm test -- --coverage | grep All files' --fail-below 85"
```

### Nightly improvement loop

```yaml
# .github/workflows/nightly-optimize.yml
name: Nightly Optimization
on:
  schedule:
    - cron: '0 3 * * *'  # 3am daily

jobs:
  optimize:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - name: Run overnight loop
        run: |
          claude -p "/autoresearch
          Iterations: 50
          Goal: Improve test coverage and reduce bundle size
          Scope: src/**/*.ts
          Verify: npm test -- --coverage | grep 'All files'
          Guard: npm run build"
      - name: Create PR with improvements
        run: claude -p "/autoresearch:ship --type code-pr --auto"
```

---

## Custom Verification Scripts

The loop works best when verification is fast and mechanical. Scripts must output a parseable number and exit cleanly.

### JavaScript template

```javascript
// scripts/score-example.js — Template for custom scoring
const fs = require('fs');
const file = process.argv[2];
const content = fs.readFileSync(file, 'utf-8');

// Your scoring logic here
const score = content.split('\n').filter(l => l.startsWith('- ')).length;

// Output MUST be a single number on its own line for easy parsing
console.log(`SCORE: ${score}`);
process.exit(score > 0 ? 0 : 1);
```

### Python template

```python
#!/usr/bin/env python3
# scripts/verify-coverage.py
import subprocess, re, sys

result = subprocess.run(
    ["npm", "test", "--", "--coverage"],
    capture_output=True, text=True
)

match = re.search(r'All files\s*\|\s*([\d.]+)', result.stdout)
if match:
    print(f"coverage: {match.group(1)}")
    sys.exit(0)
else:
    print("coverage: 0")
    sys.exit(1)
```

Use it:

```
/autoresearch
Verify: python scripts/verify-coverage.py | grep "coverage"
```

### LLM-based content quality scorer

Use a fast, cheap model (Haiku) to score content:

```javascript
// scripts/content-quality-scorer.js
const Anthropic = require('@anthropic-ai/sdk');
const fs = require('fs');

const content = fs.readFileSync(process.argv[2], 'utf-8');
const client = new Anthropic();

async function score() {
  const msg = await client.messages.create({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 100,
    messages: [{
      role: 'user',
      content: `Score this content 0-100 for clarity, engagement, and SEO. Return ONLY a number.\n\n${content}`
    }]
  });
  const score = parseInt(msg.content[0].text.trim());
  console.log(`SCORE: ${score}`);
  process.exit(0);
}
score();
```

```
/autoresearch
Goal: All blog posts score 80+ on AI-assessed quality
Scope: content/blog/*.md
Metric: quality score from Haiku (higher is better)
Verify: node scripts/content-quality-scorer.js content/blog/latest.md
```

### Rules for good verification scripts

| Rule | Why |
|------|-----|
| Runs in under 10 seconds | Fast = more iterations = more experiments |
| Outputs a single parseable number | Claude needs to extract the metric mechanically |
| Exit code 0 = success, non-zero = crash | Clean pass/fail signal |
| No human judgment required | Agent must decide autonomously |
| Deterministic (same input = same output) | Non-deterministic metrics break the feedback loop |

---

## Domain Adaptation Reference

| Domain | Metric | Scope | Verify Command | Guard |
|--------|--------|-------|----------------|-------|
| Node.js/TS backend | Coverage % | `src/**/*.ts` | `npm test -- --coverage` | — |
| Python backend | pytest coverage % | `app/**/*.py` | `pytest --cov=app` | `mypy app/` |
| Go backend | Test coverage % | `**/*.go` | `go test ./... -cover` | `go vet ./...` |
| Rust backend | Test coverage % | `src/**/*.rs` | `cargo tarpaulin` | `cargo clippy` |
| Frontend UI | Lighthouse score | `src/components/**` | `npx lighthouse` | `npm test` |
| ML training | val_bpb / loss | `train.py` | `uv run train.py` | — |
| Blog/content | Readability score | `content/*.md` | Custom script | — |
| Performance | Benchmark time (ms) | Target files | `npm run bench` | `npm test` |
| Web scraping | Success rate % | `scrapers/**/*.py` | Custom test script | `pytest tests/scrapers/` |
| Security | OWASP + STRIDE | API/auth/middleware | `/autoresearch:security` | — |

---

*Related: [chains-and-combinations.md](./chains-and-combinations.md) · [getting-started.md](./getting-started.md) · [autoresearch.md](./autoresearch.md)*