---
name: auditing-bdd-tests
description: Analyzes BDD (Gherkin) + Playwright test solutions for spec quality, flake resistance, semantic/a11y locators, and AI-agent operability. Produces aspect scoring, grade, issues by severity, and improvement roadmap.
user-invocable: true
argument-hint: [path-to-repo]
---

# BDD Test Solution Audit

Goal: evaluate **specification executability**, **flake resistance**, **maintainability**, **semantic/a11y quality**, and **AI-agent operability**.

## Adaptive Workflow

Workflow adapts based on repository size (auto-detected).

```
┌─────────────────────────────────────────────────────────────────┐
│ 1. DISCOVER → 2. ANALYZE → 3. SCORE → 4. REPORT → 5. ROADMAP   │
└─────────────────────────────────────────────────────────────────┘
     ↑                                                      │
     └──────────── Skip steps for small repos ──────────────┘
```

| Repo Size | Steps | Sampling | Questions |
|-----------|-------|----------|-----------|
| Small (≤20 scenarios) | 1→3→4 | None | 1 question |
| Medium (21–100) | 1→2→3→4→5 | 30–50% | 2 questions |
| Large (100+) | Full | Stratified | 3 questions |

---

## Step 1: Discovery & Auto-Inference

Target: `{argument OR cwd}`

**Auto-detect (no user input needed):**

| What | How to Detect |
|------|---------------|
| Stack | `playwright.config.*` → Playwright; `playwright-bdd` in package.json → playwright-bdd |
| Size | Count `*.feature` files and `Scenario:` lines |
| History | Check `.bddready/history/index.json` exists |
| CI | Check `.github/workflows/`, `Jenkinsfile`, `.gitlab-ci.yml` |
| Artifacts | Check `playwright.config.*` for trace/video/screenshot settings |

**Output immediately:**
```
Target: {path}
Stack: {stack} (auto-detected)
Size: {small/medium/large} ({N} features, {M} scenarios)
History: {yes/no} | CI: {yes/no} | Artifacts: {configured/missing}
```

See `modules/discovery.md` for detailed detection rules.

---

## Step 2: Sampling (Medium/Large repos only)

Skip for small repos — analyze all scenarios.

For medium/large repos, use stratified sampling. See `modules/sampling.md`.

---

## Progress Indicator (Medium/Large repos)

For repositories with 50+ scenarios, show progress during analysis:

```
Analyzing BDD Test Solution...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0%

[■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
✓ Discovery complete (playwright-bdd detected)
→ Analyzing features/auth/*.feature (8 scenarios)

[■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░░░░░░░░░░░] 50%
→ Analyzing features/checkout/*.feature (12 scenarios)

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■░░░░░░░░░░] 75%
→ Scoring aspects...

[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100%
✓ Analysis complete
```

**Progress stages:**
1. Discovery (10%)
2. Feature file analysis (10-70%, proportional to file count)
3. Step definition analysis (70-85%)
4. Scoring (85-95%)
5. Report generation (95-100%)

Update progress after each feature file or major step.

---

## Step 3: Score Aspects

Score each aspect using rubrics from `criteria/aspects.md`.

**Aspects and weights:**

| # | Aspect | Weight |
|---|--------|--------|
| 1 | Executable Gherkin | 16% |
| 2 | Step Definitions Quality | 14% |
| 3 | Test Architecture | 14% |
| 4 | Selector Strategy | 12% |
| 5 | Waiting & Flake Resistance | 14% |
| 6 | Data & Environment | 10% |
| 7 | CI, Reporting & Artifacts | 10% |
| 8 | AI-Agent Operability | 10% |

**Scoring:** 0 (bad) / 5 (partial) / 10 (good) per criterion.

See `modules/scoring.md` for calculation formulas.

---

## Step 4: Report

### 4.1 Terminal Output (Always)

Print ASCII dashboard with scores and issues. See `modules/output-formats.md`.

### 4.2 Issues by Severity

Classify using `reference/severity.md`:
- 🔴 **CRITICAL** — blocks reliable execution
- 🟡 **WARNING** — hinders speed/maintainability  
- 🔵 **INFO** — optimizations

**Every issue MUST have:**
- Evidence (file path, pattern, or code snippet)
- Impact (why it matters)
- Effort estimate (Low/Medium/High)

### 4.3 Save Reports

Save to `.bddready/history/reports/`:
- `{REPORT_ID}.json` — machine-readable
- `{REPORT_ID}.md` — human-readable

Update `.bddready/history/index.json` for delta tracking.

### 4.4 HTML Report (Offer to User)

After showing terminal output, ask:

> Would you like me to generate an interactive HTML report?

If yes, run:
```bash
node scripts/render-html.mjs .bddready/history/reports/{REPORT_ID}.json .bddready/history/reports/{REPORT_ID}.html
```

---

## Interactive Fix Mode

After showing issues, offer to fix quick wins immediately.

### Trigger Conditions

Offer interactive fixes when:
- At least 1 CRITICAL issue with `Effort: Low`
- Issue has clear, automatable fix pattern

### Flow

```
╔══════════════════════════════════════════════════════════════════╗
║                     QUICK FIX AVAILABLE                          ║
╠══════════════════════════════════════════════════════════════════╣
║  [C1] Flake Resistance: Found 7 arbitrary sleeps                 ║
║       Fix: Replace `wait X seconds` with condition waits         ║
║       Effort: Low | Files: 3                                     ║
║                                                                  ║
║  → Fix C1 now? [y/n/skip all]                                    ║
╚══════════════════════════════════════════════════════════════════╝
```

### Response Handling

| Response | Action |
|----------|--------|
| `y` / `yes` | Apply fix, show diff, continue to next fixable issue |
| `n` / `no` | Skip this issue, continue to next |
| `skip all` / `s` | Skip interactive mode, show full report |

### Fixable Patterns

| Issue Pattern | Auto-Fix |
|---------------|----------|
| `wait X seconds` without condition | → `waitFor` with visibility/enabled check |
| Hardcoded `sleep()` | → `waitForSelector()` or `waitForResponse()` |
| CSS class selectors | → `getByRole()` / `getByTestId()` (suggest, confirm) |
| Missing `trace: 'on-first-retry'` | → Add to playwright.config |
| Duplicate step definitions | → Consolidate (show which to keep) |

### After Each Fix

```
✓ Fixed C1: Replaced 7 sleeps with condition waits
  Modified: features/checkout.feature, features/auth.feature
  
→ Fix C2 now? [y/n/skip all]
```

### Post-Fix Summary

```
╔══════════════════════════════════════════════════════════════════╗
║                     FIX SUMMARY                                  ║
╠══════════════════════════════════════════════════════════════════╣
║  ✓ C1: Fixed (7 sleeps → condition waits)                        ║
║  ✓ C3: Fixed (added trace-on-failure)                            ║
║  ✗ C2: Skipped (requires manual review)                          ║
║                                                                  ║
║  Files modified: 5                                               ║
║  New score estimate: 68 → 74 (+6)                                ║
╚══════════════════════════════════════════════════════════════════╝
```

---

## Step 5: Roadmap (Medium/Large repos only)

Skip for small repos — provide inline recommendations instead.

| Phase | Focus |
|-------|-------|
| 1: Quick Wins | Remove sleeps, enable trace-on-failure, fix critical selectors |
| 2: Foundation | Thin step defs, proper fixtures, test isolation |
| 3: Advanced | Visual tests, a11y integration, CI optimization |

---

## User Questions

### Auto-Inference First

Before asking, try to infer from codebase:

| Question | Auto-Inference |
|----------|----------------|
| Primary goal? | Infer from issues: many sleeps → stability; bad selectors → AI-ready |
| Depth of changes? | Infer from repo size: small → quick wins; large → phased |
| CI constraints? | Read from config: worker count, timeout settings |

### Minimal Question Set

Ask ONLY what cannot be inferred:

**Small repos (1 question):**
> What is your priority: stability, speed, or AI-agent readability?

**Medium repos (2 questions):**
1. What is your priority: stability, speed, or AI-agent readability?
2. How deep can changes go: quick fixes only, or can we refactor?

**Large repos (3 questions):**
1. What is your priority: stability, speed, or AI-agent readability?
2. How deep can changes go: quick fixes only, medium refactor, or deep restructuring?
3. Are there CI/environment constraints? (e.g., worker limits, no mocks, staging only)

### Dynamic Questions (Only if triggered)

Ask ONLY if specific issues found:

| Trigger | Question |
|---------|----------|
| CRITICAL issues found | Which CRITICAL items should be fixed first? (list by ID) |
| Selector/a11y issues | Can we modify application markup (HTML), or tests only? |
| >10 WARNING issues | Which WARNING items are in scope this iteration? |

---

## Semantic/A11y Refactoring Proposal

If Aspect 4 (Selector Strategy) or Aspect 8 (AI-Agent Operability) scores below 60, propose:

```
╔══════════════════════════════════════════════════════════════════╗
║           SEMANTIC/A11Y REFACTORING PROPOSAL                     ║
╠══════════════════════════════════════════════════════════════════╣
║  Your locators would be more stable with semantic HTML.          ║
║                                                                  ║
║  Would you like me to help refactor:                             ║
║  [ ] Component markup (replace div onclick → button, add ARIA)   ║
║  [ ] Test locators (migrate CSS → getByRole)                     ║
╚══════════════════════════════════════════════════════════════════╝
```

Ask only if user has access to modify application source code.

---

## Reference Files

| File | Purpose |
|------|---------|
| `criteria/aspects.md` | Detailed scoring rubrics (0/5/10) |
| `reference/severity.md` | Issue classification rules |
| `reference/bdd-best-practices.md` | Best practices guide |
| `modules/discovery.md` | Discovery details |
| `modules/sampling.md` | Sampling strategy |
| `modules/scoring.md` | Score calculation |
| `modules/output-formats.md` | Output format specs |
| `templates/report.html` | HTML report template |
| `scripts/render-html.mjs` | HTML generator script |

---

## Quick Reference: Workflow by Size

### Small Repo (≤20 scenarios)
1. Discover (auto-detect stack, size)
2. Ask 1 question (priority)
3. Analyze all scenarios
4. Score aspects (simplified)
5. Print terminal report + issues
6. **Interactive fix mode** (if Low-effort CRITICAL issues)
7. Offer HTML report
8. Provide inline recommendations

### Medium Repo (21–100 scenarios)
1. Discover (auto-detect)
2. Ask 2 questions
3. Sample 30–50%
4. **Show progress** (50+ scenarios)
5. Full aspect scoring
6. Terminal + saved reports
7. **Interactive fix mode**
8. Offer HTML report
9. Phased roadmap (3 phases)

### Large Repo (100+ scenarios)
1. Discover (auto-detect)
2. Ask 3 questions
3. Stratified sampling
4. **Show progress** (with stage updates)
5. Full aspect scoring
6. All report formats
7. **Interactive fix mode**
8. HTML report
9. Detailed phased roadmap
10. Propose a11y refactoring if applicable