---
name: test-driven
description: Test-Driven Development (TDD) across any supported language. Use when implementing features or fixes with TDD methodology, writing tests before code, or following XP-style development.
---

# Test-driven development (XP-style)

Tests define the specification. Design them from requirements before any implementation. The RED-GREEN-REFACTOR cycle is the heartbeat: write a failing test, make it pass with minimal code, then clean up while green.

**Modern insight (2025)**: TDD + property-based testing pairing is the standard -- example tests prevent regressions, property tests discover edge cases. TDD also serves AI-assisted development: structural integrity keeps code understandable for both human and AI collaborators (Kent Beck, "Augmented Coding"). Mutation testing validates test quality beyond coverage metrics (TDD+Mutation: 63.3% vs TDD-alone: 39.4% mutation coverage).

See [frameworks](references/frameworks.md) for language-specific test runners, property testing, coverage, and mutation tools.
See [examples](references/examples.md) for brief TDD cycle patterns per language.

---

## When to Apply

- New features with clear requirements (both inside-out and outside-in approaches valid)
- Bug fixes -- write a failing test that proves the bug before fixing
- Refactoring -- ensure coverage exists before restructuring
- API contract enforcement -- test the interface, not internals
- Property-based invariants -- complement example tests with PBT
- Legacy code -- add characterization tests before modifying (Michael Feathers pattern)

## When NOT to Apply

- Exploratory prototyping or spike research
- One-off scripts, data migrations, generated code
- Purely visual UI layout work (prefer visual regression testing)
- Highly experimental algorithmic research (but PBT still helps)
- Throwaway code with <1 week lifespan

---

## Anti-patterns

- **Test-last**: Writing tests after implementation defeats the design benefit
- **Testing implementation details**: Tests should verify behavior, not internal structure -- breaks refactoring confidence
- **Over-mocking**: Testing the mocks instead of the code; mock external I/O, not core logic
- **Skipping RED**: Tests that never fail aren't tests -- they verify nothing
- **100% coverage obsession**: Coverage does not equal quality. Mutation testing exposes gaps coverage cannot
- **Refactoring on RED**: Never restructure with failing tests
- **Test-induced architectural damage**: Letting mock boundaries dictate design
- **Snapshot bloat**: Approval-style tests without curation become maintenance burden

---

## Two Schools (decision guidance, not prescription)

- **Inside-Out (Classic/Detroit)**: Start with unit tests for smallest pieces, build upward. Minimizes mocks. Best for well-understood domains, algorithms, utility functions.
- **Outside-In (London/Mockist)**: Start with acceptance test for user-facing behavior, use mocks to discover interfaces. Best for layered systems, APIs, microservices.
- **Pragmatic teams use both depending on context.** Neither is superior.

## Test Doubles Hierarchy

- **Stubs**: Return predefined data; verify outcomes (state-based)
- **Mocks**: Verify interactions/calls were made (behavior-based)
- **Fakes**: Working implementations (e.g., in-memory database)
- **Spies**: Record calls while using real behavior
- **Rule**: Mock external dependencies. Never mock core domain logic.

---

## Workflow (language-neutral)

1. **CREATE** -- Write failing tests: error cases -> edge cases -> happy paths -> property tests
2. **RED** -- Run tests, verify all fail. If any pass, the test is wrong or behavior already exists.
3. **GREEN** -- Minimal code to pass. No extras, no optimization, no cleanup.
4. **REFACTOR** -- Clean up while green. Separate structural changes from behavioral (Tidy First). Re-run tests after every change.

---

## Constitutional Rules (Non-Negotiable)

1. **Design Tests First**: Plan all test cases from requirements before implementation; write each test iteratively in the RED-GREEN-REFACTOR loop
2. **RED Before GREEN**: Each new test MUST fail before you write implementation for it
3. **Error Cases First**: Implement error handling before success paths
4. **One Test at a Time**: Write one failing test, make it pass, refactor, then add the next test
5. **Refactor Only on GREEN**: Never refactor with failing tests

## Validation Gates

| Gate | Pass Criteria | Blocking |
|------|---------------|----------|
| Tests Created | Test files exist for target module | Yes |
| RED State | All new tests fail before implementation | Yes |
| GREEN State | All tests pass after implementation | Yes |
| Coverage | >= 80% line coverage | No |
| Mutation | Mutation score reviewed (no threshold enforced) | No |

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | TDD cycle complete, all tests pass |
| 11 | No test framework detected |
| 12 | Test compilation failed |
| 13 | Tests not failing (RED state invalid) |
| 14 | Tests fail after implementation (GREEN not achieved) |
| 15 | Tests fail after refactor (regression) |

---

## Reference materials (mattpocock/skills tdd fold-in)

- `references/mocking.md` — when to mock vs use real implementations; trade-offs.
- `references/interface-design.md` — interface shape and depth in TDD context.
- `references/refactoring.md` — refactor step discipline post-green.
- `references/deep-modules.md` — Ousterhout's deep-module heuristic applied to TDD.
- `references/tests.md` — what counts as a real-bug test vs ceremony.

These reference docs are MIT-licensed (see `/home/alpha/.claude/claude/skills/LICENSES.md` for full attribution).