---
name: tdd
description: |
  Universal test-driven development methodology and test design craft.
  Background knowledge for writing behavior-focused tests, mocking strategy,
  vertical slicing, and tracer bullet execution. Auto-loaded when writing tests
  during implementation, review feedback, or standalone test authoring.
  Triggers: writing tests, test design, mocking strategy, test quality,
  behavior testing, integration testing, test-first, red-green-refactor.
---

# Test-Driven Development Methodology

Universal principles for writing good tests. These apply regardless of framework, language, or repo — discover and follow your repo's specific testing tools, commands, and conventions from its project configuration, contributor guides, existing test files, CI/CD config, and any repo-level AI skills or rules.

---

## Core principle

Tests verify **behavior through public interfaces**, not implementation details. Code can change entirely; tests shouldn't break unless behavior changed.

**Good tests** exercise real code paths through public APIs. They describe _what_ the system does, not _how_. A good test reads like a specification — "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.

**Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed.

---

## Good vs bad tests

```typescript
// GOOD: Tests observable behavior through the interface
test("created user is retrievable", async () => {
  const user = await createUser({ name: "Alice" });
  const retrieved = await getUser(user.id);
  expect(retrieved.name).toBe("Alice");
});

// BAD: Bypasses interface to verify via implementation detail
test("createUser saves to database", async () => {
  await createUser({ name: "Alice" });
  const row = await db.query("SELECT * FROM users WHERE name = ?", ["Alice"]);
  expect(row).toBeDefined();
});
```

```typescript
// GOOD: Tests the outcome
test("user can checkout with valid cart", async () => {
  const cart = createCart();
  cart.add(product);
  const result = await checkout(cart, paymentMethod);
  expect(result.status).toBe("confirmed");
});

// BAD: Tests the mechanism
test("checkout calls paymentService.process", async () => {
  const mockPayment = vi.spyOn(paymentService, "process");
  await checkout(cart, payment);
  expect(mockPayment).toHaveBeenCalledWith(cart.total);
});
```

**Red flags** for implementation-coupled tests:
- Mocking internal collaborators (not external boundaries)
- Testing private methods
- Asserting on call counts or call order
- Test breaks on refactor without behavior change
- Test name describes HOW ("calls X") not WHAT ("user can Y")

---

## Mocking philosophy

Mock at **system boundaries** only:
- External APIs (payment, email, third-party services)
- Databases (sometimes — prefer test database when available)
- Time / randomness
- File system (sometimes)

**Do not mock** your own modules, internal collaborators, or anything you control. If you need to mock an internal module to test something, that's a signal the interface design needs work — not that you need more mocks.

**Design for mockability at boundaries:**

```typescript
// Easy to mock — dependency is injected
function processPayment(order, paymentClient) {
  return paymentClient.charge(order.total);
}

// Hard to mock — dependency is internal
function processPayment(order) {
  const client = new StripeClient(process.env.STRIPE_KEY);
  return client.charge(order.total);
}
```

---

## Vertical slicing (anti-pattern: horizontal slices)

**DO NOT** write all tests first, then all implementation. This is "horizontal slicing" — it produces weak tests:

- Tests written in bulk test _imagined_ behavior, not _actual_ behavior
- You end up testing the _shape_ of things rather than user-facing behavior
- Tests become insensitive to real changes — they pass when behavior breaks, fail when behavior is fine
- You commit to test structure before understanding the implementation

**Correct approach** — vertical slices via tracer bullets:

```
WRONG (horizontal):
  RED:   test1, test2, test3, test4, test5
  GREEN: impl1, impl2, impl3, impl4, impl5

RIGHT (vertical):
  RED→GREEN: test1→impl1
  RED→GREEN: test2→impl2
  RED→GREEN: test3→impl3
```

Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.

---

## Tracer bullet

Start with ONE test that proves ONE path works end-to-end:

```
RED:   Write test for first behavior → test fails
GREEN: Write minimal code to pass → test passes
```

This is your tracer bullet — it proves the path works before you invest in breadth. Then add behaviors incrementally, one test at a time.

Rules:
- One test at a time
- Only enough code to pass the current test
- Don't anticipate future tests
- Keep tests focused on observable behavior
- **Never refactor while RED** — get to GREEN first

---

## Per-cycle checklist

```
[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive an internal refactor
[ ] Code is minimal for this test
[ ] No speculative features added
```