--- name: test-driven-development description: Use when implementing any logic, fixing any bug, or changing any behavior. Use when you need to prove that code works, when a bug report arrives, or when you're about to modify existing functionality. --- # Test-Driven Development ## Overview Write a failing test before writing the code that makes it pass. For bug fixes, reproduce the bug with a test before attempting a fix. Tests are proof — "seems right" is not done. A codebase with good tests is an AI agent's superpower; a codebase without tests is a liability. ## When to Use - Implementing any new logic or behavior - Fixing any bug (the Prove-It Pattern) - Modifying existing functionality - Adding edge case handling - Any change that could break existing behavior **When NOT to use:** Pure configuration changes, documentation updates, or static content changes that have no behavioral impact. ## The TDD Cycle ``` RED GREEN REFACTOR Write a test Write minimal code Clean up the that fails ──→ to make it pass ──→ implementation ──→ (repeat) │ │ │ ▼ ▼ ▼ Test FAILS Test PASSES Tests still PASS ``` ### Step 1: RED — Write a Failing Test Write the test first. It must fail. A test that passes immediately proves nothing. ```typescript // RED: This test fails because createTask doesn't exist yet describe('TaskService', () => { it('creates a task with title and default status', async () => { const task = await taskService.createTask({ title: 'Buy groceries' }); expect(task.id).toBeDefined(); expect(task.title).toBe('Buy groceries'); expect(task.status).toBe('pending'); expect(task.createdAt).toBeInstanceOf(Date); }); }); ``` ### Step 2: GREEN — Make It Pass Write the minimum code to make the test pass. Don't over-engineer: ```typescript // GREEN: Minimal implementation export async function createTask(input: { title: string }): Promise { const task = { id: generateId(), title: input.title, status: 'pending' as const, createdAt: new Date(), }; await db.tasks.insert(task); return task; } ``` ### Step 3: REFACTOR — Clean Up With tests green, improve the code without changing behavior: - Extract shared logic - Improve naming - Remove duplication - Optimize if necessary Run tests after every refactor step to confirm nothing broke. ## The Prove-It Pattern (Bug Fixes) When a bug is reported, **do not start by trying to fix it.** Start by writing a test that reproduces it. ``` Bug report arrives │ ▼ Write a test that demonstrates the bug │ ▼ Test FAILS (confirming the bug exists) │ ▼ Implement the fix │ ▼ Test PASSES (proving the fix works) │ ▼ Run full test suite (no regressions) ``` **Example:** ```typescript // Bug: "Completing a task doesn't update the completedAt timestamp" // Step 1: Write the reproduction test (it should FAIL) it('sets completedAt when task is completed', async () => { const task = await taskService.createTask({ title: 'Test' }); const completed = await taskService.completeTask(task.id); expect(completed.status).toBe('completed'); expect(completed.completedAt).toBeInstanceOf(Date); // This fails → bug confirmed }); // Step 2: Fix the bug export async function completeTask(id: string): Promise { return db.tasks.update(id, { status: 'completed', completedAt: new Date(), // This was missing }); } // Step 3: Test passes → bug fixed, regression guarded ``` ## Test Hierarchy Write tests at the lowest level that captures the behavior: ``` Level 1: Unit Tests ├── Pure logic, isolated functions ├── No I/O, no network, no database ├── Fast: milliseconds per test ├── Live next to source code: Button.test.tsx └── Example: "formatDate returns ISO string for valid dates" Level 2: Integration Tests ├── Component interactions, API boundaries ├── May use test database, mock services ├── Medium speed: seconds per test ├── Live next to source code or in tests/ └── Example: "POST /api/tasks creates a task in the database" Level 3: End-to-End Tests ├── Full user flows through the real application ├── Uses browser automation (Playwright, Cypress) ├── Slow: 10+ seconds per test ├── Live in e2e/ or specs/ directory └── Example: "User can register, create a task, and mark it complete" ``` **Decision guide:** ``` Is it pure logic with no side effects? → Unit test Does it cross a boundary (API, database, file system)? → Integration test Is it a critical user flow that must work end-to-end? → E2E test (limit these to critical paths) ``` ## Writing Good Tests ### Test Behavior, Not Implementation ```typescript // Good: Tests what the function does it('returns tasks sorted by creation date, newest first', async () => { const tasks = await listTasks({ sortBy: 'createdAt', sortOrder: 'desc' }); expect(tasks[0].createdAt.getTime()) .toBeGreaterThan(tasks[1].createdAt.getTime()); }); // Bad: Tests how the function works internally it('calls db.query with ORDER BY created_at DESC', async () => { await listTasks({ sortBy: 'createdAt', sortOrder: 'desc' }); expect(db.query).toHaveBeenCalledWith( expect.stringContaining('ORDER BY created_at DESC') ); }); ``` ### Use the Arrange-Act-Assert Pattern ```typescript it('marks overdue tasks when deadline has passed', () => { // Arrange: Set up the test scenario const task = createTask({ title: 'Test', deadline: new Date('2025-01-01'), }); // Act: Perform the action being tested const result = checkOverdue(task, new Date('2025-01-02')); // Assert: Verify the outcome expect(result.isOverdue).toBe(true); }); ``` ### One Assertion Per Concept ```typescript // Good: Each test verifies one behavior it('rejects empty titles', () => { ... }); it('trims whitespace from titles', () => { ... }); it('enforces maximum title length', () => { ... }); // Bad: Everything in one test it('validates titles correctly', () => { expect(() => createTask({ title: '' })).toThrow(); expect(createTask({ title: ' hello ' }).title).toBe('hello'); expect(() => createTask({ title: 'a'.repeat(256) })).toThrow(); }); ``` ### Name Tests Descriptively ```typescript // Good: Reads like a specification describe('TaskService.completeTask', () => { it('sets status to completed and records timestamp', ...); it('throws NotFoundError for non-existent task', ...); it('is idempotent — completing an already-completed task is a no-op', ...); it('sends notification to task assignee', ...); }); // Bad: Vague names describe('TaskService', () => { it('works', ...); it('handles errors', ...); it('test 3', ...); }); ``` ## Test Anti-Patterns to Avoid | Anti-Pattern | Problem | Fix | |---|---|---| | Testing implementation details | Tests break when refactoring even if behavior is unchanged | Test inputs and outputs, not internal structure | | Flaky tests (timing, order-dependent) | Erode trust in the test suite | Use deterministic assertions, isolate test state | | Testing framework code | Wastes time testing third-party behavior | Only test YOUR code | | Snapshot abuse | Large snapshots nobody reviews, break on any change | Use snapshots sparingly and review every change | | No test isolation | Tests pass individually but fail together | Each test sets up and tears down its own state | | Mocking everything | Tests pass but production breaks | Mock at boundaries, not everywhere | ## When to Use Subagents for Testing For complex bug fixes, spawn a subagent to write the reproduction test: ``` Main agent: "Spawn a subagent to write a test that reproduces this bug: [bug description]. The test should fail with the current code." Subagent: Writes the reproduction test Main agent: Verifies the test fails, then implements the fix, then verifies the test passes. ``` This separation ensures the test is written without knowledge of the fix, making it more robust. ## Common Rationalizations | Rationalization | Reality | |---|---| | "I'll write tests after the code works" | You won't. And tests written after the fact test implementation, not behavior. | | "This is too simple to test" | Simple code gets complicated. The test documents the expected behavior. | | "Tests slow me down" | Tests slow you down now. They speed you up every time you change the code later. | | "I tested it manually" | Manual testing doesn't persist. Tomorrow's change might break it with no way to know. | | "The code is self-explanatory" | Tests ARE the specification. They document what the code should do, not what it does. | | "It's just a prototype" | Prototypes become production code. Tests from day one prevent the "test debt" crisis. | ## Red Flags - Writing code without any corresponding tests - Tests that pass on the first run (they may not be testing what you think) - "All tests pass" but no tests were actually run - Bug fixes without reproduction tests - Tests that test framework behavior instead of application behavior - Test names that don't describe the expected behavior - Skipping tests to make the suite pass ## Verification After completing any implementation: - [ ] Every new behavior has a corresponding test - [ ] All tests pass: `npm test` - [ ] Bug fixes include a reproduction test that failed before the fix - [ ] Test names describe the behavior being verified - [ ] No tests were skipped or disabled - [ ] Coverage hasn't decreased (if tracked)