---
name: test-driven-development
type: workflow
description: "Forces the strict Red-Green-Refactor development cycle. Requires one failing behavior test through a public interface before each implementation slice."
argument-hint: "[task-description-or-file]"
user-invocable: true
allowed-tools: Read, Glob, Grep, Bash
context: fork
effort: 3
agent: lead-programmer
when_to_use: "When implementing ANY approved feature, bug fix, algorithmic logic, or when executing atomic tasks generated by the planning phase."
---

# Test-Driven Development (TDD)

## Purpose

TDD acts as the ultimate truth verifier. It prevents Agents from blindly inserting implementation code based on assumptions. By strictly enforcing the **RED-GREEN-REFACTOR** cycle, we guarantee that all code changes are demonstrably functional and logically sound before moving on to the next task.

Good tests verify observable behavior through the system's public interface:
commands, APIs, UI interactions, exported functions, or other supported entry
points. They must not couple to private helpers, internal call order, temporary
data structures, or implementation-only collaborators. A refactor that preserves
behavior should not break the test.

TDD here is strictly vertical. One behavior slice, one failing test, one minimum
implementation. Bulk test authoring is not "getting ahead"; it is horizontal
slicing and usually produces brittle tests for imagined behavior.

---

## Workflow (Strict Process)

**ABSOLUTE DIRECTIVE**: Under NO circumstances are you allowed to write the final implementation logic before a test has explicitly failed in the terminal environment.

**PRECONDITION**: TDD is an execution workflow, not a planning bypass. Before
writing the RED test, confirm the applicable `using-sdd` pre-code gate is
satisfied:

- Fast Gate for a tiny explicit fix
- Spec Gate for an approved spec task
- Plan Gate for an approved implementation-plan task
- Override Gate when the user explicitly accepted skipping the normal gate

If no gate is satisfied, stop and request the missing approval or clarification.

### 1. RED Phase (Write the Failing Test)
- Read the specification for the current atomic task.
- Select exactly one behavior for the next vertical slice. Do not write a batch
  of tests for imagined future behavior.
- Create or modify the appropriate test file (e.g., `*.test.ts`, `*_test.go`, `test_*.py`) to assert the expected behavior through the public interface.
- **EXECUTE**: Use the terminal to run the test suite.
- **GATE**: Capture the terminal output. You MUST see the test **FAIL**. If the test passes immediately, your test is flawed or asserting the wrong state.

### 2. GREEN Phase (Make it Pass)
- Write the absolute *minimum* amount of source code necessary to fulfill the test requirement. Do not over-engineer or add "nice-to-have" features yet.
- Do not anticipate later slices. The only acceptable implementation is the
  smallest change that makes the current behavior test pass.
- **EXECUTE**: Use the terminal to run the test suite again.
- **GATE**: Capture the terminal output. You MUST see the test **PASS**. 

### 3. REFACTOR Phase (Clean it Up)
- Optimize your implementation code (clean up naming, extract duplicate logic, improve types/interfaces) without altering behavior.
- **EXECUTE**: Run the test suite a final time to ensure your refactoring did not break the green state.

### 4. Repeat Vertically

Repeat RED -> GREEN -> REFACTOR for the next behavior only after the previous
slice is green. This tracer-bullet pattern keeps each test tied to real behavior
learned from the previous implementation step.

Do not use horizontal slicing:

```text
Wrong: RED all tests -> GREEN all implementation
Right: RED one behavior -> GREEN one behavior -> REFACTOR -> repeat
```

If you feel tempted to queue several tests first, stop and restate the next
single behavior in one sentence. If that sentence contains "and", you are
probably slicing too wide.

---

## Output Format

Your final response to the user MUST contain the evidence of your execution in the following format:

```markdown
# 🧪 TDD Execution: [Feature/Task Name]

## 🔴 RED Phase Evidence
- **Test added**: `[test_function_name]`
- **Terminal Output**:
```bash
[Insert the failing test output trace here - do not fake this]
```

## 🟢 GREEN Phase Evidence
- **Implementation**: Modified `[file_name]`
- **Terminal Output**:
```bash
[Insert the passing test output trace here]
```

**Status**: ✅ Code verified. Awaiting next command.
```

---

## Anti-Rationalizations

Be aware of lazy logic that an Agent typically uses to skip testing. If a thought on the left occurs, YOU MUST apply the rebuttal on the right:

| Excuse (Agent's Lazy Rationalization) | Rebuttal & Correct Action |
| :--- | :--- |
| "I'll just write the implementation first because it's super easy, and add the tests after." | **CRITICAL REJECTION.** Writing code before a test guarantees brittle code and untested edge cases. You MUST write the failing test first. No exceptions. |
| "The project doesn't seem to have a test framework set up, so I will just skip TDD and do a console.log." | **REJECTED.** `console.log` is not an automated test. If a test runner (Jest, PyTest, Go test, Vitest) is missing, STOP working and ask the User for permission to install and configure one immediately. |
| "I don't need to run the final terminal command to check if it's green. I can read the logic and I know it works." | **REJECTED.** Code does not exist until the compiler/test-runner proves it exists. You cannot hallucinate terminal outputs. Run the actual command. |
| "Writing the failing test is safe before approval." | **REJECTED.** RED tests are execution. Confirm the pre-code gate before editing tests. |
| "I'll write all RED tests first, then implement them together." | **REJECTED.** Bulk RED is horizontal slicing. Write one behavior test, make it pass, then continue. |
| "I know future slices already, so I can pre-bake their tests now." | **REJECTED.** Future slices change as current implementation teaches you where the real seam is. Stay on one behavior. |
| "Testing private helpers is faster than going through the public API." | **REJECTED.** Private-helper tests couple to implementation. Test the supported interface unless no behavior seam exists; if no seam exists, report that design problem. |

---

## Verification Gates

Do not conclude the turn unless you have:
- [ ] Confirmed the `using-sdd` pre-code gate before editing tests or production code.
- [ ] Executed the test runner during the **RED** phase and captured a definitive FAILING state log.
- [ ] Executed the test runner during the **GREEN** phase and captured a PASSING log.
- [ ] Verified the test exercises observable behavior through a public interface.
- [ ] Avoided horizontal slicing by completing only one RED -> GREEN behavior cycle at a time.
- [ ] Did not leave queued RED tests for future slices in the worktree.
- [ ] Displayed both terminal outputs (Red & Green) as indisputable evidence to the User.

---

## Edge Cases

- **[UI / Canvas Rendering]**: If the task involves purely visual UI CSS updates that cannot be easily unit tested, fallback to structured visual DOM tests (e.g., Testing Library query checks) or explicit manual testing steps listed for the user. Do not skip testing entirely.

---

## Related Skills

- `planning-and-task-breakdown` — TDD is the engine used to execute the tasks planned by this skill.
- `spec-driven-development` — TDD uses the spec as its single source of truth for what tests need to be written.