--- name: tdd description: > Use when implementing any feature or fix outside code-forge workflow — enforces Red-Green-Refactor cycle with mandatory test-first discipline. Supports three modes: (1) Standalone — ad-hoc TDD for quick changes, (2) Auto-Analysis — runs the full spec-forge:test-cases analysis pipeline (project profile, four-layer deep scan, multi-dimensional coverage) then implements all cases via TDD, (3) Driven — reads a test-cases.md document and implements each case via TDD. --- # Code Forge — TDD ## ⚡ Execution Entry Point @../shared/execution-entrypoint.md **For this skill:** start at **Step 0 (Determine Mode)**. If you catch yourself about to say "falling back to manual TDD", STOP and go to the indicated step. --- Test-Driven Development enforcement for any code change, with built-in code analysis. ## When to Use - Writing code outside of code-forge:impl workflow (ad-hoc changes, quick fixes) - Adding tests to existing code that lacks coverage - Implementing test cases from a spec-forge:test-cases document - Any new feature, bug fix, or behavior change that needs test discipline **Note:** code-forge:impl already enforces TDD internally. This skill is for work outside that workflow. ## Iron Law **NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST.** No exceptions. Not for "simple" changes. Not for "obvious" fixes. Not when under time pressure. ## Design Discipline (Mandatory, Applies to All Modes) Before any RED step in any mode (Driven, Auto-Analysis, Standalone), you MUST run the design-first pre-code checklist: read the relevant subsystem, consider the optimal interface-stable design, and decide whether to refactor existing code or add new code. The TDD cycle's REFACTOR step is the second enforcement point — every GREEN must be followed by a real consideration of whether the new code is the cleanest shape, not the most expedient one. This discipline is the upstream defense against patch-first development. Read it once at the start of every session and again whenever you are tempted to add a new branch / wrapper / parallel module instead of refactoring: @../shared/design-first.md ## Step 0: Determine Mode Examine the arguments to determine the operating mode: | Argument | Mode | Behavior | |----------|------|----------| | `@docs/.../test-cases.md` | **Driven Mode** | Read test cases document, implement each case via TDD | | `@src/services/payment.ts` or specific code path | **Auto-Analysis Mode** | Analyze specified code, design cases, implement via TDD | | Feature name or description (e.g., "add validation to user signup") | **Standalone Mode** | Classic TDD — write tests for the described change | | Empty (no arguments) | **Auto-Analysis Mode** | Scan project for coverage gaps, design cases, implement | ## Driven Mode — Implementing from Test Cases Document When a `test-cases.md` file is provided (generated by `spec-forge:test-cases`): ### D.1 Read and Parse 1. Read the test-cases document 2. Extract all test cases (TC-MODULE-NNN entries) 3. Identify which are already implemented (check existing test files for matching test names/IDs) 4. Filter to unimplemented cases 5. Sort by priority: P0 first, then P1, then P2 ### D.2 Confirm Scope Present to user: - "{N} test cases found, {X} already implemented, {Y} remaining" - "Implement: (A) all remaining, (B) P0 only, (C) P0 + P1, (D) specific modules?" ### D.3 Implement Loop For each test case in scope: 1. **Read the case** — extract preconditions, steps, expected result, not-expected, test infra 2. **Set up test infrastructure** — if Test Infra is "Real DB", configure TestContainers or test database; if "Mock external", set up mock for the specified third-party service; if "Temp dir", create temp directory; if "N/A", no special setup needed 3. **RED** — Write a failing test that matches the case specification - Test name should include TC ID: `test("TC-AUTH-001: create user with valid email returns 201", ...)` - Preconditions become test setup (seed data, auth context, config) - Steps become test actions - Expected result becomes assertions - "Not Expected" becomes negative assertions where applicable 4. **VERIFY RED** — Run the test, confirm it fails correctly 5. **GREEN** — Write minimal production code to make it pass (if the code already exists and passes, the case was already covered — note and move on) 6. **VERIFY GREEN** — Run all tests, confirm clean pass 7. **REFACTOR** — Clean up if needed 8. **Report** — "TC-AUTH-001: DONE (test passes, implementation complete)" ### D.4 Progress Tracking After each case, display progress: ``` TDD Progress: {completed}/{total} ({percentage}%) [x] TC-AUTH-001: Create user with valid email (P0) — DONE [x] TC-AUTH-010: Create user with duplicate email rejected (P0) — DONE [ ] TC-AUTH-011: Create user with invalid email format (P1) — next [ ] TC-AUTH-030: Create user should NOT bypass email validation (P1) ``` Ask: "Continue with next case, skip, or pause?" ### D.5 Completion After all cases are implemented: - Run full test suite - Report: total cases implemented, all tests passing, coverage change - Suggest: "Run `/code-forge:verify` to confirm completion" ## Auto-Analysis Mode — Scan and Test When the user points to code or says "help me write tests" without a test-cases document. **Iron Rule: Auto-Analysis uses the SAME full analysis as spec-forge:test-cases.** The only difference is the output — auto-analysis produces code directly instead of a document. The analysis quality must be identical. ### A.0 Full Test Case Analysis (same as spec-forge:test-cases Steps 1-5) Execute the **complete** spec-forge:test-cases analysis pipeline. The full workflow is defined in the spec-forge test-cases-generation skill (`spec-forge/skills/test-cases-generation/SKILL.md`). The essential steps are inlined below — follow them exactly: **Step 1 — Determine Input Mode and Project Profile** - Determine input mode: Scan / Code / Spec (from user arguments) - Detect project profile: Web API / CLI Tool / Frontend App / AI Agent / Data Pipeline / Function Library / SDK - Detect: has database? has auth? has external APIs? - Output explicit profile with rationale **Step 2 — Deep Scan and Extract (Four Layers)** - Use the language-specific deep extraction strategy (Python / TypeScript / Go / Rust / Java) - Extract ALL testable units across four layers: - **Interface**: public API surface, type contracts, trait/interface boundaries - **Logic**: branch paths, error chains, state transitions, validation rules - **Architecture**: module structure, layer boundaries, dependency direction - **Relationships**: call graphs, data flow, event propagation, trait implementations - Scan existing tests to determine current coverage - Run scan verification (file coverage ≥ 90%, module tree completeness, re-export tracking) - Produce structured Functional Inventory with all four layers per unit **Step 3 — Detect Dimensions** - Apply built-in dimensions: Coverage Depth (L1/L2/L3) - Auto-detect project-specific dimensions (Auth Context, Trigger Mode, Input Source, etc.) **Step 4 — Confirm Scope with User** - Present Profile confirmation: "I detected this as **{profile}** ({rationale}). Correct?" - Present scope: "{N} testable units, {X} have tests, {Y} don't. Cover: all / uncovered / specific modules?" - Present detected dimensions for confirmation - Ask for business rules the code can't reveal **Step 5 — Design Test Cases** - Per testable unit, generate at minimum: - 1 × L1 (Happy Path) - 2 × L2 (Boundary + Error) - 1 × L3 (Negative — what should NOT happen) - For interacting units: pairwise combination cases (L1 both succeed + L2 one fails + L3 should not combine) - For auto-detected dimensions: cross with coverage depth using risk-based prioritization - Apply conditional sections: - Data Integrity cases (only if project has database) - Security cases (only if project has auth or handles user input) - Performance cases (only if project has latency/throughput requirements) - Assign priorities: P0 (critical path) / P1 (important) / P2 (nice-to-have) - Build coverage matrix internally: unit × depth, dimension coverage, combination coverage, gap analysis **Result**: A complete set of structured test cases in memory — identical quality to what spec-forge:test-cases would produce as a document. ### A.1 Optional: Save Test Cases Document Ask the user: "Save the test cases as `docs/{feature}/test-cases.md` for future reference? (Y/n)" - If yes → write the document following the spec-forge:test-cases template, then continue to A.2 - If no → keep in memory, continue to A.2 ### A.2 Implement via TDD For each test case (sorted by priority: P0 → P1 → P2), follow the same TDD cycle as Driven Mode: 1. **Read the case** — extract preconditions, steps, expected result, not-expected, test infra 2. **Set up test infrastructure** — if Test Infra is "Real DB", configure TestContainers; if "Mock external", set up mock; if "Temp dir", create temp directory; if "N/A", no setup 3. **RED** — Write a failing test matching the case specification - Test name should include TC ID: `test("TC-AUTH-001: create user with valid email returns 201", ...)` - Preconditions → test setup; Steps → test actions; Expected result → assertions; Not Expected → negative assertions 4. **VERIFY RED** — Run the test, confirm it fails correctly 5. **GREEN** — Write minimal production code to make it pass 6. **VERIFY GREEN** — Run all tests, confirm clean pass 7. **REFACTOR** — Clean up if needed 8. **Report** — "TC-AUTH-001: DONE" ### A.3 Progress Tracking After each case, display progress (same format as Driven Mode D.4): ``` TDD Progress: {completed}/{total} ({percentage}%) [x] TC-AUTH-001: Create user with valid email (P0) — DONE [x] TC-AUTH-010: Duplicate email rejected (P0) — DONE [ ] TC-AUTH-011: Invalid email format (P1) — next ``` Ask: "Continue with next case, skip, or pause?" ### A.4 Completion After all cases are implemented: - Run full test suite - Report: total cases implemented, all tests passing, coverage statistics - If test cases were saved to file (A.1), report the file path - Suggest: "Run `/code-forge:verify` to confirm completion" ## Standalone Mode — Classic TDD For ad-hoc changes where the user describes what to build or fix: ### Workflow ``` RED (write failing test) → VERIFY RED → GREEN (minimal code) → VERIFY GREEN → REFACTOR → REPEAT ``` ### The Cycle Complete each phase fully before moving to the next. #### 1. RED — Write a Failing Test - One minimal test showing the desired behavior - Clear, descriptive test name - Use real code, not mocks (unless unavoidable: external APIs, time-dependent behavior) - One behavior per test #### 2. VERIFY RED — Watch It Fail (MANDATORY) Run the test. Confirm: - It **fails** (not errors) - The failure message describes the missing behavior - It fails because the feature is missing, not because of typos or setup issues If the test **passes**: you're testing existing behavior. Rewrite the test. If the test **errors**: fix the error, re-run until it fails correctly. #### 3. GREEN — Write Minimal Code - Simplest code that makes the test pass - No extra features, no "while I'm here" improvements - No premature abstractions — three similar lines beats a premature helper #### 4. VERIFY GREEN — Watch It Pass (MANDATORY) Run the test. Confirm: - The new test **passes** - All other tests **still pass** - Output is clean (no warnings, no errors) If the new test **fails**: fix the code, not the test. If other tests **fail**: fix them now, before proceeding. #### 5. REFACTOR — Clean Up (After Green Only) - Remove duplication, improve names, extract helpers - Keep all tests green throughout - Do NOT add new behavior during refactor - **Apply design-first here.** Look at the GREEN code in the context of the surrounding subsystem. Did GREEN push you toward a patch (new branch, new wrapper, parallel path) when a small refactor of the existing structure would have been cleaner? If so, refactor now while the test is green. The REFACTOR step is not optional — it is the moment design-first is enforced inside the TDD cycle. See `@../shared/design-first.md` for the discipline. #### 6. REPEAT Go back to Step 1 for the next behavior. ## Decision Rules | If you're about to... | Instead... | Why | |----------------------|-----------|-----| | Write production code without a test | STOP — write the failing test first | Tests written after implementation pass immediately and prove nothing | | Skip testing because the change is "simple" | Write the test — it will be quick if it's truly simple | Simple code has the sneakiest bugs (off-by-one, null edge cases) | | Apply a quick fix without a regression test | Write the test, then fix | Untested fixes become permanent regressions | | Continue with code that wasn't test-driven | Consider rewriting test-first | Sunk cost — untested code is a liability regardless of time spent | ## External Dependency Rules **Principle: test your own dependencies for real; only mock what you don't control.** | Your Dependency | Approach | |----------------|----------| | Own database | Real DB (TestContainers, test instance, SQLite in-memory) | | Own file system | Real temp directory | | Own cache / message queue | Real (TestContainers, embedded) | | External third-party API | Mock / stub acceptable | | Non-deterministic input (time, random) | Inject controlled values | - For projects **without** a database or external I/O: most tests are pure unit tests — no special infra needed - For write operations: verify state after the operation (DB query / file check / store assertion) ## Example ``` Task: Add isPalindrome(str) function 1. RED — Write test: test("isPalindrome returns true for 'racecar'", () => { expect(isPalindrome("racecar")).toBe(true); }); 2. VERIFY RED — Run: npm test ✗ ReferenceError: isPalindrome is not defined ← fails correctly 3. GREEN — Minimal code: function isPalindrome(str) { return str === str.split("").reverse().join(""); } 4. VERIFY GREEN — Run: npm test ✓ isPalindrome returns true for 'racecar' ← passes 42 passed, 0 failed 5. REFACTOR — (no changes needed) 6. REPEAT — next test: edge case with empty string ``` **Test runner detection:** Check `package.json` scripts, `pytest.ini`, `Cargo.toml`, `go.mod`, or `Makefile` for the project's test command before starting the cycle. Use the same runner consistently. ## Verification Checklist Before claiming work is complete: - [ ] Every new function/method has at least one test - [ ] Watched each test fail before implementing - [ ] Each test failed for the expected reason (not errors) - [ ] Wrote minimal code per test (no gold-plating) - [ ] All tests pass with clean output - [ ] Edge cases and error paths covered - [ ] Mocks used only when unavoidable - [ ] Database-touching tests use real database ## When Stuck - Test too complicated to write → design is too complicated, simplify first - Must mock everything → code is too coupled, extract interfaces - Test setup is huge → extract test helpers or fixtures - No test-cases document and unsure what to test → run `/spec-forge:test-cases` first to generate a structured case set