--- name: writing-quality-tests description: Guide for writing robust, high-signal automated tests. Use when asking "How do I test this effectively?", fixing flaky tests, refactoring test suites, or deciding between unit/integration/E2E strategies. distinct from TDD (process); this skill focuses on test quality and architecture. license: Complete terms in LICENSE.txt metadata: author: eder version: "1.0" --- # Writing Quality Tests ## Overview High-signal tests prove behavior, not implementation. While TDD focuses on the *process* of writing tests first, this skill focuses on the *artifact*—making tests stable, explicit, and valuable long-term. **Core rule:** If a test is nondeterministic or tied to internals, it is debt. Fix it. ## When to Use - **New features**: "I need to add tests for this new API endpoint/component." - **Bug fixes**: "Help me write a regression test for this bug before fixing it." - **Flaky tests**: "This test fails randomly on CI. How do I make it deterministic?" - **Refactoring**: "I want to refactor this legacy code but the tests are brittle. How do I improve them first?" - **Slow tests**: "The test suite takes too long. How can I speed it up or mock dependencies effectively?" - **Test Design**: "Should I use a unit test or integration test for this logic?" - **Review**: "Check these tests for maintainability, coverage, and clarity." - Not for manual exploratory testing or load/perf-only work. ## Non-Negotiables - Deterministic: same input -> same result; no hidden time/network randomness - Behavioral oracles: assertions map to business behavior or contracts, never incidental internals - Minimal coupling: tests fail for product changes, not helper refactors - Focused scope: one behavior per test; isolated fixtures; clear names - Fast feedback: prefer fast layers; cache expensive setup; parallelize safely ## Workflow 0) Prove it fails: capture the regression input or wished-for case and watch the test fail first (or reproduce the bug) before code changes. 1) Clarify behavior: preconditions, action, postconditions, invariants. Capture regression input if fixing a bug. 2) Pick level: unit for pure logic; contract for external calls; integration for seams; E2E only to prove flows or contracts end-to-end. 3) Design oracle: assert outputs, state, events, and invariants; avoid implementation details or transient UI. 4) Shape fixtures: use builders/factories; avoid globals; randomize with seeds only when helpful and log the seed. 5) Write the test: AAA (arrange-act-assert) or GWT; table-driven for variants; property-based for algebraic invariants. 6) Validate: run focused test first, then suite. If flaky, hunt nondeterminism (time, randomness, order, network) and remove it. 7) Document intent: name states behavior; failure message points to the expected contract. ## Patterns to Prefer - Boundary and mutation pairs: min/max/empty/null plus one mutated variation to prove invariants. - Table-driven cases: enumerate input/output pairs to avoid duplicate tests and improve diffability. - Property-based checks: algebraic properties (idempotence, reversibility, ordering), round-trips, monotonic counters. - Contracts at seams: mock at boundaries you own; for third-party calls, pin to contract tests or recorded fixtures. - Guarded goldens: only for complex structured output; require explicit review of golden updates. ## Coverage Strategy - Coverage is opt-in: never run coverage unless explicitly requested by the user in the current session (e.g., "improve coverage on file X to Y%"). PM/teammate/CI pressure does not override this rule. - Pyramid discipline: many unit tests, fewer integration, very few E2E. Use E2E to prove cross-service flow or UI contract. - Change-based coverage: every test should fail without the code change and pass with it; capture the regression input/output. - Critical paths first: auth, billing, migrations, data loss, irreversible actions. Add invariants that must never be violated. - Data and time: cover time zones, DST, leap years, ordering, pagination, idempotency, and retry semantics. - Observability: log seeds for randomized tests; emit diagnostics on failure (inputs, seed, environment versions). **Example (explicit coverage request):** User: "improve coverage on file X to 80%". Run targeted coverage for that file only, add behavior-driven tests to hit missing branches, and avoid coverage runs outside that request. ```bash pytest --cov=path/to/file.py --cov-report=term-missing ``` ## Flake Prevention - Remove time races: replace sleeps with waits on explicit conditions; freeze or inject clocks. - Isolate state: fresh fixtures per test; unique temp dirs/ports; clean databases; no shared singletons. - Control randomness: seed RNG, capture seed in failure output, prefer deterministic builders. - Network and IO: stub external calls; if unavoidable, record/replay; set tight timeouts and retries with jitter disabled in tests. - Parallel safety: ensure fixtures are parallel-safe or mark tests serial; avoid global mutable state. ## Review Checklist - Name states behavior and level (e.g., "adds item to cart (integration)"). - Single reason to fail; assertions map to user-visible behavior or contract. - Fixtures minimal and local; builders hide irrelevant details; no shared hidden state. - Negative and edge cases present; regression case for the original bug captured. - Tests run quickly; slow/expensive flows justified and focused. ## Hygiene (adaptable patterns) - Structure: Given–When–Then or AAA so intent is obvious. - Hypothesis: fix generators or code instead of suppressing health checks; log seeds for repro. - Async correctness: use real async paths/fakes; don’t hide missing awaits with sync doubles. - Assertion scope: assert behavior/contract fields; avoid brittle full-payload snapshots unless testing a contract. - Coverage as health, not blocker: focus on low-coverage behavior-heavy files; be pragmatic with legacy or infra-heavy areas. ### Marks (for selective runs) - unit: isolated logic with external deps mocked - contract/integration: cross-component seams with real wiring or adapters - async: true async paths; avoid sync fakes masking awaits - property: Hypothesis-based invariants in dedicated property files - slow: >1s or real infra; justify and keep focused ## Common Anti-Patterns - Brittle UI or text snapshots without intent; prefer semantic assertions or scoped snapshots. - Over-mocking internals; mocking within the module under test; asserting call order that is not part of the contract. - Sleep-based waits; reliance on wall-clock time; unseeded randomness. - Combined scenarios covering multiple behaviors in one test; global fixtures that hide setup. - Golden files updated blindly; tests that assert logging implementation rather than outcomes. - Running coverage by default instead of waiting for explicit coverage requests. ## Red Flags - Stop and Fix - Tests pass or fail intermittently - Assertions tied to private methods or call order instead of observable behavior - Unseeded randomness, sleeps instead of explicit waits, or shared mutable fixtures - Golden updates accepted without review of intent - A test never failed before the code change - Running coverage without the user explicitly asking - Running coverage due to PM/teammate/CI pressure