---
name: testing-philosophy
user-invocable: false
description: "Apply testing philosophy: test behavior not implementation, minimize mocks, AAA structure, coverage for confidence not percentage. Use when writing tests, reviewing test quality, discussing TDD, or evaluating test strategies."
allowed-tools:
  - Read
  - Grep
  - Glob
  - Bash
---

# Testing Philosophy

Universal principles for writing effective tests. Language-agnostic—applies across testing frameworks and languages.

## Test Thinking

Before writing tests, commit to a clear approach:

- **What is the ONE behavior this test suite must verify?** If you can't answer clearly, the production code needs refactoring.
- **Behavior or implementation?** Tests should survive refactoring. If you're testing how, not what, you're coupling to implementation.
- **What failure would make you distrust this code?** Test that scenario first.

**CRITICAL**: You are capable of identifying subtle behavioral contracts that most tests miss. Don't write generic happy-path tests—find the edge cases that matter, the error handling that fails silently, the state transitions that corrupt data.

## Core Principle

**Test behavior, not implementation.**

Tests should verify what code does, not how it does it. Implementation can change; behavior should remain stable.

## Test-First Workflow (Canon TDD)

**When to TDD**:
- ✅ Core domain logic, algorithms, business rules
- ✅ Well-defined requirements
- ✅ Production code (not prototypes)
- ✅ AI-assisted development (tests guard against hallucinations)
- ❌ UI prototyping, exploration, fuzzy requirements

**Canon TDD Pattern** (Kent Beck 2024):
1. **Write test list** - enumerate all scenarios (happy, edge, error)
2. **Turn one into failing test** - focus on interface design
3. **Make it pass** - minimal implementation
4. **Refactor** - improve design while green
5. **Repeat** until list empty

**AI-Assisted TDD**:
- AI generates test list from requirements
- AI implements code to pass tests (human reviews)
- Tests are specifications in executable form
- Commit tests separately before implementation

**NEVER test**:
- Private method internals (test through public API)
- Mock call counts unless the count IS the behavior
- Internal state unless state IS the contract
- Framework code (trust the framework)

---

## What and When to Test

### Testing Boundaries

**Test at module boundaries (public API):**

**Unit Tests:**
- Pure functions (deterministic input → output)
- Isolated modules (single unit of behavior)
- Business logic (calculations, validations, transformations)

**Integration Tests:**
- Module interactions (how components work together)
- API contracts (request/response shapes)
- Workflows (multi-step processes)

**E2E Tests:**
- Critical user journeys (end-to-end flows)
- Happy path + critical errors only
- Not every feature needs E2E

### What to Test

✅ **Always test:**
- Public API (what callers depend on)
- Business logic (critical rules, calculations)
- Error handling (failure modes)
- Edge cases (boundaries, null, empty)

❌ **Don't test:**
- Private implementation details
- Third-party libraries (already tested)
- Simple getters/setters (unless they have logic)
- Framework code (trust the framework)

### TDD: Sometimes, When It Adds Value

**Use TDD for:**
- Complex logic (algorithms, business rules)
- Well-defined requirements (you know what to build)
- Critical functionality (high-risk code)

**Skip TDD for:**
- UI prototyping (exploring design)
- Exploratory work (discovering requirements)
- Simple CRUD (straightforward logic)

**When in doubt:** Write test after if TDD feels like overhead.

### Coverage Philosophy: Meaningful > Percentage

**Don't chase coverage percentages.**

✅ **Good coverage:**
- Critical paths tested (happy + error cases)
- Edge cases covered (boundary values, null, empty)
- Confidence in refactoring

❌ **Bad coverage:**
- High % but testing wrong things
- Testing implementation details
- Brittle tests that break on refactor

**Remember:** Untested code is legacy code. But 100% coverage doesn't guarantee quality.

---

## Mocking and Test Structure

### Mocking Philosophy: Minimize Mocks

**Prefer real objects when fast and deterministic.**

**When to Mock:**

**ALWAYS mock:**
- External APIs, third-party services
- Network calls
- Non-deterministic behavior (time, randomness)

**USUALLY mock:**
- Databases (or use in-memory/test DB for integration)
- File system (depends on speed needs)

**SOMETIMES mock:**
- Slow operations (if they slow tests significantly)

**NEVER mock:**
- Your own domain logic (test it directly)
- Simple data structures
- Internal collaborators (modules in your own codebase)

**Red flag:** >3 mocks in a test suggests coupling to implementation.

### Internal vs External: The Mock Boundary

**NEVER mock internal collaborators:**
- Functions/modules in your own codebase (`@/lib/*`, `./utils/*`, `../../convex/lib/*`)
- Custom hooks (`@/hooks/*`)
- Domain logic helpers

**WHY:** Mocking internal code:
- Hides integration bugs between modules
- Couples tests to implementation details
- Creates false confidence ("tests pass but prod breaks")
- Requires test updates when internals change

**INSTEAD:** Mock only at system boundaries:
- Third-party libraries (framework, SDK)
- External APIs (network calls)
- Browser/runtime APIs
- Non-deterministic sources

**Pattern:** If the mock path starts with `@/` or `../`, stop and reconsider.

### Test Structure: AAA (Arrange, Act, Assert)

**Clear three-phase structure:**

```
// Arrange: Set up test data, mocks, preconditions
setup test data
configure mocks
establish preconditions

// Act: Execute the behavior being tested
result = performAction()

// Assert: Verify expected outcome
verify result matches expectation
```

**Guidelines:**
- Visual separation between phases (blank lines)
- One logical assertion per test (can have multiple assert statements for same behavior)
- Keep Arrange simple (complex setup = simplify production code)

### Test Naming: Descriptive Sentences

**Pattern:** "should [expected behavior] when [condition]"

**Examples:**
- "should return total when all items valid"
- "should throw error when user not found"
- "calculateTotal with empty cart returns zero"
- "should retry on network failure"

**Guidelines:**
- Be specific about what's being tested
- State expected behavior clearly
- Don't use "test" prefix (redundant in test files)
- Read like documentation

---

## Exclusions Are Last Resort

Before adding to any exclusion list, exhaust these options:

### Coverage Exclusions

Don't exclude files from coverage as a first response to CI failure.

**Before excluding, try:**
1. Can the function be exported and tested with mocked dependencies?
2. Can code be refactored to separate testable logic from runtime infrastructure?
3. Is there a pattern in the codebase for testing similar code?

**Example:** `convex/http.ts` webhook handlers were initially excluded but are now tested by:
- Exporting handler functions
- Creating mock ActionCtx with vi.fn() for runMutation
- Testing business logic separately from httpAction wrapper

**When exclusion IS appropriate:**
- Truly untestable runtime code (cryptographic verification with no seams)
- Auto-generated code that's not maintained
- Third-party code copied into repo (test at integration level instead)

Always add a comment explaining WHY the exclusion is necessary.

### ESLint Disables

- Fix the code if possible
- Prefer `eslint-disable-next-line` over file-wide disables
- Always add explanation comment: `// eslint-disable-next-line rule-name -- reason`
- Consider: is the linter telling you something important?

### TypeScript Assertions

- `as any` hides type errors; fix the underlying type issue
- `@ts-expect-error` requires explanation comment
- `@ts-ignore` should be avoided (use `@ts-expect-error` instead)
- Consider: is the type system revealing a design flaw?

### Test Skips

- `.skip()` is for temporary WIP, not permanent exclusion
- Flaky tests should be fixed, not skipped
- If a test can't pass, the code or test needs refactoring

---

## Test Quality and Smells

### Test Smells (Anti-Patterns)

❌ **Too many mocks** (>3 mocks)
- Indicates coupling to implementation
- Test becomes brittle, changes with internals

❌ **Brittle assertions**
- Asserting exact strings when substring would work
- Asserting exact ordering when order doesn't matter
- Over-specifying expected values

❌ **Unclear test intent**
- Can't tell what's being tested from reading test
- Vague test names
- Hidden test logic in helpers

❌ **Testing implementation details**
- Testing private methods directly
- Asserting internal state
- Mocking your own classes

❌ **Flaky tests**
- Pass/fail randomly
- Timing dependencies
- Shared mutable state between tests

❌ **Slow tests**
- Unit tests >100ms
- Integration tests >1s
- Slows development feedback loop

❌ **One giant test**
- Testing multiple behaviors in single test
- Hard to understand failures
- Breaks single responsibility for tests

❌ **Magic values**
- Unexplained constants
- Unclear test data
- No context for why values matter

### Test Quality Priorities

**Readable > DRY**

Tests are documentation. Clarity trumps reuse.

✅ **Good test practices:**
- Each test understandable in isolation
- Explicit setup visible in test
- Some duplication okay for clarity
- Descriptive variable names (even if verbose)

❌ **Over-DRY tests:**
- Extract helpers that hide test logic
- Shared setup that obscures what's being tested
- Reuse at expense of clarity

**Test length:**
- No hard limit
- Unit tests: Usually <50 lines
- Integration tests: Can be longer (setup needed)
- Long test? Ask: Testing too much? Simplify production code?

### Edge Cases: Required for Critical Paths

**Always test critical functionality:**
- Boundary values (0, 1, -1, max, min)
- Empty inputs (empty array, empty string, null)
- Error conditions (invalid input, missing data, failures)

**Ask:** "What could break? What do users depend on?"

**Opportunistic edge cases:**
- Nice-to-have features
- Non-critical paths
- When you find bugs (add regression test)

---

## Quick Reference

### Testing Decision Tree

**Should I write a test?**
1. Is this public API? → Yes, test it
2. Is this critical business logic? → Yes, test it
3. Is this error handling? → Yes, test it
4. Is this private implementation? → No, test through public API
5. Is this a framework feature? → No, trust framework
6. Will this test give confidence? → Yes, write it

**Should I use TDD?**
1. Requirements clear? → Consider TDD
2. Complex logic? → Consider TDD
3. Exploring solution? → Skip TDD, test after
4. Simple CRUD? → Skip TDD, test after

**Should I mock this?**
1. External system? → Mock it
2. Non-deterministic? → Mock it
3. My domain logic? → Don't mock, test it
4. >3 mocks already? → Refactor, too coupled

### Test Checklist

**Before writing test:**
- [ ] What behavior am I testing?
- [ ] What's the happy path?
- [ ] What edge cases matter?
- [ ] Can I test this without mocks?

**After writing test:**
- [ ] Is test name descriptive?
- [ ] Is AAA structure clear?
- [ ] Does test test behavior (not implementation)?
- [ ] Will test break only if behavior changes?
- [ ] Is test fast (<100ms for unit)?

---

## Philosophy

**"Tests are a safety net, not a security blanket."**

Good tests give confidence to refactor. Bad tests give false confidence and slow development.

**Test the contract, not the implementation:**
- Contract: What code promises to do
- Implementation: How code does it

**Tests should:**
- Verify behavior works
- Document how to use code
- Enable refactoring with confidence
- Fail only when behavior breaks

**Tests should NOT:**
- Duplicate production code
- Test framework features
- Prevent all refactoring
- Replace thinking about design

**Remember:** The goal is confidence, not coverage. Write tests that make you confident the code works, not tests that make metrics happy.