---
name: testing-patterns
description: |
  Agent-based declarative testing with YAML test specs. Tests run in sub-agents to preserve
  main context while executing many tests. Supports MCP servers, APIs, and browser automation.

  Use when: testing MCP servers, running integration tests, validating tool behavior after changes,
  or creating regression test suites. Keywords: yaml tests, agent testing, mcp test, integration tests.
license: MIT
---

# Testing Patterns

A pragmatic approach to testing that emphasises:
- **Live testing** over mocks
- **Agent execution** to preserve context
- **YAML specs** as documentation and tests
- **Persistent results** committed to git

## Philosophy

This is **not traditional TDD**. Instead:

1. **Test in production/staging** with good logging
2. **Use agents** to run tests (keeps main context clean)
3. **Define tests declaratively** in YAML (human-readable, version-controlled)
4. **Focus on integration** (real servers, real data)

### Why Agent-Based Testing?

Running 50 tests in the main conversation would consume your entire context window. By delegating to a sub-agent:

- Main context stays clean for development
- Agent can run many tests without context pressure
- Results come back as a summary
- Failed tests get detailed investigation

## Commands

| Command | Purpose |
|---------|---------|
| `/create-tests` | Discover project, generate test specs + testing agent |
| `/run-tests` | Execute tests via agent(s), report results |
| `/coverage` | Generate coverage report and identify uncovered code paths |

**Quick workflow:**
```
/create-tests        → Generates tests/specs/*.yaml + .claude/agents/test-runner.md
/run-tests           → Spawns agent, runs all tests, saves results
/run-tests api       → Run only specs matching "api"
/run-tests --failed  → Re-run only failed tests
/coverage            → Run tests with coverage, analyse gaps
/coverage --threshold 80  → Fail if below 80%
```

## Getting Started in a New Project

This skill provides the **pattern and format**. Claude designs the actual tests based on your project context.

**What happens when you ask "Create tests for this project":**

1. **Discovery** - Claude examines the project:
   - What MCP servers are configured?
   - What APIs or tools exist?
   - What does the code do?

2. **Test Design** - Claude creates project-specific tests:
   - Test cases for the actual tools/endpoints
   - Expected values based on real behavior
   - Edge cases relevant to this domain

3. **Structure** - Using patterns from this skill:
   - YAML specs in `tests/` directory
   - Optional testing agent in `.claude/agents/`
   - Results saved to `tests/results/`

**Example:**

```
You: "Create tests for this MCP server"

Claude: [Discovers this is a Google Calendar MCP]
        [Sees tools: calendar_events, calendar_create, calendar_delete]
        [Designs test cases:]

        tests/calendar-events.yaml:
        - list_upcoming_events (expect: array, count_gte 0)
        - search_by_keyword (expect: contains search term)
        - invalid_date_range (expect: error status)

        tests/calendar-mutations.yaml:
        - create_event (expect: success, returns event_id)
        - delete_nonexistent (expect: error, contains "not found")
```

**The skill teaches Claude:**
- How to structure YAML test specs
- What validation rules are available
- How to create testing agents
- When to use parallel execution

**Your project provides:**
- What to actually test
- Expected values and behaviors
- Domain-specific edge cases

## YAML Test Spec Format

```yaml
name: Feature Tests
description: What these tests validate

# Optional: defaults applied to all tests
defaults:
  tool: my_tool_name
  timeout: 5000

tests:
  - name: test_case_name
    description: Human-readable purpose
    tool: tool_name  # Override default if needed
    params:
      action: search
      query: "test input"
    expect:
      contains: "expected substring"
      not_contains: "should not appear"
      status: success
```

### Validation Rules

| Rule | Description | Example |
|------|-------------|---------|
| `contains` | Response contains string | `contains: "from:john"` |
| `not_contains` | Response doesn't contain | `not_contains: "error"` |
| `matches` | Regex pattern match | `matches: "after:\\d{4}"` |
| `json_path` | Check value at JSON path | `json_path: "$.results[0].name"` |
| `equals` | Exact value match | `equals: "success"` |
| `status` | Check success/error | `status: success` |
| `count_gte` | Array length >= N | `count_gte: 1` |
| `count_eq` | Array length == N | `count_eq: 5` |
| `type` | Value type check | `type: array` |

See `references/validation-rules.md` for complete documentation.

## Creating a Testing Agent

Testing agents inherit MCP tools from the session. Create an agent that:

1. Reads YAML test specs
2. Executes tool calls with params
3. Validates responses against expectations
4. Reports results

### Agent Template

**CRITICAL**: Do NOT specify a `tools` field if you need MCP access. When you specify ANY tools, it becomes an allowlist and `"*"` is interpreted literally (not as a wildcard). Omit `tools` entirely to inherit ALL tools from the parent session.

```yaml
---
name: my-tester
description: |
  Tests [domain] functionality. Reads YAML test specs and validates responses.
  Use when: testing after changes, running regression tests.
# tools field OMITTED - inherits ALL tools from parent (including MCP)
model: sonnet
---

# [Domain] Tester

## How It Works

1. Find test specs: `tests/*.yaml`
2. Parse and execute each test
3. Validate responses
4. Report pass/fail summary

## Test Spec Location

tests/
├── feature-a.yaml
├── feature-b.yaml
└── results/
    └── YYYY-MM-DD-HHMMSS.md

## Execution

For each test:
1. Call tool with params
2. Capture response
3. Apply validation rules
4. Record PASS/FAIL

## Reporting

Save results to `tests/results/YYYY-MM-DD-HHMMSS.md`
```

See `templates/test-agent.md` for complete template.

## Results Format

Test results are saved as markdown for git history:

```markdown
# Test Results: feature-name
**Date**: 2026-02-02 14:30
**Commit**: abc1234
**Summary**: 8/9 passed (89%)

## Results

- test_basic_search - PASSED (0.3s)
- test_with_filter - PASSED (0.4s)
- test_edge_case - FAILED

## Failed Test Details

### test_edge_case
- **Expected**: Contains "expected value"
- **Actual**: Response was empty
- **Params**: `{ action: search, query: "" }`
```

Save to: `tests/results/YYYY-MM-DD-HHMMSS.md`

## Workflow

### 1. Create Test Specs

```yaml
# tests/search.yaml
name: Search Tests
defaults:
  tool: my_search_tool

tests:
  - name: basic_search
    params: { query: "hello" }
    expect: { status: success, count_gte: 0 }

  - name: filtered_search
    params: { query: "hello", filter: "recent" }
    expect: { contains: "results" }
```

### 2. Create Testing Agent

Copy `templates/test-agent.md` and customise for your domain.

### 3. Run Tests

```
"Run the search tests"
"Test the API after my changes"
"Run regression tests for gmail-mcp"
```

### 4. Review Results

Results saved to `tests/results/`. Commit them for history:

```bash
git add tests/results/
git commit -m "Test results: 8/9 passed"
```

## Parallel Test Execution

Run multiple test agents simultaneously to speed up large test suites:

```
"Run these test suites in parallel:
- Agent 1: tests/auth/*.yaml
- Agent 2: tests/search/*.yaml
- Agent 3: tests/api/*.yaml"
```

Each agent:
- Has its own context (won't bloat main conversation)
- Can run 10-50 tests independently
- Returns a summary when done
- Inherits MCP tools from parent session

**Why parallel agents?**
- 50 tests in main context = context exhaustion
- 50 tests across 5 agents = clean context + faster execution
- Each agent reports pass/fail summary, not every test detail

**Batching strategy:**
- Group tests by feature area or MCP server
- 10-20 tests per agent is ideal
- Too few = overhead of spawning not worth it
- Too many = agent context fills up

## MCP Testing

For MCP servers, the testing agent inherits configured MCPs:

```bash
# Configure MCP first
claude mcp add --transport http gmail https://gmail.mcp.example.com/mcp

# Then test
"Run tests for gmail MCP"
```

Example MCP test spec:

```yaml
name: Gmail Search Tests
defaults:
  tool: gmail_messages

tests:
  - name: search_from_person
    params: { action: search, searchQuery: "from John" }
    expect: { contains: "from:john" }

  - name: search_with_date
    params: { action: search, searchQuery: "emails from January 2026" }
    expect: { matches: "after:2026" }
```

## API Testing

For REST APIs, use Bash tool:

```yaml
name: API Tests
defaults:
  timeout: 5000

tests:
  - name: health_check
    command: curl -s https://api.example.com/health
    expect: { contains: "ok" }

  - name: get_user
    command: curl -s https://api.example.com/users/1
    expect:
      json_path: "$.name"
      type: string
```

## Browser Testing

For browser automation, use Playwright tools:

```yaml
name: UI Tests

tests:
  - name: login_page_loads
    steps:
      - navigate: https://app.example.com/login
      - snapshot: true
    expect: { contains: "Sign In" }

  - name: form_submission
    steps:
      - navigate: https://app.example.com/form
      - type: { ref: "#email", text: "test@example.com" }
      - click: { ref: "button[type=submit]" }
    expect: { contains: "Success" }
```

## Tips

1. **Start with smoke tests**: Basic connectivity and auth
2. **Test edge cases**: Empty results, errors, special characters
3. **Use descriptive names**: `search_with_date_filter` not `test1`
4. **Group related tests**: One file per feature area
5. **Add after bugs**: Every fixed bug gets a regression test
6. **Commit results**: Create history of test runs

## What This Is NOT

- Not a Jest/Vitest replacement (use those for unit tests)
- Not enforcing TDD (use what works for you)
- Not a test runner library (the agent IS the runner)
- Not about mocking (we test real systems)

## When to Use

| Scenario | Use This | Use Traditional Testing |
|----------|----------|------------------------|
| MCP server validation | Yes | No |
| API integration | Yes | Complement with unit tests |
| Browser workflows | Yes | Complement with component tests |
| Unit testing | No | Yes (Jest/Vitest) |
| Component testing | No | Yes (Testing Library) |
| Type checking | No | Yes (TypeScript) |

## Related Resources

- `templates/test-spec.yaml` - Generic test spec template
- `templates/test-agent.md` - Testing agent template
- `references/validation-rules.md` - Complete validation rule reference