# Test-Design Skill Spec > [🇬🇧 English](./SKILL-DESIGN.md) • [🇯🇵 日本語](./SKILL-DESIGN.ja.md) How kiwa structures **Claude Code skills for automating and standardizing test design** across contract (Foundry / Hardhat) and dApp e2e (Playwright + kiwa fixture) layers. This document is the single source of truth referenced by Phase E implementation PRs. ## TL;DR Test design is split into 3 layers: | Layer | Skill (planned) | Purpose | |---|---|---| | **1. Generic test design** | `/kiwa-design` | Given a feature spec / API / screen / code / DB schema, produce risk list, test viewpoints, test cases, priorities, automation policy | | **2. Test-runner specialization** | `/kiwa-forge`, `/kiwa-hardhat`, `/kiwa-play` (refactor) | Convert Layer 1 output into actual `*.t.sol` / `*.test.ts` / `*.spec.ts` files | | **3. Documentation** | docs cookbook + skill reference | Show how to chain the layers for a complete dApp test pyramid | A single user-facing skill invocation walks the full 5-step flow: ```mermaid flowchart LR A[Skill invocation] --> B[Organize input] B --> C[Identify quality risks] C --> D[Select test viewpoints] D --> E[Generate test cases] E --> F[Prioritize] F --> G[Decide automation strategy] ``` --- ## Why this spec exists Test design is fundamentally a **specification activity**, not a coding activity. The team would lose hours per feature if every developer rewrote "what to test" from scratch. kiwa already proved that test-spec-first design (`Step 1.5` in `/kiwa-play`) reduces false positives and accelerates implementation. Phase E generalizes that pattern across **contract / integration / e2e** layers and across **Foundry / Hardhat / Playwright** runners. Phase E does not invent new test taxonomy. It standardizes how Claude Code skills emit: - Risk-based test selection - Viewpoint coverage (normal / abnormal / boundary / state-transition / permission / validation / idempotency / concurrency / performance / security) - Test cases in a uniform `TC-XXX` table - Priority (high / medium / low) by impact criteria - Automation recommendation (auto / manual + tooling) ## 5-step skill flow Every test-design skill produces the following sections **in this order**. ### 1. Organize input The skill must enumerate, for the target feature: - Feature name + one-sentence summary - User actions (UI / CLI / API client perspective) - API contract (HTTP method / path / request / response) - DB updates (tables touched, columns mutated, transaction boundary) - Permission model (roles, scopes, multi-tenant isolation) - External integrations (third-party APIs, blockchain RPC, webhooks) - Failure modes (timeouts, retries, partial state, idempotency keys) If any of the above is missing from the spec, list it under **"insufficient spec"** in the output. The skill must not invent missing values. ### 2. Identify quality risks For each input element, score the risk by 5 criteria (high / medium / low for each): | Criterion | High example | |---|---| | Revenue impact | Checkout / billing / settlement | | Security impact | Auth bypass, signature forgery | | Data destruction risk | Irreversible write, no soft delete | | Usage frequency | Every page load, every transaction | | Past incident history | Bug filed in the last 6 months | The skill emits a **risk summary table** and feeds it into Step 3. ### 3. Select test viewpoints For each feature, select applicable viewpoints from the 10-item catalog: | # | Viewpoint | Apply when | |---|---|---| | 1 | Happy path | Always | | 2 | Failure path | Any external dependency exists | | 3 | Boundary value | Numeric input, string length, time range | | 4 | State transition | State machine, status field, finite states | | 5 | Permission | Auth-gated routes, role-based UI | | 6 | Input validation | User-typed input, API payload | | 7 | Idempotency | Webhook handler, payment flow, blockchain tx | | 8 | Concurrency | Race conditions, multi-tab, multi-user | | 9 | Performance | High-traffic endpoint, large payload | | 10 | Security | Auth, signing, encryption, secret handling | Selected viewpoints become test-case categories in Step 4. ### 4. Generate test cases Each test case is one row in the unified output table: | Field | Content | |---|---| | Test ID | `TC-001` | | Test level | Unit / Integration / E2E | | Viewpoint | Boundary value | | Precondition | User logged in | | Input | Name string at exact character limit | | Steps | Call `PUT /api/profile` with the input | | Expected | 200 OK, DB stores correctly normalized value | | Priority | High | | Automation | Recommended | The skill emits one such row per case. Cases are grouped by viewpoint and sorted by priority within each group. ### 5. Prioritize + automation strategy Priority assignment is based on the risk summary from Step 2 + the viewpoint from Step 3: - **High**: Any cell scored "high" in revenue / security / data destruction - **Medium**: At least one "high" in frequency / past incidents - **Low**: All criteria "low" Automation defaults by test level: - **Unit tests**: Always automate (fast feedback, deterministic) - **Integration tests**: Automate the primary API paths, skip exhaustive edge cases unless production-critical - **E2E tests**: Automate only the critical user flows (login, checkout, on-chain transaction), defer rarely-used flows to manual verification The skill's final output appends: - "Tests recommended for automation" — sorted by priority - "Tests okay for manual verification" — explanation per case - "Insufficient spec" — bullets the skill could not resolve ## Output format Every test-design skill emits this markdown skeleton: ```markdown ## Target feature ## Spec summary ## Main quality risks ## Recommended test composition ## Test viewpoints ## Test cases ## Tests recommended for automation ## Tests okay for manual verification ## Insufficient spec ``` These 9 sections are mandatory. Order is fixed. Skills must produce empty `(none)` placeholders rather than omit sections. ## Layer 2 specialization When Layer 1 (`/kiwa-design`) finishes, Layer 2 skills convert the generic output into runner-specific code: | Layer 2 skill | Conversion target | |---|---| | `/kiwa-forge` | `test/*.t.sol` files, `forge test` execution, `forge coverage` evaluation | | `/kiwa-hardhat` | `test/*.test.ts` files, `npx hardhat test` execution, `hardhat-coverage` evaluation | | `/kiwa-play` (refactored) | `tests/*.spec.ts` + `tests/prepare-env.ts`, Playwright execution, 4-round flake check | Each Layer 2 skill consumes a Layer 1 spec file (`.context/spec/test-spec-{module}.md`) and writes implementation. The Layer 1 spec acts as the contract between design and implementation. ## Skill prompt template All test-design skills use this skeleton in `SKILL.md`: ```text You are a specialist in application test design. Given the inputs (spec, code, API definitions, screen layouts), identify quality risks and produce a complete test plan. Always perform these steps: 1. Summarize the spec. 2. Identify quality risks. 3. Classify into unit / integration / E2E layers. 4. Cover viewpoints: normal / abnormal / boundary / state transition / permission / security / performance / regression. 5. For each test case emit: precondition, input, steps, expected outcome, priority, automation recommendation. 6. Flag any spec gaps that prevent confident testing. 7. Sort output by priority, high first. ``` Layer 2 skills extend the prompt with runner-specific instructions (e.g. "convert Layer 1 cases into Foundry `forge test` patterns including invariant / fuzz where viewpoint = boundary"). ## Use cases this serves The same 9-section output covers all four review activities: | Activity | How the spec helps | |---|---| | Design review | Reviewers verify the test plan before implementation starts | | Pre-implementation review | TDD-style "write tests first" works directly from the spec | | PR review | Reviewer checks that the PR covers all High-priority cases | | QA viewpoint check | QA team has a categorized checklist instead of ad-hoc exploration | ## Roadmap | Phase | Scope | Target file | |---|---|---| | **E-1** | SKILL-DESIGN.md spec (this document) | `docs/SKILL-DESIGN.md` | | **E-2** | `/kiwa-design` skill (Layer 1) | `.claude/skills/kiwa-design/` | | **E-3** | `/kiwa-play` refactor (Layer 2 e2e) | Existing skill, integrate Layer 1 | | **E-4** | `/kiwa-forge` skill | `.claude/skills/kiwa-forge/` | | **E-5** | `/kiwa-hardhat` skill | `.claude/skills/kiwa-hardhat/` | | **E-6** | Cookbook chapter linking the layers | `docs/{ja,en}/cookbook/kiwa-design-flow.md` | Phases are sequenced D → A → B style: spec → most valuable skill (Layer 1) → integration into the existing e2e skill, then community contributions for Foundry / Hardhat runners. ## Out of scope - Test execution scheduling (CI integration is left to the user's CI tool) - Mutation testing or formal verification (separate tooling) - Test data generation libraries (use faker / fast-check / fuzz harnesses as the user prefers) - Cross-skill memory / state (each skill invocation is stateless except for the optional `.context/spec/` artifact) ## See also - [`docs/MOCK-DESIGN.md`](./MOCK-DESIGN.md) — Wallet / SDK mock fidelity spec (related concept for "what to fake") - [`.claude/skills/kiwa-play/SKILL.md`](../.claude/skills/kiwa-play/SKILL.md) — Existing e2e skill that already follows part of this flow - [`.claude/skills/kiwa-play/references/adversarial-pitfalls.md`](../.claude/skills/kiwa-play/references/adversarial-pitfalls.md) — Self-check checklist for false positives