# Skill chain tutorial โ€” generate contract tests + e2e tests with four skills > [๐Ÿ‡ฌ๐Ÿ‡ง English](./skill-chain-tutorial.md) โ€ข [๐Ÿ‡ฏ๐Ÿ‡ต ๆ—ฅๆœฌ่ชž](./skill-chain-tutorial.ja.md) Walk through kiwa's skill chain (`/kiwa-design` at Layer 1 โ†’ `/kiwa-forge` / `/kiwa-hardhat` at Layer 2 โ†’ `/kiwa-play` at Layer 3) end-to-end โ€” **generate the spec, produce contract tests + e2e tests, and run them**. The worked example uses the real `examples/nextjs-token-gating` contracts. Written from the retrofit perspective (adding tests to a dApp + contracts that already ship), but the same steps apply to fresh TDD-first development too. Treat the existing implementation as the de-facto specification, infer the viewpoints from it, and fill in the missing tests. ## Overall diagram ```mermaid graph TD A[Existing contract + dApp shipping] -->|current code| B["/kiwa-design Layer 1"] B -->|--layer contract --input *.sol| C[tests/spec/contract/test-spec-X.md] B -->|--layer e2e --input app/ + tests/| D[tests/spec/e2e/test-spec-X.md] C -->|9-column table parse| E["/kiwa-forge"] C -->|9-column table parse| F["/kiwa-hardhat"] D -->|9-column table parse| G["/kiwa-play --mode extend"] E --> H[Write test/X.t.sol after the fact] F --> I[Write test/X.test.ts after the fact] G --> J[Write tests/X.spec.ts after the fact] H -->|forge test| K[real contract behavior == test PASS] I -->|npx hardhat test| L[real contract behavior == test PASS] J -->|playwright test 4 rounds| M[existing UI behavior == test PASS] ``` Core idea of the 3-layer chain โ€” the **9-column table inside the Layer 1 output (`tests/spec/{contract,e2e}/test-spec-{module}.md`) acts as the single source of truth**. The three Layer 2 skills (Foundry / Hardhat / Playwright) read the same file and mechanically translate it into runner-specific helpers. In the retrofit flow the "target functionality" section is populated by grep extraction from the existing code. ## Full worked example: nextjs-token-gating (retrofitting an existing dApp) ### Step 0: Survey the existing contract / dApp `examples/nextjs-token-gating/` already contains: ```bash ls examples/nextjs-token-gating/contracts/ tests/ # contracts/: GateNFT.sol + GatedContent.sol # tests/: gating.spec.ts (existing e2e tests) + prepare-env.ts + fixture.ts ``` Functions / errors in the real contracts (what the skill greps for you): ```bash grep -E "function |event |error " contracts/*.sol # GateNFT.sol: mint() / transferFrom() / NotOwner / InvalidRecipient # GatedContent.sol: getSecret() / grantTimedAccess(user, ttl) / hasAccess() / isGated() # NotGated / InvalidTtl / Accessed / TimedAccessGranted ``` Existing test count: ```bash grep -cE "^test\(|^test\.describe\(" tests/*.spec.ts # gating.spec.ts: 8 tests (T-GT-000 through T-GT-007) ``` There is no formal spec โ€” the "behavioral spec of the contract" is scattered across docstrings and the actual code. Retrofit testing here "writes the behavior down + adds missing viewpoints". ### Step 1: Reverse-engineer a contract-side spec with Layer 1 ```text /kiwa-design --layer contract --module token-gating --input examples/nextjs-token-gating/contracts/GatedContent.sol The Layer 1 skill: - reads the input .sol and greps for function / event / error - also inspects the sibling contract (GateNFT.sol) through the IGateNFT interface - reverse-engineers "target functionality" / "spec summary" / "permission model" / "failure modes" from docstrings and real code - scores quality risk on the 5 criteria, decides apply/skip for each of the 13 viewpoints - generates a 9-column table at one case per row, grouped by viewpoint, sorted by priority ``` `tests/spec/contract/test-spec-token-gating.md` is produced. The "Insufficient spec" section records the docstring ambiguities (cleanup timing for `timedAccessExpiry`, the duplicate-grant behavior, presence of a max supply, pause functionality). ### Step 2: Reverse-engineer an e2e-side spec with Layer 1 ```text /kiwa-design --layer e2e --module token-gating --input examples/nextjs-token-gating/ The Layer 1 skill: - reads the existing tests/gating.spec.ts and treats the 8 existing tests as "current coverage" - extracts the relationship between contract and UI (app/page.tsx or the inline HTML fixture) - records viewpoints not covered by the existing tests (partial permission verification, multi-grantee simultaneous expiry, self-grant bypass) as "new tests to add" ``` `tests/spec/e2e/test-spec-token-gating.md` is produced, listing existing test IDs (T-GT-NNN) alongside the new test IDs (TC-NNN). ### Step 3: Write contract tests after the fact with Layer 2 (Foundry) ```text /kiwa-forge --module token-gating --gas-report ``` The skill: - reads `tests/spec/contract/test-spec-token-gating.md` - translates viewpoint groupings (1 happy / 2 failure / 3 boundary / 4 state / 5 permission / 10 security) into Solidity test functions, annotated with `// ่ฆณ็‚น N: {name}` comments - viewpoint 3 โ†’ `testFuzz_grantTimedAccess_Boundary` (`bound(ttl, 1, 365 days)`), viewpoint 4 โ†’ `invariant_TimedAccessExpiryNonZero` + Handler, viewpoint 10 โ†’ `test_SelfGrantBypassDefense` + `test_TransferRevokesAll_MultiGrantee` - **writes `test/GatedContent.t.sol` after the fact** (creates a new file because no prior .t.sol exists; otherwise lives alongside existing files) - runs `forge build` to compile, then `forge test --gas-report` - **tests PASS** = the current behavior of the real contract is now recorded as the test's "expected outcome" - **tests FAIL** = a bug surfaces (docstring vs real code mismatch, or a misreading in the spec) ### Step 3': Write contract tests after the fact with Layer 2 (Hardhat, in parallel) ```text /kiwa-hardhat --module token-gating --gas-report ``` Reads the same `tests/spec/contract/test-spec-token-gating.md` and writes `test/GatedContent.test.ts` in Hardhat shape, in parallel. Foundry-leaning and Hardhat-leaning developers can hold **parallel tests with the same test IDs** sourced from one spec. Viewpoints and case IDs (TC-001 through TC-013) line up across both layers. ### Step 4: Extend the e2e tests with Layer 2 in `extend` mode (Playwright) ```text /kiwa-play --mode extend --example nextjs-token-gating ``` The skill: - in Step 1.5.B, reads `tests/spec/e2e/test-spec-token-gating.md` - recognises the 8 existing tests (T-GT-000 through T-GT-007) as "current coverage" and guarantees zero regression - **appends** the missing viewpoints (partial permission verification, multi-grantee simultaneous expiry, self-grant bypass) as new tests (TC-008 onwards) into `tests/gating.spec.ts` - runs `pnpm test` four times in a row to confirm zero flakes (all 8 existing + N new tests pass) ### Step 5: Coverage evaluation (required, completion blocked on shortfall) `forge test` / `npx hardhat test` passing is **not enough** for completion; the chain must also measure coverage and check thresholds. As an OSS-publication baseline the default thresholds are: | Metric | Default threshold | Rationale | |---|---|---| | Lines | 90% | Cover the primary paths fully | | Statements | 90% | Statement-level coverage | | **Branches** | **80%** | 100% on Solidity require/revert/short-circuit is impractical | | Funcs | 90% | Cover every public / external function | ```bash # Foundry forge coverage --report summary # Hardhat npx hardhat coverage ``` Measured values across kiwa's three real examples (achieved in PR #185): | Example | Runner | Lines | Statements | Branches | Funcs | |---|---|---|---|---|---| | nextjs-token-gating | Foundry | 100% | 97.73% | 87.50% | 100% | | mint-nft | Foundry | 97.70% | 94.57% | 83.33% | 95.24% | | mint-nft | Hardhat | 93.75% | 92.86% | 80.56% | 100% | | defi-swap | Foundry | 100% | 97.62% | 87.50% | 100% | All four metrics clear the default threshold (Lines/Stmts/Funcs โ‰ฅ 90%, Branches โ‰ฅ 80%), satisfying the publication baseline. #### Action when coverage falls short | Result | Action | |---|---| | All four metrics โ‰ฅ threshold | Done, write the test-passed marker | | Any metric below threshold | **Not done**. Append "{metric} {N}% < {threshold}% under-covered" to the Layer 1 "Insufficient spec" section, and enumerate uncovered viewpoints / error paths / events as bullets | | `forge coverage` / `hardhat coverage` itself fails | Do not treat the test pass as completion; report the cause (silent skip is forbidden) | ### Step 6: The completed test pyramid After Step 5 coverage success: | Layer | Runner | Output file | Viewpoints covered | Existing vs new | |---|---|---|---|---| | contract unit | Foundry | `test/GatedContent.t.sol` | All 10 (fuzz + invariant) | All new (no prior .t.sol) | | contract unit | Hardhat | `test/GatedContent.test.ts` | All 10 (fast-check + chai) | All new (no prior .test.ts) | | dApp e2e | Playwright | `tests/gating.spec.ts` | 1 / 2 / 4 / 5 / 10 | 8 existing + N new (extend) | The existing e2e tests are preserved while the viewpoint gaps are filled, and brand-new contract tests are added. The same Layer 1 spec keeps test IDs in sync across both layers. ## Viewpoint ร— helper mapping cheat sheet A quick reference for the 3 layers ร— 13 viewpoint helper mapping: | Viewpoint | Foundry | Hardhat | Playwright | |---|---|---|---| | 1. Happy path | `test_*` | `it()` + chai expect | `test()` happy path | | 2. Failure path | `vm.expectRevert(Error.selector)` | `revertedWithCustomError(c, 'Error')` | mock RPC injection (`createRpcHandler`) | | 3. Boundary | `testFuzz_*` + `vm.assume` / `bound` | `fast-check` `asyncProperty` | parameterized `test.describe.each` | | 4. State transition | `invariant_*` + Handler pattern | `beforeEach` state seed + `describe.each` | seed state via Playwright fixture | | 5. Permission | `vm.prank(role)` | `c.connect(signer)` | switch wallet account (`makeClients(port, OTHER_PK)`) | | 6. Input validation | `testFuzz_*` + revert assertion | `fc.string()` + revert assertion | `getByTestId` form assertion | | 7. Idempotency | call twice โ†’ second `vm.expectRevert` | call twice โ†’ second `expect(...)` revert | retry test (`test.describe.serial`) | | 8. Concurrency | tx ordering test (`vm.warp`) | `Promise.allSettled([tx1, tx2])` | multi-tab (`context.newPage()`) | | 9. Performance | `forge test --gas-report` | `hardhat-gas-reporter` | Playwright trace + perf metrics | | 10. Security | `invariant_NoReentrancy` + `vm.signature` | signature recovery + role assertion | end-to-end signature flow (`verifyMessage`) | For the full reference, see each Layer 2 skill's `references/{foundry,hardhat,playwright}-mapping.md`. ## Retrofitting vs new (TDD-style) development | Comparison | New development (TDD) | Retrofitting (the primary use case in this chapter) | |---|---|---| | `/kiwa-design` input | A feature spec document (no code yet) | Existing `.sol` / `app/` / `tests/` passed via `--input`, mined by grep | | Layer 1 "target functionality" section | Authored from the spec | Summarised from grep results | | Layer 1 "insufficient spec" section | Lists ambiguities from the spec | Lists undocumented behavior / implicit assumptions / untested paths | | When Layer 2 runs `forge test` | Expects RED (tests are written first) | Expects PASS (records the real behavior as canonical) | | When tests FAIL | Fix the implementation (TDD GREEN) | A bug surfaces (existing behavior differs from expectations) | | Relationship with existing tests | (n/a) | `--mode extend` guarantees zero regression on the existing test count | ## False-positive self-check checklist Hotspots where false positives can enter a 3-layer chain, plus how to defend: - **Missing precondition in the Layer 1 spec** โ€” even when the "precondition" column reads `(none)`, an implicit contract state requirement (for example "NFT must already be held before granting") can cause "state corruption masked as test pass" in Layer 2. When `(none)` is used in Layer 1, verify it is intentional - **Layer 2 parser miss (column shift)** โ€” if the 9-column header does not match the SSOT (`docs/SKILL-DESIGN.ja.md` Step 4), Layer 2 reads a different column as the viewpoint. Run `grep -c "ใƒ†ใ‚นใƒˆ ID | ใƒ†ใ‚นใƒˆใƒฌใƒ™ใƒซ | ใƒ†ใ‚นใƒˆ่ฆณ็‚น"` to confirm the 9-column header before committing - **Partial verification of viewpoint 5 (permissions)** โ€” checking only `hasAccess(user)` and ignoring the `grantor` / `msg.sender` paths lets a self-grant bypass slip through. Always exercise every entry point (grantor / grantee / third party) - **Time-warp side effects in viewpoint 4 (state transition)** โ€” Foundry `vm.warp` and Hardhat `time.increaseTo` both leak time into the next test if you forget to reset. Restore the fixture from `loadFixture` / `snapshotChain` in `setUp` - **Race conditions in parallel runs** โ€” for viewpoint 8, prefer `Promise.allSettled` over `Promise.all` (so a single reject does not collapse the rest); Foundry is synchronous, so parallel races are not even expressible - **The trap of treating real behavior as canonical** โ€” even when `forge test` passes, "the real contract has a bug and the test froze that bug into place" remains possible. Cross-check the Layer 1 "main quality risks" section and confirm with the spec author whether "real behavior == intended spec" The full nine patterns plus a five-question self-check live in `.claude/skills/kiwa-play/references/adversarial-pitfalls.md`. ## Related links - Phase E SSOT: [`docs/SKILL-DESIGN.md`](../../docs/SKILL-DESIGN.md) - Layer 1: [.claude/skills/kiwa-design/SKILL.md](../../.claude/skills/kiwa-design/SKILL.md) - Layer 2 Foundry: [.claude/skills/kiwa-forge/SKILL.md](../../.claude/skills/kiwa-forge/SKILL.md) - Layer 2 Hardhat: [.claude/skills/kiwa-hardhat/SKILL.md](../../.claude/skills/kiwa-hardhat/SKILL.md) - Layer 2 Playwright: [.claude/skills/kiwa-play/SKILL.md](../../.claude/skills/kiwa-play/SKILL.md) - Related cookbook chapters: [snapshot-revert.md](../../docs/en/cookbook/snapshot-revert.md) (snapshot pattern shared between Layer 1 / Layer 2), [custom-error-revert.md](../../docs/en/cookbook/custom-error-revert.md) (viewpoint 2 failure-path helper)