--- name: "test" description: "Test construction and execution for the skaile-dev monorepo. Two modes: 'run' executes the test suite for one or more packages and reports results; 'construct' generates new tests for recently implemented code. Knows the full test stack: Vitest 3.2.4 (agent-framework + forge/L4-project + forge/L4-assistant + _scripts), Vitest 4.1 (forge/L5-concept), Jest (platform backend), Vitest (platform frontend), Playwright (E2E), and how to run each via the Bun workspace. Coverage is collected under Bun with @vitest/coverage-istanbul (not v8) and ratcheted against the committed baseline via _scripts/check-coverage-ratchet.ts." metadata: tags: - "testing" - "vitest" - "jest" - "playwright" - "bun" - "monorepo" - "skaile-development" source: "MERGED" stage: "beta" prerequisites: files: - path: "package.json" gate: hard description: "Monorepo root package.json required" user_inputs: dialog: - id: "mode" label: "Mode: 'run' (execute tests) or 'construct' (generate new tests)" type: "select" options: - "run" - "construct" required: true default: "run" - id: "scope" label: "Package(s) to test (comma-separated, or 'all' for full suite)" type: "text" required: false default: "all" hint: "e.g. 'forge/L4-project', 'platform/backend', 'agent-framework/cli'" - id: "filter" label: "Test name filter (for 'run' mode — runs only matching tests)" type: "text" required: false - id: "level" label: "Level (for 'construct' mode)" type: "select" options: - "unit" - "integration" - "e2e" - "auto" required: false default: "auto" hint: "auto = infer from changed files; else delegates to test-unit / test-integration / test-e2e" files: [] --- # Run Tests — Test Construction and Execution ## Canonical References (read first) The skaile-dev monorepo has a committed layered test strategy. Before writing or changing tests, read (at minimum) the concept spec and the phase plan — they own the terminology (L0–L5 layers), coverage policy, and CI lanes that this skill enforces. - **Concept / design spec:** `_devlog/specs/2026-04-22-test-concept-design.md` — layer taxonomy, shared infrastructure, coverage policy. - **Phase plan:** `_devlog/plans/2026-04-22-test-gap-fill.md` — phase-by-phase gap-fill with per-package targets. - **User-facing docs:** `/testing/` (Starlight, source at `docs/src/content/docs/testing.md`) — CI lanes table, ratchet description, local commands. - **Coverage baseline:** `_devlog/reports/coverage-baseline-2026-04-22/summary.json` — ratchet compares this against the CI run's `coverage-summary.json`. - **Shared helpers:** `agent-framework/test-utils/` (see its `CLAUDE.md`) — always use these instead of reinventing temp-dir cleanup, mock drivers, in-memory transports, or WebSocket doubles. - **CI lanes:** `.github/workflows/test-fast.yml`, `.github/workflows/test-full.yml`, `.github/workflows/test-e2e.yml` — three workflows that run on every PR. ## Overview Manages the test lifecycle for the skaile-dev monorepo. Works in two modes: | Mode | What It Does | |------|-------------| | `run` | Execute the test suite for specified packages, report results, triage failures | | `construct` | Generate new tests for recently implemented code — thin wrapper that delegates to `test-unit`, `test-integration`, or `test-e2e` based on the `level` input | For any non-trivial test authoring (setting up infra from scratch, generating a full plan, adding a new Playwright suite), prefer the dedicated skills: - **`test-plan`** — generate a per-package `TEST_PLAN.md` from CLAUDE.md + source - **`test-unit`** — scaffold unit infra + generate unit tests - **`test-integration`** — scaffold integration infra (DB / temp-dir) + generate tests - **`test-e2e`** — scaffold Playwright (web) or CLI harness + generate journeys `test construct` is retained for quick, already-configured packages where the user just wants a few tests for recently changed files. ## Test Stack by Package The root `vitest.config.ts` aggregates every agent-framework package plus `_scripts/`. A single `bun x --bun vitest run` from the repo root runs them all. Packages that need a different environment (happy-dom for Vue composables, Nitro shims for forge routes, vitest 4.1 for forge/L5-concept) keep their own `vitest.config.ts` and their own `bun run test` script; those are scoped runs. | Package | Framework | Run Command (from skaile-dev root) | |---------|-----------|-----------------------------------| | `agent-framework/*` (all packages under this tree) | Vitest 3.2.4 | `bun x --bun vitest run` | | `_scripts/` (check-coverage-ratchet etc.) | Vitest 3.2.4 | `bun x --bun vitest run` | | `forge/L4-project` | Vitest 3.2.4 + `happy-dom` | `bun run --filter @skaile/forge-project test` | | `forge/L4-assistant` | Vitest 3.2.4 + `happy-dom` | `bun run --filter @skaile/forge-assistant test` | | `forge/L5-concept` | Vitest 4.1 | `bun run --filter @skaile/forge-concept test` | | `forge/common-backend` | Vitest 3.2.4 | root `bun x --bun vitest run` (included via root config) | | `forge/common-ui` | Vitest 3.2.4 (unit) + Playwright CT (e2e) | `bun x --bun vitest run` (unit); `cd forge/common-ui && bun run test:e2e` (CT) | | `agent-framework/tui` | Vitest 3.2.4 | `bun run --filter @skaile/agent-tui test` | | `platform/backend` | Jest | `bun run --filter ./platform/backend test` | | `platform/frontend` | Vitest | `bun run --filter ./platform/frontend test` | | `platform/e2e` | Playwright | `bun run --filter ./platform/e2e test:e2e` | | `forge/L4-project/tests/e2e/` | Playwright | `cd forge/L4-project && bun run test:e2e` | **Never** run `vitest` from inside a submodule/package with `bun` — always invoke from the skaile-dev root so the workspace overrides resolve every `@skaile/*` dependency locally. The forge Nuxt-app scoped runs are the only exception because they use their own vitest config with setup files. **Forge/concept** uses vitest 4.1 which differs from the root's 3.2.4; it is NOT wired into the shared root config or the coverage ratchet. Run it only with its own scoped command. ## When to Use - After any code change — verify nothing broke (`mode=run`) - After implementing a new feature — add test coverage (`mode=construct`) - Triaging CI failures — investigate specific failing tests (`mode=run, filter=`) - Before opening a PR — full suite pass is required ## When NOT to Use - For E2E test setup or generation — use `test-e2e` - For integration test setup or generation — use `test-integration` - For unit test setup or generation — use `test-unit` - For generating a per-package test plan — use `test-plan` - For acceptance-criteria verification against `_concept/` specs — use `verify` --- ROLE Test executor and constructor for the skaile-dev monorepo — runs tests, triages failures, generates new tests from source analysis. READS package.json (root + per-package) — workspace config, test scripts /CLAUDE.md — package conventions for test patterns /vitest.config.ts or jest.config.* — test configuration existing test files — patterns to follow when constructing recently changed source files (construct mode) — what to test MUST read existing test files before constructing new ones — match conventions exactly MUST run tests before reporting results — never report without running MUST triage failures: distinguish regressions from new failures vs. infrastructure issues MUST distinguish test failures (bugs) from test construction errors (bad test code) NEVER modify a test to make it pass — fix the source code instead NEVER skip failing tests by commenting them out NEVER report "all passing" without actually running the suite EMIT [test] started mode= scope= # ── Mode: run ───────────────────────────────────────────────────── IF mode = run STEP 1: Resolve scope IF scope = all - Run full monorepo test suite ELSE - Parse scope into package list - Derive correct run command per package from the test stack table above STEP 2: Execute FOR EACH package in scope: - $ [--reporter=verbose] [--testNamePattern=] 2>&1 | tail -80 - Capture: total, passed, failed, skipped, duration EMIT [test] package_result package= total= passed= failed= COMPACT CHECKPOINT (every 3 packages or when scope=all): After completing each group of 3 packages, pause and call /compact before continuing. This prevents context from growing unboundedly across many test iterations and is the single most impactful token-saving action in this skill. STEP 3: Triage failures FOR EACH failing test: - Read the test file and the test case - Read the source file(s) it tests - Classify failure: - REGRESSION: previously passing test now fails due to code change - NEW_FAILURE: test was already failing before this change (check git blame) - INFRA_ISSUE: database not available, port conflict, missing env var - BAD_TEST: test itself is incorrectly written (assertion is wrong, not the source) - Report classification with evidence STEP 4: Present results report ``` ## Test Results — ### Summary | Package | Total | Passed | Failed | Skipped | Duration | |---------|-------|--------|--------|---------|----------| | forge/L4-project | 42 | 42 | 0 | 0 | 1.2s | | platform/backend | 156 | 154 | 2 | 0 | 8.4s | ... | **Total** | **N** | **N** | **N** | **N** | **Ns** | ### Failures (if any) | Test | Package | File | Classification | Summary | |------|---------|------|----------------|---------| | "creates session token" | platform/backend | ... | REGRESSION | token field renamed in schema | ### Recommended Actions - Fix: : - Investigate: ``` EMIT [test] run_complete total= passed= failed= IF any REGRESSION failures - Report: "These tests were passing before. Fix before proceeding." IF any BAD_TEST failures - Report: "These tests appear to have incorrect assertions. Review before fixing source." IF any INFRA_ISSUE failures - Report: "Infrastructure issues detected — resolve environment before re-running." # ── Mode: construct ─────────────────────────────────────────────── IF mode = construct STEP 0: Route to the right quality skill (preferred path) FIRST CHECK: does the target package have `tests/TEST_PLAN.md`? IF NO → delegate to `test-plan` with target= first. A TEST_PLAN ties the per-package work to the layer taxonomy from the concept spec and enumerates the cases each test file must cover. After test-plan returns, continue with the level-appropriate test-* skill below. IF YES → proceed to level routing. Then, by level: IF level = unit or level = auto + changed files look like pure logic (no I/O imports): → Delegate to `test-unit` with target= and STOP IF level = integration or level = auto + changed files include API routes / DB handlers / subprocess drivers: → Delegate to `test-integration` with target= and STOP IF level = e2e or level = auto + changed files include pages/ or CLI bins: → Delegate to `test-e2e` with target= and STOP Only fall through to the legacy in-place construction below when the user explicitly wants a quick one-file addition in an already-configured package. DISPATCH HYGIENE (construct mode): When scope covers multiple packages, do NOT dispatch one agent per package sequentially. Group packages by L-layer (see CLAUDE.md layer table) and dispatch one agent per layer group in parallel. Apply MVC prompts: each agent receives only its package's CLAUDE.md excerpt and the relevant source files — not the full parent context. See references/sub-agent-dispatch.md. STEP 0.5: Patterns new tests MUST follow - **Temp directories:** always import `makeTempDir` from `@skaile/test-utils` (not raw `mkdtempSync` + `afterEach(rmSync)`). The helper registers cleanup via Vitest's `onTestFinished` when available and falls back to `afterEach` otherwise — it has been adopted across 23 agent-framework test files. - **Mock drivers:** use `makeMockDriver` from `@skaile/test-utils`. - **In-memory transport (A.2 / A.3 pattern):** use `makeInMemoryTransport` for transport/client round-trip tests. - **WebSocket client doubles:** use `MockWebSocket` + `installMockWebSocket` from `@skaile/test-utils/mock-websocket`. Install it on `globalThis.WebSocket` in `beforeEach` and tear down in `afterEach`. - **Fixture workspaces:** use `loadFixtureWorkspace` from `@skaile/test-utils`; fixture files live under `tests/fixtures/` (never `__fixtures__` or `test/fixtures`). - **Vue composables in forge apps** (L5 unit tier): the forge Nuxt app's `vitest.config.ts` must set `environment: "happy-dom"` and list `happy-dom` as a devDependency. `forge/L4-project/vitest.config.ts` is the reference. - **Nitro route integration tests** (forge L5 integration tier): synthesize an h3 event via `tests/_setup/h3-event.ts` + install Nitro globals via `tests/_setup/nitro-globals.ts` as a `setupFiles` entry. See `forge/L4-project/tests/api-*.test.ts` for the canonical pattern (import the route handler dynamically, mock `@skaile/forge-common-backend` at the package boundary, call the handler with a synthetic event). - **Bridge / subprocess drivers** (L3 integration tier): use the fake-binary harness pattern in `agent-framework/bridge/tests/omp-driver.test.ts`. The driver is redirected at a fake Node script via the `OMP_BRIDGE_BIN` / `OMP_BRIDGE_PREARGS` env vars; the fake controls ready/exit/stderr via `FAKE_OMP_*` env vars. Guarded by `SKAILE_SPAWN_TESTS=1` in the full lane. - **Docker / lab integration** (L3): guarded by `SKAILE_DOCKER_TESTS=1`. - **E2E spawn-harness** (L4 CLI): spawn the compiled `skaile` binary via `bun x` and assert on stdout / exit code per subcommand. Files live under `agent-framework/cli/tests/e2e/.test.ts`. - **Playwright** (L5 E2E): `.spec.ts` suffix under `tests/e2e/`. Each forge app owns its own `playwright.config.ts` and runs via `bun run test:e2e`. - **Playwright Component Testing (CT)** (composable libraries): when a Vue composable uses TipTap, ProseMirror, or any other library that requires real browser DOM beyond what happy-dom provides, use `@playwright/experimental-ct-vue`. Config at `playwright-ct.config.ts`, fixtures in `tests/e2e/fixtures/`, spec files as `*.spec.ts` under `tests/e2e/`. Requires the `resolveVueCompilerDom()` Vite plugin (see `forge/common-ui/playwright-ct.config.ts`). Run with `bun run test:e2e`. Recording: `bun run test:e2e:record -- http://localhost:` — opens `playwright codegen` against the CT dev server, records interactions as spec code. STEP 0.6: Mocking gotchas (project-specific — do not fight these) - `better-sqlite3` is not yet Bun-compatible. Forge route handlers that call `getDb()` via `@skaile/forge-common-backend` must mock `createDb` (and any other boundary functions the route pulls in) at the package boundary: ```ts vi.mock("@skaile/forge-common-backend", async () => { const actual = await vi.importActual>( "@skaile/forge-common-backend", ); return { ...actual, createDb: mockCreateDb, getSessionUser: mockGetSessionUser }; }); ``` For forge/L5-concept routes, the mock must return a chainable stub (`.select().from().get() → { count: 1 }`) because `getDb()` runs a seed check on first call and will otherwise execute the seed path. - Under Bun + Vitest, `vi.mock` of a same-package relative util sometimes misses. Register three specifier forms (two relatives + the absolute path via `new URL(... , import.meta.url).pathname`). This is documented in the D.2 forge/L4-assistant update on 2026-04-22. - Two test files are `describe.skip` under Bun because of `vi.mock` + dynamic `import()` limitations: `agent-framework/bridge/tests/codex-driver.test.ts` and `agent-framework/runner/tests/session-builder.test.ts`. They pass under plain `bun x vitest run` (Node). **Do not try to "fix" them**; the skip is intentional and documented. STEP 1: Identify what needs tests IF scope is provided → look for untested code in those packages ELSE - Use `git diff --name-only HEAD~1..HEAD` to find recently changed source files - Filter to testable files: source files (not tests, not configs) STEP 2: Discover test environment per package - Read existing test files for the package - Read vitest.config.ts / jest.config.* for configuration - Note: test file naming convention (`*.test.ts` or `*.spec.ts`) - Note: test directory convention (colocated or `__tests__/`) - Note: import style, mock patterns, assertion library - Read 2–3 existing tests to internalize the style STEP 3: Identify testable units FOR EACH changed source file: - Identify exported functions, composables, classes, API handlers, store actions - For each: determine what behavior is observable from outside (not implementation) - Classify as: unit-testable, integration-testable, or E2E-only - Map to existing tests: is there already a test file for this unit? STEP 4: Generate test files FOR EACH unit without test coverage: - Create test file following the package's naming and placement convention - Structure: describe blocks map to exported units; it blocks map to behaviors - Test names reference observable behavior, not internal method names - Use the same mocking patterns as existing tests - Prioritize: happy path, error path, boundary values - Do NOT test internal implementation details Test structure template (adapt to framework): ```typescript describe('', () => { describe('', () => { it('should ', () => { // Arrange // Act // Assert — test the observable contract, not internals }) it('should handle ', () => { // ... }) }) }) ``` STEP 5: Run generated tests - $ IF tests fail due to missing setup (mocks, providers) → fix setup IF tests fail because the source has a bug → report it; do NOT fix the test IF tests fail because the test is wrong → fix the test STEP 6: Present construction report ``` ## Test Construction Report ### Generated | File | Package | Tests | Units Covered | |------|---------|-------|---------------| | src/composables/useWorkspace.test.ts | forge/L4-project | 8 | 2 | ### Coverage Added | Unit | File | Was Tested | Now Tested | |------|------|-----------|-----------| | useWorkspace | src/composables/useWorkspace.ts | No | Yes | ### Skipped (E2E-only) | Unit | Reason | |------|--------| | WorkspacePage.vue | Full browser interaction — needs Playwright | ### Results - New tests: N - Passing: N - Failing (source bugs): N (see below) ``` EMIT [test] construct_complete files_created= tests_generated= passing= CHECKLIST - [ ] Existing test patterns read before constructing new tests - [ ] Tests match package naming and placement conventions - [ ] Tests cover observable behavior, not implementation details - [ ] All generated tests run without infrastructure errors - [ ] Failures triaged (regression vs. new vs. infra vs. bad test) - [ ] Source bugs reported (tests not modified to hide them) --- ## Token Hygiene (read before running) Test runs are the largest single driver of context growth. Follow these rules to keep sessions from ballooning: **Truncate output.** Every test run appends to context. Always pipe to `| tail -80` unless you need to see full output for triage. The summary line at the end is what matters — the rest is noise once tests pass. **Compact between packages.** When running scope=all or more than 3 packages, call `/compact` in the conversation after each package group finishes. Do not run the entire monorepo suite in one uninterrupted loop without compaction. **Split wide scopes into separate sessions.** For test gap-fill or construct work spanning many packages, prefer one session per L-layer group: - Session A: L0–L2 (types, core, resolver, flow-engine, bridge/pure, common-ui) - Session B: L3 (transport, client, session, store, asset-manager, sdk, tui) - Session C: L4+ (connectors, runner, bridge/drivers, lab, cli, forge apps) **Do not re-read CLAUDE.md or spec files on every iteration.** Read once, keep in context. If you've already loaded a package's CLAUDE.md this session, don't read it again. ## Bun Workspace Test Commands (Quick Reference) ```bash # Run all agent-framework + _scripts tests — truncate output to summary bun x --bun vitest run 2>&1 | tail -80 # Run with watch mode (development) bun x --bun vitest # Run with test name filter bun x --bun vitest run -t "workspace rename" 2>&1 | tail -40 # Run a single test file (full output is fine for a single file) bun x --bun vitest run agent-framework/core/src/manifest.test.ts # Run a forge Nuxt app's suite (their own vitest config — happy-dom + Nitro shims) bun run --filter @skaile/forge-project test 2>&1 | tail -60 bun run --filter @skaile/forge-assistant test 2>&1 | tail -60 bun run --filter @skaile/forge-concept test 2>&1 | tail -60 # vitest@4.1, scoped only # Run platform backend (Jest) bun run --filter ./platform/backend test 2>&1 | tail -80 # Run platform / forge-project E2E (Playwright) bun run --filter ./platform/e2e test:e2e 2>&1 | tail -60 (cd forge/L4-project && bun run test:e2e 2>&1 | tail -60) # Run forge/common-ui Playwright CT (TipTap/ProseMirror browser tests) (cd forge/common-ui && bun run test:e2e 2>&1 | tail -20) # Record new CT tests (codegen) # Terminal 1: cd forge/common-ui && bun run test:e2e:ui # Terminal 2: cd forge/common-ui && bun run test:e2e:record -- http://localhost:3101 # Run with coverage (istanbul under Bun — mirrors test-full.yml CI lane) bun x --bun vitest run \ --coverage.enabled \ --coverage.provider=istanbul \ --coverage.reporter=text-summary \ --coverage.reporter=json-summary \ --coverage.reportsDirectory=_devlog/reports/coverage-ci \ 2>&1 | tail -40 # Check the ratchet against the committed baseline bun _scripts/check-coverage-ratchet.ts # Options: --ci --baseline --tolerance # Exit codes: 0 pass, 1 regression, 2 invalid input ``` **Do NOT use `@vitest/coverage-v8` under Node** for this repo. The v8 provider needs Node's inspector, and under Node the connectors package breaks because `connector-registry.ts` uses `require('./adapters/memory.js')` at runtime — which Bun polyfills but Node's ESM loader does not. Istanbul instruments source at load time and works identically under Bun and Node; we pick Bun because the fast-lane tests already pass under Bun and we want one consistent runtime story. See `.github/workflows/test-full.yml` for the full CI rationale. ## Known Constraints These are the current hard constraints of the skaile-dev test framework. Do NOT try to route around them — they are documented here so the skill won't flail. | Constraint | Where | Workaround | |---|---|---| | `better-sqlite3` has no Bun build | forge/L4-project, forge/L5-concept, forge/L4-assistant | Mock `createDb` at the `@skaile/forge-common-backend` boundary. For forge/L5-concept, also return a chainable stub so the seed-check short-circuits. | | `vi.mock` misses same-package relatives under Bun+Vitest | forge Nitro route tests | Register three specifier forms: `./foo`, `../foo`, and the absolute path via `new URL("../foo.ts", import.meta.url).pathname`. | | `vi.mock` + dynamic `import()` unstable under Bun | `bridge/tests/codex-driver.test.ts`, `runner/tests/session-builder.test.ts` | These files are `describe.skip` under Bun and pass under plain `bun x vitest run` (Node). Leave them skipped — do not "fix" them. | | `@vitest/coverage-v8` requires Node's inspector | whole monorepo | Use `@vitest/coverage-istanbul` under Bun. See test-full.yml. | | Connectors package `require('./adapters/memory.js')` at runtime | `agent-framework/connectors` | Works under Bun; breaks under Node ESM. Run under `bun x --bun vitest run`. | | forge/L5-concept uses vitest 4.1 (root is 3.2.4) | `forge/L5-concept` | Run scoped via `bun run --filter @skaile/forge-concept test`; do not include in the root v8/istanbul coverage run. | | Bun.serve request handler body | `transport/src/server.ts` | Not reachable under Node. Validated by the 4 Bun-only integration tests in `ws-server.test.ts` (which are `describe.skip` unless running under `--bun`). | ## Post-Test Actions (construct or run mode) After finishing test work — especially construct mode — recommend: 1. **Check the ratchet.** Regenerate coverage and compare against the baseline: ```bash bun x --bun vitest run \ --coverage.enabled --coverage.provider=istanbul \ --coverage.reporter=json-summary \ --coverage.reportsDirectory=_devlog/reports/coverage-ci bun _scripts/check-coverage-ratchet.ts ``` If the ratchet reports `baseline-improved` for a package and the gain is intentional, update that package's entry in `_devlog/reports/coverage-baseline-2026-04-22/summary.json` in the same PR. 2. **Record the change.** Call the `devlog` skill so `_devlog/DEVLOG.md` captures the testing work (new test files added, coverage moved, known-skip list changes). 3. **Before opening a PR**, run the local equivalent of test-full.yml (via the `quality` skill in `mode=full`) — see that skill for the canonical pre-PR sequence. ## Common Mistakes | Mistake | What to do instead | |---------|-------------------| | Modifying tests to make them pass | Fix the source code — tests are the spec | | Running only the new tests | Run the full package suite to catch regressions | | Constructing tests without reading existing ones | Always read 2–3 existing tests first — conventions vary | | Testing implementation details | Test observable behavior only | | Skipping infra failure investigation | INFRA failures mask real bugs — fix environment first | | Using `mkdtempSync` + manual cleanup | Use `makeTempDir()` from `@skaile/test-utils` | | Using `@vitest/coverage-v8` under Node | Use istanbul under Bun (see Bun Workspace Test Commands) | | "Fixing" the two skipped Bun tests | They're skipped on purpose; pass under Node. Leave them alone. | ## Integration - **Called by:** `implement` (after each task and before finish), `audit` (as a pre-analysis gate), `quality` - **Calls:** `test-plan`, `test-unit`, `test-integration`, `test-e2e` (from construct mode — test-plan first when no TEST_PLAN.md exists, then the level-specific skill) - **Delegates to:** `test-plan` → `test-unit` / `test-integration` / `test-e2e` — all in the skaileup-evaluate domain, no external dependencies - **Followed by:** `_scripts/check-coverage-ratchet.ts` (ratchet), then `devlog` (record)