# Testing Pipelines Squid has two testing approaches — no agent runtime needed for either. 1. **YAML Tests** — write `.test.yaml` files alongside your pipelines, run with `squid test` 2. **TypeScript Tests** — use the `TestRunner` API with vitest/jest --- ## YAML Tests (recommended) The simplest way to test pipelines. Write test cases in YAML, mock any step, assert on results. ### Quick start Create `deploy.test.yaml` next to `deploy.yaml`: ```yaml pipeline: ./deploy.yaml tests: - name: "deploys when approved" mode: sandbox args: env: staging image: app:v2 mocks: run: build: { output: { built: true } } test: { output: { passed: true } } spawn: reviewer: { output: { score: 95 } } gates: approve: true assert: status: completed steps: deploy: completed - name: "skips deploy when rejected" mode: sandbox gates: approve: false assert: steps: deploy: skipped ``` Run: ```bash squid test # auto-discovers all *.test.yaml squid test deploy.test.yaml # specific file ``` Output: ``` deploy.test.yaml ✓ deploys when approved (2ms) ✓ skips deploy when rejected (1ms) 1 suite(s), 2 test(s) 2 passed ``` ### Test modes | Mode | `run` steps | `spawn` steps | `gate` steps | Use case | |------|------------|---------------|--------------|----------| | **`sandbox`** | Mocked — nothing executes | Mocked | Mock decisions | Unit testing pipeline logic, conditions, branching, data flow | | **`integration`** | Execute for real (unless mocked) | Mocked | Mock decisions | Testing actual shell scripts, real commands | **`sandbox`** is the default. Use it for: - Testing that conditions route correctly - Testing that gates block/allow the right steps - Testing branch/loop/restart logic - Testing data flow between steps **`integration`** runs real shell commands. Use it when: - Your `run` steps have actual scripts you want to verify - You want to test that `echo` / `jq` / other tools produce correct output - You can still mock individual `run` steps (e.g., mock `kubectl` but run `jq`) ### Test file format ```yaml pipeline: ./path/to/pipeline.yaml # REQUIRED — relative to this test file tests: # REQUIRED — array of test cases - name: "test case name" # REQUIRED — descriptive name mode: sandbox # sandbox (default) | integration args: # pipeline arguments key: value env: # environment overrides KEY: value mocks: # mock step outputs run: # mock run steps stepId: output: { ... } # what to return as output stdout: "raw text" # raw stdout (optional) status: completed # completed (default) | failed error: "message" # error message (when status: failed) spawn: # mock spawn steps stepId: output: { ... } status: accepted # accepted (default) | error error: "message" gates: # gate decisions stepId: true # true = approve, false = reject assert: # REQUIRED — what to check status: completed # pipeline status output: { ... } # pipeline final output steps: # per-step assertions stepId: completed # step status (shorthand) ``` ### Assertion reference ```yaml assert: # Pipeline level status: completed # completed | failed | halted | cancelled output: { key: value } # exact match on pipeline output # Step level steps: build: completed # status shorthand: completed | failed | skipped build: { status: completed } # status object form build: { output: { image: "app:v2" } } # exact output match build: { outputContains: "app" } # output contains substring build: { outputPath: image, equals: "app:v2" } # nested field check ``` ### Mocking run steps In **sandbox** mode, all `run` steps are mocked automatically (they return `{ command, sandbox: true }`). Add explicit mocks to control what output they return: ```yaml mocks: run: build: output: { image: "app:v2", tag: "latest" } test: output: { passed: 42, failed: 0 } flaky-step: status: failed error: "connection timeout" ``` In **integration** mode, `run` steps execute for real **unless** you mock them: ```yaml mode: integration mocks: run: # Mock the dangerous one, let the rest run deploy: output: { deployed: true } ``` ### Mocking spawn steps Spawn steps are always mocked in both modes (no real agent calls): ```yaml mocks: spawn: researcher: output: { findings: ["a", "b", "c"] } reviewer: output: { score: 85, feedback: "looks good" } failing-agent: status: error error: "agent crashed" ``` Unmocked spawn steps return `{ mocked: true }` by default. ### Gate decisions ```yaml gates: approve: true # approve this gate dangerous: false # reject this gate ``` Unmocked gates are auto-approved by default. ### Example: Testing a multi-agent pipeline ```yaml pipeline: ./multi-agent-dev.yaml tests: - name: "full happy path" mode: sandbox args: feature: "add auth" repo: /workspace mocks: spawn: architect: output: { plan: "add JWT", files: ["auth.ts"] } backend-coder: output: { files: ["src/auth.ts"] } frontend-coder: output: { files: ["src/Login.tsx"] } test-writer: output: { files: ["test/auth.test.ts"] } reviewer: output: { criticalIssues: 0, summary: "LGTM" } doc-writer: output: { docs: ["API.md"] } run: run-tests: output: { passed: true, coverage: 92 } create-pr: output: { pr: "#42" } gates: plan-review: true deploy-approval: true assert: status: completed steps: architect: completed backend-coder: completed reviewer: completed create-pr: completed - name: "plan rejected" mode: sandbox args: feature: "bad idea" repo: /workspace mocks: spawn: architect: output: { plan: "risky" } gates: plan-review: false assert: status: completed steps: backend-coder: skipped ``` ### Example: Integration testing with real scripts ```yaml pipeline: ./sub-build.yaml tests: - name: "echo commands produce correct output" mode: integration args: target: prod assert: status: completed steps: compile: outputContains: "app-prod" lint: completed summary: completed ``` --- ## TypeScript Tests For programmatic testing with vitest, jest, or any test framework. ### TestRunner API ```typescript import { createTestRunner } from "squid/testing"; import { parseFile } from "squid"; const pipeline = parseFile("my-pipeline.yaml"); const result = await createTestRunner() .mockSpawn("research", { output: { findings: ["a", "b"] } }) .approveGate("review") .rejectGate("dangerous-gate") .withArgs({ env: "test" }) .withEnv({ API_KEY: "test-key" }) .run(pipeline); expect(result.status).toBe("completed"); result.assertStepCompleted("research"); result.assertStepSkipped("dangerous-action"); ``` ### API Reference | Method | Description | |--------|-------------| | `mockSpawn(stepId, { output })` | Mock a spawn step's result | | `mockSpawnHandler(stepId, fn)` | Dynamic handler for spawn mock | | `approveGate(stepId)` | Auto-approve a gate | | `rejectGate(stepId)` | Auto-reject a gate | | `overrideStep(stepId, result)` | Replace any step's result entirely | | `withArgs(args)` | Set pipeline arguments | | `withEnv(env)` | Set environment variables | | `run(pipeline)` | Execute and return `TestResult` | ### TestResult ```typescript interface TestResult extends RunResult { capturedSteps: Array<{ step: Step; result: StepResult }>; getStepResult(stepId: string): StepResult | undefined; assertStepCompleted(stepId: string): void; // throws if not completed assertStepSkipped(stepId: string): void; // throws if not skipped } ``` ### Example: vitest test suite ```typescript import { describe, it, expect } from "vitest"; import { createTestRunner } from "squid/testing"; import { parseFile } from "squid"; const pipeline = parseFile("deploy.yaml"); describe("deploy pipeline", () => { it("deploys when approved", async () => { const result = await createTestRunner() .approveGate("approve") .withArgs({ env: "staging", image: "app:v2" }) .run(pipeline); expect(result.status).toBe("completed"); result.assertStepCompleted("deploy"); }); it("skips deploy when rejected", async () => { const result = await createTestRunner() .rejectGate("approve") .withArgs({ env: "staging", image: "app:v2" }) .run(pipeline); result.assertStepSkipped("deploy"); }); it("branches on review score", async () => { const result = await createTestRunner() .mockSpawn("reviewer", { output: { criticalIssues: 3 } }) .run(pipeline); result.assertStepCompleted("fix-bugs"); }); }); ``` --- ## Execution Modes | Mode | `run` steps | `spawn` steps | `gate` steps | Use | |------|------------|---------------|--------------|-----| | `run` | Execute | Real agent calls | Halt for approval | Production | | `dry-run` | Skip (show command) | Skip | Skip | Preview | | `test` | Execute | Mocked | Auto-approve | Legacy TS tests | | **`sandbox`** | **Mocked** | **Mocked** | **Mock decisions** | **YAML unit tests** | | **`integration`** | **Execute (unless mocked)** | **Mocked** | **Mock decisions** | **YAML integration tests** | --- ## End-to-End Tests E2E tests run pipelines against **real agent CLIs** (Claude Code, OpenClaw, OpenCode). They validate the full round-trip: Squid invokes the CLI, the agent processes the task, output is parsed back into the pipeline. ### Prerequisites ```bash # Claude Code adapter claude --version # must be installed and authenticated # OpenClaw adapter openclaw --version # must be installed openclaw config # must be authenticated # OpenClaw gateway must be running for agent spawns to work: openclaw status # check if gateway is running openclaw gateway run --bind loopback --port 18789 # start if not running # Or start via the OpenClaw macOS app (menubar icon) # OpenCode adapter opencode --version # must be installed ``` ### Running E2E tests E2E tests are disabled by default. Enable with `SQUID_E2E=1`: ```bash # Run all e2e tests SQUID_E2E=1 npx vitest run test/e2e.test.ts # Run a specific feature SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "parallel" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "code review loop" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "sub-pipeline" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "gate" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "loop" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "error recovery" SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "mixed" # Run only Claude Code tests (no OpenClaw required) SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "claude-code" # Run only OpenClaw tests SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "openclaw" ``` Tests auto-skip if the required CLI is not found. ### E2E test coverage | Test | Pipeline | Feature Tested | Adapters | Timeout | |------|----------|---------------|----------|---------| | **Basic spawn** | `e2e-claude-code.yaml` | Single agent spawn + JSON output parsing | Claude Code | 2 min | | **Data flow** | `e2e-claude-code.yaml` | Output from step A available in step B | Claude Code | 2 min | | **OpenClaw spawn** | `e2e-openclaw.yaml` | OpenClaw CLI invocation + JSON extraction from payloads envelope | OpenClaw | 3 min | | **Code review loop** | `e2e-code-review-loop.yaml` | Coder → reviewer → `restart:` loop until score threshold met | Claude Code | 5 min | | **Restart exhaustion** | `e2e-code-review-loop.yaml` | `maxRestarts` reached → branch routes to "rejected" | Claude Code | 10 min | | **Parallel agents** | `e2e-parallel-agents.yaml` | `parallel:` branches, `merge: object`, concurrent spawns | Claude Code | 3 min | | **Sub-pipeline** | `e2e-sub-pipeline.yaml` | `pipeline:` step calls child YAML, arg passing, output propagation | Claude Code | 3 min | | **Gate + resume** | `e2e-gate-resume.yaml` | Gate auto-approves in `test` mode; halts with resume token in `run` mode | Claude Code | 3 min | | **Loop over items** | `e2e-loop-items.yaml` | `loop:` iterates list items through agent, `collect` results | Claude Code | 5 min | | **Error recovery** | `e2e-error-recovery.yaml` | `branch:` on agent confidence, fallback agent on failure | Claude Code | 3 min | | **Mixed adapters** | `e2e-mixed-adapters.yaml` | Claude Code writes code, OpenClaw reviews it in same pipeline | Claude Code + OpenClaw | 5 min | ### E2E pipeline files All E2E pipelines are in `skills/squid-pipeline/examples/e2e/`: ``` e2e/ ├── e2e-claude-code.yaml # Basic spawn ├── e2e-openclaw.yaml # OpenClaw spawn ├── e2e-code-review-loop.yaml # Restart loop (coder/reviewer) ├── e2e-code-review-loop.test.yaml # Sandbox test for the loop ├── e2e-parallel-agents.yaml # Parallel branches + merge ├── e2e-sub-pipeline.yaml # Parent pipeline ├── e2e-sub-pipeline-child.yaml # Child pipeline (called by parent) ├── e2e-gate-resume.yaml # Gate halt + resume ├── e2e-loop-items.yaml # Loop with agent per item ├── e2e-error-recovery.yaml # Branch-based error fallback └── e2e-mixed-adapters.yaml # Claude Code + OpenClaw in one pipeline ``` Each pipeline can also be run directly with `squid run`: ```bash # Run a single e2e pipeline manually squid run skills/squid-pipeline/examples/e2e/e2e-parallel-agents.yaml \ --args-json '{"topic": "benefits of testing"}' -v # Dry-run to see what would execute without calling agents squid run skills/squid-pipeline/examples/e2e/e2e-code-review-loop.yaml --dry-run # Validate all e2e pipelines for f in skills/squid-pipeline/examples/e2e/e2e-*.yaml; do squid validate "$f" done ``` ### Per-adapter testing **Claude Code only** (no OpenClaw gateway needed): ```bash SQUID_E2E=1 npx vitest run test/e2e.test.ts \ -t "claude-code|code review loop|parallel|sub-pipeline|gate|loop|error recovery" ``` **OpenClaw only** (needs running gateway): ```bash SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "openclaw" ``` **Both adapters** (Claude Code + OpenClaw gateway): ```bash SQUID_E2E=1 npx vitest run test/e2e.test.ts -t "mixed" ``` ### Writing new E2E tests 1. Create a pipeline in `skills/squid-pipeline/examples/e2e/e2e-.yaml` 2. Validate with `squid validate` 3. Add test case in `test/e2e.test.ts` using `it.skipIf(!shouldRun)` 4. Use `parseFile()` + `runPipeline()` — same API as production 5. Set generous timeouts (agents are slow: 60-300s per spawn) 6. Log outputs with `console.log` for debugging 7. Handle graceful failures (e.g., OpenClaw gateway not running) ```typescript describe("e2e: my feature", () => { const shouldRun = E2E_ENABLED && HAS_CLAUDE; it.skipIf(!shouldRun)("does the thing", async () => { const pipeline = parseFile(resolve(e2eDir, "e2e-my-feature.yaml")); const result = await runPipeline(pipeline, { args: { key: "value" }, }); console.log("Status:", result.status); console.log("Output:", JSON.stringify(result.results.step?.output, null, 2)); expect(result.status).toBe("completed"); expect(result.results.step?.status).toBe("completed"); }, 180_000); // 3 min timeout }); ``` --- ## Tips 1. **Start with sandbox mode** — test logic first, then add integration tests for scripts 2. **Test the happy path first** — all spawns succeed, all gates approved 3. **Test rejection paths** — reject gates, verify conditional skips 4. **Test error handling** — mock steps as `status: failed`, verify branch routing 5. **Test restart loops** — mock spawn outputs that improve across iterations 6. **Mock only what's needed** — unmocked run steps return `{ sandbox: true }`, unmocked spawns return `{ mocked: true }` 7. **Keep test files next to pipelines** — `deploy.yaml` + `deploy.test.yaml` 8. **Use integration mode sparingly** — only for testing actual shell commands 9. **Use e2e tests for adapter validation** — run `SQUID_E2E=1` after changing adapters or JSON parsing