# E2E test cases End-to-end scenarios that wire the **real** worker + supervisor + TUI modules together the way the daemon does (worker metric sink → supervisor SQLite → control API), using a fake Copilot provider so no live network/token is needed. Spec: [`copilot-reverse.e2e.test.ts`](./copilot-reverse.e2e.test.ts). **Policy: every code update must keep all e2e cases green.** Run with `npm run test:e2e` (`npm test` also runs them — the suite is included in the default vitest run). | ID | Scenario | Expected result | |----|----------|-----------------| | EP-01 | Anthropic `POST /v1/messages` with `stream: true` | SSE contains `message_start`, a `text` delta, and `message_stop` | | EP-02 | OpenAI `POST /v1/chat/completions` | `choices[0].message.content` is the assistant text | | EP-03 | `POST /v1/messages/count_tokens` | `200` with `input_tokens > 0` (lets clients time auto-compaction) | | EP-04 | `/v1/messages` carrying an Anthropic **server-side tool** (`web_search_20250305`) | request completes `200` — the built-in tool is dropped, so the client never hangs waiting for a `tool_result` | | EP-05 | A failing provider stream | worker emits `event: error` (not a silent close) **and** the supervisor records the failure with its message, visible at `GET /api/requests` / dashboard | | EP-06 | `GET /` on the supervisor | `200` HTML dashboard page | | EP-07 | `/logs` slash command | lists recent request errors with their messages | | EP-08 | `/dashboard` and `/report` slash commands | open the dashboard URL and a prefilled GitHub issue URL in the browser | | EP-09 | `/reset-claude` after `setup` wrote config | removes exactly the `ANTHROPIC_*` keys copilot-reverse added, preserving the rest | | EP-10 | two concurrent Anthropic streams | each `message_start` carries a UNIQUE message id (no dedupe-to-first) | | EP-11 | Anthropic stream | `message_start` seeds a non-zero `input_tokens` estimate (context bar not stuck at 0%) | | EP-12 | provider returns usage | `message_delta` reports `input_tokens` (prompt − cached), `output_tokens`, `cache_read_input_tokens` | | EP-13 | OpenAI stream with usage | a usage chunk with `total_tokens` is emitted before `[DONE]` | | EP-14 | OpenAI stream fails mid-flight | an `error` chunk is emitted, not a silent close | | EP-15 | dated Anthropic id (`claude-opus-4-8-20251101`) | fuzzy-matched to the available Copilot model | | EP-16 | `claude-opus-4.8[1m]` request | the `[1m]` suffix is stripped before forwarding | | EP-17 | Anthropic image block | round-trips through the proxy as image content (vision) | | EP-18 | mixed text+tool stream | text@0, tool@1, `stop_reason=tool_use` | | EP-19 | non-stream tool_use response | maps to Anthropic `tool_use` content | | EP-20 | OpenAI assistant tool_call + tool result | both reach the provider as canonical blocks | | EP-21 | failed request | error message persists in `request_log`, queryable via `/api/requests` | | EP-22 | control API | exposes status, doctor, requests endpoints | | EP-23 | fresh db | round-trips a recorded request (migration-safe schema) | | EP-24 | `setup-claude` global | HUD status reports configured (user scope); `[1m]` + window written | | EP-25 | `setup-codex` | writes native `~/.codex/config.toml` with `model_context_window` | | EP-26 | reset after 1M setup | removes every key including the 1M-window keys | ### Codex `/responses` (EP-27 … EP-38) The OpenAI Responses API end-to-end through a booted worker (Codex speaks only this). Hermetic — fake provider, no network. Spec: same `copilot-reverse.e2e.test.ts`, `describe("E2E: Codex /responses")`. | ID | Scenario | Expected result | |----|----------|-----------------| | EP-27 | non-stream `/openai/responses` | `object:"response"`, `status:"completed"`, an `output_text` message item, `usage` totals | | EP-28 | streaming `/openai/responses` | ordered `response.created → output_item.added → content_part.added → output_text.delta → …done → response.completed`, monotonic `sequence_number`, final `usage` | | EP-29 | streaming tool call | `function_call` item + `function_call_arguments.delta/.done`, args reassemble | | EP-30 | non-stream tool call | maps to a `function_call` output item with `call_id`/`name`/`arguments` | | EP-31 | prior `function_call` + `function_call_output` in `input` | round-trips to the provider as `tool_use` + `tool_result` | | EP-32 | `input_image` content part | round-trips to the provider as an image block | | EP-33 | `instructions` | becomes a `system` message | | EP-34 | hosted `web_search` tool + a function tool | `web_search` passes through as a hostedTool; the function tool is kept | | EP-35 | expired token | `401` with `error.type:"error"` (login hint) | | EP-36 | mid-stream failure | a `data: {"type":"error"}` frame, not a silent close | | EP-37 | a `/responses` request | recorded in the supervisor `request_log` with `endpoint:"/openai/responses"` | | EP-38 | `gpt-4o[1m]` model | the `[1m]` suffix is stripped before forwarding | ## What each case protects (regressions it would catch) - **EP-01/EP-02** — core proxy translation (OpenAI/Anthropic ⇄ Copilot canonical), streaming framing. - **EP-03** — the count_tokens endpoint Claude Code relies on; missing → 404 → compaction mis-times. - **EP-04** — the "infinite hang on server-side tools" class of bug (agent-maestro #163/#150). - **EP-05** — the headline fixes: no silent stream-close, and request-error capture end-to-end. - **EP-06** — dashboard route stays mounted. - **EP-10/EP-11/EP-12/EP-13** — the usage/id fixes: unique message id (different asks → different answers), non-zero context bar, real token usage. - **EP-15/EP-16** — model resolution (fuzzy match + 1M `[1m]` suffix). - **EP-17** — vision passthrough (images not dropped). - **EP-18/EP-19/EP-20** — tool-call translation in both directions. - **EP-24/EP-25/EP-26** — the full setup→status→reset lifecycle for both clients. ## Live integration tests (opt-in, real Copilot) [`copilot-live.integration.test.ts`](./copilot-live.integration.test.ts) hits the REAL Copilot endpoints end-to-end (GitHub token exchange → worker → adapter → api.githubcopilot.com). It is **not** part of `npm test` — run it with `npm run test:integration`. Every case auto-skips when no GitHub login is on disk (so CI stays hermetic). Coverage: token exchange, model discovery (incl. a real 1M-window model), OpenAI completion, Anthropic streaming with **different questions → different answers** (the unique-id regression guard), real `message_delta` usage, and count_tokens. - **EP-07/EP-08/EP-09** — TUI command wiring: logs/error visibility, dashboard/report, config reset. ## Real CLI Docker e2e (opt-in, real `claude` + `codex`) The fullest test: the **actual `claude` and `codex` CLIs** drive the **real worker daemon** inside a Linux container, with a real GitHub token (and optional WebIQ key) mounted. See [`docker/README.md`](./docker/README.md) — built/run via `e2e/docker/Dockerfile.cli` + `cli-e2e.sh`, not part of `npm test`. It writes a markdown report after each run. Checks: | Scenario | Path | Passes when | |----------|------|-------------| | `codex exec` | `/openai/responses` | the model returns `CODEX_OK` | | `claude -p` | `/anthropic/v1/messages` | the model returns `CLAUDE_OK` | | `claude` web search | gateway `web_search` loop → WebIQ | a grounded answer (a Rust `1.x` version), no error | | codex multi-line | `/openai/responses` | two-line reply preserved (`LINE_ONE`/`LINE_TWO`) | | claude constrained | `/anthropic/v1/messages` | `6*7` → `42` | | `[1m]` model id | resolveModel strip | `gpt-4o[1m]` still answers `ONEM_OK` | | model discovery | `/anthropic/v1/models` | picker gets dashed `claude-opus-4-8[1m]`, no dotted ids leak | | canonical opus | `/anthropic/v1/messages` | `claude-opus-4-8[1m]` resolves to Copilot opus + answers `OPUS_OK` | | setup default model | `claudeCopilotReverseEnv` | the default ANTHROPIC_MODEL is dashed `claude-opus-4-8[1m]` + answers `DEFAULT_OK` | ## HTTP edge-case Docker e2e (hermetic — no real Copilot) Boots the **real** worker (:7891) + supervisor (:7890) and drives them over HTTP on a dummy token, so error paths, supervision lifecycle, and the crash-guard regression run without a real token or quota. `e2e/docker/Dockerfile.http` + `http-e2e.mjs`; runs on every CI push. Checks: malformed JSON→400, >20mb→413, unknown route→404, models/healthz/count_tokens shapes, status/doctor/requests/dashboard, restart recovery, dead-socket broadcast churn survival, and a deterministic `EventBus` isolation guard that **fails on a reverted PR #8** (throwing subscriber must not escape `emit`). It also checks model discovery: `/anthropic/v1/models` advertises Claude families as dashed canonical ids + display + a `[1m]` badge (`claude-opus-4-8[1m]`) and never leaks Copilot's dotted ids — so Claude Code's native picker lights up. Real round-trips run only when a real token is mounted. This black-box path caught two bugs nothing else did: a Codex tool-translation `400` (a `custom`/ `tool_search` tool forwarded nameless → Copilot rejects → "stream closed before response.completed"), and empty terminal Responses events (`output_*.done` carried no text → Codex rendered nothing).