# Integration test architecture The integration tests verify the full request pipeline. An HTTP client sends a request to the proxy, the proxy creates a Copilot SDK session that talks to a mock LLM server, and the proxy streams the response back in the correct SSE format. ```mermaid flowchart LR Test["Test client
(fetch)"] Proxy["xcode-copilot-server
(Fastify)"] SDK["Copilot SDK
(BYOK mode)"] Mock["llm-mock-server
(deterministic rules)"] Test -->|"POST /v1/chat/completions
POST /v1/messages
POST /v1/responses"| Proxy Proxy -->|"SDK session events"| SDK SDK -->|"OpenAI / Anthropic / Responses
wire format"| Mock Mock -->|"SSE / JSON"| SDK SDK -->|"assistant.message_delta
session.idle"| Proxy Proxy -->|"SSE in matching format"| Test ``` ## How it works The Copilot SDK supports [BYOK (Bring Your Own Key)](https://github.com/github/copilot-sdk) providers. Instead of talking to GitHub's backend, the SDK sends requests to a custom endpoint. We point it at [`llm-mock-server`](https://github.com/theblixguy/llm-mock-server), which returns deterministic responses based on pattern-matching rules. This means the tests exercise the real SDK session lifecycle (event subscriptions, streaming, session reuse) without needing GitHub auth or making real API calls. A dummy token is enough to start the SDK CLI process. ## Setup [`setup.ts`](../test/integration/setup.ts) runs once per test file via `beforeAll`/`afterAll`. 1. Starts `llm-mock-server` on a random port with shared rules 2. Starts `CopilotService` with a dummy GitHub token 3. Exports `startServer()` which creates a proxy instance pointed at the mock via BYOK The mock rules are simple input-output pairs. ```text "hello" -> "Hello from mock!" "capital of France" -> "The capital of France is Paris." /what word/i -> "The word was banana." "think about life" -> { text: "The answer is 42.", reasoning: "..." } "say nothing" -> "" (no match) -> "I'm a mock server." ``` ## Per-provider BYOK config Each provider uses the correct wire format between the SDK and the mock. | Provider | BYOK type | BYOK baseUrl | Notes | | -------- | --------- | ------------ | ----- | | OpenAI | `openai` | `mock.url/v1` | SDK appends `/chat/completions` | | Claude | `anthropic` | `mock.url` | SDK appends `/v1/messages`. Needs dummy `apiKey` | | Codex | `openai` + `wireApi: "responses"` | `mock.url/v1` | SDK appends `/responses` | The `allowedCliTools: ["test"]` config prevents the SDK from attaching its built-in tools to BYOK requests. Without this, the SDK sends ~30 tool definitions that fail the mock's strict schema validation. ## Test structure ```text test/integration/ setup.ts shared mock rules, service lifecycle, helpers openai.test.ts OpenAI Chat Completions endpoint claude.test.ts Anthropic Messages endpoint codex.test.ts Responses API endpoint test/streaming-integration.test.ts SDK-level tests that mock the CopilotSession directly. Covers error handling, compaction, reasoning block structure, tool bridge, and MCP routes. ``` Each integration test file defines a `PATH` (the endpoint path), `msg()` (builds a minimal valid request), `byok()` (returns the BYOK provider config), and `textFrom()` (extracts text content from the provider's SSE format). ## What's tested ### Integration tests (via llm-mock-server) Per-provider coverage: - Basic streaming response with correct SSE format and content-type - System message / instructions passthrough - Multi-turn conversation (incremental prompts via session reuse) - Reasoning reply text extraction - Fallback response for unmatched messages - Empty response handling - Schema validation (missing required fields, invalid types, non-streaming rejection) - Usage stats recording across single and multiple requests - User-agent guard rejection (wrong and missing user-agent) - File pattern exclusion (excluded code blocks stripped from prompt) - Health endpoint ### SDK-level tests (via mocked CopilotSession) These test things that llm-mock-server can't simulate: - Session error mid-stream (no deltas, partial deltas) - Prompt send failure (session.send() rejection) - Context compaction events - Reasoning block structure (Claude thinking blocks, Codex reasoning summary events) - Tool execution event logging - Tool bridge (Claude tool_use blocks, Codex function_call items) - MCP JSON-RPC routes (initialize, tools/list, tools/call, notifications)