# Architecture ## Design Principles 1. **CLI wrappers, not direct API** — each MCP server wraps a CLI tool (`codex`, `gemini`, `qwen`) rather than calling APIs directly. This means auth is handled by the CLI (OAuth flows, token refresh) and we don't manage API keys for primary providers. 2. **Spawn, not exec** — all servers use Node.js `spawn()` instead of `exec()` to avoid shell injection risks. Prompts are passed as arguments or via stdin, never interpolated into shell strings. 3. **Fail fast, fallback clean** — error detection happens at the MCP server level. If a provider returns a quota error, the server returns a structured `isError: true` response with the error type. The orchestrator (Claude Code + Concilium skill) handles fallback routing. 4. **Each server is standalone** — you can use `mcp-openai` alone for OpenAI access, without the Concilium skill. The skill is an orchestration layer on top. ## Flow Diagram ``` User asks Claude Code a hard question │ ▼ Claude Code tries to solve it │ ┌────┴────┐ │ Solved? │ └────┬────┘ Yes │ No (after 3 attempts) ▼ │ Done ▼ ┌─────────────────┐ │ AI Concilium │ │ (skill trigger) │ └────────┬────────┘ │ ┌────────┴────────┐ │ Formulate │ │ problem (<500c)│ └────────┬────────┘ │ ┌────────┴────────────────────┐ │ PARALLEL │ ▼ ▼ ┌───────────┐ ┌───────────┐ │ OpenAI │ │ Gemini │ │ MCP call │ │ MCP call │ └─────┬─────┘ └─────┬─────┘ │ │ ▼ ▼ ┌───────────┐ ┌───────────┐ │ Response A│ │ Response B│ │ or Error │ │ or Error │ └─────┬─────┘ └─────┬─────┘ │ │ │ (on error) │ (on error) ▼ ▼ ┌───────────┐ ┌───────────┐ │ Qwen │ │ Qwen │ │ (fallback)│ │ (fallback)│ └─────┬─────┘ └─────┬─────┘ │ (on error) │ (on error) ▼ ▼ ┌───────────┐ ┌───────────┐ │ DeepSeek │ │ DeepSeek │ │ (fallback)│ │ (fallback)│ └─────┬─────┘ └─────┬─────┘ │ │ └───────────┬─────────────┘ │ ▼ ┌────────────────┐ │ Synthesize │ │ responses │ └───────┬────────┘ │ ┌───────┴───────┐ │ Consensus? │ └───────┬───────┘ Yes │ No ▼ │ Apply ▼ Iterate (optional) ``` ## Error Detection Patterns Each MCP server implements `detectError(output)` that checks CLI output for known patterns: ### OpenAI (`mcp-openai`) ``` "usage limit" / "hit your usage limit" → QUOTA_EXCEEDED "not supported when using codex" → MODEL_NOT_SUPPORTED "auth" + ("expired" / "login") → AUTH_EXPIRED ``` ### Gemini (`mcp-gemini`) ``` "quota" / "rate limit" / "resource_exhausted" → QUOTA_EXCEEDED "authentication" / "not authenticated" → AUTH_REQUIRED ``` ### Qwen (`mcp-qwen`) ``` "no auth type is selected" → AUTH_NOT_CONFIGURED "quota" / "rate limit" / "insufficient_quota" → QUOTA_EXCEEDED "authentication" / "invalid api key" → AUTH_EXPIRED "model not found" / "model is not available" → MODEL_NOT_AVAILABLE ``` ## Timeout Handling All servers use the same SIGTERM/SIGKILL pattern: ```javascript // 1. Set timeout const timer = setTimeout(() => { killed = true; proc.kill("SIGTERM"); // Graceful shutdown setTimeout(() => { if (!proc.killed) proc.kill("SIGKILL"); // Force kill after 5s }, 5000); }, timeoutMs); // 2. Clean up on close proc.on("close", () => { clearTimeout(timer); if (killed) reject(new Error("timeout")); else resolve({ stdout, stderr, exitCode }); }); ``` Default timeouts: - `openai_chat`: 180s - `openai_review`: 120s - `gemini_chat`: 90s - `gemini_analyze`: 180s - `qwen_chat`: 120s ## Why CLI Wrappers vs Direct API? | Aspect | CLI Wrapper | Direct API | |--------|-------------|------------| | Auth | CLI handles OAuth flows | Need API keys | | Cost | Free tiers via OAuth | Pay per token | | Maintenance | CLI updates automatically | SDK version pinning | | Reliability | CLI tested by provider | Custom error handling | | Flexibility | Limited to CLI features | Full API access | We chose CLI wrappers because: 1. **No API keys for primary providers** — OAuth via CLI is free/cheap 2. **Less code** — no SDK dependencies, no token management 3. **Provider-tested** — CLI tools are maintained by the providers themselves 4. **Simple** — spawn a process, get stdout, done