# Architecture

## Design Principles

1. **CLI wrappers, not direct API** — each MCP server wraps a CLI tool (`codex`, `gemini`, `qwen`) rather than calling APIs directly. This means auth is handled by the CLI (OAuth flows, token refresh) and we don't manage API keys for primary providers.

2. **Spawn, not exec** — all servers use Node.js `spawn()` instead of `exec()` to avoid shell injection risks. Prompts are passed as arguments or via stdin, never interpolated into shell strings.

3. **Fail fast, fallback clean** — error detection happens at the MCP server level. If a provider returns a quota error, the server returns a structured `isError: true` response with the error type. The orchestrator (Claude Code + Concilium skill) handles fallback routing.

4. **Each server is standalone** — you can use `mcp-openai` alone for OpenAI access, without the Concilium skill. The skill is an orchestration layer on top.

## Flow Diagram

```
User asks Claude Code a hard question
         │
         ▼
Claude Code tries to solve it
         │
    ┌────┴────┐
    │ Solved? │
    └────┬────┘
    Yes  │  No (after 3 attempts)
    ▼    │
  Done   ▼
    ┌─────────────────┐
    │  AI Concilium   │
    │  (skill trigger) │
    └────────┬────────┘
             │
    ┌────────┴────────┐
    │  Formulate      │
    │  problem (<500c)│
    └────────┬────────┘
             │
    ┌────────┴────────────────────┐
    │         PARALLEL            │
    ▼                             ▼
┌───────────┐             ┌───────────┐
│  OpenAI   │             │  Gemini   │
│  MCP call │             │  MCP call │
└─────┬─────┘             └─────┬─────┘
      │                         │
      ▼                         ▼
┌───────────┐             ┌───────────┐
│ Response A│             │ Response B│
│ or Error  │             │ or Error  │
└─────┬─────┘             └─────┬─────┘
      │                         │
      │ (on error)              │ (on error)
      ▼                         ▼
┌───────────┐             ┌───────────┐
│   Qwen    │             │   Qwen    │
│ (fallback)│             │ (fallback)│
└─────┬─────┘             └─────┬─────┘
      │ (on error)              │ (on error)
      ▼                         ▼
┌───────────┐             ┌───────────┐
│ DeepSeek  │             │ DeepSeek  │
│ (fallback)│             │ (fallback)│
└─────┬─────┘             └─────┬─────┘
      │                         │
      └───────────┬─────────────┘
                  │
                  ▼
         ┌────────────────┐
         │   Synthesize   │
         │   responses    │
         └───────┬────────┘
                 │
         ┌───────┴───────┐
         │  Consensus?   │
         └───────┬───────┘
         Yes     │  No
         ▼       │
       Apply     ▼
              Iterate (optional)
```

## Error Detection Patterns

Each MCP server implements `detectError(output)` that checks CLI output for known patterns:

### OpenAI (`mcp-openai`)
```
"usage limit" / "hit your usage limit" → QUOTA_EXCEEDED
"not supported when using codex"       → MODEL_NOT_SUPPORTED
"auth" + ("expired" / "login")         → AUTH_EXPIRED
```

### Gemini (`mcp-gemini`)
```
"quota" / "rate limit" / "resource_exhausted" → QUOTA_EXCEEDED
"authentication" / "not authenticated"         → AUTH_REQUIRED
```

### Qwen (`mcp-qwen`)
```
"no auth type is selected"                     → AUTH_NOT_CONFIGURED
"quota" / "rate limit" / "insufficient_quota"  → QUOTA_EXCEEDED
"authentication" / "invalid api key"           → AUTH_EXPIRED
"model not found" / "model is not available"   → MODEL_NOT_AVAILABLE
```

## Timeout Handling

All servers use the same SIGTERM/SIGKILL pattern:

```javascript
// 1. Set timeout
const timer = setTimeout(() => {
  killed = true;
  proc.kill("SIGTERM");           // Graceful shutdown
  setTimeout(() => {
    if (!proc.killed) proc.kill("SIGKILL");  // Force kill after 5s
  }, 5000);
}, timeoutMs);

// 2. Clean up on close
proc.on("close", () => {
  clearTimeout(timer);
  if (killed) reject(new Error("timeout"));
  else resolve({ stdout, stderr, exitCode });
});
```

Default timeouts:
- `openai_chat`: 180s
- `openai_review`: 120s
- `gemini_chat`: 90s
- `gemini_analyze`: 180s
- `qwen_chat`: 120s

## Why CLI Wrappers vs Direct API?

| Aspect | CLI Wrapper | Direct API |
|--------|-------------|------------|
| Auth | CLI handles OAuth flows | Need API keys |
| Cost | Free tiers via OAuth | Pay per token |
| Maintenance | CLI updates automatically | SDK version pinning |
| Reliability | CLI tested by provider | Custom error handling |
| Flexibility | Limited to CLI features | Full API access |

We chose CLI wrappers because:
1. **No API keys for primary providers** — OAuth via CLI is free/cheap
2. **Less code** — no SDK dependencies, no token management
3. **Provider-tested** — CLI tools are maintained by the providers themselves
4. **Simple** — spawn a process, get stdout, done