--- title: "Troubleshooting" version: 3.8.40 lastUpdated: 2026-06-28 --- # Troubleshooting > **For Users**: Looking for quick fixes? See the [Quick Reference](#quick-reference) below. 🌐 **Languages:** 🇺🇸 [English](./TROUBLESHOOTING.md) | 🇧🇷 [Português (Brasil)](../i18n/pt-BR/docs/guides/TROUBLESHOOTING.md) | 🇪🇸 [Español](../i18n/es/docs/guides/TROUBLESHOOTING.md) | 🇫🇷 [Français](../i18n/fr/docs/guides/TROUBLESHOOTING.md) | 🇮🇹 [Italiano](../i18n/it/docs/guides/TROUBLESHOOTING.md) | 🇷🇺 [Русский](../i18n/ru/docs/guides/TROUBLESHOOTING.md) | 🇨🇳 [中文 (简体)](../i18n/zh-CN/docs/guides/TROUBLESHOOTING.md) | 🇩🇪 [Deutsch](../i18n/de/docs/guides/TROUBLESHOOTING.md) | 🇮🇳 [हिन्दी](../i18n/in/docs/guides/TROUBLESHOOTING.md) | 🇹🇭 [ไทย](../i18n/th/docs/guides/TROUBLESHOOTING.md) | 🇺🇦 [Українська](../i18n/uk-UA/docs/guides/TROUBLESHOOTING.md) | 🇸🇦 [العربية](../i18n/ar/docs/guides/TROUBLESHOOTING.md) | 🇯🇵 [日本語](../i18n/ja/docs/guides/TROUBLESHOOTING.md) | 🇻🇳 [Tiếng Việt](../i18n/vi/docs/guides/TROUBLESHOOTING.md) | 🇧🇬 [Български](../i18n/bg/docs/guides/TROUBLESHOOTING.md) | 🇩🇰 [Dansk](../i18n/da/docs/guides/TROUBLESHOOTING.md) | 🇫🇮 [Suomi](../i18n/fi/docs/guides/TROUBLESHOOTING.md) | 🇮🇱 [עברית](../i18n/he/docs/guides/TROUBLESHOOTING.md) | 🇭🇺 [Magyar](../i18n/hu/docs/guides/TROUBLESHOOTING.md) | 🇮🇩 [Bahasa Indonesia](../i18n/id/docs/guides/TROUBLESHOOTING.md) | 🇰🇷 [한국어](../i18n/ko/docs/guides/TROUBLESHOOTING.md) | 🇲🇾 [Bahasa Melayu](../i18n/ms/docs/guides/TROUBLESHOOTING.md) | 🇳🇱 [Nederlands](../i18n/nl/docs/guides/TROUBLESHOOTING.md) | 🇳🇴 [Norsk](../i18n/no/docs/guides/TROUBLESHOOTING.md) | 🇵🇹 [Português (Portugal)](../i18n/pt/docs/guides/TROUBLESHOOTING.md) | 🇷🇴 [Română](../i18n/ro/docs/guides/TROUBLESHOOTING.md) | 🇵🇱 [Polski](../i18n/pl/docs/guides/TROUBLESHOOTING.md) | 🇸🇰 [Slovenčina](../i18n/sk/docs/guides/TROUBLESHOOTING.md) | 🇸🇪 [Svenska](../i18n/sv/docs/guides/TROUBLESHOOTING.md) | 🇵🇭 [Filipino](../i18n/phi/docs/guides/TROUBLESHOOTING.md) | 🇨🇿 [Čeština](../i18n/cs/docs/guides/TROUBLESHOOTING.md) Common problems and solutions for OmniRoute. --- ## Quick Reference **New to OmniRoute?** Start here — these solve 90% of problems: | I see this | What it means | What to do | | ----------------------- | ----------------------------------- | ------------------------------------------------------------------------------------------------- | | "Can't connect" | OmniRoute isn't running | Run `omniroute` or `docker restart omniroute` | | "Invalid API key" | Your key is wrong or expired | Re-copy the key from the provider's website | | "Rate limit exceeded" | You're sending too many requests | Wait 1 minute, or use `model: "auto"` for automatic fallback | | "Quota exceeded" | You've used up your free/paid quota | Connect more providers, or use free providers (Kiro, Pollinations) | | "Slow responses" | Provider is busy or far away | Use `model: "auto/fast"` or connect a faster provider (Groq, Cerebras) | | "Wrong provider used" | `auto` picked a different provider | That's normal! `auto` picks the best one. Force a specific provider with `model: "openai/gpt-4o"` | | "502 Bad Gateway" | Provider is down | Wait and retry, or use `model: "auto"` to switch providers | | "401 Unauthorized" | Your credentials are wrong | Check your API key or re-authenticate with OAuth | | "429 Too Many Requests" | Rate limited | Wait 1 minute, or connect more providers | **Still stuck?** See the [detailed troubleshooting](#detailed-troubleshooting) below, or ask on [Discord](https://discord.gg/EkzRkpzKYt). --- ## Detailed Troubleshooting --- ## Quick Fixes | Problem | Solution | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | First login not working | Set `INITIAL_PASSWORD` in `.env` (no hardcoded default) | | Dashboard opens on wrong port | Set `PORT=20128` and `NEXT_PUBLIC_BASE_URL=http://localhost:20128` | | No logs written to disk | Set `APP_LOG_TO_FILE=true` and verify call log capture is enabled | | EACCES: permission denied | Set `DATA_DIR=/path/to/writable/dir` to override `~/.omniroute` | | Routing strategy not saving | Update to the latest v3.x release (Zod schema fix for settings persistence shipped in earlier versions) | | Login crash / blank page | Check Node.js version — see [Node.js Compatibility](#nodejs-compatibility) below | | `dlopen` / `slice is not valid mach-o file` (macOS) | Run `cd $(npm root -g)/omniroute/app && npm rebuild better-sqlite3 && omniroute` — see [macOS native module rebuild](#macos-native-module-rebuild) below | | Proxy "fetch failed" | Ensure proxy config is set at the correct level — see [Proxy Issues](#proxy-issues) below | --- ## Node.js Compatibility ### Login page crashes or shows "Module self-registration" error **Cause:** You are running a Node.js version outside OmniRoute's approved secure runtime floor. The most common case is running an older Node 22 or 24 patch level that falls below the patched security floor OmniRoute requires. **Symptoms:** - Login page shows a blank screen or a server error - Console shows `Error: Module did not self-register` or similar native binding errors - The login page shows an **orange warning banner** with your Node version if the runtime is outside the supported secure policy **Fix:** 1. Install a supported Node.js LTS release (recommended: Node.js 24.x): ```bash nvm install 24 nvm use 24 ``` 2. Verify your version: `node --version` should show `v24.0.0` or newer on the 24.x LTS line 3. Reinstall OmniRoute: `npm install -g omniroute` 4. Restart: `omniroute` > **Supported secure versions:** `>=22.22.2 <23` or `>=24.0.0 <27`. Node.js 24.x LTS (Krypton) and Node.js 26 are fully supported. ### macOS: `dlopen` / "slice is not valid mach-o file" **Cause:** After a global `npm install -g omniroute`, the `better-sqlite3` native binary inside the package may have been compiled for a different architecture or Node.js ABI than what is running locally. This is common on macOS (both Apple Silicon and Intel) when the pre-built binary does not match your environment. **Symptoms:** - Server fails immediately on startup with a `dlopen` error - Error contains `slice is not valid mach-o file` - Full example: ``` dlopen(/Users//.nvm/versions/node/v24.14.1/lib/node_modules/omniroute/app/node_modules/better-sqlite3/build/Release/better_sqlite3.node, 0x0001): tried: '...' (slice is not valid mach-o file) ``` **Fix — rebuild for your local environment (no Node.js downgrade required):** ```bash cd $(npm root -g)/omniroute/app npm rebuild better-sqlite3 omniroute ``` > **Note:** This recompiles the native binding against your local Node.js version and CPU architecture, resolving the binary mismatch. The officially supported runtime range is **`>=22.22.2 <23` or `>=24.0.0 <27`** (`SUPPORTED_NODE_RANGE` in `src/shared/utils/nodeRuntimeSupport.ts`, aligned with the `package.json` `engines` field). Node.js 24.x LTS (Krypton) and Node.js 26 are fully supported with `better-sqlite3` v12.x. --- ## Proxy Issues ### Provider validation shows "fetch failed" **Cause:** The API key validation endpoint (`POST /api/providers/validate`) was previously bypassing proxy configuration, causing failures in environments that require proxy routing. **Fix (v3.5.5+):** This is now fixed. Provider validation routes through `runWithProxyContext`, honoring provider-level and global proxy settings automatically. ### Token health check fails with "fetch failed" **Cause:** Background OAuth token refresh was not resolving proxy configuration per connection. **Fix (v3.5.5+):** The token health check scheduler now resolves proxy config per connection before attempting refresh. Update to v3.5.5+. ### SOCKS5 proxy returns "invalid onRequestStart method" **Cause:** On Node.js 22, the undici@8 dispatcher is incompatible with Node's built-in `fetch()` implementation. **Fix (v3.5.5+):** OmniRoute now uses undici's own `fetch()` function when a proxy dispatcher is active, ensuring consistent behavior. Update to v3.5.5+. ### MITM proxy under WSL: desktop apps on the Windows host are not intercepted **Cause:** The MITM proxy and its CA certificate install into the environment where OmniRoute runs. Under WSL that environment is the Linux guest, while the AI desktop apps (Kiro, Trae, Copilot, Zed, …) run on the Windows host. The host apps do not trust the guest's certificate store and do not route through the guest's system proxy, so desktop interception does not engage there. **Recommendation:** Run OmniRoute natively on the same OS as the desktop apps you want to intercept (Windows for Windows apps; macOS/Linux likewise). Keeping OmniRoute inside WSL while targeting host apps requires manually trusting the generated CA certificate on the Windows host and pointing each host app's network/proxy settings at the WSL proxy endpoint — an unsupported, fragile setup. --- ## Provider Issues ### "Language model did not provide messages" **Cause:** Provider quota exhausted. **Fix:** 1. Check dashboard quota tracker 2. Use a combo with fallback tiers 3. Switch to cheaper/free tier ### Rate Limiting **Cause:** Subscription quota exhausted. **Fix:** - Add fallback: `cc/claude-opus-4-6 → glm/glm-4.7 → if/kimi-k2-thinking` - Use GLM/MiniMax as cheap backup ### OAuth Token Expired OmniRoute auto-refreshes tokens. If issues persist: 1. Dashboard → Provider → Reconnect 2. Delete and re-add the provider connection ### Kiro multi-account: second account invalidates the first **Cause:** Kiro's backend enforces a single active session per OIDC client registration. When two accounts share the same registered client (connections imported before v3.8.0), refreshing one account's token invalidates the other's refresh token. **Fix (v3.8.0+):** Re-import affected connections. Starting with v3.8.0, every new Kiro connection created via **Import Token**, **Google/GitHub social login**, or **Auto-Import** automatically registers its own dedicated OIDC client. The connection is therefore fully isolated and refreshing one account has no effect on any other account. Connections that were imported _before_ v3.8.0 do not carry a per-connection client registration. Those connections continue to use the shared social-auth refresh endpoint. To gain isolation, delete the old connection from Dashboard → Providers and re-add it via any of the three import flows. For full details and step-by-step instructions for adding two Kiro accounts side by side, see [`docs/guides/KIRO_SETUP.md`](./KIRO_SETUP.md). --- ## Cloud Issues ### Cloud Sync Errors 1. Verify `BASE_URL` points to your running instance (e.g., `http://localhost:20128`) 2. Verify `CLOUD_URL` points to your cloud endpoint (e.g., `https://omniroute.dev`) 3. Keep `NEXT_PUBLIC_*` values aligned with server-side values ### Cloud `stream=false` Returns 500 **Symptom:** `Unexpected token 'd'...` on cloud endpoint for non-streaming calls. **Cause:** Upstream returns SSE payload while client expects JSON. **Workaround:** Use `stream=true` for cloud direct calls. Local runtime includes SSE→JSON fallback. ### Cloud Says Connected but "Invalid API key" 1. Create a fresh key from local dashboard (`/api/keys`) 2. Run cloud sync: Enable Cloud → Sync Now 3. Old/non-synced keys can still return `401` on cloud --- ## Docker Issues ### CLI Tool Shows Not Installed 1. Check runtime fields: `curl http://localhost:20128/api/cli-tools/runtime/codex | jq` 2. For portable mode: use image target `runner-cli` (bundled CLIs) 3. For host mount mode: set `CLI_EXTRA_PATHS` and mount host bin directory as read-only 4. If `installed=true` and `runnable=false`: binary was found but failed healthcheck ### Quick Runtime Validation ```bash curl -s http://localhost:20128/api/cli-tools/codex-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}' curl -s http://localhost:20128/api/cli-tools/claude-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}' curl -s http://localhost:20128/api/cli-tools/openclaw-settings | jq '{installed,runnable,commandPath,runtimeMode,reason}' ``` --- ## Cost Issues ### High Costs 1. Check usage stats in Dashboard → Usage 2. Switch primary model to GLM/MiniMax 3. Use free tier (Qoder, Kiro) for non-critical tasks 4. Set cost budgets per API key: Dashboard → API Keys → Budget --- ## Debugging ### Enable Log Files Set `APP_LOG_TO_FILE=true` in your `.env` file. Application logs are written under `logs/`. Request artifacts are stored under `${DATA_DIR}/call_logs/` when the call log pipeline is enabled in settings. When pipeline capture is enabled, set `CALL_LOG_PIPELINE_CAPTURE_STREAM_CHUNKS=false` to omit stream chunk payloads, or tune `CALL_LOG_PIPELINE_MAX_SIZE_KB` to change the artifact cap in KB. ### Check Provider Health ```bash # Health dashboard http://localhost:20128/dashboard/health # API health check curl http://localhost:20128/api/monitoring/health ``` ### Runtime Storage - Main state: `${DATA_DIR}/storage.sqlite` (providers, combos, aliases, keys, settings) - Usage: SQLite tables in `storage.sqlite` (`usage_history`, `call_logs`, `proxy_logs`) + optional `${DATA_DIR}/call_logs/` - Application logs: `/logs/...` (when `APP_LOG_TO_FILE=true`) - Call log artifacts: `${DATA_DIR}/call_logs/YYYY-MM-DD/...` when the call log pipeline is enabled The Request Logs page's **Clean history** action clears `call_logs`, legacy `request_detail_logs`, and the local `${DATA_DIR}/call_logs/` artifact directory. --- ## Circuit Breaker Issues ### Provider stuck in OPEN state When a provider's circuit breaker is OPEN, requests are blocked until the cooldown expires. **Fix:** 1. Go to **Dashboard → Settings → Resilience** 2. Check the circuit breaker card for the affected provider 3. Click **Reset All** to clear all breakers, or wait for the cooldown to expire 4. Verify the provider is actually available before resetting ### Provider keeps tripping the circuit breaker If a provider repeatedly enters OPEN state: 1. Check **Dashboard → Health → Provider Health** for the failure pattern 2. Go to **Settings → Resilience → Provider Profiles** and increase the failure threshold 3. Check if the provider has changed API limits or requires re-authentication 4. Review latency telemetry — high latency may cause timeout-based failures --- ## Audio Transcription Issues ### "Unsupported model" error - Ensure you're using the correct prefix: `deepgram/nova-3` or `assemblyai/best` - Verify the provider is connected in **Dashboard → Providers** ### Transcription returns empty or fails - Check supported audio formats: `mp3`, `wav`, `m4a`, `flac`, `ogg`, `webm` - Verify file size is within provider limits (typically < 25MB) - Check provider API key validity in the provider card --- ## Translator Debugging Use **Dashboard → Translator** to debug format translation issues: | Mode | When to Use | | ---------------- | -------------------------------------------------------------------------------------------- | | **Playground** | Compare input/output formats side by side — paste a failing request to see how it translates | | **Chat Tester** | Send live messages and inspect the full request/response payload including headers | | **Test Bench** | Run batch tests across format combinations to find which translations are broken | | **Live Monitor** | Watch real-time request flow to catch intermittent translation issues | ### Common format issues - **Thinking tags not appearing** — Check if the target provider supports thinking and the thinking budget setting - **Tool calls dropping** — Some format translations may strip unsupported fields; verify in Playground mode - **System prompt missing** — Claude and Gemini handle system prompts differently; check translation output - **SDK returns raw string instead of object** — Resolved in v1.x; response sanitizer strips non-standard fields (`x_groq`, `usage_breakdown`, etc.) that cause OpenAI SDK Pydantic validation failures. If you still see this on v3.x+, please file an issue. - **GLM/ERNIE rejects `system` role** — Resolved in v1.x; role normalizer automatically merges system messages into user messages for incompatible models. If you still see this on v3.x+, please file an issue. - **`developer` role not recognized** — Resolved in v1.x; automatically converted to `system` for non-OpenAI providers. If you still see this on v3.x+, please file an issue. - **`json_schema` not working with Gemini** — Resolved in v1.x; `response_format` is now converted to Gemini's `responseMimeType` + `responseSchema`. If you still see this on v3.x+, please file an issue. --- ## Resilience Settings ### Auto rate-limit not triggering - Auto rate-limit only applies to API key providers (not OAuth/subscription) - Verify **Settings → Resilience → Provider Profiles** has auto-rate-limit enabled - Check if the provider returns `429` status codes or `Retry-After` headers ### Tuning exponential backoff Provider profiles support these settings: - **Base delay** — Initial wait time after first failure (default: 1s) - **Max delay** — Maximum wait time cap (default: 30s) - **Multiplier** — How much to increase delay per consecutive failure (default: 2x) ### Anti-thundering herd When many concurrent requests hit a rate-limited provider, OmniRoute uses mutex + auto rate-limiting to serialize requests and prevent cascading failures. This is automatic for API key providers. --- ## Optional RAG / LLM failure taxonomy (16 problems) Some OmniRoute users place the gateway in front of RAG or agent stacks. In those setups it is common to see a strange pattern: OmniRoute looks healthy (providers up, routing profiles ok, no rate limit alerts) but the final answer is still wrong. In practice these incidents usually come from the downstream RAG pipeline, not from the gateway itself. If you want a shared vocabulary to describe those failures you can use the WFGY ProblemMap, an external MIT license text resource that defines sixteen recurring RAG / LLM failure patterns. At a high level it covers: - retrieval drift and broken context boundaries - empty or stale indexes and vector stores - embedding versus semantic mismatch - prompt assembly and context window issues - logic collapse and overconfident answers - long chain and agent coordination failures - multi agent memory and role drift - deployment and bootstrap ordering problems The idea is simple: 1. When you investigate a bad response, capture: - user task and request - route or provider combo in OmniRoute - any RAG context used downstream (retrieved documents, tool calls, etc) 2. Map the incident to one or two WFGY ProblemMap numbers (`No.1` … `No.16`). 3. Store the number in your own dashboard, runbook, or incident tracker next to the OmniRoute logs. 4. Use the corresponding WFGY page to decide whether you need to change your RAG stack, retriever, or routing strategy. Full text and concrete recipes live here (MIT license, text only): [WFGY ProblemMap README](https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md) You can ignore this section if you do not run RAG or agent pipelines behind OmniRoute. --- ## v3.8.0 Known Issues Issues specific to the v3.8.0 release and their current workarounds. If a fix lands in a later patch, the entry will be updated or removed. ### Windsurf OAuth flow fails with 401 **Symptoms:** - "401 unauthorized" while completing the Windsurf OAuth flow from the dashboard - Windsurf provider card stays in "needs reconnection" state after the callback **Causes:** - `WINDSURF_FIREBASE_API_KEY` env var missing or empty - `WINDSURF_API_KEY` misconfigured or pointing at a stale token - Local firewall/proxy blocking the OAuth callback **Fix:** 1. Verify both `WINDSURF_FIREBASE_API_KEY` and `WINDSURF_API_KEY` are set in `.env` 2. Restart OmniRoute so the new env values are picked up 3. Re-run the OAuth flow from **Dashboard → Providers → Windsurf → Reconnect** ### Devin CLI auth failures **Symptoms:** - "Devin CLI not found" or "auth failed" when invoking Devin-backed tools - CLI runtime check reports `installed=false` **Causes:** - `CLI_DEVIN_BIN` points to a path that does not exist - Devin CLI is not installed on the host **Fix:** 1. Install the Devin CLI for your platform 2. Set `CLI_DEVIN_BIN=/usr/local/bin/devin` (or the real path) in `.env` 3. Restart OmniRoute and re-test from **Dashboard → CLI Tools** ### Model cooldown stuck (manual reset) **Symptoms:** - A model stays listed in cooldown even after the expiration time has passed - Requests still skip the model in combo routing despite the timestamp being in the past **Manual reset:** - **Dashboard:** **Settings → Model Cooldowns** → click **Re-enable** on the affected card - **API:** `DELETE /api/resilience/model-cooldowns` with management auth headers ### Command Code provider connection fails with 403 **Symptoms:** - 403 when testing the Command Code provider connection - The provider card shows "unauthorized" after a fresh add **Cause:** The OAuth flow did not complete (callback not received or token not persisted). **Fix:** - Run `omniroute providers` from the CLI to re-trigger the OAuth flow, or - Re-run OAuth from **Dashboard → Providers → Command Code → Reconnect** ### ModelScope returns aggressive 429 cooldowns **Symptoms:** - Very short or immediate cooldowns on ModelScope after a small burst of requests - Combo routing skips ModelScope earlier than expected **Cause:** ModelScope emits provider-specific `Retry-After` headers. v3.8.0 ships dedicated handling for those headers, so older versions misread them as generic rate-limit hints. **Fix:** - Ensure you are on v3.8.0 or later - Verify the `useUpstream429BreakerHints` toggle is enabled under **Settings → Resilience** ### OMNIROUTE_WS_BRIDGE_SECRET missing in production **Symptoms:** - 401 on every Codex/Responses WebSocket bridge request when running on a remote production host - WebSocket bridge handshake closes immediately after connect **Cause:** The `OMNIROUTE_WS_BRIDGE_SECRET` env var is missing from the production environment. **Fix:** 1. Generate a random secret: `openssl rand -hex 32` 2. Set `OMNIROUTE_WS_BRIDGE_SECRET=` in the production server env (and any client that talks to the bridge) 3. Restart OmniRoute ### Responses API: background mode degraded to synchronous **Symptoms:** - Warning logged: `background mode degraded to synchronous` - A `background: true` request returns a normal synchronous response instead of a background job handle **Cause:** v3.8.0 intentionally degrades `background: true` on the Responses API to synchronous execution while emitting a warning. Full async background execution is a future deliverable. **Fix:** - Adjust the client to call without `background`, or - Wait for a later release that ships full async background mode (track the changelog) --- ## Still Stuck? - **GitHub Issues**: [github.com/diegosouzapw/OmniRoute/issues](https://github.com/diegosouzapw/OmniRoute/issues) - **Architecture**: See [`docs/architecture/ARCHITECTURE.md`](../architecture/ARCHITECTURE.md) for internal details - **API Reference**: See [`docs/reference/API_REFERENCE.md`](../reference/API_REFERENCE.md) for all endpoints - **Health Dashboard**: Check **Dashboard → Health** for real-time system status - **Translator**: Use **Dashboard → Translator** to debug format issues