--- layout: '@/layouts/Doc.astro' title: 'Local AI Apps on ALCF: Argo, Inference Endpoints, and One Gateway' date: 2026-06-27 date-created: 2026-06-27 date-modified: today description: 'Wiring Claude Code, OpenCode, and the Hermes One desktop app through a single local llm-rosetta gateway that reaches both ALCF Argo (via argo-shim) and ALCF Inference Endpoints (Sophia/Metis), making Claude, GPT, and open models reachable from any client.' --- ALCF exposes multiple useful model gateways. [Argo][argo] gives me Claude and GPT-family frontier models through an internal endpoint that's only reachable from inside the lab network. The newer [ALCF Inference Endpoints][alcf-infer] service exposes open models on Sophia and Metis through a public OpenAI-compatible API protected by Globus auth. I wanted to use both from _local_ desktop apps on my Mac ([Claude Code][claude-code], [OpenCode][opencode], and the [Hermes One][hermes] desktop app) without exposing ports, juggling client-specific API keys, or paying a third-party provider. This post documents the full stack I ended up with: a single local `llm-rosetta` gateway that fans out to either `argo-shim` (for Argo) or ALCF's native inference API (for Sophia/Metis), while presenting one localhost API to every client. Everything runs on `127.0.0.1`, bills through ALCF, and survives token rotation. [argo]: https://www.alcf.anl.gov/ [alcf-infer]: https://docs.alcf.anl.gov/services/inference-endpoints/ [claude-code]: https://docs.claude.com/en/docs/claude-code [opencode]: https://opencode.ai [hermes]: https://github.com/NousResearch/hermes-agent > **TL;DR**: `argo-shim` turns the SSH-gated Argo service into a localhost API. > `llm-rosetta` sits in front of it, translates OpenAI ⇆ Anthropic, and also > exposes ALCF's native Sophia/Metis inference endpoints. Point Claude Code, > OpenCode, Hermes, or anything OpenAI-compatible at one local port and route by > model name. ## The problem Argo speaks the **Anthropic Messages API** for Claude models (`/v1/messages`) and the **OpenAI Chat Completions API** for GPT/Gemini (`/v1/chat/completions`), authenticated with an `x-api-key` header. The ALCF Inference Endpoints service speaks OpenAI-compatible chat/completions directly, but needs a Globus access token. So there are two classes of problems: 1. Argo (`apps.inside.anl.gov`) is only reachable through an SSH jump host behind MFA. 2. Different clients want different things: Claude Code speaks Anthropic; OpenCode and Hermes speak OpenAI; some send `Authorization: Bearer`, some send `x-api-key`. 3. Argo's OpenAI endpoint has two undocumented quirks (more on those later) that make it return `HTTP 500` unless you massage the request. 4. The inference service has its own auth lifecycle: Globus access tokens expire and must be present when the rosetta gateway starts. The end-state architecture solves all of this with one local front door (`llm-rosetta`) and one shim for the SSH-gated Argo path: ```c ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Claude Code │ │ vtcode │ │ OpenCode │ │ Hermes One │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ Anthropic │ Anthropic │ OpenAI │ OpenAI └───────┬───────┘ └───────┬───────┘ │ │ ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴├ Anthropic-native clients │ ┊ │ can skip rosetta entirely │ ┊ └───────────────────┬───────────────────┘ ┊ │ ┊ ▼ ┊ ┌─────────────────────────────┐ ┊ │ llm-rosetta gateway :8765 │ ┊ │ OpenAI ⇆ Anthropic │ ┊ └───────┬─────────────┬───────┘ ┊ Argo │ │ ALCF Inference ┊ Claude, GPT/o │ │ Sophia, Metis ┊ ▼ ▼ ┊ ┌──────────────────────────┐ ┌────────────────────────────────┐ └╴╴╴╴╴╴╴▶│ argo-shim :25940 │ │ inference-api.alcf.anl.gov │ │ (auth + fixups) │ │ /resource_server/{cluster}/… │ └────────────┴─────────────┘ └───────────────┴────────────────┘ │ SSH tunnel :25939 │ ▼ ▼ ALCF Argo (apps.inside.anl.gov) Sophia / Metis endpoints ``` - **Claude Code**, **OpenCode**, **Hermes**, and **vtcode** can all point at rosetta if you want one dashboard/request log. - The Anthropic-native clients (**Claude Code**, **vtcode**) can also skip rosetta and talk to `argo-shim` directly, since `argo-shim` already speaks Anthropic `/v1/messages` (the dotted path above). Routing them through rosetta is only for the unified request log. - Argo Claude models route through rosetta → `argo-shim` → Argo `/messages`. - Argo GPT/o-series route through rosetta → `argo-shim` → Argo `/chat/completions`. - ALCF inference models route through rosetta directly to `inference-api.alcf.anl.gov` (no SSH tunnel or `argo-shim` needed). ## The pieces ### `argo-shim`: tunnel + auth + fixups [`argo-shim`][argo-shim] (by [n-getty][n-getty]) is a single-file Python proxy that: 1. Manages an SSH tunnel to `apps.inside.anl.gov:443`. 2. Listens on `127.0.0.1:25940` and rewrites any path to `/argoapi/...` before forwarding upstream. 3. Authenticates local clients with a random token it writes into `~/.claude/settings.json` (so Claude Code picks it up automatically). [argo-shim]: https://github.com/n-getty/argo-shim [n-getty]: https://github.com/n-getty To make the OpenAI path work and let OpenAI-format clients authenticate, I made two small additive patches (covered in [Gotchas](#gotchas) below), both since [contributed upstream](#contributing-back). Neither touches the Anthropic `/v1/messages` path that Claude Code uses. ### `llm-rosetta` + the gateway [`llm-rosetta`][rosetta] (by [Oaklight][oaklight]) is an LLM API translation layer with an optional HTTP gateway. It converts between OpenAI Chat Completions, Anthropic Messages, and Google GenAI formats via a central intermediate representation. The gateway routes by model name: - A request for Argo `claude-*` → translated **OpenAI → Anthropic** → posted to `argo-shim`'s `/v1/messages`. - A request for Argo `gpt-*` / `o*` → forwarded as **OpenAI Chat Completions** to `argo-shim`'s `/argoapi/v1/chat/completions`. - A request for `alcf-sophia/*` or `alcf-metis/*` → forwarded directly to the ALCF Inference Endpoints service with a Globus access token. [rosetta]: https://github.com/Oaklight/llm-rosetta [oaklight]: https://github.com/Oaklight ### Hermes + the Hermes One desktop app [Hermes][hermes] is a tool-calling agent. The **Hermes One** desktop app (Electron) spawns a local backend that reads `~/.hermes/config.yaml`. Point its `custom` provider at the rosetta gateway and you get every Argo model in a native chat UI. ## Reproducing it Everything below assumes **macOS** with [`uv`][uv] installed and working SSH access to the ALCF jump host. [uv]: https://docs.astral.sh/uv/ ### 1. Install and start `argo-shim` ```bash # A recent argo-shim already includes the bearer-auth + user-injection support # (see "Contributing back"). Install from PyPI, or from a local checkout. uv tool install argo-shim # If your ALCF username differs from your local login name, tell the shim: export CELS_USERNAME= # Start it (creates the SSH tunnel, writes a token to ~/.claude/settings.json) argo-shim ``` `argo-shim` listens on `127.0.0.1:25940`, derives a per-user port for the SSH tunnel, and prints the token. Each restart **rotates** the token (important later). Quick sanity check (the token lives in `~/.claude/settings.json`): ```bash TOKEN=$(python3 -c "import json;print(json.load(open('$HOME/.claude/settings.json'))['apiKeyHelper'].split()[-1])") # Claude via Anthropic Messages: should return JSON curl -sS -H "x-api-key: $TOKEN" -H "anthropic-version: 2023-06-01" \ -H "content-type: application/json" \ http://127.0.0.1:25940/argoapi/v1/messages \ -d '{"model":"Claude Sonnet 4.5","max_tokens":20, "messages":[{"role":"user","content":"say hi"}]}' ``` ### 2. Claude Code (Anthropic, native) Claude Code needs no extra config; `argo-shim` writes the base URL and token into `~/.claude/settings.json` for you: ```json { "apiKeyHelper": "echo ", "env": { "ANTHROPIC_BASE_URL": "http://127.0.0.1:25940/argoapi" } } ``` That's it. `claude` now routes through Argo. ### 3. OpenCode (OpenAI-format, via a custom provider) OpenCode reads `~/.config/opencode/opencode.json`. Add an Anthropic-compatible custom provider pointed at the shim (OpenCode's `@ai-sdk/anthropic` sends `x-api-key`, which the shim wants): ```json { "provider": { "argo": { "npm": "@ai-sdk/anthropic", "name": "Argo (via argo-shim)", "options": { "baseURL": "http://127.0.0.1:25940/argoapi/v1", "apiKey": "{env:ARGO_SHIM_TOKEN}", "headers": { "anthropic-version": "2023-06-01" } }, "models": { "claudeopus48": { "name": "claude-4.8-opus" } } } } } ``` Export the token so `{env:ARGO_SHIM_TOKEN}` resolves (see [Token rotation](#token-rotation) for a helper that does this automatically): ```bash export ARGO_SHIM_TOKEN=$(python3 -c "import json;print(json.load(open('$HOME/.claude/settings.json'))['apiKeyHelper'].split()[-1])") ``` ### 4. `llm-rosetta` gateway (for OpenAI-only clients) Install the gateway: ```bash uv tool install "llm-rosetta[gateway]" ``` Create `~/.config/llm-rosetta-gateway/config.jsonc`. The Argo providers point at the local shim; the ALCF inference providers point directly at the public OpenAI-compatible endpoint: ```jsonc { "providers": { // Claude models: Anthropic Messages format. // base_url has NO /v1: the anthropic template appends /v1/messages. "argo": { "type": "anthropic", "api_key": "${ARGO_SHIM_TOKEN}", "base_url": "http://127.0.0.1:25940", }, // GPT/o models: OpenAI Chat Completions. // base_url includes /argoapi/v1: the template appends /chat/completions. "argo-openai": { "type": "openai_chat", "api_key": "${ARGO_SHIM_TOKEN}", "base_url": "http://127.0.0.1:25940/argoapi/v1", }, // ALCF Inference Endpoints: already OpenAI-compatible. // ${ALCF_INFERENCE_TOKEN} is substituted when the gateway starts. "alcf-sophia": { "type": "openai_chat", "api_key": "${ALCF_INFERENCE_TOKEN}", "base_url": "https://inference-api.alcf.anl.gov/resource_server/sophia/vllm/v1", }, "alcf-metis": { "type": "openai_chat", "api_key": "${ALCF_INFERENCE_TOKEN}", "base_url": "https://inference-api.alcf.anl.gov/resource_server/metis/api/v1", }, }, "models": { // Register each model in BOTH forms clients send: bare ("claude-opus-4-8") // and vendor-prefixed ("anthropic/claude-opus-4-8"). Declare `capabilities` // explicitly: the default is ["text"], which makes rosetta strip images. "claude-opus-4-8": { "provider": "argo", "upstream_model": "Claude Opus 4.8", "capabilities": ["text", "vision", "tools", "reasoning"], }, "anthropic/claude-opus-4-8": { "provider": "argo", "upstream_model": "Claude Opus 4.8", "capabilities": ["text", "vision", "tools", "reasoning"], }, "gpt-5.5": { "provider": "argo-openai", "upstream_model": "GPT-5.5", "capabilities": ["text", "vision", "tools", "reasoning"], }, "openai/gpt-5.5": { "provider": "argo-openai", "upstream_model": "GPT-5.5", "capabilities": ["text", "vision", "tools", "reasoning"], }, // ALCF inference models can be registered by exact upstream ID and/or a // cluster-prefixed alias to make filtering easier in dashboards. "alcf-sophia/openai/gpt-oss-120b": { "provider": "alcf-sophia", "upstream_model": "openai/gpt-oss-120b", "capabilities": ["text", "tools", "reasoning"], }, "alcf-metis/gpt-oss-120b": { "provider": "alcf-metis", "upstream_model": "gpt-oss-120b", "capabilities": ["text"], }, // … repeat for every model you want … }, // No server.api_key: the gateway binds to 127.0.0.1 only, so no auth needed. "server": { "host": "127.0.0.1", "port": 8765 }, } ``` If you only use Argo models, start it directly: ```bash llm-rosetta-gateway --no-banner # listens on 127.0.0.1:8765 ``` If you also want **ALCF Inference Endpoints**, authenticate once with Globus and export a short-lived access token before starting/restarting the gateway: ```bash # First-time / monthly-ish auth; opens a Globus browser flow. uv run --with globus-sdk --with openai \ ~/.config/llm-rosetta-gateway/inference_auth_token.py authenticate # Refresh/export the access token for this shell, then restart rosetta so # ${ALCF_INFERENCE_TOKEN} gets substituted into the config. alcf-inference-token rosetta-gateway restart ``` For my day-to-day post-shim-restart ritual I use: ```bash argo-rosetta-sync --with-inference ``` Test the translation (OpenAI request in, Claude answer out): ```bash curl -sS http://127.0.0.1:8765/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"claude-opus-4-8", "messages":[{"role":"user","content":"say hi"}],"max_tokens":20}' ``` And test an ALCF inference model (if `ALCF_INFERENCE_TOKEN` was set before the gateway started): ```bash curl -sS http://127.0.0.1:8765/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"alcf-sophia/openai/gpt-oss-120b", "messages":[{"role":"user","content":"say hi"}],"max_tokens":20}' ``` The gateway also ships a web admin panel at `http://127.0.0.1:8765/admin/` with live metrics and request logs. > **Heads up**: the admin panel's **Fetch from Provider** button re-discovers > Argo's full model list and writes entries that _don't work_ (some Gemini > variants still 500 upstream; spaced display-names get mapped to the wrong > provider). Manage the model list from the config file instead. I keep a > [regen script](#a-regen-script) for exactly this. ### 5. Hermes One desktop app Hermes' backend reads `~/.hermes/config.yaml`. Point its `custom` provider at the rosetta gateway: ```yaml model: base_url: 'http://127.0.0.1:8765/v1' default: 'claude-opus-4-7' provider: 'custom' ``` Because the rosetta gateway runs **no-auth** (loopback only), no API key is needed. Hermes' "API Key required for remote endpoints, optional for localhost" note applies. In the desktop app: **Settings → Providers → Local / Others**, set: | Field | Value | | ------------ | ----------------------------- | | **Model** | `claude-opus-4-7` | | **Base URL** | `http://127.0.0.1:8765/v1` | | **API Key** | _(optional, gateway is open)_ | Click **Done**, restart the app, start a New Chat. 🎉 ## Gotchas These are the non-obvious problems I hit. Two of them needed small, **additive** patches to `argo-shim`: the Anthropic `/v1/messages` path Claude Code uses is untouched (those two are now [upstream](#contributing-back)). The rest were configuration or upstream-app quirks. ### 1. OpenAI models need a `user` field Argo's `/chat/completions` returns a bare `HTTP 500` for GPT/Gemini requests… _unless_ you include a `user` field set to a valid ALCF username. With it, the 500 turns into a real completion. The fix: have the shim auto-inject it. ```python # In argo-shim, alongside the existing /messages handling: if method == "POST" and body and "/chat/completions" in self.path: req = json.loads(body) if isinstance(req, dict) and not (req.get("user") or "").strip(): req["user"] = ARGO_USER # $ARGO_USER / $CELS_USERNAME / login body = json.dumps(req).encode() ``` This one took a while to find: `curl` worked but the real clients all returned 500. The difference was that none of my `curl` tests happened to send a `user` field that Argo accepted. Once I tried a valid ALCF username, GPT-4o, GPT-5, and the o-series all came alive. ### 2. OpenAI clients send `Authorization: Bearer`, not `x-api-key` The shim originally accepted only `x-api-key`. rosetta's `openai_chat` provider authenticates with `Authorization: Bearer `. Teach the shim to accept both: ```python client_key = self.headers.get("x-api-key", "") if not client_key: auth = self.headers.get("Authorization", "") if auth.lower().startswith("bearer "): client_key = auth[7:].strip() ``` ### 3. Hermes' `custom` provider and model-name prefixes Two Hermes-specific quirks: - The `custom` provider **doesn't read `CUSTOM_API_KEY`** for the actual request: only an inline `api_key` in `config.yaml` (which the desktop UI _strips_ on save). Running the gateway no-auth sidesteps this entirely. - Hermes prepends a vendor prefix to model names (`anthropic/claude-opus-4-8`), then matches that exact string against the endpoint's `/v1/models` list. So the gateway must register **both** the bare and vendor-prefixed forms of every model, hence the duplicated entries in the config above. ### 4. The capabilities default silently drops images The rosetta admin panel showed every model with a single `text` capability badge, even though Claude and GPT-4o obviously do vision. I assumed it was cosmetic. It isn't. If a model entry doesn't declare a `capabilities` list, rosetta defaults it to `["text"]`. And a model **without `"vision"` has images stripped from the request** before it's forwarded upstream (`enforce_vision()` replaces them with text placeholders). So pasting an image into Hermes would have silently dropped it: no error, just a model that "couldn't see" the image. The fix is to declare real capabilities per model in the gateway config: ```jsonc "claude-opus-4-8": { "provider": "argo", "upstream_model": "Claude Opus 4.8", "capabilities": ["text", "vision", "tools", "reasoning"] } ``` My [regen script](#a-regen-script) now sets these per model family (Claude 4.x and the GPT-5/o-series get `text + vision + tools + reasoning`; GPT-4o/4.1 get `text + vision + tools`). After regenerating, images flow through to Argo instead of being quietly discarded. ### 5. Duplicated responses in Hermes One After everything worked, Hermes One started showing **every reply twice**: once as a slightly-reworded partial, then again in full. My first instinct was that the proxy chain was double-emitting. It wasn't. I tested the rosetta gateway directly: streaming, non-streaming, and a fresh non-stream call, and every response came back **clean and singular**. The shim and Argo were innocent. The duplication only appeared _inside the Hermes desktop renderer_, and only for chatty tool-calling turns (the offending turn had `tool_turns=10`). The culprit was Hermes' `display.interim_assistant_messages: true`, which renders the model's _interim commentary between tool calls_ **and** the final answer. With a verbose reasoning model (GPT-5.5) those two are near-identical, so you see the reply twice with slightly different wording. The fix is one line in `~/.hermes/config.yaml`: ```yaml display: interim_assistant_messages: false ``` The meta-lesson, again: when something looks broken, **bisect the layers**. A two-minute `curl` against the gateway saved me from "fixing" a proxy that was working perfectly. ### 6. ALCF inference models appear even when auth is missing The ALCF inference providers use `"api_key": "${ALCF_INFERENCE_TOKEN}"` in the rosetta config. That substitution happens **when the gateway starts**. If the env var is missing, the models still appear in `/v1/models` and the admin UI, but calls 401 because the literal placeholder (or no useful bearer token) gets sent upstream. The fix is to refresh the Globus token before restarting rosetta: ```bash alcf-inference-token rosetta-gateway restart ``` Or, after an `argo-shim` restart, do the whole thing: ```bash argo-rosetta-sync --with-inference ``` ### 7. Gemini is partially available I originally thought Gemini was completely broken because I kept hitting `'NoneType' object is not iterable` after fixing the `user` field. Re-testing more carefully showed a more nuanced picture. Important distinction: **these are Argo Gemini models, not Google GenAI API models.** Argo exposes them through its OpenAI-compatible `/chat/completions` endpoint, so in rosetta they use the `openai_chat` provider type, not the `google` provider type. - `Gemini 2.5 Pro` works through Argo's OpenAI `/chat/completions` path. - `Gemini 2.5 Flash` works too. - `Gemini 3.5 Flash` also works. - `Gemini 3.1 Flash Lite` works **only if the request omits `max_tokens` or uses `max_completion_tokens`**; with the usual OpenAI-style `max_tokens` field it still returns the upstream `NoneType` 500. So the config includes the working models as `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-3.5-flash`, and their `google/...` aliases, while omitting `Gemini 3.1 Flash Lite` until I add a model-specific request transform. ## Token rotation Every `argo-shim` restart mints a new token. To keep everything in sync I use shell helpers in `~/.config/zsh/functions.zsh`. These are the actual pieces you need, not magic commands hidden elsewhere. ```bash # Re-export the live shim token into the env vars various clients read. refresh-argo-token() { local settings="$HOME/.claude/settings.json" local token token=$(python3 -c "import json; print(json.load(open('$settings'))['apiKeyHelper'].split()[-1])") || return 1 export ARGO_SHIM_TOKEN="$token" # OpenCode, vtcode, direct shim clients export ANTHROPIC_API_KEY="$token" # Anthropic-native clients export ANTHROPIC_BASE_URL="http://127.0.0.1:25940/v1" print "ARGO_SHIM_TOKEN refreshed (${token:0:8}…)" } # Point Claude Code at rosetta, not directly at argo-shim, so Claude Code # traffic appears in the rosetta dashboard. argo-shim rewrites Claude settings # back to :25940/argoapi on startup, so run this after starting argo-shim. route-claude-through-rosetta() { python3 - <<'PY' import json, pathlib p = pathlib.Path.home() / ".claude" / "settings.json" d = json.load(open(p)) env = d.setdefault("env", {}) env["ANTHROPIC_BASE_URL"] = "http://127.0.0.1:8765" for key in ("NO_PROXY", "no_proxy"): vals = [x.strip() for x in env.get(key, "").split(",") if x.strip()] for val in ("localhost", "127.0.0.1"): if val not in vals: vals.append(val) env[key] = ",".join(vals) p.write_text(json.dumps(d, indent=2) + "\n") PY } # Export a fresh Globus access token for ALCF Inference Endpoints. First run may # require an interactive browser auth flow (see below). alcf-inference-token() { local helper="$HOME/.config/llm-rosetta-gateway/inference_auth_token.py" mkdir -p "$(dirname "$helper")" if [[ ! -r "$helper" ]]; then curl -fsSL "https://raw.githubusercontent.com/argonne-lcf/inference-endpoints/refs/heads/main/inference_auth_token.py" -o "$helper" || return 1 fi local token token=$(uv run --with globus-sdk --with openai "$helper" get_access_token) || return 1 export ALCF_INFERENCE_TOKEN="$token" print "ALCF_INFERENCE_TOKEN refreshed (${token:0:8}…)" } # Minimal rosetta restart helper. My full version also has start/stop/status, # but this is the essential part: regenerate config if needed, then restart. rosetta-gateway-restart() { pkill -f llm-rosetta-gateway 2>/dev/null || true sleep 1 nohup llm-rosetta-gateway --no-banner >> "$HOME/.hermes/logs/rosetta-gateway.log" 2>&1 & } # One-shot recovery after starting/restarting argo-shim. argo-rosetta-sync() { refresh-argo-token || return 1 [[ "$1" == "--with-inference" ]] && alcf-inference-token # If you keep a regen script, run it here. Otherwise ensure config.jsonc uses # the current $ARGO_SHIM_TOKEN / $ALCF_INFERENCE_TOKEN before restarting. rosetta-gateway-restart || return 1 route-claude-through-rosetta || return 1 } ``` So the startup ritual after a reboot is: ```bash argo-shim # in a terminal: creates tunnel, rotates token refresh-argo-token # sync Argo token into env rosetta-gateway-restart # restart rosetta after config/token changes route-claude-through-rosetta # optional: send Claude Code through rosetta too # then open the Hermes One app ``` If I also want the ALCF Inference Endpoints in that same gateway: ```bash alcf-inference-token # export Globus access token rosetta-gateway restart # reload config with ${ALCF_INFERENCE_TOKEN} ``` Or one bundle after an `argo-shim` restart: ```bash argo-rosetta-sync --with-inference ``` ### A regen script Clicking the admin panel's **Fetch from Provider** button re-pollutes the model list (it re-discovers Argo's full set and writes broken entries; see the heads-up above), so whenever that happens I keep a script that regenerates a known-good `config.jsonc` from small Python maps of `{ alias: (upstream_model_id, capabilities) }`: registering bare + vendor-prefixed forms for Argo, adding `alcf-sophia/*` and `alcf-metis/*` aliases, attaching the right capabilities (so images aren't stripped), pulling the live Argo token, and restarting the gateway: ```bash rosetta-gateway regen-models # rebuild model list + restart ``` Editing the model maps at the top of that script is the one place to add or remove models. ## Contributing back Two fixes I'd made to `argo-shim` were general improvements rather than local workarounds, so they went back upstream as [a PR][argo-pr]: 1. **Accept `Authorization: Bearer `** in addition to `x-api-key`, so any OpenAI-format client can authenticate to the shim. 2. **Auto-inject the `user` field** on `/chat/completions`, so OpenAI/Gemini models stop returning `HTTP 500`. The value resolves from `$ARGO_USER`, then `$CELS_USERNAME`, then the login user: no hardcoded usernames. [argo-pr]: https://github.com/n-getty/argo-shim/pull/9 A nice footnote: the PR got an automated review flagging a real edge case: a non-dict JSON body (e.g. a top-level array) would crash the handler, since the injection code assumed `req.get(...)` always worked. Guarding on `isinstance(req, dict)` and treating a blank `user` as missing made it robust. That's the `isinstance` check you see in the snippet above. ## Wrapping up The result: **Claude Code, OpenCode, vtcode, and the Hermes One desktop app** all talking to one local gateway. Argo models go through the SSH-backed shim; ALCF inference models go straight to Sophia/Metis with Globus auth. From the client side it all looks like `http://127.0.0.1:8765/v1`. The two reusable building blocks are worth calling out: - **[`argo-shim`][argo-shim]** turns a finicky, MFA-gated internal endpoint into a clean localhost API, plus the small auth/format fixups that make Argo's Claude and GPT paths behave like normal provider endpoints. - **[`llm-rosetta`][rosetta]** is the universal adapter: any client format in, any provider format out. It also becomes a convenient control plane for mixing providers (Argo via the shim, Sophia/Metis via ALCF inference, and whatever else comes next) behind one localhost base URL. If you're at a lab with similar internal gateways, this pattern generalizes: put a small shim in front of anything weird, point a translation gateway at all of your upstreams, and give every local client a single localhost base URL.