---
layout: '@/layouts/Doc.astro'
title: 'Local AI Apps on ALCF: Argo, Inference Endpoints, and One Gateway'
date: 2026-06-27
date-created: 2026-06-27
date-modified: today
description: 'Wiring Claude Code, OpenCode, and the Hermes One desktop app through a single local llm-rosetta gateway that reaches both ALCF Argo (via argo-shim) and ALCF Inference Endpoints (Sophia/Metis), making Claude, GPT, and open models reachable from any client.'
---

ALCF exposes multiple useful model gateways. [Argo][argo] gives me Claude and
GPT-family frontier models through an internal endpoint that's only reachable
from inside the lab network. The newer [ALCF Inference Endpoints][alcf-infer]
service exposes open models on Sophia and Metis through a public
OpenAI-compatible API protected by Globus auth.

I wanted to use both from _local_ desktop apps on my Mac ([Claude Code][claude-code],
[OpenCode][opencode], and the [Hermes One][hermes] desktop app) without exposing
ports, juggling client-specific API keys, or paying a third-party provider.

This post documents the full stack I ended up with: a single local
`llm-rosetta` gateway that fans out to either `argo-shim` (for Argo) or ALCF's
native inference API (for Sophia/Metis), while presenting one localhost API to
every client. Everything runs on `127.0.0.1`, bills through ALCF, and survives
token rotation.

[argo]: https://www.alcf.anl.gov/
[alcf-infer]: https://docs.alcf.anl.gov/services/inference-endpoints/
[claude-code]: https://docs.claude.com/en/docs/claude-code
[opencode]: https://opencode.ai
[hermes]: https://github.com/NousResearch/hermes-agent

> **TL;DR**: `argo-shim` turns the SSH-gated Argo service into a localhost API.
> `llm-rosetta` sits in front of it, translates OpenAI ⇆ Anthropic, and also
> exposes ALCF's native Sophia/Metis inference endpoints. Point Claude Code,
> OpenCode, Hermes, or anything OpenAI-compatible at one local port and route by
> model name.

## The problem

Argo speaks the **Anthropic Messages API** for Claude models
(`/v1/messages`) and the **OpenAI Chat Completions API** for GPT/Gemini
(`/v1/chat/completions`), authenticated with an `x-api-key` header. The ALCF
Inference Endpoints service speaks OpenAI-compatible chat/completions directly,
but needs a Globus access token.

So there are two classes of problems:

1. Argo (`apps.inside.anl.gov`) is only reachable through an SSH jump host behind
   MFA.
2. Different clients want different things: Claude Code speaks Anthropic;
   OpenCode and Hermes speak OpenAI; some send `Authorization: Bearer`, some
   send `x-api-key`.
3. Argo's OpenAI endpoint has two undocumented quirks (more on those later) that
   make it return `HTTP 500` unless you massage the request.
4. The inference service has its own auth lifecycle: Globus access tokens expire
   and must be present when the rosetta gateway starts.

The end-state architecture solves all of this with one local front door
(`llm-rosetta`) and one shim for the SSH-gated Argo path:

```c
      ┌─────────────┐ ┌─────────────┐         ┌─────────────┐ ┌─────────────┐
      │ Claude Code │ │    vtcode   │         │   OpenCode  │ │  Hermes One │
      └─────────────┘ └─────────────┘         └─────────────┘ └─────────────┘
            │ Anthropic     │ Anthropic             │ OpenAI        │ OpenAI
            └───────┬───────┘                       └───────┬───────┘
                    │                                       │
 ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴├  Anthropic-native clients             │
 ┊                  │  can skip rosetta entirely            │
 ┊                  └───────────────────┬───────────────────┘
 ┊                                      │
 ┊                                      ▼
 ┊                       ┌─────────────────────────────┐
 ┊                       │  llm-rosetta gateway :8765  │
 ┊                       │      OpenAI ⇆ Anthropic     │
 ┊                       └───────┬─────────────┬───────┘
 ┊                   Argo        │             │ ALCF Inference
 ┊               Claude, GPT/o   │             │ Sophia, Metis
 ┊                               ▼             ▼
 ┊        ┌──────────────────────────┐  ┌────────────────────────────────┐
 └╴╴╴╴╴╴╴▶│     argo-shim :25940     │  │   inference-api.alcf.anl.gov   │
          │     (auth + fixups)      │  │  /resource_server/{cluster}/…  │
          └────────────┴─────────────┘  └───────────────┴────────────────┘
                       │ SSH tunnel :25939              │
                       ▼                                ▼
         ALCF Argo (apps.inside.anl.gov)      Sophia / Metis endpoints
```

- **Claude Code**, **OpenCode**, **Hermes**, and **vtcode** can all point at
  rosetta if you want one dashboard/request log.
- The Anthropic-native clients (**Claude Code**, **vtcode**) can also skip
  rosetta and talk to `argo-shim` directly, since `argo-shim` already speaks
  Anthropic `/v1/messages` (the dotted path above). Routing them through rosetta
  is only for the unified request log.
- Argo Claude models route through rosetta → `argo-shim` → Argo `/messages`.
- Argo GPT/o-series route through rosetta → `argo-shim` → Argo
  `/chat/completions`.
- ALCF inference models route through rosetta directly to
  `inference-api.alcf.anl.gov` (no SSH tunnel or `argo-shim` needed).

## The pieces

### `argo-shim`: tunnel + auth + fixups

[`argo-shim`][argo-shim] (by [n-getty][n-getty]) is a single-file Python proxy
that:

1. Manages an SSH tunnel to `apps.inside.anl.gov:443`.
2. Listens on `127.0.0.1:25940` and rewrites any path to `/argoapi/...`
   before forwarding upstream.
3. Authenticates local clients with a random token it writes into
   `~/.claude/settings.json` (so Claude Code picks it up automatically).

[argo-shim]: https://github.com/n-getty/argo-shim
[n-getty]: https://github.com/n-getty

To make the OpenAI path work and let OpenAI-format clients authenticate, I made
two small additive patches (covered in [Gotchas](#gotchas) below), both since
[contributed upstream](#contributing-back). Neither touches the Anthropic
`/v1/messages` path that Claude Code uses.

### `llm-rosetta` + the gateway

[`llm-rosetta`][rosetta] (by [Oaklight][oaklight]) is an LLM API translation
layer with an optional HTTP gateway. It converts between OpenAI Chat
Completions, Anthropic Messages, and Google GenAI formats via a central
intermediate representation. The gateway routes by model name:

- A request for Argo `claude-*` → translated **OpenAI → Anthropic** → posted
  to `argo-shim`'s `/v1/messages`.
- A request for Argo `gpt-*` / `o*` → forwarded as **OpenAI Chat Completions**
  to `argo-shim`'s `/argoapi/v1/chat/completions`.
- A request for `alcf-sophia/*` or `alcf-metis/*` → forwarded directly to the
  ALCF Inference Endpoints service with a Globus access token.

[rosetta]: https://github.com/Oaklight/llm-rosetta
[oaklight]: https://github.com/Oaklight

### Hermes + the Hermes One desktop app

[Hermes][hermes] is a tool-calling agent. The **Hermes One** desktop app
(Electron) spawns a local backend that reads `~/.hermes/config.yaml`. Point its
`custom` provider at the rosetta gateway and you get every Argo model in a
native chat UI.

## Reproducing it

Everything below assumes **macOS** with [`uv`][uv] installed and working SSH
access to the ALCF jump host.

[uv]: https://docs.astral.sh/uv/

### 1. Install and start `argo-shim`

```bash
# A recent argo-shim already includes the bearer-auth + user-injection support
# (see "Contributing back"). Install from PyPI, or from a local checkout.
uv tool install argo-shim

# If your ALCF username differs from your local login name, tell the shim:
export CELS_USERNAME=<your-alcf-username>

# Start it (creates the SSH tunnel, writes a token to ~/.claude/settings.json)
argo-shim
```

`argo-shim` listens on `127.0.0.1:25940`, derives a per-user port for the SSH
tunnel, and prints the token. Each restart **rotates** the token (important
later).

Quick sanity check (the token lives in `~/.claude/settings.json`):

```bash
TOKEN=$(python3 -c "import json;print(json.load(open('$HOME/.claude/settings.json'))['apiKeyHelper'].split()[-1])")

# Claude via Anthropic Messages: should return JSON
curl -sS -H "x-api-key: $TOKEN" -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  http://127.0.0.1:25940/argoapi/v1/messages \
  -d '{"model":"Claude Sonnet 4.5","max_tokens":20,
       "messages":[{"role":"user","content":"say hi"}]}'
```

### 2. Claude Code (Anthropic, native)

Claude Code needs no extra config; `argo-shim` writes the base URL and token
into `~/.claude/settings.json` for you:

```json
{
    "apiKeyHelper": "echo <ROTATING_TOKEN>",
    "env": {
        "ANTHROPIC_BASE_URL": "http://127.0.0.1:25940/argoapi"
    }
}
```

That's it. `claude` now routes through Argo.

### 3. OpenCode (OpenAI-format, via a custom provider)

OpenCode reads `~/.config/opencode/opencode.json`. Add an Anthropic-compatible
custom provider pointed at the shim (OpenCode's `@ai-sdk/anthropic` sends
`x-api-key`, which the shim wants):

```json
{
    "provider": {
        "argo": {
            "npm": "@ai-sdk/anthropic",
            "name": "Argo (via argo-shim)",
            "options": {
                "baseURL": "http://127.0.0.1:25940/argoapi/v1",
                "apiKey": "{env:ARGO_SHIM_TOKEN}",
                "headers": { "anthropic-version": "2023-06-01" }
            },
            "models": {
                "claudeopus48": { "name": "claude-4.8-opus" }
            }
        }
    }
}
```

Export the token so `{env:ARGO_SHIM_TOKEN}` resolves (see
[Token rotation](#token-rotation) for a helper that does this automatically):

```bash
export ARGO_SHIM_TOKEN=$(python3 -c "import json;print(json.load(open('$HOME/.claude/settings.json'))['apiKeyHelper'].split()[-1])")
```

### 4. `llm-rosetta` gateway (for OpenAI-only clients)

Install the gateway:

```bash
uv tool install "llm-rosetta[gateway]"
```

Create `~/.config/llm-rosetta-gateway/config.jsonc`. The Argo providers point
at the local shim; the ALCF inference providers point directly at the public
OpenAI-compatible endpoint:

```jsonc
{
    "providers": {
        // Claude models: Anthropic Messages format.
        // base_url has NO /v1: the anthropic template appends /v1/messages.
        "argo": {
            "type": "anthropic",
            "api_key": "${ARGO_SHIM_TOKEN}",
            "base_url": "http://127.0.0.1:25940",
        },
        // GPT/o models: OpenAI Chat Completions.
        // base_url includes /argoapi/v1: the template appends /chat/completions.
        "argo-openai": {
            "type": "openai_chat",
            "api_key": "${ARGO_SHIM_TOKEN}",
            "base_url": "http://127.0.0.1:25940/argoapi/v1",
        },
        // ALCF Inference Endpoints: already OpenAI-compatible.
        // ${ALCF_INFERENCE_TOKEN} is substituted when the gateway starts.
        "alcf-sophia": {
            "type": "openai_chat",
            "api_key": "${ALCF_INFERENCE_TOKEN}",
            "base_url": "https://inference-api.alcf.anl.gov/resource_server/sophia/vllm/v1",
        },
        "alcf-metis": {
            "type": "openai_chat",
            "api_key": "${ALCF_INFERENCE_TOKEN}",
            "base_url": "https://inference-api.alcf.anl.gov/resource_server/metis/api/v1",
        },
    },
    "models": {
        // Register each model in BOTH forms clients send: bare ("claude-opus-4-8")
        // and vendor-prefixed ("anthropic/claude-opus-4-8"). Declare `capabilities`
        // explicitly: the default is ["text"], which makes rosetta strip images.
        "claude-opus-4-8": {
            "provider": "argo",
            "upstream_model": "Claude Opus 4.8",
            "capabilities": ["text", "vision", "tools", "reasoning"],
        },
        "anthropic/claude-opus-4-8": {
            "provider": "argo",
            "upstream_model": "Claude Opus 4.8",
            "capabilities": ["text", "vision", "tools", "reasoning"],
        },
        "gpt-5.5": {
            "provider": "argo-openai",
            "upstream_model": "GPT-5.5",
            "capabilities": ["text", "vision", "tools", "reasoning"],
        },
        "openai/gpt-5.5": {
            "provider": "argo-openai",
            "upstream_model": "GPT-5.5",
            "capabilities": ["text", "vision", "tools", "reasoning"],
        },

        // ALCF inference models can be registered by exact upstream ID and/or a
        // cluster-prefixed alias to make filtering easier in dashboards.
        "alcf-sophia/openai/gpt-oss-120b": {
            "provider": "alcf-sophia",
            "upstream_model": "openai/gpt-oss-120b",
            "capabilities": ["text", "tools", "reasoning"],
        },
        "alcf-metis/gpt-oss-120b": {
            "provider": "alcf-metis",
            "upstream_model": "gpt-oss-120b",
            "capabilities": ["text"],
        },
        // … repeat for every model you want …
    },
    // No server.api_key: the gateway binds to 127.0.0.1 only, so no auth needed.
    "server": { "host": "127.0.0.1", "port": 8765 },
}
```

If you only use Argo models, start it directly:

```bash
llm-rosetta-gateway --no-banner   # listens on 127.0.0.1:8765
```

If you also want **ALCF Inference Endpoints**, authenticate once with Globus and
export a short-lived access token before starting/restarting the gateway:

```bash
# First-time / monthly-ish auth; opens a Globus browser flow.
uv run --with globus-sdk --with openai \
  ~/.config/llm-rosetta-gateway/inference_auth_token.py authenticate

# Refresh/export the access token for this shell, then restart rosetta so
# ${ALCF_INFERENCE_TOKEN} gets substituted into the config.
alcf-inference-token
rosetta-gateway restart
```

For my day-to-day post-shim-restart ritual I use:

```bash
argo-rosetta-sync --with-inference
```

Test the translation (OpenAI request in, Claude answer out):

```bash
curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-opus-4-8",
       "messages":[{"role":"user","content":"say hi"}],"max_tokens":20}'
```

And test an ALCF inference model (if `ALCF_INFERENCE_TOKEN` was set before the
gateway started):

```bash
curl -sS http://127.0.0.1:8765/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"alcf-sophia/openai/gpt-oss-120b",
       "messages":[{"role":"user","content":"say hi"}],"max_tokens":20}'
```

The gateway also ships a web admin panel at
`http://127.0.0.1:8765/admin/` with live metrics and request logs.

> **Heads up**: the admin panel's **Fetch from Provider** button re-discovers
> Argo's full model list and writes entries that _don't work_ (some Gemini
> variants still 500 upstream; spaced display-names get mapped to the wrong
> provider). Manage the model list from the config file instead. I keep a
> [regen script](#a-regen-script) for exactly this.

### 5. Hermes One desktop app

Hermes' backend reads `~/.hermes/config.yaml`. Point its `custom` provider at
the rosetta gateway:

```yaml
model:
    base_url: 'http://127.0.0.1:8765/v1'
    default: 'claude-opus-4-7'
    provider: 'custom'
```

Because the rosetta gateway runs **no-auth** (loopback only), no API key is
needed. Hermes' "API Key required for remote endpoints, optional for
localhost" note applies.

In the desktop app: **Settings → Providers → Local / Others**, set:

| Field        | Value                         |
| ------------ | ----------------------------- |
| **Model**    | `claude-opus-4-7`             |
| **Base URL** | `http://127.0.0.1:8765/v1`    |
| **API Key**  | _(optional, gateway is open)_ |

Click **Done**, restart the app, start a New Chat. 🎉

## Gotchas

These are the non-obvious problems I hit. Two of them needed small,
**additive** patches to `argo-shim`: the Anthropic `/v1/messages` path Claude
Code uses is untouched (those two are now [upstream](#contributing-back)). The
rest were configuration or upstream-app quirks.

### 1. OpenAI models need a `user` field

Argo's `/chat/completions` returns a bare `HTTP 500` for GPT/Gemini requests…
_unless_ you include a `user` field set to a valid ALCF username. With it, the
500 turns into a real completion. The fix: have the shim auto-inject it.

```python
# In argo-shim, alongside the existing /messages handling:
if method == "POST" and body and "/chat/completions" in self.path:
    req = json.loads(body)
    if isinstance(req, dict) and not (req.get("user") or "").strip():
        req["user"] = ARGO_USER          # $ARGO_USER / $CELS_USERNAME / login
        body = json.dumps(req).encode()
```

This one took a while to find: `curl` worked but the real clients all returned 500. The difference was that none of my `curl` tests happened to send a `user`
field that Argo accepted. Once I tried a valid ALCF username, GPT-4o, GPT-5,
and the o-series all came alive.

### 2. OpenAI clients send `Authorization: Bearer`, not `x-api-key`

The shim originally accepted only `x-api-key`. rosetta's `openai_chat`
provider authenticates with `Authorization: Bearer <key>`. Teach the shim to
accept both:

```python
client_key = self.headers.get("x-api-key", "")
if not client_key:
    auth = self.headers.get("Authorization", "")
    if auth.lower().startswith("bearer "):
        client_key = auth[7:].strip()
```

### 3. Hermes' `custom` provider and model-name prefixes

Two Hermes-specific quirks:

- The `custom` provider **doesn't read `CUSTOM_API_KEY`** for the actual
  request: only an inline `api_key` in `config.yaml` (which the desktop UI
  _strips_ on save). Running the gateway no-auth sidesteps this entirely.
- Hermes prepends a vendor prefix to model names
  (`anthropic/claude-opus-4-8`), then matches that exact string against the
  endpoint's `/v1/models` list. So the gateway must register **both** the bare
  and vendor-prefixed forms of every model, hence the duplicated entries in
  the config above.

### 4. The capabilities default silently drops images

The rosetta admin panel showed every model with a single `text` capability
badge, even though Claude and GPT-4o obviously do vision. I assumed it was
cosmetic. It isn't.

If a model entry doesn't declare a `capabilities` list, rosetta defaults it to
`["text"]`. And a model **without `"vision"` has images stripped from the
request** before it's forwarded upstream (`enforce_vision()` replaces them with
text placeholders). So pasting an image into Hermes would have silently dropped
it: no error, just a model that "couldn't see" the image.

The fix is to declare real capabilities per model in the gateway config:

```jsonc
"claude-opus-4-8": {
  "provider": "argo",
  "upstream_model": "Claude Opus 4.8",
  "capabilities": ["text", "vision", "tools", "reasoning"]
}
```

My [regen script](#a-regen-script) now sets these per model family (Claude 4.x
and the GPT-5/o-series get `text + vision + tools + reasoning`; GPT-4o/4.1 get
`text + vision + tools`). After regenerating, images flow through to Argo
instead of being quietly discarded.

### 5. Duplicated responses in Hermes One

After everything worked, Hermes One started showing **every reply twice**:
once as a slightly-reworded partial, then again in full. My first instinct was
that the proxy chain was double-emitting.

It wasn't. I tested the rosetta gateway directly: streaming, non-streaming, and
a fresh non-stream call, and every response came back **clean and singular**.
The shim and Argo were innocent. The duplication only appeared _inside the
Hermes desktop renderer_, and only for chatty tool-calling turns (the offending
turn had `tool_turns=10`).

The culprit was Hermes' `display.interim_assistant_messages: true`, which
renders the model's _interim commentary between tool calls_ **and** the final
answer. With a verbose reasoning model (GPT-5.5) those two are near-identical,
so you see the reply twice with slightly different wording. The fix is one line
in `~/.hermes/config.yaml`:

```yaml
display:
    interim_assistant_messages: false
```

The meta-lesson, again: when something looks broken, **bisect the layers**.
A two-minute `curl` against the gateway saved me from "fixing" a proxy that was
working perfectly.

### 6. ALCF inference models appear even when auth is missing

The ALCF inference providers use `"api_key": "${ALCF_INFERENCE_TOKEN}"` in the
rosetta config. That substitution happens **when the gateway starts**. If the
env var is missing, the models still appear in `/v1/models` and the admin UI,
but calls 401 because the literal placeholder (or no useful bearer token) gets
sent upstream.

The fix is to refresh the Globus token before restarting rosetta:

```bash
alcf-inference-token
rosetta-gateway restart
```

Or, after an `argo-shim` restart, do the whole thing:

```bash
argo-rosetta-sync --with-inference
```

### 7. Gemini is partially available

I originally thought Gemini was completely broken because I kept hitting
`'NoneType' object is not iterable` after fixing the `user` field. Re-testing
more carefully showed a more nuanced picture.

Important distinction: **these are Argo Gemini models, not Google GenAI API
models.** Argo exposes them through its OpenAI-compatible
`/chat/completions` endpoint, so in rosetta they use the `openai_chat`
provider type, not the `google` provider type.

- `Gemini 2.5 Pro` works through Argo's OpenAI `/chat/completions` path.
- `Gemini 2.5 Flash` works too.
- `Gemini 3.5 Flash` also works.
- `Gemini 3.1 Flash Lite` works **only if the request omits `max_tokens` or
  uses `max_completion_tokens`**; with the usual OpenAI-style `max_tokens` field
  it still returns the upstream `NoneType` 500.

So the config includes the working models as `gemini-2.5-pro`,
`gemini-2.5-flash`, `gemini-3.5-flash`, and their `google/...` aliases, while
omitting `Gemini 3.1 Flash Lite` until I add a model-specific request transform.

## Token rotation

Every `argo-shim` restart mints a new token. To keep everything in sync I use
shell helpers in `~/.config/zsh/functions.zsh`. These are the actual pieces you
need, not magic commands hidden elsewhere.

```bash
# Re-export the live shim token into the env vars various clients read.
refresh-argo-token() {
  local settings="$HOME/.claude/settings.json"
  local token
  token=$(python3 -c "import json; print(json.load(open('$settings'))['apiKeyHelper'].split()[-1])") || return 1
  export ARGO_SHIM_TOKEN="$token"          # OpenCode, vtcode, direct shim clients
  export ANTHROPIC_API_KEY="$token"        # Anthropic-native clients
  export ANTHROPIC_BASE_URL="http://127.0.0.1:25940/v1"
  print "ARGO_SHIM_TOKEN refreshed (${token:0:8}…)"
}

# Point Claude Code at rosetta, not directly at argo-shim, so Claude Code
# traffic appears in the rosetta dashboard. argo-shim rewrites Claude settings
# back to :25940/argoapi on startup, so run this after starting argo-shim.
route-claude-through-rosetta() {
  python3 - <<'PY'
import json, pathlib
p = pathlib.Path.home() / ".claude" / "settings.json"
d = json.load(open(p))
env = d.setdefault("env", {})
env["ANTHROPIC_BASE_URL"] = "http://127.0.0.1:8765"
for key in ("NO_PROXY", "no_proxy"):
    vals = [x.strip() for x in env.get(key, "").split(",") if x.strip()]
    for val in ("localhost", "127.0.0.1"):
        if val not in vals:
            vals.append(val)
    env[key] = ",".join(vals)
p.write_text(json.dumps(d, indent=2) + "\n")
PY
}

# Export a fresh Globus access token for ALCF Inference Endpoints. First run may
# require an interactive browser auth flow (see below).
alcf-inference-token() {
  local helper="$HOME/.config/llm-rosetta-gateway/inference_auth_token.py"
  mkdir -p "$(dirname "$helper")"
  if [[ ! -r "$helper" ]]; then
    curl -fsSL "https://raw.githubusercontent.com/argonne-lcf/inference-endpoints/refs/heads/main/inference_auth_token.py" -o "$helper" || return 1
  fi
  local token
  token=$(uv run --with globus-sdk --with openai "$helper" get_access_token) || return 1
  export ALCF_INFERENCE_TOKEN="$token"
  print "ALCF_INFERENCE_TOKEN refreshed (${token:0:8}…)"
}

# Minimal rosetta restart helper. My full version also has start/stop/status,
# but this is the essential part: regenerate config if needed, then restart.
rosetta-gateway-restart() {
  pkill -f llm-rosetta-gateway 2>/dev/null || true
  sleep 1
  nohup llm-rosetta-gateway --no-banner >> "$HOME/.hermes/logs/rosetta-gateway.log" 2>&1 &
}

# One-shot recovery after starting/restarting argo-shim.
argo-rosetta-sync() {
  refresh-argo-token || return 1
  [[ "$1" == "--with-inference" ]] && alcf-inference-token
  # If you keep a regen script, run it here. Otherwise ensure config.jsonc uses
  # the current $ARGO_SHIM_TOKEN / $ALCF_INFERENCE_TOKEN before restarting.
  rosetta-gateway-restart || return 1
  route-claude-through-rosetta || return 1
}
```

So the startup ritual after a reboot is:

```bash
argo-shim                    # in a terminal: creates tunnel, rotates token
refresh-argo-token           # sync Argo token into env
rosetta-gateway-restart      # restart rosetta after config/token changes
route-claude-through-rosetta # optional: send Claude Code through rosetta too
# then open the Hermes One app
```

If I also want the ALCF Inference Endpoints in that same gateway:

```bash
alcf-inference-token      # export Globus access token
rosetta-gateway restart   # reload config with ${ALCF_INFERENCE_TOKEN}
```

Or one bundle after an `argo-shim` restart:

```bash
argo-rosetta-sync --with-inference
```

### A regen script

Clicking the admin panel's **Fetch from Provider** button re-pollutes the model
list (it re-discovers Argo's full set and writes broken entries; see the
heads-up above), so whenever that happens I keep a script that regenerates a
known-good `config.jsonc` from small Python maps of
`{ alias: (upstream_model_id, capabilities) }`: registering bare +
vendor-prefixed forms for Argo, adding `alcf-sophia/*` and `alcf-metis/*`
aliases, attaching the right capabilities (so images aren't stripped), pulling
the live Argo token, and restarting the gateway:

```bash
rosetta-gateway regen-models   # rebuild model list + restart
```

Editing the model maps at the top of that script is the one place to add or
remove models.

## Contributing back

Two fixes I'd made to `argo-shim` were general improvements rather than local
workarounds, so they went back upstream as [a PR][argo-pr]:

1. **Accept `Authorization: Bearer <token>`** in addition to `x-api-key`, so
   any OpenAI-format client can authenticate to the shim.
2. **Auto-inject the `user` field** on `/chat/completions`, so OpenAI/Gemini
   models stop returning `HTTP 500`. The value resolves from `$ARGO_USER`, then
   `$CELS_USERNAME`, then the login user: no hardcoded usernames.

[argo-pr]: https://github.com/n-getty/argo-shim/pull/9

A nice footnote: the PR got an automated review flagging a real edge case: a
non-dict JSON body (e.g. a top-level array) would crash the handler, since the
injection code assumed `req.get(...)` always worked. Guarding on
`isinstance(req, dict)` and treating a blank `user` as missing made it robust.
That's the `isinstance` check you see in the snippet above.

## Wrapping up

The result: **Claude Code, OpenCode, vtcode, and the Hermes One desktop app**
all talking to one local gateway. Argo models go through the SSH-backed shim;
ALCF inference models go straight to Sophia/Metis with Globus auth. From the
client side it all looks like `http://127.0.0.1:8765/v1`.

The two reusable building blocks are worth calling out:

- **[`argo-shim`][argo-shim]** turns a finicky, MFA-gated internal endpoint
  into a clean localhost API, plus the small auth/format fixups that make Argo's
  Claude and GPT paths behave like normal provider endpoints.
- **[`llm-rosetta`][rosetta]** is the universal adapter: any client format in,
  any provider format out. It also becomes a convenient control plane for mixing
  providers (Argo via the shim, Sophia/Metis via ALCF inference, and whatever
  else comes next) behind one localhost base URL.

If you're at a lab with similar internal gateways, this pattern generalizes:
put a small shim in front of anything weird, point a translation gateway at all
of your upstreams, and give every local client a single localhost base URL.