--- title: Multiple Workers --- # Multiple Workers SMG can route across many workers simultaneously — local inference servers, remote cloud APIs, or a mix of both. This guide covers how to add workers and balance traffic across them.
#### Before you begin - Completed the [Getting Started](index.md) guide
## Supported Worker Types SMG connects to workers over HTTP or gRPC, and supports both local inference servers and remote API providers: | Worker Type | Protocol | Example URL | |-------------|----------|-------------| | SGLang | HTTP / gRPC | `http://worker:8000` or `grpc://worker:50051` | | vLLM | gRPC | `grpc://worker:50051` | | TensorRT-LLM | gRPC | `grpc://worker:50051` | | OpenAI (GPT) | HTTP | `https://api.openai.com` | | Anthropic (Claude) | HTTP | `https://api.anthropic.com` | | xAI (Grok) | HTTP | `https://api.x.ai` | | Google (Gemini) | HTTP | `https://generativelanguage.googleapis.com` | | Any OpenAI-compatible API | HTTP | `https://your-provider.com` | ## Static Workers via CLI Pass multiple URLs to `--worker-urls`: ```bash smg \ --worker-urls http://worker1:8000 http://worker2:8000 http://worker3:8000 \ --policy round_robin \ --host 0.0.0.0 \ --port 30000 ``` For gRPC workers, use the `grpc://` scheme and provide `--model-path` so the gateway can load the tokenizer: ```bash smg \ --worker-urls grpc://worker1:50051 grpc://worker2:50052 \ --model-path meta-llama/Llama-3.1-8B-Instruct \ --policy round_robin ``` See [gRPC Workers](grpc-workers.md) for details on what gRPC mode enables. ## Cloud API Workers Route to cloud providers by setting `--backend` and passing the provider URL. API keys are either passed by the caller through the `Authorization` header (BYOK) or stored on the worker record when registering via the admin API. SMG auto-detects the provider (OpenAI, Anthropic, xAI, Gemini) from the model name and applies the correct API transformations. === "OpenAI" ```bash export OPENAI_API_KEY=sk-... smg \ --backend openai \ --worker-urls https://api.openai.com \ --host 0.0.0.0 \ --port 30000 ``` === "Anthropic" ```bash export ANTHROPIC_API_KEY=sk-ant-... smg \ --backend openai \ --worker-urls https://api.anthropic.com \ --host 0.0.0.0 \ --port 30000 ``` === "xAI (Grok)" ```bash export XAI_API_KEY=xai-... smg \ --backend openai \ --worker-urls https://api.x.ai \ --host 0.0.0.0 \ --port 30000 ``` === "Gemini" ```bash export GEMINI_API_KEY=... smg \ --backend openai \ --worker-urls https://generativelanguage.googleapis.com \ --host 0.0.0.0 \ --port 30000 ``` ## Dynamic Workers with IGW Mode In Inference Gateway (IGW) mode, SMG starts with no workers and you add or remove them at runtime via the REST API: ```bash smg --enable-igw --host 0.0.0.0 --port 30000 ``` ### Add a worker ```bash curl -X POST http://localhost:30000/workers \ -H "Content-Type: application/json" \ -d '{"url": "http://worker1:8000"}' ``` Response: ```json { "status": "accepted", "worker_id": "a1b2c3d4", "url": "http://worker1:8000", "location": "/workers/a1b2c3d4", "message": "Worker addition queued for background processing" } ``` ### List workers ```bash curl http://localhost:30000/workers ``` ### Remove a worker ```bash curl -X DELETE http://localhost:30000/workers/{worker_id} ``` ### Worker configuration options The `POST /workers` endpoint accepts additional fields: ```json { "url": "http://worker:8000", "api_key": "optional-key", "runtime": "sglang", "worker_type": "regular", "priority": 50, "cost": 1.0, "labels": {"region": "us-east"} } ``` | Field | Default | Description | |-------|---------|-------------| | `url` | (required) | Worker URL (`http://`, `grpc://`, or `https://` for cloud) | | `api_key` | — | API key for authenticated workers | | `runtime` | (auto-detect) | Runtime: `sglang`, `vllm`, `trtllm`, `mlx`, or `external` | | `worker_type` | `regular` | Type: `regular`, `prefill`, or `decode` | | `priority` | `50` | Routing priority (0–100, higher = preferred) | | `cost` | `1.0` | Cost multiplier for cost-aware routing | | `labels` | `{}` | Arbitrary metadata | ## Verify ```bash # List connected workers curl http://localhost:30000/workers # Check health curl http://localhost:30000/health # Send a request curl http://localhost:30000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Next Steps - [Monitoring](monitoring.md) — Track request rates, latency, and worker health - [gRPC Workers](grpc-workers.md) — Enable tokenization, chat templates, and tool parsing at the gateway - [PD Disaggregation](pd-disaggregation.md) — Separate prefill and decode onto specialized workers