--- title: Service Discovery --- # Service Discovery SMG automatically discovers and registers workers in Kubernetes environments, eliminating manual worker URL management and enabling dynamic scaling. --- ## Overview
### :material-kubernetes: Native Kubernetes Watch pods matching label selectors with automatic registration and removal.
### :material-sync: Dynamic Scaling Workers are automatically added and removed as pods scale up or down.
### :material-filter: Label Selectors Target specific workers using Kubernetes label selectors.
### :material-swap-horizontal: PD Support Separate discovery for prefill and decode workers in disaggregated deployments.
--- ## How It Works
![Service Discovery Architecture](../../assets/images/service-discovery.svg)
### Discovery Flow 1. **Watch Pods**: SMG creates a Kubernetes watcher for pods matching the configured label selector 2. **Filter Events**: Only pods matching the selector (regular or PD mode) are processed 3. **Handle Events**: Pod creation triggers `AddWorker` job, deletion triggers `RemoveWorker` job 4. **Register Workers**: Workers are added to the registry with health checks starting immediately 5. **Track State**: A HashSet tracks discovered pods to prevent duplicate registrations --- ## Configuration ### Basic Setup ```bash smg \ --service-discovery \ --selector app=sglang-worker \ --service-discovery-namespace inference \ --service-discovery-port 8000 ``` ### Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--service-discovery` | `false` | Enable Kubernetes service discovery | | `--selector` | - | Label selector for worker pods (required) | | `--service-discovery-namespace` | (all namespaces) | Kubernetes namespace to watch | | `--service-discovery-port` | `80` | Port to use for worker connections | --- ## Label Selectors SMG uses Kubernetes label selectors to identify worker pods. ### Simple Selector Match pods with a single label: ```bash smg --service-discovery --selector app=vllm ``` Matches pods with label `app=vllm`. ### Multiple Labels Match pods that carry several labels by passing multiple `key=value` pairs: ```bash smg --service-discovery --selector app=sglang environment=production ``` Matches pods with both `app=sglang` AND `environment=production`. --- ## PD Disaggregation Discovery For prefill-decode disaggregated deployments, use separate selectors for each worker type. ### Configuration ```bash smg \ --service-discovery \ --pd-disaggregation \ --prefill-selector app=sglang role=prefill \ --decode-selector app=sglang role=decode \ --service-discovery-namespace inference ``` ### Parameters | Parameter | Description | |-----------|-------------| | `--prefill-selector` | Label selector for prefill workers | | `--decode-selector` | Label selector for decode workers | ### Worker Labels Label your pods appropriately: ```yaml # Prefill worker apiVersion: v1 kind: Pod metadata: name: sglang-prefill-0 labels: app: sglang role: prefill spec: containers: - name: sglang image: lmsysorg/sglang:latest args: ["--dp-size", "1", "--prefill-only"] --- # Decode worker apiVersion: v1 kind: Pod metadata: name: sglang-decode-0 labels: app: sglang role: decode spec: containers: - name: sglang image: lmsysorg/sglang:latest args: ["--dp-size", "1", "--decode-only"] ``` --- ## Required RBAC SMG needs permissions to watch pods in the target namespace. ### Role ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: smg-discovery namespace: inference rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch"] ``` ### RoleBinding ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: smg-discovery namespace: inference subjects: - kind: ServiceAccount name: smg namespace: inference roleRef: kind: Role name: smg-discovery apiGroup: rbac.authorization.k8s.io ``` ### ServiceAccount ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: smg namespace: inference ``` ### Cross-Namespace Discovery To discover workers across multiple namespaces, use a ClusterRole: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: smg-discovery rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: smg-discovery subjects: - kind: ServiceAccount name: smg namespace: inference roleRef: kind: ClusterRole name: smg-discovery apiGroup: rbac.authorization.k8s.io ``` --- ## Complete Deployment Example ### SMG Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: smg namespace: inference spec: replicas: 1 selector: matchLabels: app: smg template: metadata: labels: app: smg spec: serviceAccountName: smg containers: - name: smg image: ghcr.io/lightseekorg/smg:latest args: - --service-discovery - --selector=app=sglang-worker - --service-discovery-namespace=inference - --service-discovery-port=8000 - --policy=cache_aware ports: - containerPort: 8000 name: http ``` !!! tip "Engine images" For all-in-one deployments where each pod runs both gateway and engine, use an engine image tag (e.g., `ghcr.io/lightseekorg/smg:{smg_version}-{engine}-{engine_version}`). See [Getting Started](../../getting-started/index.md#install) for available tags. ### Worker StatefulSet ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: sglang-worker namespace: inference spec: serviceName: sglang-worker replicas: 3 selector: matchLabels: app: sglang-worker template: metadata: labels: app: sglang-worker spec: containers: - name: sglang image: lmsysorg/sglang:latest args: - --model-path=meta-llama/Llama-3.1-8B-Instruct - --port=8000 ports: - containerPort: 8000 ``` --- ## Worker Lifecycle ### Registration Flow 1. **Pod Created**: Kubernetes creates a new worker pod 2. **Watch Event**: SMG receives the pod creation event 3. **Capability Query**: SMG queries the worker's `/model_info` endpoint (falling back to the deprecated `/get_model_info` if the new path returns 404) 4. **Registration**: Worker is added to the registry 5. **Health Check**: Background health checks begin ### Removal Flow 1. **Pod Terminating**: Kubernetes begins pod termination 2. **Watch Event**: SMG receives the pod deletion event 3. **Drain**: SMG stops sending new requests to the worker 4. **Removal**: Worker is removed from the registry ### Worker States | State | Description | Receives Traffic | |-------|-------------|------------------| | **Pending** | Just registered, not yet proven healthy locally | No | | **Ready** | Locally verified and passing health checks | Yes | | **NotReady** | Previously `Ready`, now failing readiness checks; not removed unless configured | No | | **Failed** | Sustained liveness failure; removed when `--remove-unhealthy-workers` is set | No | --- ## Monitoring ### Metrics | Metric | Description | |--------|-------------| | `smg_discovery_workers_discovered` | Workers known via discovery | | `smg_discovery_registrations_total` | Worker registration events | | `smg_discovery_deregistrations_total` | Worker deregistration events | | `smg_discovery_sync_duration_seconds` | Duration of each periodic reconciliation cycle | ### Logs ```bash # Enable discovery debug logging RUST_LOG=smg::discovery=debug smg --service-discovery ... ``` Example log output: ``` [INFO] Watching pods in namespace 'inference' with selector 'app=sglang-worker' [INFO] Discovered new pod: sglang-worker-0 (10.0.0.5:8000) [INFO] Registered worker: http://10.0.0.5:8000 [INFO] Discovered new pod: sglang-worker-1 (10.0.0.6:8000) [INFO] Registered worker: http://10.0.0.6:8000 ``` --- ## Troubleshooting | Symptom | Cause | Solution | |---------|-------|----------| | No workers discovered | Wrong selector | Verify labels match selector | | RBAC error | Missing permissions | Apply Role and RoleBinding | | Workers not ready | Health check failing | Check worker health endpoint | | Stale workers | Watch disconnected | Check Kubernetes API connectivity | ### Verify Discovery ```bash # Check discovered workers via admin API curl http://smg:30000/workers | jq # Check pod labels match selector kubectl get pods -n inference -l app=sglang-worker # Verify RBAC kubectl auth can-i watch pods -n inference --as=system:serviceaccount:inference:smg ``` --- ## What's Next?
### :material-swap-horizontal: PD Disaggregation Learn about prefill-decode separation. [PD Disaggregation →](../routing/pd-disaggregation.md)
### :material-scale-balance: Load Balancing Configure routing policies for discovered workers. [Load Balancing →](../routing/load-balancing.md)
### :material-heart-pulse: Health Checks Configure health monitoring for workers. [Health Checks →](../reliability/health-checks.md)