---
title: High Availability
---
# High Availability
SMG supports high-availability cluster deployments using mesh networking for fault tolerance, scalability, and zero-downtime updates.
---
## Overview
### :material-shield-check: Fault Tolerance
Continue serving requests when individual router nodes fail. Automatic failover with zero manual intervention.
### :material-arrow-expand-all: Scalability
Distribute load across multiple router instances. Add nodes without downtime.
### :material-sync: State Synchronization
Share worker states, policy configurations, and rate limits across the cluster in real-time.
### :material-rocket-launch: Zero Downtime Updates
Perform rolling updates without service interruption. Graceful shutdown with request draining.
---
## Mesh Architecture

### :material-connection: Gossip Protocol
SWIM-based protocol for membership and failure detection.
- 1-second heartbeat interval
- Automatic peer discovery
- Failure detection in seconds
### :material-crown: Cluster Coordination
Node coordination for cluster operations.
- Membership tracking
- Node status management
- Graceful shutdown coordination
### :material-database-sync: CRDT Stores
Conflict-free Replicated Data Types for eventual consistency.
- No coordination locks
- Partition tolerant
- Automatic conflict resolution
### :material-share-variant: State Replication
Real-time synchronization of all cluster state.
- Worker registry
- Rate limit counters
- Cache-aware routing trees
---
## Configuration
### Quick Start
Enable mesh networking with minimal flags:
```bash
# Start first node
smg --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.1 --mesh-port 39527
# Start second node, joining the first
smg --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.2 --mesh-port 39528 --mesh-peer-urls 10.0.0.1:39527
```
### Command Line Options
| Flag | Default | Description |
|------|---------|-------------|
| `--enable-mesh` | `false` | Enable mesh networking for HA deployments |
| `--mesh-server-name` | `Mesh_<4 random chars>` | Unique identifier for this node in the cluster |
| `--mesh-host` | `0.0.0.0` | Bind address for mesh communication |
| `--mesh-advertise-host` | `--mesh-host` | Routable address advertised to mesh peers |
| `--mesh-port` | `39527` | Port for mesh gRPC communication |
| `--mesh-peer-urls` | (none) | Initial peer URLs for cluster bootstrap; only the first entry is used as the bootstrap peer |
| `--router-selector` | (none) | Label selector for Kubernetes pod discovery (e.g. `app=smg tier=router`) |
!!! note "`--mesh-advertise-host` is required when `--mesh-host` is unspecified"
If `--mesh-host` is set to an unspecified bind address (for example `0.0.0.0`), the gateway refuses to start unless `--mesh-advertise-host` is set to a routable node IP. This prevents other peers from trying to dial an unroutable address.
### Python Entrypoint
`--enable-mesh` is also available in the Python entrypoint used by the Docker image. When
`--mesh-host` is left at `0.0.0.0`, set `--mesh-advertise-host` to a routable address such as
the pod IP:
```bash
smg launch --enable-mesh --mesh-host 0.0.0.0 --mesh-advertise-host 10.0.0.11 --mesh-port 39527
```
### Basic Configuration
**Node 1** (Bootstrap)
```bash
smg --enable-mesh \
--mesh-server-name node1 \
--mesh-host 0.0.0.0 \
--mesh-advertise-host 10.0.0.11 \
--mesh-port 39527 \
--host 0.0.0.0 \
--port 8000
```
**Node 2** (Join)
```bash
smg --enable-mesh \
--mesh-server-name node2 \
--mesh-host 0.0.0.0 \
--mesh-advertise-host 10.0.0.12 \
--mesh-port 39527 \
--mesh-peer-urls "10.0.0.11:39527" \
--host 0.0.0.0 \
--port 8000
```
**Node 3** (Join)
```bash
smg --enable-mesh \
--mesh-server-name node3 \
--mesh-host 0.0.0.0 \
--mesh-advertise-host 10.0.0.13 \
--mesh-port 39527 \
--mesh-peer-urls 10.0.0.11:39527 \
--host 0.0.0.0 \
--port 8000
```
!!! tip "Only the first peer bootstraps membership"
Later peers are learned via gossip after the initial connection. Pass exactly one reachable peer for cluster bootstrap — additional values on `--mesh-peer-urls` are currently ignored.
---
## Gossip Protocol
### State Synchronization
SMG uses a SWIM-based gossip protocol for cluster membership and state propagation:
1. **Ping/Ping-Req**: Each node periodically pings random peers to check health
2. **State Sync**: Healthy nodes exchange state information during pings
3. **Failure Detection**: Unreachable nodes are marked as suspected, then down
4. **Broadcast**: Status changes are broadcast to all cluster members
### Node Status States
| Status | Description |
|--------|-------------|
| `INIT` | Node is starting up |
| `ALIVE` | Node is healthy and reachable |
| `SUSPECTED` | Node may be unreachable (failed ping) |
| `DOWN` | Node confirmed unreachable (failed ping-req) |
| `LEAVING` | Node is gracefully shutting down |
### Failure Detection Timing
| Phase | Duration | Action |
|-------|----------|--------|
| Ping | 1s interval | Direct probe to peer |
| Down | After missed pings | Remove from active cluster |
---
## State Synchronization
### Synchronized State Types
### :material-server: Worker Registry
All nodes share worker discovery and health status.
- Worker URLs and metadata
- Health check results
- Circuit breaker states
### :material-speedometer: Rate Limits
Cluster-wide rate limiting coordination.
- Token bucket state
- Request counters
- Quota synchronization
### :material-tree: Routing Trees
Cache-aware routing state shared across nodes.
- Radix tree operations
- Prefix match data
- LRU eviction coordination
### :material-cog: Policy State
Routing policy configuration and state.
- Policy parameters
- Load balancing weights
- Session affinity mappings
### Cache-Aware State Sync
Cache-aware routing policy state is synchronized across mesh nodes. This ensures that KV cache routing decisions are consistent across all routers in the cluster, preventing redundant cache misses and enabling optimal prefix reuse regardless of which router handles the request.
!!! info "Cluster introspection endpoints"
Use the HA management API under `/ha/*` (listed below) to inspect cluster membership, synchronized worker state, and policy state.
### CRDT Implementation
SMG uses several CRDT types for conflict-free synchronization:
| CRDT Type | Used For | Merge Strategy |
|-----------|----------|----------------|
| G-Counter | Request counts | Sum of all increments |
| PN-Counter | Token buckets | Sum of positive and negative |
| LWW-Register | Worker state | Last-writer-wins by timestamp |
| OR-Set | Worker sets | Union with tombstones |
---
## Deployment Patterns
### Three-Node Cluster (Minimum HA)
**Characteristics**
- Tolerates 1 node failure
- Quorum of 2 for leader election
- Recommended for most deployments
**Configuration**
```bash
# All nodes — point at one reachable peer for bootstrap
smg --enable-mesh \
--mesh-peer-urls node1:39527 \
--worker-urls http://worker1:8000 http://worker2:8000
```
### Five-Node Cluster (Higher Availability)
**Characteristics**
- Tolerates 2 node failures
- Quorum of 3 for leader election
- Suitable for critical workloads
**Configuration**
```bash
# All nodes — point at one reachable peer for bootstrap
smg --enable-mesh \
--mesh-peer-urls node1:39527 \
--worker-urls http://worker1:8000 http://worker2:8000
```
---
## Kubernetes Deployment
### StatefulSet Configuration
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: smg
spec:
serviceName: smg-mesh
replicas: 3
selector:
matchLabels:
app: smg
template:
metadata:
labels:
app: smg
spec:
containers:
- name: smg
image: ghcr.io/lightseekorg/smg:latest
args:
- --enable-mesh
- --mesh-server-name=$(POD_NAME)
- --mesh-host=0.0.0.0
- --mesh-advertise-host=$(POD_IP)
- --mesh-port=39527
- --mesh-peer-urls=smg-0.smg-mesh:39527
- --worker-urls=$(WORKER_URLS)
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 8000
name: http
- containerPort: 39527
name: mesh
```
!!! tip "Engine images"
For all-in-one deployments where each pod runs both gateway and engine, use an engine image tag (e.g., `ghcr.io/lightseekorg/smg:{smg_version}-{engine}-{engine_version}`). See [Getting Started](../../getting-started/index.md#install) for available tags.
### Headless Service
```yaml
apiVersion: v1
kind: Service
metadata:
name: smg-mesh
spec:
clusterIP: None
selector:
app: smg
ports:
- port: 39527
name: mesh
```
### Kubernetes Pod Discovery
Use `--router-selector` to enable automatic pod discovery via the Kubernetes API. SMG will find and join other router pods matching the given label selector, removing the need for static `--mesh-peer-urls`:
```bash
smg --enable-mesh --service-discovery --router-selector app=smg tier=router
```
!!! tip "Label Selectors"
The `--router-selector` flag accepts space-separated `key=value` pairs that map to Kubernetes label selectors. All matching pods with an exposed mesh port are automatically added as peers.
---
## HA Management API
### Health Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/ha/health` | GET | Node health status |
| `/ha/status` | GET | Cluster status information |
| `/ha/workers` | GET | Worker states across cluster |
| `/ha/policies` | GET | Policy states across cluster |
| `/ha/shutdown` | POST | Graceful shutdown trigger |
### Cluster Status Response
```json
{
"node_name": "node1",
"node_count": 3,
"nodes": [
{"name": "node1", "address": "node1:39527", "status": "1", "version": 1},
{"name": "node2", "address": "node2:39527", "status": "1", "version": 1},
{"name": "node3", "address": "node3:39527", "status": "1", "version": 1}
],
"stores": {
"membership_count": 3,
"worker_count": 0,
"policy_count": 0,
"app_count": 0
}
}
```
The `status` field is emitted as the stringified discriminant of the `NodeStatus` prost enum: `"0"` INIT, `"1"` ALIVE, `"2"` SUSPECTED, `"3"` DOWN, `"4"` LEAVING.
---
## Monitoring
### Mesh Metrics
| Metric | Description |
|--------|-------------|
| `router_mesh_peer_connections` | Number of active peer connections |
| `router_mesh_peer_reconnects_total` | Total number of peer reconnections |
| `router_mesh_batches_total` | Total state update batches sent/received |
| `router_mesh_bytes_total` | Total bytes transmitted in mesh |
| `router_mesh_convergence_ms` | State convergence time across the mesh |
| `router_mesh_snapshot_trigger_total` | Total number of snapshot triggers |
### Alerting Rules
```yaml
groups:
- name: smg-mesh
rules:
# router_mesh_peer_connections is a per-peer gauge (0/1, labeled by "peer").
# count(router_mesh_peer_connections == 1) gives the number of active peer links.
# Adjust the threshold to match your expected peer count (e.g. N-1 for an N-node cluster).
- alert: SMGClusterDegraded
expr: count(router_mesh_peer_connections == 1) < 2
for: 1m
labels:
severity: warning
annotations:
summary: "SMG cluster has fewer than expected peer connections"
- alert: SMGNodeDown
expr: router_mesh_peer_connections == 0
for: 30s
labels:
severity: critical
annotations:
summary: "SMG mesh node {{ $labels.peer }} is down"
```
---
## Best Practices
### :material-numeric-3-circle: Odd Node Counts
Use 3, 5, or 7 nodes to avoid split-brain scenarios during network partitions.
### :material-earth: Availability Zones
Distribute nodes across availability zones for resilience against zone failures.
### :material-network: Network Latency
Keep mesh nodes in the same region (< 10ms RTT) for optimal state sync performance.
### :material-monitor: Monitoring
Monitor `count(router_mesh_peer_connections == 1)` to track active peer links and alert when the count drops below your expected threshold.
---
## Troubleshooting
### Common Issues
| Symptom | Cause | Solution |
|---------|-------|----------|
| Node stuck in INIT | Cannot reach peers | Check firewall rules for mesh port |
| Frequent leader elections | Network instability | Increase gossip timeouts |
| State inconsistency | Clock skew | Synchronize NTP across nodes |
| High sync latency | Large state | Increase sync interval |
### Debug Logging
```bash
RUST_LOG=smg::mesh=debug smg --enable-mesh ...
```
### Verify Cluster Health
```bash
# Check cluster status
curl http://node1:8000/ha/status | jq
# Check individual node health
curl http://node1:8000/ha/health | jq
# Check worker states
curl http://node1:8000/ha/workers | jq
# Check policy states
curl http://node1:8000/ha/policies | jq
```
---
## What's Next?
### :material-power: Graceful Shutdown
Allow in-flight requests to complete during shutdown.
[Graceful Shutdown →](../reliability/graceful-shutdown.md)
### :material-electric-switch: Circuit Breakers
Isolate failing workers to prevent cascade failures.
[Circuit Breakers →](../reliability/circuit-breakers.md)
### :material-chart-box: Metrics Reference
Complete list of mesh networking metrics.
[Metrics Reference →](../../reference/metrics.md)