--- title: Retries --- # Retries SMG implements automatic retries with exponential backoff to handle transient failures gracefully without overwhelming recovering services. --- ## Overview

### :material-refresh: Automatic Retries Transparently retry failed requests to different workers without client intervention.

### :material-chart-timeline: Exponential Backoff Space out retry attempts with increasing delays to give services time to recover.

### :material-shuffle-variant: Jitter Add randomness to backoff timing to prevent thundering herd problems.

### :material-filter: Smart Selection Only retry on transient error codes that are likely to succeed on retry.

--- ## Why Retries? Transient failures are common in distributed systems: - **Network timeouts**: Temporary network congestion or packet loss - **Worker overload**: Temporary capacity limits (429 responses) - **Intermittent errors**: Brief service interruptions during deployments - **Connection issues**: Worker restart or network partition Without retries, every transient failure becomes a client-visible error. With retries, SMG handles these automatically. --- ## Exponential Backoff with Jitter SMG uses exponential backoff with jitter to space out retry attempts: ``` delay = initial_backoff_ms * (backoff_multiplier ^ attempt) delay = min(delay, max_backoff_ms) delay = delay * (1 + random(-jitter_factor, +jitter_factor)) ``` ### Example Progression With default settings (no jitter for clarity): | Attempt | Calculated Delay | |---------|------------------| | 1 | 50ms | | 2 | 75ms | | 3 | 112ms | | 4 | 168ms | | 5 | 253ms | !!! note "Zero-based indexing" The `attempt` variable uses 0-based indexing internally. Attempt 1 in the table corresponds to `attempt=0` in the calculation. ### Why Jitter? Without jitter, if multiple requests fail simultaneously, they all retry at exactly the same time—potentially overwhelming the recovering service. Jitter spreads out retries randomly to prevent this "thundering herd" problem. --- ## Retryable Status Codes SMG automatically retries requests that fail with these status codes: | Code | Meaning | Why Retryable | |------|---------|---------------| | `408` | Request Timeout | Temporary network issue | | `429` | Too Many Requests | Worker temporarily overloaded | | `500` | Internal Server Error | Transient server issue | | `502` | Bad Gateway | Upstream temporarily unavailable | | `503` | Service Unavailable | Service temporarily down | | `504` | Gateway Timeout | Upstream timeout | Requests with other status codes (e.g., 400 Bad Request, 401 Unauthorized) are **not retried** because they would likely fail again. --- ## Configuration ```bash smg \ --worker-urls http://w1:8000 http://w2:8000 \ --retry-max-retries 5 \ --retry-initial-backoff-ms 50 \ --retry-max-backoff-ms 30000 \ --retry-backoff-multiplier 1.5 \ --retry-jitter-factor 0.2 ``` ### Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `--retry-max-retries` | `5` | Maximum number of retry attempts | | `--retry-initial-backoff-ms` | `50` | Initial delay before first retry (milliseconds) | | `--retry-max-backoff-ms` | `30000` | Maximum backoff delay (milliseconds) | | `--retry-backoff-multiplier` | `1.5` | Multiplier applied to delay after each retry | | `--retry-jitter-factor` | `0.2` | Random jitter factor (0.0-1.0) to prevent thundering herd | | `--disable-retries` | `false` | Disable automatic retries entirely | --- ## Recommended Configurations

### :material-lightning-bolt: Latency-Sensitive Minimal retries for interactive applications. ```bash smg \ --retry-max-retries 2 \ --retry-initial-backoff-ms 10 \ --retry-max-backoff-ms 100 ``` **Use when**: Real-time chat, interactive UIs

### :material-server-network: High-Availability Balanced retries for production workloads. ```bash smg \ --retry-max-retries 3 \ --retry-initial-backoff-ms 100 \ --retry-backoff-multiplier 2.0 ``` **Use when**: Production APIs, multi-worker deployments

### :material-cog: Batch Processing Aggressive retries for offline workloads. ```bash smg \ --retry-max-retries 10 \ --retry-initial-backoff-ms 100 \ --retry-max-backoff-ms 60000 \ --retry-backoff-multiplier 2.0 ``` **Use when**: Batch inference, non-interactive pipelines

### :material-close-circle: No Retries Disable retries entirely. ```bash smg --disable-retries ``` **Use when**: Client handles retries, testing failure scenarios

--- ## Interaction with Circuit Breakers Retries and circuit breakers work together: | Circuit State | Retry Behavior | |---------------|----------------| | **Closed** | Normal retries to the worker | | **Open** | Worker skipped; retry goes to different worker | | **Half-Open** | Limited test requests; failures don't count against retry budget | When a circuit is **open**: - Requests are rejected immediately (no retry to that worker) - If other healthy workers exist, the retry goes to them - If all circuits are open, the request fails --- ## Monitoring ### Metrics | Metric | Description | |--------|-------------| | `smg_worker_retries_total` | Total retry attempts by worker type and endpoint | | `smg_worker_retries_exhausted_total` | Requests that exhausted all retries by worker type and endpoint | | `smg_worker_retry_backoff_seconds` | Histogram of backoff delays | ### Useful PromQL Queries

#### Retry Rate ```promql # Retries per second rate(smg_worker_retries_total[5m]) # Retries exhausted per second rate(smg_worker_retries_exhausted_total[5m]) ```

#### Backoff Distribution ```promql # Average backoff delay rate(smg_worker_retry_backoff_seconds_sum[5m]) / rate(smg_worker_retry_backoff_seconds_count[5m]) # 99th percentile backoff histogram_quantile(0.99, smg_worker_retry_backoff_seconds_bucket) ```

### Alert Thresholds | Metric | Warning | Critical | Action | |--------|---------|----------|--------| | Retry rate | >10/sec | >50/sec | Investigate worker health | | Retry success rate | <80% | <50% | Check for persistent failures | | Avg backoff | >5s | >15s | Workers may be overloaded | --- ## Tuning Guidelines | Symptom | Potential Adjustment | |---------|---------------------| | Excessive latency from retries | Reduce `--retry-max-retries`, decrease backoff times | | Thundering herd on recovery | Increase `--retry-jitter-factor` | | Retries exhausted too quickly | Increase `--retry-max-retries`, `--retry-max-backoff-ms` | | Clients seeing too many errors | Increase retry count, check worker health | --- ## What's Next?

### :material-electric-switch: Circuit Breakers Isolate failing workers to prevent cascade failures. [Circuit Breakers →](circuit-breakers.md)

### :material-heart-pulse: Health Checks Proactive worker monitoring and failure detection. [Health Checks →](health-checks.md)

### :material-traffic-light: Rate Limiting Protect workers from overload with token bucket rate limiting. [Rate Limiting →](rate-limiting.md)