--- title: Monitoring --- # Monitoring Set up Prometheus monitoring, OpenTelemetry tracing, and Grafana dashboards for SMG.

#### Before you begin - Completed the [Getting Started](index.md) guide - Prometheus server (or follow steps below to deploy) - Grafana (optional, for dashboards) - OTLP collector (optional, for distributed tracing)

--- ## Enable Metrics SMG exposes Prometheus metrics on a dedicated port with a 6-layer metric hierarchy. ### Start SMG with metrics ```bash smg \ --worker-urls http://worker:8000 \ --prometheus-port 29000 \ --prometheus-host 0.0.0.0 ``` ### Verify metrics endpoint ```bash curl http://localhost:29000/metrics ``` You should see Prometheus-formatted metrics: ``` # HELP smg_http_requests_total Total HTTP requests # TYPE smg_http_requests_total counter smg_http_requests_total{method="POST",path="/v1/chat/completions"} 1234 ... ``` --- ## OpenTelemetry Tracing SMG supports distributed tracing via OpenTelemetry. ### Enable tracing ```bash smg \ --worker-urls http://worker:8000 \ --enable-trace \ --otlp-traces-endpoint localhost:4317 ``` ### Configuration | Flag | Default | Description | |------|---------|-------------| | `--enable-trace` | `false` | Enable OpenTelemetry tracing | | `--otlp-traces-endpoint` | `localhost:4317` | OTLP gRPC collector endpoint | ### Trace propagation SMG automatically propagates W3C TraceContext headers to workers: - `traceparent` — Trace ID and span ID - `tracestate` — Vendor-specific trace data --- ## Prometheus Configuration ### Basic configuration ```yaml title="prometheus.yml" global: scrape_interval: 15s evaluation_interval: 15s scrape_configs: - job_name: 'smg' static_configs: - targets: ['localhost:29000'] metrics_path: /metrics ``` ### Kubernetes ServiceMonitor For Prometheus Operator: ```yaml title="smg-servicemonitor.yaml" apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: smg namespace: inference labels: app: smg spec: selector: matchLabels: app: smg endpoints: - port: metrics interval: 15s path: /metrics namespaceSelector: matchNames: - inference ``` --- ## Key Metrics by Layer ### Layer 1: HTTP Metrics | Metric | Type | Description | |--------|------|-------------| | `smg_http_requests_total` | Counter | Requests by method, path | | `smg_http_request_duration_seconds` | Histogram | Request latency | | `smg_http_responses_total` | Counter | Responses by path, status_code, error_code | | `smg_http_connections_active` | Gauge | Active connections | | `smg_http_rate_limit_total` | Counter | Rate limit decisions | ### Layer 2: Router Metrics | Metric | Type | Description | |--------|------|-------------| | `smg_router_requests_total` | Counter | Requests by router_type, model, endpoint | | `smg_router_ttft_seconds` | Histogram | Time to first token (gRPC) | | `smg_router_tpot_seconds` | Histogram | Time per output token (gRPC) | | `smg_router_tokens_total` | Counter | Tokens by type (input/output) | | `smg_router_stage_duration_seconds` | Histogram | Pipeline stage durations | ### Layer 3: Worker Metrics | Metric | Type | Description | |--------|------|-------------| | `smg_worker_health` | Gauge | Health status (1=healthy, 0=unhealthy) | | `smg_worker_requests_active` | Gauge | Active requests per worker | | `smg_worker_cb_state` | Gauge | Circuit breaker state | | `smg_worker_retries_total` | Counter | Retry attempts | ### Layer 5: MCP Metrics | Metric | Type | Description | |--------|------|-------------| | `smg_mcp_tool_calls_total` | Counter | Tool invocations by tool_name, result | | `smg_mcp_tool_duration_seconds` | Histogram | Tool execution time | | `smg_mcp_servers_active` | Gauge | Active MCP servers | [View all metrics →](../reference/metrics.md) --- ## Grafana Dashboards ### Essential panels **Request Rate** ```promql sum(rate(smg_http_requests_total[5m])) ``` **P99 Latency** ```promql histogram_quantile(0.99, rate(smg_http_request_duration_seconds_bucket[5m])) ``` **Error Rate** ```promql sum(rate(smg_http_responses_total{status_code=~"5.."}[5m])) / sum(rate(smg_http_responses_total[5m])) ``` **`/v1/responses` Success Rate** ```promql sum(rate(smg_http_responses_total{path="/v1/responses",status_code=~"2.."}[5m])) / sum(rate(smg_http_responses_total{path="/v1/responses"}[5m])) ``` **Time to First Token (TTFT)** ```promql histogram_quantile(0.5, rate(smg_router_ttft_seconds_bucket[5m])) ``` **Tokens per Second** ```promql sum(rate(smg_router_tokens_total[5m])) ``` **Worker Health** ```promql sum(smg_worker_health) ``` --- ## Alerting Rules ```yaml title="smg-alerts.yaml" groups: - name: smg rules: - alert: SMGHighErrorRate expr: | sum(rate(smg_http_responses_total{status_code=~"5.."}[5m])) / sum(rate(smg_http_responses_total[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate on SMG" description: "Error rate is {{ $value | humanizePercentage }}" - alert: SMGWorkerUnhealthy expr: smg_worker_health == 0 for: 1m labels: severity: warning annotations: summary: "SMG worker unhealthy" description: "Worker {{ $labels.worker }} is unhealthy" - alert: SMGHighLatency expr: | histogram_quantile(0.99, rate(smg_http_request_duration_seconds_bucket[5m])) > 5 for: 5m labels: severity: warning annotations: summary: "High latency on SMG" description: "P99 latency is {{ $value }}s" - alert: SMGCircuitBreakerOpen expr: smg_worker_cb_state == 1 for: 1m labels: severity: critical annotations: summary: "Circuit breaker open" description: "Circuit breaker for {{ $labels.worker }} is open" - alert: SMGHighTTFT expr: | histogram_quantile(0.95, rate(smg_router_ttft_seconds_bucket[5m])) > 2 for: 5m labels: severity: warning annotations: summary: "High time to first token" description: "P95 TTFT is {{ $value }}s" - alert: SMGRateLimitRejections expr: rate(smg_http_rate_limit_total{result="rejected"}[5m]) > 10 for: 5m labels: severity: warning annotations: summary: "High rate limit rejections" description: "{{ $value }} rejections/sec" ``` --- ## Useful Queries ### Request analysis ```promql # Request rate by endpoint sum by (path) (rate(smg_http_requests_total[5m])) # Success rate sum(rate(smg_http_responses_total{status_code="200"}[5m])) / sum(rate(smg_http_responses_total[5m])) # Latency percentiles histogram_quantile(0.50, rate(smg_http_request_duration_seconds_bucket[5m])) histogram_quantile(0.95, rate(smg_http_request_duration_seconds_bucket[5m])) histogram_quantile(0.99, rate(smg_http_request_duration_seconds_bucket[5m])) ``` ### LLM performance ```promql # Tokens per second by model sum by (model) (rate(smg_router_tokens_total[5m])) # TTFT by model histogram_quantile(0.5, sum by (model, le) (rate(smg_router_ttft_seconds_bucket[5m]))) # Input/output token ratio sum(rate(smg_router_tokens_total{token_type="output"}[5m])) / sum(rate(smg_router_tokens_total{token_type="input"}[5m])) ``` ### Worker analysis ```promql # Load distribution smg_worker_requests_active / ignoring(worker) group_left sum(smg_worker_requests_active) # Unhealthy workers count(smg_worker_health == 0) # Circuit breaker states count by (worker) (smg_worker_cb_state == 1) ``` ### MCP tool analysis ```promql # Tool success rate sum(rate(smg_mcp_tool_calls_total{result="success"}[5m])) / sum(rate(smg_mcp_tool_calls_total[5m])) # Most used tools topk(10, sum by (tool_name) (rate(smg_mcp_tool_calls_total[5m]))) # Slowest tools topk(5, histogram_quantile(0.95, sum by (tool_name, le) (rate(smg_mcp_tool_duration_seconds_bucket[5m])))) ``` --- ## Verification ```bash # Check metrics are being scraped curl -s http://prometheus:9090/api/v1/targets | jq '.data.activeTargets[] | select(.labels.job=="smg")' # Query a metric curl -s 'http://prometheus:9090/api/v1/query?query=smg_http_requests_total' | jq # Check alerts curl -s http://prometheus:9090/api/v1/alerts | jq ``` --- ## Troubleshooting ??? question "Metrics endpoint not responding" 1. Verify SMG is running with `--prometheus-port`: ```bash ps aux | grep smg ``` 2. Check the port is listening: ```bash netstat -tlnp | grep 29000 ``` 3. Check firewall rules allow access ??? question "Traces not appearing" 1. Verify OTLP endpoint is reachable: ```bash curl http://localhost:4317 ``` 2. Check SMG was started with `--enable-trace` 3. Verify collector is receiving spans ??? question "Missing metrics" 1. Ensure the feature generating metrics is enabled 2. Some metrics only appear for specific router types (e.g., TTFT is gRPC-only) 3. Verify metric name spelling in queries --- ## What's Next? - [Configure Logging](logging.md) — Structured log aggregation - [Configure TLS](tls.md) — Secure client-to-gateway traffic - [Metrics Reference](../reference/metrics.md) — Complete metrics documentation