--- title: Metrics Reference --- # Metrics Reference Complete reference for Prometheus metrics exposed by SMG. Metrics are organized in six layers matching the request lifecycle. --- ## Metrics Endpoint Metrics are exposed on the Prometheus port (default: `29000`): ```bash curl http://localhost:29000/metrics ``` The same listener also serves a WebSocket stream of real-time metric updates at `/ws/metrics` (used by the TUI and dashboards that need live state). Configure via CLI: ```bash smg --prometheus-port 29000 --prometheus-host 0.0.0.0 ``` --- ## Layer 1: HTTP Metrics Metrics for incoming HTTP requests at the gateway edge. ### `smg_http_requests_total` Total HTTP requests received by the gateway. | Type | Labels | |------|--------| | Counter | `method`, `path` | ```promql # Request rate by endpoint sum by (path) (rate(smg_http_requests_total[5m])) # Total request rate sum(rate(smg_http_requests_total[5m])) ``` --- ### `smg_http_request_duration_seconds` HTTP request duration from receipt to response. | Type | Labels | |------|--------| | Histogram | `method`, `path` | ```promql # P99 latency by endpoint histogram_quantile(0.99, sum by (path, le) (rate(smg_http_request_duration_seconds_bucket[5m]))) # Average latency rate(smg_http_request_duration_seconds_sum[5m]) / rate(smg_http_request_duration_seconds_count[5m]) ``` --- ### `smg_http_responses_total` HTTP responses by path, status, and error code. | Type | Labels | |------|--------| | Counter | `path`, `status_code`, `error_code` | ```promql # Error rate (5xx responses) sum(rate(smg_http_responses_total{status_code=~"5.."}[5m])) / sum(rate(smg_http_responses_total[5m])) # Success rate sum(rate(smg_http_responses_total{status_code="200"}[5m])) / sum(rate(smg_http_responses_total[5m])) # Success rate for /v1/responses sum(rate(smg_http_responses_total{path="/v1/responses",status_code=~"2.."}[5m])) / sum(rate(smg_http_responses_total{path="/v1/responses"}[5m])) ``` --- ### `smg_http_connections_active` Currently active HTTP connections. | Type | Labels | |------|--------| | Gauge | None | --- ### `smg_http_inflight_request_age_count` Distribution of in-flight request ages for Grafana heatmaps. | Type | Labels | |------|--------| | Gauge | `gt`, `le` | Age buckets (seconds): 30, 60, 180, 300, 600, 1200, 3600, 7200, 14400, 28800, 86400 --- ### `smg_http_rate_limit_total` Rate limiting decisions. | Type | Labels | |------|--------| | Counter | `result` | Values: `allowed`, `rejected` ```promql # Rejection rate rate(smg_http_rate_limit_total{result="rejected"}[5m]) / sum(rate(smg_http_rate_limit_total[5m])) ``` --- ## Layer 2: Router Metrics Metrics for request routing and processing. ### `smg_router_requests_total` Requests processed by the router. | Type | Labels | |------|--------| | Counter | `router_type`, `backend_type`, `connection_mode`, `model`, `endpoint`, `streaming` | Router types: `openai`, `http`, `grpc` Backend types: `regular`, `pd`, `external`, `harmony` Endpoints: `chat`, `generate`, `responses`, `completions`, `rerank`, `embeddings`, `classify`, `messages`, `realtime`, `realtime_sessions`, `realtime_client_secrets`, `realtime_transcription` Streaming: `true`, `false` ```promql # Request rate by model sum by (model) (rate(smg_router_requests_total[5m])) # Streaming vs non-streaming sum by (streaming) (rate(smg_router_requests_total[5m])) ``` --- ### `smg_router_request_duration_seconds` Total router request duration. | Type | Labels | |------|--------| | Histogram | `router_type`, `backend_type`, `connection_mode`, `model`, `endpoint` | --- ### `smg_router_request_errors_total` Router errors by type. | Type | Labels | |------|--------| | Counter | `router_type`, `backend_type`, `connection_mode`, `model`, `endpoint`, `error_type` | Error types: `no_workers`, `timeout`, `backend_error`, `validation_error`, `internal_error` ```promql # Error rate by type sum by (error_type) (rate(smg_router_request_errors_total[5m])) ``` --- ### `smg_router_stage_duration_seconds` Duration of individual pipeline stages (gRPC mode only). | Type | Labels | |------|--------| | Histogram | `router_type`, `stage` | Stage names are emitted by the gRPC pipeline (e.g., `tokenize`, `route`, `inference`, `detokenize`, `tool_parse`). ```promql # Tokenization latency histogram_quantile(0.99, rate(smg_router_stage_duration_seconds_bucket{stage="tokenize"}[5m])) ``` --- ### `smg_router_ttft_seconds` Time to first token (gRPC streaming only). | Type | Labels | |------|--------| | Histogram | `router_type`, `backend_type`, `model`, `endpoint` | ```promql # P50 TTFT by model histogram_quantile(0.5, sum by (model, le) (rate(smg_router_ttft_seconds_bucket[5m]))) ``` --- ### `smg_router_tpot_seconds` Time per output token (gRPC streaming only). | Type | Labels | |------|--------| | Histogram | `router_type`, `backend_type`, `model`, `endpoint` | ```promql # Average TPOT rate(smg_router_tpot_seconds_sum[5m]) / rate(smg_router_tpot_seconds_count[5m]) ``` --- ### `smg_router_tokens_total` Token counts by type. | Type | Labels | |------|--------| | Counter | `router_type`, `backend_type`, `model`, `endpoint`, `token_type` | Token types: `input`, `output` ```promql # Tokens per second sum by (token_type) (rate(smg_router_tokens_total[5m])) # Input/output ratio sum(rate(smg_router_tokens_total{token_type="output"}[5m])) / sum(rate(smg_router_tokens_total{token_type="input"}[5m])) ``` --- ### `smg_router_generation_duration_seconds` Total generation time (first token to last token). | Type | Labels | |------|--------| | Histogram | `router_type`, `backend_type`, `model`, `endpoint` | --- ### `smg_router_upstream_responses_total` HTTP responses from upstream workers. | Type | Labels | |------|--------| | Counter | `router_type`, `status_code`, `error_code` | --- ## Layer 3: Worker Metrics Metrics for worker pool management and resilience. ### `smg_worker_pool_size` Number of workers in the pool. | Type | Labels | |------|--------| | Gauge | `worker_type`, `connection_mode`, `model` | --- ### `smg_worker_connections_active` Active connections per worker pool. | Type | Labels | |------|--------| | Gauge | `worker_type`, `connection_mode` | --- ### `smg_worker_requests_active` Active requests per worker. | Type | Labels | |------|--------| | Gauge | `worker` | ```promql # Load distribution across workers smg_worker_requests_active / ignoring(worker) group_left sum(smg_worker_requests_active) ``` --- ### `smg_worker_health` Worker health status. | Type | Labels | Values | |------|--------|--------| | Gauge | `worker` | `1` = healthy, `0` = unhealthy | ```promql # Count healthy workers sum(smg_worker_health) # Alert on unhealthy workers smg_worker_health == 0 ``` --- ### `smg_worker_health_checks_total` Health check results. | Type | Labels | |------|--------| | Counter | `worker_type`, `result` | Results: `success`, `failure` --- ### `smg_worker_selection_total` Worker selection events by load balancer. | Type | Labels | |------|--------| | Counter | `worker_type`, `connection_mode`, `model`, `policy` | --- ### `smg_worker_errors_total` Worker-level errors by type. | Type | Labels | |------|--------| | Counter | `worker_type`, `connection_mode`, `error_type` | --- ### Circuit Breaker Metrics #### `smg_worker_cb_state` Circuit breaker state per worker. | Type | Labels | Values | |------|--------|--------| | Gauge | `worker` | `0` = closed, `1` = open, `2` = half-open | ```promql # Workers with open circuits count(smg_worker_cb_state == 1) ``` #### `smg_worker_cb_transitions_total` Circuit breaker state transitions. | Type | Labels | |------|--------| | Counter | `worker`, `from`, `to` | #### `smg_worker_cb_outcomes_total` Request outcomes tracked by circuit breaker. | Type | Labels | |------|--------| | Counter | `worker`, `outcome` | Outcomes: `success`, `failure` #### `smg_worker_cb_consecutive_failures` Consecutive failures per worker. | Type | Labels | |------|--------| | Gauge | `worker` | #### `smg_worker_cb_consecutive_successes` Consecutive successes per worker. | Type | Labels | |------|--------| | Gauge | `worker` | --- ### Retry Metrics #### `smg_worker_retries_total` Retry attempts. | Type | Labels | |------|--------| | Counter | `worker_type`, `endpoint` | #### `smg_worker_retries_exhausted_total` Requests that exhausted all retries. | Type | Labels | |------|--------| | Counter | `worker_type`, `endpoint` | #### `smg_worker_retry_backoff_seconds` Retry backoff durations by attempt number. | Type | Labels | |------|--------| | Histogram | `attempt` | --- ## Layer 4: Discovery Metrics Metrics for service discovery. ### `smg_discovery_registrations_total` Worker registrations. | Type | Labels | |------|--------| | Counter | `source`, `result` | Sources: `static`, `kubernetes`, `consul`, `manual` --- ### `smg_discovery_deregistrations_total` Worker deregistrations. | Type | Labels | |------|--------| | Counter | `source`, `reason` | --- ### `smg_discovery_sync_duration_seconds` Discovery sync duration. | Type | Labels | |------|--------| | Histogram | `source` | --- ### `smg_discovery_workers_discovered` Workers discovered per source. | Type | Labels | |------|--------| | Gauge | `source` | --- ## Layer 5: MCP Tool Metrics Metrics for Model Context Protocol tool execution. ### `smg_mcp_tool_calls_total` MCP tool invocations. | Type | Labels | |------|--------| | Counter | `model`, `tool_name`, `result` | Results: `success`, `error` ```promql # Tool success rate sum(rate(smg_mcp_tool_calls_total{result="success"}[5m])) / sum(rate(smg_mcp_tool_calls_total[5m])) # Most used tools topk(10, sum by (tool_name) (rate(smg_mcp_tool_calls_total[5m]))) ``` --- ### `smg_mcp_tool_duration_seconds` Tool execution duration. | Type | Labels | |------|--------| | Histogram | `model`, `tool_name` | --- ### `smg_mcp_servers_active` Active MCP servers. | Type | Labels | |------|--------| | Gauge | None | --- ### `smg_mcp_tool_iterations_total` Tool loop iterations in Responses API. | Type | Labels | |------|--------| | Counter | `model` | --- ## Layer 6: Database Metrics Metrics for storage operations. ### `smg_db_operations_total` Database operations. | Type | Labels | |------|--------| | Counter | `storage_type`, `operation`, `result` | Storage types: `response`, `conversation`, `conversation_item` Operations: `get`, `put`, `delete`, `list` --- ### `smg_db_operation_duration_seconds` Database operation duration. | Type | Labels | |------|--------| | Histogram | `storage_type`, `operation` | --- ### `smg_db_connections_active` Active database connections. | Type | Labels | |------|--------| | Gauge | `storage_type` | --- ### `smg_db_items_stored` Items stored in database. | Type | Labels | |------|--------| | Counter | `storage_type` | --- ## Cache Routing Metrics ### `smg_manual_policy_cache_entries` Entries in the cache-aware routing cache. | Type | Labels | |------|--------| | Gauge | None | --- ### `smg_worker_routing_keys_active` Active routing keys per worker (used by cache-aware policies). | Type | Labels | |------|--------| | Gauge | `worker` | --- ### `smg_manual_policy_branch_total` Manual policy execution branch counts for routing decisions. | Type | Labels | |------|--------| | Counter | `branch` | --- ### `smg_consistent_hashing_policy_branch_total` Consistent hashing policy execution branch counts for routing decisions. | Type | Labels | |------|--------| | Counter | `branch` | --- ### `smg_prefix_hash_policy_branch_total` Prefix hash policy execution branch counts for routing decisions. | Type | Labels | |------|--------| | Counter | `branch` | --- ## Dashboard Queries Summary | Metric | Query | |--------|-------| | Request rate | `sum(rate(smg_http_requests_total[5m]))` | | Error rate | `sum(rate(smg_http_responses_total{status_code=~"5.."}[5m])) / sum(rate(smg_http_responses_total[5m]))` | | P99 latency | `histogram_quantile(0.99, rate(smg_http_request_duration_seconds_bucket[5m]))` | | TTFT P50 | `histogram_quantile(0.5, rate(smg_router_ttft_seconds_bucket[5m]))` | | Tokens/sec | `sum(rate(smg_router_tokens_total[5m]))` | | Healthy workers | `sum(smg_worker_health)` | | Open circuits | `count(smg_worker_cb_state == 1)` | | Rate limit rejections | `rate(smg_http_rate_limit_total{result="rejected"}[5m])` | | MCP tool success rate | `sum(rate(smg_mcp_tool_calls_total{result="success"}[5m])) / sum(rate(smg_mcp_tool_calls_total[5m]))` | --- ## Histogram Buckets Default histogram buckets (29 buckets from 1ms to 7200s) applied to every metric whose name ends with `duration_seconds`: ``` 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 15.0, 30.0, 45.0, 60.0, 90.0, 120.0, 180.0, 240.0, 300.0, 480.0, 900.0, 1200.0, 1800.0, 2700.0, 3600.0, 5400.0, 7200.0 ``` Configure custom buckets via CLI: ```bash smg --prometheus-duration-buckets 0.01 0.1 0.5 1 5 10 ```