--- name: setup-prometheus-monitoring description: > Configure Prometheus for time-series metrics collection, including scrape configurations, service discovery, recording rules, and federation patterns for multi-cluster deployments. Use when setting up centralized metrics collection for microservices, implementing time-series monitoring for application and infrastructure, establishing a foundation for SLO/SLI tracking and alerting, or migrating from legacy monitoring solutions to a modern observability stack. license: MIT allowed-tools: Read Write Edit Bash Grep Glob metadata: author: Philipp Thoss version: "1.0" domain: observability complexity: intermediate language: multi tags: prometheus, monitoring, metrics, scrape, recording-rules --- # Setup Prometheus Monitoring Configure a production-ready Prometheus deployment with scrape targets, recording rules, and federation. ## When to Use - Setting up centralized metrics collection for microservices or distributed systems - Implementing time-series monitoring for application and infrastructure metrics - Establishing a foundation for SLO/SLI tracking and alerting - Consolidating metrics from multiple Prometheus instances via federation - Migrating from legacy monitoring solutions to modern observability stack ## Inputs - **Required**: List of scrape targets (services, exporters, endpoints) - **Required**: Retention period and storage requirements - **Optional**: Existing service discovery mechanism (Kubernetes, Consul, EC2) - **Optional**: Recording rules for pre-aggregated metrics - **Optional**: Federation hierarchy for multi-cluster setups ## Procedure ### Step 1: Install and Configure Prometheus Create the base Prometheus configuration with global settings and scrape intervals. ```bash # Create Prometheus directory structure mkdir -p /etc/prometheus/{rules,file_sd} mkdir -p /var/lib/prometheus # Download Prometheus (adjust version as needed) cd /tmp wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz tar xvf prometheus-2.48.0.linux-amd64.tar.gz sudo cp prometheus-2.48.0.linux-amd64/{prometheus,promtool} /usr/local/bin/ ``` Create `/etc/prometheus/prometheus.yml`: ```yaml global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15s external_labels: cluster: 'production' region: 'us-east-1' # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 # Load recording and alerting rules rule_files: - "rules/*.yml" # Scrape configurations scrape_configs: # Prometheus self-monitoring - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] labels: env: 'production' # Node exporter for host metrics - job_name: 'node' static_configs: - targets: - 'node1:9100' - 'node2:9100' labels: env: 'production' # Application metrics with file-based service discovery - job_name: 'app-services' file_sd_configs: - files: - '/etc/prometheus/file_sd/services.json' refresh_interval: 30s relabel_configs: - source_labels: [__address__] target_label: instance - source_labels: [env] target_label: environment ``` **Expected:** Prometheus starts successfully, web UI accessible at `http://localhost:9090`, targets listed under Status > Targets. **On failure:** - Check syntax with `promtool check config /etc/prometheus/prometheus.yml` - Verify file permissions: `sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus` - Check logs: `journalctl -u prometheus -f` ### Step 2: Configure Service Discovery Set up dynamic target discovery to avoid manual target management. For **Kubernetes** environments, add to `scrape_configs`: ```yaml - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: # Only scrape pods with prometheus.io/scrape annotation - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true # Use custom port if specified - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 # Add namespace as label - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace # Add pod name as label - source_labels: [__meta_kubernetes_pod_name] target_label: kubernetes_pod_name ``` For **file-based** service discovery, create `/etc/prometheus/file_sd/services.json`: ```json [ { "targets": ["web-app-1:8080", "web-app-2:8080"], "labels": { "job": "web-app", "env": "production", "team": "platform" } }, { "targets": ["api-service-1:9090", "api-service-2:9090"], "labels": { "job": "api-service", "env": "production", "team": "backend" } } ] ``` For **Consul** service discovery: ```yaml - job_name: 'consul-services' consul_sd_configs: - server: 'consul.example.com:8500' services: [] # Empty list means discover all services relabel_configs: - source_labels: [__meta_consul_service] target_label: job - source_labels: [__meta_consul_tags] regex: '.*,monitoring,.*' action: keep ``` **Expected:** Dynamic targets appear in Prometheus UI, automatically updated when services scale or change. **On failure:** - Kubernetes: Verify RBAC permissions with `kubectl auth can-i list pods --as=system:serviceaccount:monitoring:prometheus` - File SD: Validate JSON syntax with `python -m json.tool /etc/prometheus/file_sd/services.json` - Consul: Test connectivity with `curl http://consul.example.com:8500/v1/catalog/services` ### Step 3: Create Recording Rules Pre-aggregate expensive queries for dashboard performance and alerting efficiency. Create `/etc/prometheus/rules/recording_rules.yml`: ```yaml groups: - name: api_aggregations interval: 30s rules: # Calculate request rate per endpoint (5m window) - record: job:http_requests:rate5m expr: | sum by (job, endpoint, method) ( rate(http_requests_total[5m]) ) # Calculate error rate percentage - record: job:http_errors:rate5m expr: | sum by (job) ( rate(http_requests_total{status=~"5.."}[5m]) ) / sum by (job) ( rate(http_requests_total[5m]) ) * 100 # P95 latency by endpoint - record: job:http_request_duration_seconds:p95 expr: | histogram_quantile(0.95, sum by (job, endpoint, le) ( rate(http_request_duration_seconds_bucket[5m]) ) ) - name: resource_aggregations interval: 1m rules: # CPU usage by instance - record: instance:cpu_usage:ratio expr: | 1 - avg by (instance) ( rate(node_cpu_seconds_total{mode="idle"}[5m]) ) # Memory usage percentage - record: instance:memory_usage:ratio expr: | 1 - ( node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes ) # Disk usage by mount point - record: instance:disk_usage:ratio expr: | 1 - ( node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.*"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.*"} ) ``` Validate and reload: ```bash # Validate rules syntax promtool check rules /etc/prometheus/rules/recording_rules.yml # Reload Prometheus configuration (without restart) curl -X POST http://localhost:9090/-/reload # Or send SIGHUP signal sudo killall -HUP prometheus ``` **Expected:** Recording rules evaluate successfully, new metrics visible in Prometheus with `job:` prefix, query performance improved for dashboards. **On failure:** - Check rule syntax with `promtool check rules` - Verify evaluation interval matches data availability - Check for missing source metrics: `curl http://localhost:9090/api/v1/targets` - Review logs for evaluation errors: `journalctl -u prometheus | grep -i error` ### Step 4: Configure Storage and Retention Optimize storage for retention requirements and query performance. Edit `/etc/systemd/system/prometheus.service`: ```ini [Unit] Description=Prometheus Monitoring System Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] Type=simple User=prometheus Group=prometheus ExecStart=/usr/local/bin/prometheus \ --config.file=/etc/prometheus/prometheus.yml \ --storage.tsdb.path=/var/lib/prometheus \ --storage.tsdb.retention.time=30d \ --storage.tsdb.retention.size=50GB \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries \ --web.listen-address=:9090 \ --web.enable-lifecycle \ --web.enable-admin-api Restart=always RestartSec=10s [Install] WantedBy=multi-user.target ``` Key storage flags: - `--storage.tsdb.retention.time=30d`: Keep 30 days of data - `--storage.tsdb.retention.size=50GB`: Limit storage to 50GB (whichever limit hits first) - `--storage.tsdb.wal-compression`: Enable WAL compression (reduces disk I/O) - `--web.enable-lifecycle`: Allow config reload via HTTP POST - `--web.enable-admin-api`: Enable snapshot and delete APIs Enable and start: ```bash sudo systemctl daemon-reload sudo systemctl enable prometheus sudo systemctl start prometheus sudo systemctl status prometheus ``` **Expected:** Prometheus retains metrics according to policy, disk usage stays within limits, old data automatically pruned. **On failure:** - Monitor disk usage: `du -sh /var/lib/prometheus` - Check TSDB stats: `curl http://localhost:9090/api/v1/status/tsdb` - Verify retention settings: `curl http://localhost:9090/api/v1/status/runtimeinfo | jq .data.storageRetention` - Force cleanup: `curl -X POST http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~".+"}` ### Step 5: Set Up Federation (Multi-Cluster) Configure hierarchical Prometheus for aggregating metrics across clusters. On **edge Prometheus** instances (in each cluster), ensure external labels are set: ```yaml global: external_labels: cluster: 'production-east' datacenter: 'us-east-1' ``` On **central Prometheus** instance, add federation scrape config: ```yaml scrape_configs: - job_name: 'federate-production' honor_labels: true metrics_path: '/federate' params: 'match[]': # Aggregate only pre-computed recording rules - '{__name__=~"job:.*"}' # Include alert states - '{__name__=~"ALERTS.*"}' # Include critical infrastructure metrics - 'up{job=~".*"}' static_configs: - targets: - 'prometheus-east.example.com:9090' - 'prometheus-west.example.com:9090' labels: env: 'production' relabel_configs: - source_labels: [__address__] target_label: instance - source_labels: [__address__] regex: 'prometheus-(.*).example.com.*' target_label: cluster replacement: '$1' ``` Federation best practices: - Use `honor_labels: true` to preserve original labels - Federate only recording rules and aggregates (not raw metrics) - Set appropriate scrape intervals (longer than edge Prometheus evaluation) - Use `match[]` to filter metrics (avoid federating everything) **Expected:** Central Prometheus shows federated metrics from all clusters, queries can span multiple regions, minimal data duplication. **On failure:** - Verify federation endpoint accessibility: `curl http://prometheus-east.example.com:9090/federate?match[]={__name__=~"job:.*"} | head -20` - Check for label conflicts (central vs edge external labels) - Monitor federation lag: compare timestamp differences - Review match patterns: `curl http://localhost:9090/api/v1/label/__name__/values | jq .data | grep "job:"` ### Step 6: Implement High Availability (Optional) Deploy redundant Prometheus instances with identical configurations for failover. Use **Thanos** or **Cortex** for true HA, or simple load-balanced setup: ```yaml # prometheus-1.yml and prometheus-2.yml (identical configs) global: scrape_interval: 15s external_labels: prometheus: 'prometheus-1' # Different per instance replica: 'A' # Use --web.external-url flag for each instance # prometheus-1: --web.external-url=http://prometheus-1.example.com:9090 # prometheus-2: --web.external-url=http://prometheus-2.example.com:9090 ``` Configure Grafana to query both instances: ```json { "name": "Prometheus-HA", "type": "prometheus", "url": "http://prometheus-lb.example.com", "jsonData": { "httpMethod": "POST", "timeInterval": "15s" } } ``` Use HAProxy or nginx for load balancing: ```nginx upstream prometheus_backend { server prometheus-1.example.com:9090 max_fails=3 fail_timeout=30s; server prometheus-2.example.com:9090 max_fails=3 fail_timeout=30s; } server { listen 9090; location / { proxy_pass http://prometheus_backend; proxy_set_header Host $host; } } ``` **Expected:** Query requests balanced across instances, automatic failover if one instance down, no data loss during single instance failure. **On failure:** - Verify both instances scraping same targets (slight time skew acceptable) - Check for configuration drift between instances - Monitor deduplication in queries (Grafana shows duplicate series) - Review load balancer health checks ## Validation - [ ] Prometheus web UI accessible at expected endpoint - [ ] All configured scrape targets showing as UP in Status > Targets - [ ] Service discovery dynamically adding/removing targets as expected - [ ] Recording rules evaluating successfully (no errors in logs) - [ ] Metrics retention matching configured time/size limits - [ ] Federation (if configured) pulling metrics from edge instances - [ ] Queries returning expected metric cardinality (not excessive) - [ ] Disk usage stable and within allocated storage budget - [ ] Configuration reload working via HTTP endpoint or SIGHUP - [ ] Prometheus self-monitoring metrics available (up, scrape duration, etc.) ## Common Pitfalls - **High cardinality metrics**: Avoid labels with unbounded values (user IDs, timestamps, UUIDs). Use recording rules to aggregate before storage. - **Scrape interval mismatch**: Recording rules should evaluate at intervals equal to or greater than scrape intervals to avoid gaps. - **Federation overload**: Federating all metrics creates massive data duplication. Only federate aggregated recording rules. - **Missing relabel configs**: Without proper relabeling, service discovery can create confusing or duplicate labels. - **Retention too short**: Set retention longer than your longest dashboard time window to avoid "no data" gaps. - **No resource limits**: Prometheus can consume excessive memory with high cardinality. Set `--storage.tsdb.max-block-duration` and monitor heap usage. - **Disabled lifecycle endpoint**: Without `--web.enable-lifecycle`, config reloads require full restarts causing scrape gaps. ## Related Skills - `configure-alerting-rules` - Define alerting rules based on Prometheus metrics and route to Alertmanager - `build-grafana-dashboards` - Visualize Prometheus metrics with Grafana dashboards and panels - `define-slo-sli-sla` - Establish SLO/SLI targets using Prometheus recording rules and error budget tracking - `instrument-distributed-tracing` - Complement metrics with distributed tracing for deeper observability