--- name: grafana-dashboard description: Create professional Grafana dashboards with visualizations, templating, and alerts. Use when building monitoring dashboards, creating data visualizations, or setting up operational insights. --- # Grafana Dashboard ## Overview Design and implement comprehensive Grafana dashboards with multiple visualization types, variables, and drill-down capabilities for operational monitoring. ## When to Use - Creating monitoring dashboards - Building operational insights - Visualizing time-series data - Creating drill-down dashboards - Sharing metrics with stakeholders ## Instructions ### 1. **Grafana Dashboard JSON** ```json { "dashboard": { "title": "Application Performance", "description": "Real-time application metrics", "tags": ["production", "performance"], "timezone": "UTC", "refresh": "30s", "templating": { "list": [ { "name": "datasource", "type": "datasource", "datasource": "prometheus" }, { "name": "service", "type": "query", "datasource": "prometheus", "query": "label_values(requests_total, service)" } ] }, "panels": [ { "id": 1, "title": "Request Rate", "type": "graph", "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8}, "targets": [ { "expr": "sum(rate(requests_total{service=\"$service\"}[5m]))", "legendFormat": "{{ method }}" } ], "yaxes": [ { "format": "rps", "label": "Requests per Second" } ] }, { "id": 2, "title": "Error Rate", "type": "graph", "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8}, "targets": [ { "expr": "sum(rate(requests_total{status_code=~\"5..\",service=\"$service\"}[5m])) / sum(rate(requests_total{service=\"$service\"}[5m]))", "legendFormat": "Error Rate" } ] }, { "id": 3, "title": "Response Latency (p95)", "type": "graph", "gridPos": {"x": 0, "y": 8, "w": 12, "h": 8}, "targets": [ { "expr": "histogram_quantile(0.95, rate(request_duration_seconds_bucket{service=\"$service\"}[5m]))", "legendFormat": "p95" } ] }, { "id": 4, "title": "Active Connections", "type": "stat", "gridPos": {"x": 12, "y": 8, "w": 12, "h": 8}, "targets": [ { "expr": "sum(active_connections{service=\"$service\"})" } ] } ] } } ``` ### 2. **Grafana Provisioning Configuration** ```yaml # /etc/grafana/provisioning/dashboards/dashboards.yaml apiVersion: 1 providers: - name: 'Dashboards' orgId: 1 folder: 'Production' type: file disableDeletion: false updateIntervalSeconds: 10 options: path: /var/lib/grafana/dashboards ``` ```yaml # /etc/grafana/provisioning/datasources/prometheus.yaml apiVersion: 1 datasources: - name: Prometheus type: prometheus access: proxy orgId: 1 url: http://prometheus:9090 isDefault: true editable: true jsonData: timeInterval: '30s' ``` ### 3. **Grafana Alert Configuration** ```yaml # /etc/grafana/provisioning/alerting/alerts.yaml groups: - name: application_alerts interval: 1m rules: - uid: alert_high_error_rate title: High Error Rate condition: B data: - refId: A model: expr: 'sum(rate(requests_total{status_code=~"5.."}[5m]))' - refId: B conditions: - evaluator: params: [0.05] type: gt query: params: [A, 5m, now] for: 5m annotations: description: 'Error rate is {{ $values.A }}' labels: severity: critical team: platform ``` ### 4. **Grafana API Client** ```javascript // grafana-api-client.js const axios = require('axios'); class GrafanaClient { constructor(baseUrl, apiKey) { this.baseUrl = baseUrl; this.client = axios.create({ baseURL: baseUrl, headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' } }); } async createDashboard(dashboard) { const response = await this.client.post('/api/dashboards/db', { dashboard: dashboard, overwrite: true }); return response.data; } async getDashboard(uid) { const response = await this.client.get(`/api/dashboards/uid/${uid}`); return response.data; } async createAlert(alert) { const response = await this.client.post('/api/alerts', alert); return response.data; } async listDashboards() { const response = await this.client.get('/api/search?query='); return response.data; } } module.exports = GrafanaClient; ``` ### 5. **Docker Compose Setup** ```yaml version: '3.8' services: grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin} GF_USERS_ALLOW_SIGN_UP: 'false' GF_SERVER_ROOT_URL: http://grafana.example.com volumes: - ./provisioning:/etc/grafana/provisioning - grafana_storage:/var/lib/grafana depends_on: - prometheus prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus_storage:/prometheus volumes: grafana_storage: prometheus_storage: ``` ## Best Practices ### ✅ DO - Use meaningful dashboard titles - Add documentation panels - Implement row-based organization - Use variables for flexibility - Set appropriate refresh intervals - Include runbook links in alerts - Test alerts before deploying - Use consistent color schemes - Version control dashboard JSON ### ❌ DON'T - Overload dashboards with too many panels - Mix different time ranges without justification - Create without runbooks - Ignore alert noise - Use inconsistent metric naming - Set refresh too frequently - Forget to configure datasources - Leave default passwords ## Visualization Types - **Graph**: Time-series trends - **Stat**: Single value with thresholds - **Gauge**: Percentage or usage - **Heatmap**: Pattern detection - **Bar Chart**: Category comparison - **Pie Chart**: Composition