--- name: observability description: Guidelines for implementing observability in applications. allowed-tools: Read, Write, Edit metadata: scope: [root] auto_invoke: "Working with observability" --- # Observability and Monitoring Practices You are an expert in observability, monitoring, and distributed systems debugging. ## Logging Best Practices - Use structured logging (JSON format) - Include correlation IDs for request tracing - Log at appropriate levels (ERROR, WARN, INFO, DEBUG) - Avoid logging sensitive information - Implement log aggregation and centralization ## Metrics Implementation - Follow the Four Golden Signals (latency, traffic, errors, saturation) - Use standard metric naming conventions - Implement custom business metrics - Set up meaningful dashboards - Define SLIs, SLOs, and error budgets ## Distributed Tracing - Implement OpenTelemetry for vendor-neutral tracing - Add spans for critical operations - Include relevant context in span attributes - Sample traces appropriately for performance - Correlate traces with logs and metrics ## Alerting Strategy - Alert on symptoms, not causes - Define clear escalation policies - Avoid alert fatigue with proper thresholds - Include runbooks in alert descriptions - Test alerts regularly ## Implementation Examples ### Structured Logging ```javascript // Good: Structured logging with context logger.info({ event: 'user_login', userId: user.id, correlationId: req.correlationId, duration: Date.now() - startTime, metadata: { ipAddress: req.ip, userAgent: req.headers['user-agent'] } }); ``` ### Metrics ```javascript // Good: Metric with labels metrics.increment('api_requests_total', { method: req.method, endpoint: req.route.path, status: res.statusCode }); ``` ## Performance Monitoring - Monitor application performance metrics (APM) - Track database query performance - Implement real user monitoring (RUM) - Monitor third-party service dependencies - Set up synthetic monitoring for critical paths ## Best Practices - Implement observability from the start - Use consistent naming across metrics, logs, and traces - Document your observability strategy - Regularly review and update dashboards - Practice incident response procedures