--- name: NestJS Observability description: Structured logging (Pino) and Prometheus metrics. metadata: labels: [nestjs, logging, monitoring, pino] triggers: files: ['main.ts', '**/*.module.ts'] keywords: [nestjs-pino, Prometheus, Logger, reqId] --- # Observability Standards ## **Priority: P1 (OPERATIONAL)** Logging, monitoring, and observability patterns for production applications. - **Standard**: Use `nestjs-pino` for high-performance JSON logging. - **Why**: Node's built-in `console.log` is blocking and unstructured. - **Configuration**: - **Redaction**: Mandatory masking of sensitive fields (`password`, `token`, `email`). - **Context**: Always inject `Logger` and set the context (`LoginService`). ## Tracing (Correlation) - **Request ID**: Every log line **must** include a `reqId` (Request ID). - `nestjs-pino` handles this automatically using `AsyncLocalStorage`. - **Propagation**: Pass `x-request-id` to downstream microservices/database queries key to trace flows. ## API Overhead & Database Benchmarking - **Execution Bucket Strategy**: When performance profiling is enabled, utilize global interceptors combined with `AsyncLocalStorage` to split and expose latency into logical buckets. - **Headers**: Expose the metrics via HTTP Headers on the response for immediate feedback during development or testing: - `X-Response-Duration-Ms` (Total execution time) - `X-DB-Execution-Ms` (Time spent exclusively in database queries, tracked via TypeORM loggers) - `X-API-Overhead-Ms` (Time spent in NestJS interceptors, guards, and serialization) - **Security**: Only enable performance headers and detailed SQL benchmarking in development or when a specific feature flag (`ENABLE_PERFORMANCE_BENCHMARK`) is explicitly active. ## Metrics - **Exposure**: Use `@willsoto/nestjs-prometheus` to expose `/metrics` for Prometheus scraping. - **Key Metrics**: 1. `http_request_duration_seconds` (Histogram) 2. `db_query_duration_seconds` (Histogram) 3. `memory_usage_bytes` (Gauge) ## Health Checks - **Terminus**: Implement explicit logic for "Liveness" (I'm alive) vs "Readiness" (I can take traffic). - **DB Check**: `TypeOrmHealthIndicator` / `PrismaHealthIndicator`. - **Memory Check**: Fail if Heap > 300MB (prevent crash loops).