--- name: performance-profiling description: Application performance profiling and bottleneck identification — Node.js profiling, Chrome DevTools, flame graphs, memory leak detection, CPU profiling, React rendering performance. Activate on "profiling", "performance bottleneck", "flame graph", "memory leak", "slow app", "CPU profiling", "heap snapshot", "React re-renders", "EXPLAIN ANALYZE", "event loop lag", "clinic.js", "Core Web Vitals". NOT for infrastructure monitoring or observability (use logging-observability), load testing (use a load-testing skill), or database schema optimization. allowed-tools: Read,Write,Edit,Bash,Grep,Glob metadata: category: DevOps & Site Reliability tags: - performance - profiling - performance-bottleneck - flame-graph pairs-with: - skill: react-performance-optimizer reason: React DevTools profiling identifies component re-render bottlenecks for optimization - skill: caching-strategies reason: Profiling reveals cache miss patterns that inform caching architecture decisions - skill: postgresql-optimization reason: Database query profiling with EXPLAIN ANALYZE identifies slow query bottlenecks - skill: logging-observability reason: Performance metrics and traces collected by observability systems feed profiling analysis --- # Performance Profiling Find where your application actually spends time before touching a line of code. Covers the full stack: Node.js CPU and memory profiling, browser flame graphs, React render profiling, and database query analysis. The discipline here is profile first, optimize second — premature optimization is not a workflow, it is a guess. ## When to Use **Use for**: - Diagnosing slow Node.js applications (CPU-bound, I/O-bound, memory pressure) - Generating and reading flame graphs to find hot code paths - Detecting memory leaks via heap snapshots and growth trends - Profiling React component render performance with React Profiler - Measuring browser rendering performance (Core Web Vitals, layout thrashing, long tasks) - Database query profiling with EXPLAIN ANALYZE - Measuring event loop utilization and latency **NOT for**: - Infrastructure monitoring, distributed tracing, or log aggregation (use `logging-observability`) - Load testing and capacity planning (a separate domain) - Network latency analysis between services (use distributed tracing tools) - Database schema design optimization (separate from query profiling) --- ## Core Decision: Where Is My App Slow? ```mermaid flowchart TD Start[App is slow. Where?] --> Layer{Which layer?} Layer -->|Backend| Backend{What kind?} Layer -->|Frontend/browser| Browser{What symptom?} Layer -->|Unknown| Measure[Instrument first — add timing logs] Backend -->|CPU pegged, slow responses| CPU[CPU Profiling] Backend -->|Memory growing, crashes| Mem[Memory / Heap Profiling] Backend -->|Fast CPU, slow I/O| IO{I/O type?} IO -->|Database queries| DB[EXPLAIN ANALYZE + query profiler] IO -->|Network calls| Network[Trace external calls, add timeouts] IO -->|File system| FS[Check event loop utilization] Browser -->|Slow initial load| Lighthouse[Lighthouse + bundle analysis] Browser -->|Janky scrolling, animations| Rendering[Chrome Performance tab — layout thrashing] Browser -->|Slow after interaction| React{React app?} React -->|Yes| ReactProfiler[React Profiler + why-did-you-render] React -->|No| JS[Chrome Performance — long tasks, main thread blocking] CPU --> FlameGraph[Generate flame graph with 0x or clinic flame] Mem --> HeapSnap[Take heap snapshots before/after suspected leak] FS --> ELU[clinic bubbles — event loop utilization] ``` --- ## Node.js: CPU Profiling ### V8 Inspector (Built-in) ```bash # Attach inspector and capture a CPU profile node --inspect src/index.js # Or start paused and wait for DevTools node --inspect-brk src/index.js ``` Then open `chrome://inspect` in Chrome, click the target, go to the **Profiler** tab, and record while sending load to the server. ### 0x: Flame Graphs from the Terminal ```bash npm install -g 0x # Profile a script (runs it, generates flame graph) 0x -- node src/index.js # Profile with a load generator running simultaneously 0x -- node src/server.js & npx autocannon -d 30 http://localhost:3000/api/heavy ``` 0x generates an interactive HTML flame graph. The **widest stacks** are where time is spent. Look for: - Functions that appear wide near the bottom (called frequently by everything) - Unexpected width in library code (serialization, template engines, parsers) - Idle / `[idle]` blocks — I/O wait, not CPU (look elsewhere for those) ### Clinic.js Suite ```bash npm install -g clinic # Doctor: overview of what is wrong clinic doctor -- node src/server.js # Flame: CPU flame graph (wraps 0x) clinic flame -- node src/server.js # Bubbles: event loop utilization clinic bubbles -- node src/server.js ``` Clinic Doctor gives you a triage view: CPU, memory, event loop, and handles. Start here when you do not know what kind of bottleneck you have. ### Event Loop Utilization (ELU) ```js const { performance } = require('perf_hooks'); // Sample ELU every 5 seconds let last = performance.eventLoopUtilization(); setInterval(() => { const current = performance.eventLoopUtilization(); const diff = performance.eventLoopUtilization(current, last); console.log(`ELU: ${(diff.utilization * 100).toFixed(1)}%`); last = current; }, 5000); ``` ELU above 80% means the event loop is saturated — CPU-bound work or sync blocking. ELU near 0% with slow responses means I/O wait (network, disk, database). --- ## Node.js: Memory Profiling ### Heap Snapshots ```bash # Take heap snapshot via CLI node --inspect src/index.js # In chrome://inspect → Memory tab → Take Heap Snapshot ``` **Three-snapshot technique for leak detection**: 1. Snapshot after startup (baseline) 2. Snapshot after N requests (warm) 3. Snapshot after 2N requests (growth) Compare Snapshot 3 to Snapshot 2 — objects that grew proportionally to request count are leaking. ### Common Leak Patterns **Closure captures** — Variables captured in long-lived closures that should have been released: ```js // LEAK: handler is registered but never removed emitter.on('data', (chunk) => { processedData.push(chunk); // processedData grows unbounded }); // FIX: remove listener when done, or use once() emitter.once('data', handler); // or const handler = (chunk) => { ... }; emitter.on('data', handler); // later: emitter.off('data', handler); ``` **Growing caches without eviction**: ```js // LEAK: cache grows forever const cache = new Map(); app.get('/user/:id', async (req, res) => { if (!cache.has(req.params.id)) { cache.set(req.params.id, await db.getUser(req.params.id)); } res.json(cache.get(req.params.id)); }); // FIX: use LRU cache with max size const LRU = require('lru-cache'); const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 5 }); ``` **WeakRef and FinalizationRegistry** (for intentional weak references): ```js const cache = new Map(); function cacheValue(key, obj) { const ref = new WeakRef(obj); const registry = new FinalizationRegistry((k) => cache.delete(k)); registry.register(obj, key); cache.set(key, ref); } ``` --- ## Anti-Pattern: Optimizing Without Profiling **Novice**: "This function looks expensive, I'll rewrite it in a more efficient algorithm." **Expert**: Rewrote the wrong function. Profiling would have shown that this function is called once per startup and contributes 0.1% of runtime. The actual bottleneck was JSON serialization in the response handler, called 10,000 times per second. Optimization effort must follow measurement, never intuition. **Detection**: The "optimized" code is measurably faster in microbenchmark isolation but production p99 latency is unchanged. --- ## Anti-Pattern: Micro-Benchmarking in Isolation **Novice**: Writes a benchmark comparing two sorting algorithms on an array of 1000 items, concludes Algorithm B is 2x faster, rewrites production code. **Expert**: Micro-benchmarks measure JIT-compiled hot paths under artificial conditions. Real workloads have different data shapes, mixed call patterns, GC pressure, and I/O interspersed. The JIT may optimize the benchmark differently than the real call site. Profile the actual application under real load — or at minimum, profile with realistic data shapes and call patterns embedded in the actual application code path. **The test**: Does your benchmark run in a tight loop 10,000 times before measuring? If yes, V8 has JIT-compiled it differently than it will compile the real code, which runs cold at startup and is called with varied inputs. --- ## React Rendering Performance ### React Profiler (DevTools) 1. Open React DevTools → Profiler tab 2. Click "Record" 3. Perform the slow interaction 4. Stop recording 5. Examine the flame chart — bars represent components, width represents render time Key columns: **"Why did this render?"** shows which prop or state change triggered each render. ### why-did-you-render ```bash npm install @welldone-software/why-did-you-render ``` ```js // src/wdyr.js (import before React) import React from 'react'; if (process.env.NODE_ENV === 'development') { const whyDidYouRender = require('@welldone-software/why-did-you-render'); whyDidYouRender(React, { trackAllPureComponents: true }); } ``` ```js // Mark a specific component for tracking MyExpensiveComponent.whyDidYouRender = true; ``` This logs to the console every time a component re-renders with the same props — exposing unnecessary renders caused by reference equality failures. ### Common React Performance Patterns ```js // Memoize expensive components const ExpensiveList = React.memo(({ items, onSelect }) => { return items.map(item => ); }); // Stable callback references — prevent re-renders downstream const handleSelect = useCallback((id) => { setSelected(id); }, []); // no deps: stable forever // Memoize expensive computations const sortedItems = useMemo(() => { return [...items].sort((a, b) => a.name.localeCompare(b.name)); }, [items]); // Virtualize long lists import { FixedSizeList } from 'react-window'; {({ index, style }) => } ``` --- ## Database Query Profiling ### PostgreSQL EXPLAIN ANALYZE ```sql -- Wrap any query in EXPLAIN (ANALYZE, BUFFERS) to see execution plan EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > NOW() - INTERVAL '30 days' GROUP BY u.id; ``` Read the output bottom-up. Each node shows: - `actual time=X..Y` — startup time to first row, total time for all rows - `rows=N` — actual rows returned - `loops=N` — how many times this node executed **Red flags**: - `Seq Scan` on large tables — missing index - `rows=1000` estimated vs `rows=1` actual — stale statistics, run `ANALYZE` - `Hash Join` with large hash batches — memory pressure, tune `work_mem` - `Nested Loop` on large outer result — cartesian product risk ### Finding Slow Queries in Production ```sql -- Enable pg_stat_statements extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Top 10 slowest queries by total time SELECT query, calls, total_exec_time / 1000 AS total_seconds, mean_exec_time AS mean_ms, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; ``` --- ## Browser Profiling See `references/browser-profiling.md` for the full Chrome Performance tab workflow, Core Web Vitals measurement, and layout thrashing diagnosis. --- ## Bottleneck Classification Rules When the user provides profiling data, classify and rank bottlenecks using these rules. Process signals in priority order — higher-priority signals override lower ones. ### Priority 1: Database (check first — it's the bottleneck 70% of the time) | Signal | Classification | |--------|---------------| | Any query >500ms | **Critical** — `type: database`. Next step: Run `EXPLAIN ANALYZE` on the query. Look for sequential scans on large tables (missing index) and N+1 patterns (same query repeated with different IDs). | | Multiple queries >100ms per request | **High** — `type: database`. Next step: Aggregate query count per endpoint. If >5 queries per request, look for N+1 or missing JOINs. Consider a query count budget per endpoint. | | Query count >20 per page load | **High** — `type: database`. Even if individual queries are fast, connection overhead and round-trip latency compound. Next step: Batch with `WHERE id IN (...)` or use a DataLoader pattern. | ### Priority 2: Event Loop (Node.js-specific — the most underdiagnosed bottleneck) | Signal | Classification | |--------|---------------| | ELU >0.8 | **Critical** — `type: cpu`. The event loop is saturated. Next step: Run `clinic flame` or `--prof` to find synchronous hot paths. Common culprits: JSON.parse on large payloads, synchronous crypto, regex backtracking. | | ELU >0.5 with slow p99 latency | **High** — `type: cpu`. Event loop contention is causing tail latency. Next step: Look for blocking operations that run infrequently but hold the loop when they do (large sorts, template rendering, PDF generation). | | ELU <0.2 with slow responses | **This is NOT a CPU problem.** `type: io`. Next step: The app is waiting on something external (DB, API calls, file system). Trace outbound requests with `clinic bubbleprof` or add timing logs to external calls. | ### Priority 3: Memory | Signal | Classification | |--------|---------------| | Heap growth rate >10MB/min sustained | **Critical** — `type: memory`. Memory leak will OOM the process. Next step: Take two heap snapshots 5 minutes apart, compare in Chrome DevTools, look for growing object counts (retained size). Common suspects: event listener accumulation, closures capturing request objects, unbounded caches. | | Heap growth proportional to request rate (resets on GC) | **Medium** — `type: memory`. Not a leak, just high allocation pressure. Next step: Check for unnecessary object creation in hot paths (cloning large objects, building strings with concatenation). Reduce allocation, don't chase GC. | | `suspects` array from heap analysis | List each suspect with its retained size. **High** if any single object retains >50MB. Next step: Trace the retainer tree to find why it's not being collected. | ### Priority 4: React Rendering (frontend) | Signal | Classification | |--------|---------------| | Component render time >16ms | **High** — `type: rendering`. Dropping frames. Next step: Check if the component re-renders on every parent render (missing `React.memo` or unstable props). Profile with React DevTools "Why did this render?" | | >5 re-renders per user interaction | **Medium** — `type: rendering`. Next step: Check for state updates that trigger cascading re-renders. Move state closer to where it's used, or split context providers. | | Large component tree (>500 components mounted) | **Medium** — `type: rendering`. Next step: Virtualize lists (`react-window`), lazy-load off-screen components, check for unnecessary mount/unmount cycles. | ### Priority 5: CPU (non-event-loop) | Signal | Classification | |--------|---------------| | Single function >30% of `selfTime` in CPU profile | **High** — `type: cpu`. Hot function dominates. Next step: Read the function. If it's in your code, optimize it. If it's in a library, check if you're calling it unnecessarily or with pathologically large input. | | Flame graph shows wide, flat profile (no single hot function) | **Medium** — `type: cpu`. Death by a thousand cuts. Next step: Look for patterns — are many functions doing similar work? This often means redundant computation (computing the same derived value multiple times per request). | ### Output Ranking After classifying all signals, rank the bottleneck list by: 1. **Severity** (critical first) 2. **Actionability** (clear next step ranks higher than vague "investigate further") 3. **Estimated impact** — "Adding an index will reduce this query from 800ms to 5ms" is more useful than "This might help" Always include `estimatedImpact` as a concrete prediction: "Eliminating N+1 queries should reduce request count from 47 to 3, cutting endpoint latency by ~200ms" — not "should improve performance." --- ## References - `references/node-profiling.md` — Consult for detailed Node.js profiling: --inspect flags, clinic.js commands, heap snapshot analysis, event loop monitoring, stream backpressure diagnosis - `references/browser-profiling.md` — Consult for browser performance: Chrome Performance tab workflow, Lighthouse CI integration, React Profiler deep-dive, Core Web Vitals measurement, layout thrashing patterns