---
name: when-profiling-performance-use-performance-profiler
version: 1.0.0
description: Comprehensive performance profiling, bottleneck detection, and optimization system
author: Claude Code
category: performance
complexity: HIGH
tags: [performance, profiling, optimization, benchmarking, mece]
agents:
  - performance-analyzer
  - performance-benchmarker
  - coder
  - optimizer
components:
  - subagent
  - slash-command
  - mcp-tool
dependencies:
  - claude-flow@alpha
  - perf (Linux)
  - instruments (macOS)
  - clinic.js (Node.js)
---

# Performance Profiler Skill

## Overview

**When profiling performance, use performance-profiler** to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.

## MECE Breakdown

### Mutually Exclusive Components:
1. **Baseline Phase**: Establish current performance metrics
2. **Detection Phase**: Identify bottlenecks and hot paths
3. **Analysis Phase**: Root cause analysis and impact assessment
4. **Optimization Phase**: Generate and prioritize recommendations
5. **Implementation Phase**: Apply optimizations with agent assistance
6. **Validation Phase**: Benchmark improvements and verify gains

### Collectively Exhaustive Coverage:
- **CPU Profiling**: Function execution time, hot paths, call graphs
- **Memory Profiling**: Heap usage, allocations, leaks, garbage collection
- **I/O Profiling**: File system, database, network latency
- **Network Profiling**: Request timing, bandwidth, connection pooling
- **Concurrency**: Thread utilization, lock contention, async operations
- **Algorithm Analysis**: Time complexity, space complexity
- **Cache Analysis**: Hit rates, cache misses, invalidation patterns
- **Database**: Query performance, N+1 problems, index usage

## Features

### Core Capabilities:
- Multi-dimensional performance profiling (CPU, memory, I/O, network)
- Automated bottleneck detection with prioritization
- Real-time profiling and historical analysis
- Flame graph generation for visual analysis
- Memory leak detection and heap snapshots
- Database query optimization
- Algorithmic complexity analysis
- A/B comparison of before/after optimizations
- Production-safe profiling with minimal overhead
- Integration with APM tools (New Relic, DataDog, etc.)

### Profiling Modes:
- **Quick Scan**: 30-second lightweight profiling
- **Standard**: 5-minute comprehensive analysis
- **Deep**: 30-minute detailed investigation
- **Continuous**: Long-running production monitoring
- **Stress Test**: Load-based profiling under high traffic

## Usage

### Slash Command:
```bash
/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]
```

### Subagent Invocation:
```javascript
Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")
```

### MCP Tool:
```javascript
mcp__performance-profiler__analyze({
  project_path: "./app",
  profiling_mode: "standard",
  targets: ["cpu", "memory", "io"],
  generate_optimizations: true
})
```

## Architecture

### Phase 1: Baseline Measurement
1. Establish current performance metrics
2. Define performance budgets
3. Set up monitoring infrastructure
4. Capture baseline snapshots

### Phase 2: Bottleneck Detection
1. CPU profiling (sampling or instrumentation)
2. Memory profiling (heap analysis)
3. I/O profiling (syscall tracing)
4. Network profiling (packet analysis)
5. Database profiling (query logs)

### Phase 3: Root Cause Analysis
1. Correlate metrics across dimensions
2. Identify causal relationships
3. Calculate performance impact
4. Prioritize issues by severity

### Phase 4: Optimization Generation
1. Algorithmic improvements
2. Caching strategies
3. Parallelization opportunities
4. Database query optimization
5. Memory optimization
6. Network optimization

### Phase 5: Implementation
1. Generate optimized code with coder agent
2. Apply database optimizations
3. Configure caching layers
4. Implement parallelization

### Phase 6: Validation
1. Run benchmark suite
2. Compare before/after metrics
3. Verify no regressions
4. Generate performance report

## Output Formats

### Performance Report:
```json
{
  "project": "my-app",
  "profiling_mode": "standard",
  "duration_seconds": 300,
  "baseline": {
    "requests_per_second": 1247,
    "avg_response_time_ms": 123,
    "p95_response_time_ms": 456,
    "p99_response_time_ms": 789,
    "cpu_usage_percent": 67,
    "memory_usage_mb": 512,
    "error_rate_percent": 0.1
  },
  "bottlenecks": [
    {
      "type": "cpu",
      "severity": "high",
      "function": "processData",
      "time_percent": 34.5,
      "calls": 123456,
      "avg_time_ms": 2.3,
      "recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
    }
  ],
  "optimizations": [...],
  "estimated_improvement": {
    "throughput_increase": "3.2x",
    "latency_reduction": "68%",
    "memory_reduction": "45%"
  }
}
```

### Flame Graph:
Interactive SVG flame graph showing call stack with time proportions

### Heap Snapshot:
Memory allocation breakdown with retention paths

### Optimization Report:
Prioritized list of actionable improvements with code examples

## Examples

### Example 1: Quick CPU Profiling
```bash
/profile ./my-app --mode quick --target cpu
```

### Example 2: Deep Memory Analysis
```bash
/profile ./my-app --mode deep --target memory --detect-leaks
```

### Example 3: Full Stack Optimization
```bash
/profile ./my-app --mode standard --target all --optimize --benchmark
```

### Example 4: Database Query Optimization
```bash
/profile ./my-app --mode standard --target io --database --explain-queries
```

## Integration with Claude-Flow

### Coordination Pattern:
```javascript
// Step 1: Initialize profiling swarm
mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })

// Step 2: Spawn specialized agents
[Parallel Execution]:
  Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
  Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
  Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
  Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
  Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")

// Step 3: Implementation agent applies optimizations
[Sequential Execution]:
  Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
  Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")
```

## Configuration

### Default Settings:
```json
{
  "profiling": {
    "sampling_rate_hz": 99,
    "stack_depth": 128,
    "include_native_code": false,
    "track_allocations": true
  },
  "thresholds": {
    "cpu_hot_path_percent": 10,
    "memory_leak_growth_mb": 10,
    "slow_query_ms": 100,
    "slow_request_ms": 1000
  },
  "optimization": {
    "auto_apply": false,
    "require_approval": true,
    "run_tests_before": true,
    "run_benchmarks_after": true
  },
  "output": {
    "flame_graph": true,
    "heap_snapshot": true,
    "call_tree": true,
    "recommendations": true
  }
}
```

## Profiling Techniques

### CPU Profiling:
- **Sampling**: Periodic stack sampling (low overhead)
- **Instrumentation**: Function entry/exit hooks (accurate but higher overhead)
- **Tracing**: Event-based profiling

### Memory Profiling:
- **Heap Snapshots**: Point-in-time memory state
- **Allocation Tracking**: Record all allocations
- **Leak Detection**: Compare snapshots over time
- **GC Analysis**: Garbage collection patterns

### I/O Profiling:
- **Syscall Tracing**: Track system calls (strace, dtrace)
- **File System**: Monitor read/write operations
- **Database**: Query logging and EXPLAIN ANALYZE
- **Network**: Packet capture and request timing

### Concurrency Profiling:
- **Thread Analysis**: CPU utilization per thread
- **Lock Contention**: Identify blocking operations
- **Async Operations**: Promise/callback timing

## Performance Optimization Strategies

### Algorithmic:
- Reduce time complexity (O(n²) → O(n log n))
- Use appropriate data structures
- Eliminate unnecessary work
- Memoization and dynamic programming

### Caching:
- In-memory caching (Redis, Memcached)
- CDN for static assets
- HTTP caching headers
- Query result caching

### Parallelization:
- Multi-threading
- Worker pools
- Async I/O
- Batching operations

### Database:
- Add missing indexes
- Optimize queries
- Reduce N+1 queries
- Connection pooling
- Read replicas

### Memory:
- Object pooling
- Reduce allocations
- Stream processing
- Compression

### Network:
- Connection keep-alive
- HTTP/2 or HTTP/3
- Compression
- Request batching
- Rate limiting

## Performance Budgets

### Frontend:
- Time to First Byte (TTFB): < 200ms
- First Contentful Paint (FCP): < 1.8s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Total Blocking Time (TBT): < 200ms
- Cumulative Layout Shift (CLS): < 0.1

### Backend:
- API Response Time (p50): < 100ms
- API Response Time (p95): < 500ms
- API Response Time (p99): < 1000ms
- Throughput: > 1000 req/s
- Error Rate: < 0.1%
- CPU Usage: < 70%
- Memory Usage: < 80%

### Database:
- Query Time (p50): < 10ms
- Query Time (p95): < 50ms
- Query Time (p99): < 100ms
- Connection Pool Utilization: < 80%

## Best Practices

1. Profile production workloads when possible
2. Use production-like data volumes
3. Profile under realistic load
4. Measure multiple times for consistency
5. Focus on p95/p99, not just averages
6. Optimize bottlenecks in order of impact
7. Always benchmark before and after
8. Monitor for regressions in CI/CD
9. Set up continuous profiling
10. Track performance over time

## Troubleshooting

### Issue: High CPU usage but no obvious hot path
**Solution**: Check for excessive small function calls, increase sampling rate, or use instrumentation

### Issue: Memory grows continuously
**Solution**: Run heap snapshot comparison to identify leak sources

### Issue: Slow database queries
**Solution**: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans

### Issue: High latency but low CPU
**Solution**: Profile I/O operations, check for blocking synchronous calls

## See Also

- PROCESS.md - Detailed step-by-step profiling workflow
- README.md - Quick start guide
- subagent-performance-profiler.md - Agent implementation details
- slash-command-profile.sh - Command-line interface
- mcp-performance-profiler.json - MCP tool schema