--- name: linux-perf description: Linux perf profiler skill for CPU performance analysis. Use when collecting sampling profiles with perf record, generating perf report, measuring hardware counters (cache misses, branch mispredicts, IPC), identifying hot functions, or feeding perf data into flamegraph tools. Activates on queries about perf, Linux performance counters, PMU events, off-CPU profiling, perf stat, perf annotate, or sampling-based profiling on Linux. --- # Linux perf ## Purpose Guide agents through `perf` for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation. ## Triggers - "Which function is consuming the most CPU?" - "How do I measure cache misses / IPC?" - "How do I use `perf` to find hotspots?" - "How do I generate a flamegraph from perf data?" - "perf shows `[unknown]` or `[kernel]` frames" ## Workflow ### 1. Prerequisites ```bash # Install sudo apt install linux-perf # Debian/Ubuntu (version-matched) sudo dnf install perf # Fedora/RHEL # Check permissions # By default perf requires root or paranoid level ≤ 1 cat /proc/sys/kernel/perf_event_paranoid # 2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions # Temporarily lower (session only) sudo sysctl -w kernel.perf_event_paranoid=1 # Persistent echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf sudo sysctl -p /etc/sysctl.d/99-perf.conf ``` Compile the target with debug symbols for useful frame data: ```bash gcc -g -O2 -fno-omit-frame-pointer -o prog main.c # -fno-omit-frame-pointer: essential for frame-pointer-based unwinding # Alternative: compile with DWARF CFI and use --call-graph=dwarf ``` ### 2. perf stat — quick counters ```bash # Basic hardware counters perf stat ./prog # With specific events perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog # Wall-clock comparison: N runs perf stat -r 5 ./prog # Attach to existing process perf stat -p 12345 sleep 10 ``` Interpret `perf stat` output: - **IPC** (instructions per cycle) < 1.0: memory-bound or stalled pipeline - **cache-miss rate** > 5%: significant cache pressure - **branch-miss rate** > 5%: branch predictor struggling ### 3. perf record — sampling ```bash # Default: sample at 1000 Hz (cycles event) perf record -g ./prog # Specify frequency perf record -F 999 -g ./prog # Specific event perf record -e cache-misses -g ./prog # Attach to running process perf record -F 999 -g -p 12345 sleep 30 # Off-CPU profiling (time spent waiting) perf record -e sched:sched_switch -ag sleep 10 # DWARF call graphs (better for binaries without frame pointers) perf record -F 999 --call-graph=dwarf ./prog # Save to named file perf record -o myapp.perf.data -g ./prog ``` ### 4. perf report — interactive analysis ```bash perf report # reads perf.data perf report -i myapp.perf.data perf report --no-children # self time only (not cumulative) perf report --sort comm,dso,sym # sort by fields perf report --stdio # non-interactive text output ``` Navigation in TUI: - `Enter` — expand a symbol - `a` — annotate (show assembly with hit counts) - `s` — show source (needs debug info) - `d` — filter by DSO (library) - `t` — filter by thread - `?` — help ### 5. perf annotate — hot instructions ```bash # Show assembly with hit percentages perf annotate sym_name # From report: press 'a' on a symbol # Or directly: perf annotate -i perf.data --symbol=hot_function --stdio ``` High hit count on a `mov` or `vmovdqa` suggests a cache miss at that load. ### 6. perf top — live profiling ```bash # Live top, like 'top' but for functions sudo perf top -g # Filter by process sudo perf top -p 12345 ``` ### 7. Feed into flamegraphs ```bash # Generate perf script output perf script > out.perf # Use Brendan Gregg's FlameGraph tools git clone https://github.com/brendangregg/FlameGraph ./FlameGraph/stackcollapse-perf.pl out.perf > out.folded ./FlameGraph/flamegraph.pl out.folded > flamegraph.svg # Open flamegraph.svg in browser ``` See `skills/profilers/flamegraphs` for reading flamegraphs and interpreting results. ### 8. Common issues | Problem | Cause | Fix | |---------|-------|-----| | `Permission denied` | `perf_event_paranoid` too high | Lower paranoid level or run with `sudo` | | `[unknown]` frames | Missing frame pointers or debug info | Recompile with `-fno-omit-frame-pointer` or use `--call-graph=dwarf` | | `[kernel]` everywhere | Kernel symbols not visible | Use `sudo perf record`; install `linux-image-$(uname -r)-dbgsym` | | `No kallsyms` | Kernel symbols unavailable | `echo 0 | sudo tee /proc/sys/kernel/kptr_restrict` | | Empty report for short program | Program exits too fast | Use `-F 9999` or instrument longer workload | | DWARF unwinding slow | Large DWARF stack | Limit with `--call-graph dwarf,512` | ### 9. Useful events ```bash # List all available events perf list # Common hardware events cycles instructions cache-references cache-misses branch-instructions branch-misses stalled-cycles-frontend stalled-cycles-backend # Software events context-switches cpu-migrations page-faults # Tracepoints (requires root) sched:sched_switch syscalls:sys_enter_read ``` For a counter reference and interpretation guide, see [references/events.md](references/events.md). ## Related skills - Use `skills/profilers/flamegraphs` for SVG flamegraph generation and reading - Use `skills/profilers/valgrind` for cache simulation and memory profiling - Use `skills/compilers/gcc` or `skills/compilers/clang` for PGO from perf data (AutoFDO)