--- name: performance-review description: Detects time and space complexity hotspots via AST scan. Use when code feels slow, before performance-sensitive merges, or to find O(n²) regressions. alwaysApply: false category: code-quality tags: - performance - complexity - algorithms - ast - static-analysis tools: [] usage_patterns: - hotspot-detection - complexity-review - pre-merge-performance-check complexity: advanced model_hint: deep estimated_tokens: 280 progressive_loading: true dependencies: - pensive:shared modules: - modules/time-complexity.md - modules/space-complexity.md - modules/gauntlet-integration.md - modules/kuva-visualization.md --- ## Table of Contents - [Quick Start](#quick-start) - [When to Use](#when-to-use) - [When NOT to Use](#when-not-to-use) - [Required TodoWrite Items](#required-todowrite-items) - [Workflow](#workflow) - [Tiered Analysis](#tiered-analysis) - [Output Format](#output-format) - [Cross-Plugin Dependencies](#cross-plugin-dependencies) - [Supporting Modules](#supporting-modules) # Performance Review Static-analysis review of time and space complexity hotspots. The skill runs in three escalating tiers. Tier 1 uses Python's stdlib `ast` and always runs. Tier 2 uses gauntlet's tree-sitter parser to extend detection across languages when gauntlet is installed. Tier 3 uses the gauntlet code graph to upgrade severity when hotspots reach other hotspots transitively. If gauntlet is missing, Tiers 2 and 3 no-op and Tier 1 still produces useful findings on Python source. ## Quick Start ```bash /performance-review # scan changed files /performance-review path/to/file.py # scan one file /performance-review --tier 1 # force Tier 1 only ``` Programmatic use: ```python from pensive.skills.performance_review import PerformanceReviewSkill skill = PerformanceReviewSkill() result = skill.analyze(context, "src/module.py") for f in result.issues: print(f"[{f.severity}] {f.file}:{f.line} {f.message}") ``` ## When to Use - Pre-merge review of code that runs on user-scaled inputs. - Triage of a function that "feels slow" before reaching for a profiler. - Audit a refactor for newly introduced O(n²) patterns. - Guardrail for AI-generated code where nested-loop hot spots are common. ## When NOT to Use - The target needs **runtime** measurement (memory profile, CPU time on real data). Use `Skill(parseltongue:python-performance)` instead: that skill drives `cProfile`, `py-spy`, and benchmarks. - General refactoring guidance not focused on hotspots: use `Skill(pensive:code-refinement)` whose `algorithm-efficiency` module covers broader optimization patterns. This skill detects; that skill teaches. - Architecture-level performance (sharding, caching layers, queue placement): use `Skill(pensive:architecture-review)`. ## Required TodoWrite Items 1. `perf-review:context-established` 2. `perf-review:scan-complete` 3. `perf-review:findings-categorized` 4. `perf-review:integration-checked` 5. `perf-review:report-generated` ## Workflow ### Step 1: Context (`perf-review:context-established`) - Identify target files. If invoked with no argument, use `git diff --name-only`. If invoked with a path, scope to that. - Note language(s) involved. Tier 1 covers Python; non-Python files need gauntlet for Tier 2 coverage. ### Step 2: Tier 1 AST scan (`perf-review:scan-complete`) Load `modules/time-complexity.md` for the time-side patterns and `modules/space-complexity.md` for space-side. Each module documents the AST shape of every detector. For each Python target file, call: ```python from pensive.skills.performance_review import PerformanceReviewSkill result = PerformanceReviewSkill().analyze(context, path) ``` The visitor walks the AST once and emits `ReviewFinding` records. ### Step 3: Categorize and rank (`perf-review:findings-categorized`) Group findings by severity: - **HIGH**: O(n²) or worse on input-sized iterables (T1, T2). - **MEDIUM**: Unbounded allocation or per-iteration overhead (T3, T4, S1, S3). - **LOW**: Style-level inefficiencies (T5, T6, S2). - **CRITICAL**: Reserved for Tier-3 transitive upgrades. Within a severity, sort by file then line. Suppress findings the user has explicitly marked acceptable (TODO/comment markers) at module-load time of the target. ### Step 4: Tier 2/3 enrichment (`perf-review:integration-checked`) Load `modules/gauntlet-integration.md` for the contract. If gauntlet is installed, run Tier 2 on non-Python files that were skipped at Step 2. If a `.gauntlet/graph.db` exists in the working tree, run Tier 3 to upgrade severities based on transitive hotspot reachability. If gauntlet is missing, this step is a no-op and the report notes "Tier 2/3 not available: install gauntlet for multi-language and call-chain coverage." ### Step 5: Report (`perf-review:report-generated`) Emit a markdown report: ``` ## Performance Review: ### HIGH () - src/foo.py:42: Nested loop over the same iterable 'items'. Suggestion: sort + two pointers, or hash-set membership. ### MEDIUM () - ... ### LOW () - ... Tier coverage: 1 (always) | 2 (gauntlet ✓/✗) | 3 (graph ✓/✗) ``` The report is informational. Apply fixes via `Skill(pensive:code-refinement)` or hand-merge. ## Tiered Analysis | Tier | Source | When it runs | What it covers | |------|--------|--------------|----------------| | 1 | stdlib `ast` | Always (Python source only) | T1-T6, S1-S3 | | 2 | `gauntlet.treesitter_parser` | When gauntlet importable | Same patterns adapted to JS/TS, Go, Rust, Java, C/C++ | | 3 | `gauntlet.graph.GraphStore` | When `.gauntlet/graph.db` exists | Severity upgrade via transitive call chains | ## Output Format Findings use the shared `ReviewFinding` dataclass from `pensive.skills.base`: ```python ReviewFinding( file="src/module.py", line=42, severity="HIGH", # LOW | MEDIUM | HIGH | CRITICAL category="time", # time | space message="Nested loop over the same iterable 'items'.", suggestion="Sort + two pointers, or hash-set membership.", code_snippet="", ) ``` This shape matches every other pensive review skill, so the findings can flow into `Skill(pensive:unified-review)` without translation. ## Cross-Plugin Dependencies | Dependency | Required? | Effect when missing | |------------|-----------|---------------------| | `gauntlet.treesitter_parser` | Optional | Tier 2 returns []; Python coverage unchanged | | `gauntlet.graph.GraphStore` | Optional | Tier 3 returns []; severities are not upgraded | The optional-import contract follows the precedent in `plugins/leyline/src/leyline/tokens.py:25-32` and `plugins/gauntlet/hooks/pr_blast_radius.py:52-56`: try-import to module-level sentinels, then early-return on `None` inside each tier helper. See `modules/gauntlet-integration.md` for the exact code shape. ## Supporting Modules - `modules/time-complexity.md`: T1-T6 detector patterns and AST shapes. - `modules/space-complexity.md`: S1-S3 detector patterns. - `modules/gauntlet-integration.md`: Tier 2/3 contract, fallback semantics, examples. - `modules/kuva-visualization.md`: Rendering benchmark data as charts with kuva (criterion, pytest-benchmark, ad-hoc tables). Covers when chart evidence satisfies proof-of-work requirements. ## Verification A perf-review finding is only useful if the caller can confirm it is real. Use this checklist before treating any finding as worth fixing: 1. **Reproduce under a profiler.** Run `cProfile`, `py-spy`, or the language-specific equivalent on the hotspot. The findings pinpoint AST shapes; the profiler validates the runtime impact. 2. **Re-run the failing benchmark.** If `benches/` exists, the hotspot should show up in numbers, not just AST scans. 3. **Compare numbers before and after the proposed fix.** The fix is wrong if numbers do not move. Capture both timings as evidence references like `[E1]` (before) and `[E2]` (after). When 3+ data points exist, render a kuva chart and attach it to the PR — see `modules/kuva-visualization.md`. 4. **Sample two or three reported hotspots manually.** Findings can be true at the AST level and false at the call-graph level when callers short-circuit. Manual sampling catches that. The `Skill(imbue:proof-of-work)` discipline applies: claims like "the hotspot is fixed" require evidence, not assertion. ## Testing A test file already lives at `plugins/pensive/tests/skills/test_performance_review.py` covering the AST-shape detectors. Two rules for changes here: - **Add a new detector with a test.** Any new T-* or S-* pattern added to the modules ships with a test that has the smallest AST sample exercising it. - **Add a regression test for any false positive removed.** When the skill stops firing on a shape that used to look hot, the reason should appear as a test case so the regression is discoverable later. The Iron Law applies: a new detector without a failing test first is a request to skip TDD on a code-analysis component, which is exactly the place where TDD pays off most. ## Exit Criteria - [ ] A perf-review report file exists for the requested target. - [ ] Every finding carries a severity label and a concrete suggestion the caller can act on. - [ ] Time-complexity (T1-T6) and space-complexity (S1-S3) detectors have been run; tier coverage is reported. - [ ] Tier 2 (gauntlet treesitter) and Tier 3 (graph store) contracts honor the optional-import sentinel: missing modules return `[]` rather than raising. - [ ] Each new detector ships with a smallest-AST test that fails before the detector exists; each removed false positive ships with a regression test. - [ ] Findings flow into `Skill(pensive:unified-review)` without translation when invoked from the unified entry point.