--- name: polish description: Analyze a codebase for improvements across multiple dimensions — test coverage gaps, documentation quality, performance, API ergonomics, correctness. Use when the user wants to find what's missing or could be better in their project. argument-hint: "[dimension ...] (e.g. tests, docs, perf, api, correctness — or empty for all)" disable-model-invocation: true allowed-tools: - Bash(cargo test *) - Bash(cargo check *) - Bash(cargo clippy *) - Agent - Read - Glob - Grep --- ## Repo state - VCS: !`test -d .jj && echo "jj" || echo "git"` - Language: !`test -f Cargo.toml && echo "rust" || (test -f package.json && echo "node" || (test -f go.mod && echo "go" || echo "unknown"))` ## Step 1: Determine scope Available dimensions: | Dimension | What it covers | |---|---| | `tests` | Coverage gaps, untested public API, missing edge cases, flaky patterns | | `docs` | Crate/module docs, doc comments on public items, README, examples | | `perf` | Unnecessary allocations, hot-path inefficiency, lock contention, algorithmic issues | | `api` | Ergonomics, misuse resistance, consistency, naming, type-level invariants | | `correctness` | Error handling, safety invariants, panic/crash paths, resource leaks, thread safety | If `$ARGUMENTS` is empty, run **all five** dimensions. If `$ARGUMENTS` names specific dimensions (e.g. `/polish tests docs`), run only those. Accept common aliases: `test`→`tests`, `doc`/`documentation`→`docs`, `performance`→`perf`, `ergonomics`→`api`, `sound`/`soundness`/`safety`/`unsafe`→`correctness`. If `$ARGUMENTS` contains something else, treat it as custom focus instructions and pass it verbatim (delimited by triple backticks) to all agents. ## Step 2: Launch parallel analysis agents Launch one Agent subagent per selected dimension, all in a single message. Each agent receives the shared analysis guidelines (below) followed by its dimension-specific guidelines. Each agent should use Read, Glob, and Grep extensively to examine the full codebase — this is not diff-scoped. ### Shared analysis guidelines Include the following in every subagent prompt. Analyze the **entire codebase**, not just recent changes. Read every source file relevant to your dimension. The goal is to find what's missing, incomplete, or improvable — not to review a specific change. **What to flag** — gaps or issues that: (a) would meaningfully improve the project if addressed; (b) are specific and actionable; (c) aren't nit-picks or style preferences; (d) the author would likely agree with. **Don't flag** — things that are fine as-is for the project's current scope, speculative future needs, or patterns that are idiomatic even if imperfect. **Priorities** — tag each finding: [P0] blocking issue, must fix; [P1] high-value, should do; [P2] worth doing; [P3] nice-to-have. **Format** — number findings with the dimension prefix (T1, D1, F1, A1, C1, …). For each: priority tag, file path with line number where relevant, one-paragraph explanation. Be specific — "add tests for X" not "improve test coverage." Report in under 500 words. ### Agent: Tests (prefix: T) Identify gaps in test coverage and test quality. - List every public function/method/type. For each, note whether it has direct test coverage. - Flag untested public API surface — these are the highest priority. - Flag untested error paths and edge cases. - Check for missing integration tests of key workflows (especially any documented in specs/README). - Flag flaky test patterns: time-dependent, order-dependent, non-deterministic. - Note if property-based/fuzz testing would add value for any component. - Check that tests actually assert meaningful properties (not just "doesn't panic"). Don't suggest tests for trivial getters/setters or boilerplate impls. Focus on behavioral gaps. ### Agent: Documentation (prefix: D) Evaluate documentation completeness and quality. - Check for crate/module/package-level docs (the overview a new user sees first). - List every public item without a doc comment. - Check README: does it exist? Does it have a usage example? Is it accurate? - Check for runnable examples (doctests, examples/ directory). - Evaluate existing doc comments: do they explain *why*, not just *what*? Are they accurate? - Check for stale/misleading comments that contradict the code. - Note if a CHANGELOG exists (relevant if the project is published). Don't flag missing docs on items where the name and type signature are self-explanatory. ### Agent: Performance (prefix: F — not P, to avoid collision with priority tags) Look for performance issues and optimization opportunities. - Scan for unnecessary allocations in hot paths (e.g. building collections where iteration would do, redundant string copies). - Check for algorithmic inefficiency (O(n²) where O(n) is possible, etc.). - Look for unnecessary copies/clones where borrows, moves, or references would work. - Check lock granularity and contention potential. - Note any I/O patterns that could be batched or buffered. - Check buffer/capacity sizing — are defaults reasonable? Is there unnecessary resizing? - Flag any benchmarks that exist or should exist. Don't flag micro-optimizations that don't matter at the project's scale. Focus on issues that would matter with realistic workloads. ### Agent: API Ergonomics (prefix: A) Evaluate the public API surface for usability and safety. - Is the API hard to misuse? Can invalid states be constructed? - Are error types informative? Can callers distinguish failure modes they care about? - Is the API consistent (naming, parameter order, return types)? - Are there missing convenience methods that would reduce boilerplate for common use cases? - Check standard trait/interface implementations: are expected capabilities (equality, hashing, serialization, debug printing, cloning) present where users would need them? - Look for builder pattern opportunities, or builder patterns that aren't pulling their weight. - Check for missing conversions/coercions that would improve interop with the ecosystem. - Is the type-level API making good use of the type system? (parse-don't-validate, newtype wrappers, tagged unions over stringly-typed fields, etc.) Don't suggest API changes that would bloat the surface area for hypothetical use cases. ### Agent: Correctness (prefix: C) Audit for correctness and safety issues in the existing codebase. - Check error handling: are errors silently dropped, swallowed, or logged-and-continued? Are there crash-on-error patterns (unwrap, assert, panic, throw-without-catch) in library/non-test code? - Look for resource leaks: unclosed handles, missing cleanup on error paths, unbounded allocations. - Verify thread/concurrency safety: data races, missing synchronization, lock ordering issues. - Check for invariant violations: can public API calls put internal state into an invalid configuration? - Audit unsafe or unchecked code blocks: are safety invariants documented? Is there a safe alternative? - Look for TODOs/FIXMEs that flag known correctness concerns. This is not a review of a diff — audit the code as it stands today. ## Step 3: Synthesize findings Once all agents return: 1. Collect all findings, keeping dimension prefixes. 2. Sort by priority (P0 first, then P1, P2, P3). 3. Deduplicate: if two agents flagged the same issue, merge and keep the higher priority. 4. Present the combined analysis grouped by priority tier. 5. End with a summary: one line per dimension stating the main gap, plus an overall assessment of project maturity.