--- name: project-profiler version: 2.0.0 description: | Generate an LLM-optimized project profile for any git repository. Outputs docs/{project-name}.md covering architecture, core abstractions, usage guide, design decisions, and recommendations. Trigger: "/project-profiler", "profile this project", "為專案建側寫" user_invocable: true argument_hint: "[target directory, default .]" allowed-tools: [Read, Write, Edit, Grep, Glob, Bash, Task, WebSearch, WebFetch] --- # project-profiler Generate an **LLM-optimized project profile** — a judgment-rich document that lets any future LLM answer within 60 seconds: 1. **What are the core abstractions?** 2. **Which modules to modify for feature X?** 3. **What is the biggest risk/debt?** 4. **When should / shouldn't you use this?** This is NOT a codebase map (directory + module navigation) or a diff schematic. This is **architectural judgment**: design tradeoffs, usage patterns, and when NOT to use. --- ## Model Strategy - **Opus**: Orchestrator — runs all phases, writes the final profile. Does NOT read source code directly (except in direct mode). - **Sonnet**: Subagents — read source code files, analyze patterns, report structured findings. - All subagents launch in a **single message** (parallel, never sequential). --- ## Phase 0: Preflight ### 0.1 Target & Project Name Determine the target directory (use argument if provided, else `.`). Extract project name from the first available source: 1. `package.json` → `name` 2. `pyproject.toml` → `[project] name` 3. `Cargo.toml` → `[package] name` 4. `go.mod` → module path (last segment) 5. Directory name as fallback ### 0.2 Run Scanner ```bash uv run {SKILL_DIR}/scripts/scan-project.py {TARGET_DIR} --format summary ``` Capture the summary output. This provides: - Project metadata (name, version, license, deps count) - Tech stack (languages, frameworks, package manager) - Language distribution (top 5 by tokens) - Entry points (CLI, API, library) - Project features (dockerfile, CI, tests, codebase_map) - **Detected conditional sections** (Storage, Embedding, Infrastructure, etc.) - **Workspaces** (monorepo packages, if any) - Top 20 largest files - Directory structure (depth 3) For debugging or when full file details are needed, use `--format json` instead. ### 0.3 Git Metadata Run these commands (use Bash tool): ```bash # Recent commits git -C {TARGET_DIR} log --oneline -20 # Contributors git -C {TARGET_DIR} log --format="%aN" | sort -u | head -20 # Version tags git -C {TARGET_DIR} tag --sort=-v:refname | head -5 # First commit date git -C {TARGET_DIR} log --format="%aI" --reverse | head -1 ``` ### 0.4 Check Existing CODEBASE_MAP If `docs/CODEBASE_MAP.md` exists, note its presence. The profile will reference it rather than duplicating directory structure. ### 0.5 Token Budget → Execution Mode Based on `total_tokens` from scanner, choose execution mode: | Total Tokens | Mode | Strategy | |-------------|------|----------| | **≤ 80k** | **Direct** | **Skip subagents. Opus reads all files directly and performs all analysis in a single context.** | | 80k – 200k | 2 agents | Agent AB (Core + Architecture + Design), Agent C (Usage + Patterns + Deployment) | | 200k – 400k | 3 agents | Agent A (Core + Design), Agent B (Architecture + Patterns), Agent C (Usage + Deployment) | | > 400k | 3 agents | Agent A, Agent B, Agent C — each ≤150k tokens, with overflow files assigned to lightest agent | **Why 80k threshold**: Opus has 200k context. At ≤80k source tokens, loading all files + scanner output + git metadata + writing the profile all fit comfortably. Subagent overhead (spawn + communication + wait) adds 2-3 minutes for zero benefit. **Direct mode workflow**: Skip Phase 2 entirely. After Phase 0+1, proceed to Phase 3 (read scanner `detected_sections` directly), then Phase 4, then Phase 5. Read files on-demand during synthesis — do NOT pre-read all files; read only what's needed for each section. --- ## Phase 1: Community & External Data > Run in parallel with Phase 2 subagent launches (or with Phase 3 in direct mode). ### 1.1 GitHub Stats Parse owner/repo from `.git/config` remote origin URL: ```bash git -C {TARGET_DIR} remote get-url origin ``` Extract `owner/repo` from the URL. Then: ```bash gh api repos/{owner}/{repo} --jq '{stars: .stargazers_count, forks: .forks_count, open_issues: .open_issues_count}' ``` If `gh` is unavailable or not a GitHub repo → fill with `N/A`. Do not fail. ### 1.2 Package Downloads **npm** (if `package.json` exists): ``` WebFetch https://api.npmjs.org/downloads/point/last-month/{package_name} ``` **PyPI** (if `pyproject.toml` exists): ``` WebFetch https://pypistats.org/api/packages/{package_name}/recent ``` If fetch fails → fill with `N/A`. ### 1.3 License Read from (in order): LICENSE file → package metadata field → `N/A`. ### 1.4 Maturity Assessment Calculate from: - **Git history length**: first commit date → now - **Release count**: number of version tags - **Contributor count**: unique authors | Criteria | Score | |----------|-------| | < 3 months, < 3 releases, 1-2 contributors | experimental | | 3-12 months, 3-10 releases, 2-5 contributors | growing | | 1-3 years, 10-50 releases, 5-20 contributors | stable | | > 3 years, > 50 releases, > 20 contributors | mature | Use the **lowest** matching tier (conservative estimate). --- ## Phase 2: Parallel Deep Exploration > **Direct mode (≤80k tokens): SKIP this entire phase.** Proceed to Phase 3. Opus reads files directly during synthesis. Launch Sonnet subagents using the `Task` tool. **All subagents must be launched in a single message.** Assign files to each agent based on the token budget from Phase 0.5. Use the scanner output to determine which files go to which agent. ### File Assignment Strategy **If workspaces detected** (monorepo): 1. Group files by workspace package 2. Assign complete packages to agents (never split a package across agents) 3. Agent A gets packages with core business logic 4. Agent B gets packages with infrastructure/shared libraries 5. Agent C gets packages with CLI/API/SDK surface + docs **If no workspaces** (single project): 1. Sort all files by path 2. Group by top-level directory 3. Assign groups to agents based on their responsibility: - Agent A gets: core source files (src/lib, core/, models/, types/) + README, CHANGELOG - Agent B gets: architecture files (routes/, middleware/, config/, entry points) + tests/ - Agent C gets: integration files (API, CLI, SDK, examples/, docs/) + .github/ 4. If files don't fit neatly, distribute remaining to agents under budget ### Agent A: Core Abstractions + Design Decisions ``` Task prompt for Agent A — subagent_type: "general-purpose", model: "sonnet" ## Mission Identify the most architecturally significant abstractions AND key design decisions in this codebase. ## Files to Read {LIST_OF_ASSIGNED_FILES} Also read: README.md, CHANGELOG.md (if they exist and not already assigned) ## Output Format ### Part 1: Core Abstractions Report the TOP 10-15 most architecturally significant abstractions, ranked by fan-in (how many other files reference them). If the project has fewer than 15 meaningful abstractions, report all. For EACH abstraction: #### {Name} - **Purpose**: {≤15 words} - **Defined in**: `{file_path}:ClassName` or `{file_path}:function_name` - **Type**: {class / interface / type / trait / struct / protocol} - **Public methods/fields**: {exact_count} - **Adapters/implementations**: {count} — {names with file paths} - **Imported by**: {count} files - **Key pattern**: {factory / singleton / strategy / observer / none} ### Part 2: Design Decisions For EACH decision (identify 3-5): #### {Decision Title} - **Problem**: {what needed solving} - **Choice made**: {what was chosen} - **Evidence**: `{file_path}:ClassName` or `{file_path}:function_name` — {relevant code pattern} - **Alternatives NOT chosen**: {what else could have been done} - **Why not**: {concrete reason — performance / complexity / ecosystem / team preference} - **Tradeoff**: {what is gained} vs. {what is lost} ### Part 3: Architecture Risks For EACH risk (identify 2-4): - **Risk**: {specific description} - **Location**: `{file_path}:SymbolName` - **Impact**: {what breaks if this goes wrong} - **Mitigation**: {how to fix or reduce risk} ### Part 4: Recommendations For EACH recommendation (identify 2-4): - **Current state**: `{file_path}` — {what exists now} - **Problem**: {specific issue — not "could be better"} - **Fix**: {concrete action — not "consider refactoring"} - **Effect**: {measurable outcome} ## Rules - Every number must come from actual code (count imports, count methods) - No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大") - Every claim needs a `file:SymbolName` reference (NOT line numbers — they break on next commit) - Each decision must have a "why NOT the alternative" answer - Report the TOTAL count of abstractions found ``` ### Agent B: Architecture + Code Quality Patterns ``` Task prompt for Agent B — subagent_type: "general-purpose", model: "sonnet" ## Mission Map the system topology, layer boundaries, data flow paths, AND code quality patterns. ## Files to Read {LIST_OF_ASSIGNED_FILES} ## Output Format ### Part 1: Topology - **Architecture style**: {monolith / microservices / serverless / library / CLI tool / plugin system} - **Entry points**: {list with file paths} - **Layer count**: {N} ### Part 2: Layers (table) | Layer | Modules | Files | Responsibility | |-------|---------|-------|---------------| ### Part 3: Data Flow Paths For each major user-facing operation: 1. **{Operation name}**: {step1_module} → {step2_module} → ... → {result} - Evidence: `{file:SymbolName}` for each step ### Part 4: Mermaid Diagram Elements Provide raw data for Mermaid diagrams: - Nodes: {module_name} — {file_path} - Edges: {from} → {to} — {relationship_type: imports/calls/extends} ### Part 5: Module Dependencies (structured) For each module: - **{module_name}** (`{path}`): imports [{dep1}, {dep2}, ...] ### Part 6: Boundary Violations List any cases where a lower layer imports from a higher layer. ### Part 7: Code Quality Patterns - **Error handling**: {strategy and consistency — e.g., "try/catch at controller layer, custom AppError class"} - **Logging**: {framework and coverage — e.g., "winston, structured JSON, covers all API routes"} - **Testing**: {framework, coverage level, patterns — e.g., "vitest, 47 test files, unit + integration"} - **Type safety**: {strict / partial / none — e.g., "strict TypeScript with no `any` casts"} ## Rules - Every number must come from actual code - No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大") - Every claim needs a `file:SymbolName` reference (NOT line numbers) - Focus on HOW data moves, not WHAT the code does ``` ### Agent C: Usage + Deployment + Security ``` Task prompt for Agent C — subagent_type: "general-purpose", model: "sonnet" ## Mission Document all consumption interfaces, deployment modes, security surface, and AI agent integration points. ## Files to Read {LIST_OF_ASSIGNED_FILES} ## Output Format ### Part 1: Consumption Interfaces For each interface found: - **Type**: {Python SDK / TS SDK / REST API / MCP / CLI / Vercel AI SDK / Library import} - **Entry point**: `{file_path}:ClassName` or `{file_path}:function_name` - **Public surface**: {N} exported functions/classes/endpoints - **Example usage**: {minimal code snippet from docs/examples or inferred from exports} ### Part 2: Configuration | Source | Path | Key Settings | |--------|------|-------------| ### Part 3: Deployment Modes | Mode | Evidence | Prerequisites | |------|----------|--------------| ### Part 4: AI Agent Integration - **MCP tools**: {count and names, if any} - **Function calling schemas**: {count, if any} - **Tool definitions**: {count, if any} - **SDK integration**: {Vercel AI SDK / LangChain / LlamaIndex / custom} ### Part 5: Security Surface - **API key handling**: {how and where} - **Auth mechanism**: {type and file} - **CORS config**: {if applicable} - **Data at rest**: {encrypted / plaintext / N/A} - **PII handling**: {anonymized / logged / none detected} ### Part 6: Performance & Cost Indicators | Metric | Value | Source | |--------|-------|--------| | {LLM calls per request} | {N} | `{file:SymbolName}` | | {Cache strategy} | {type} | `{file:SymbolName}` | | {Rate limiting} | {config} | `{file:SymbolName}` | ## Rules - Every number must come from actual code - No subjective language (no "well-designed", "elegant", "robust", "clean", "優雅", "完美", "強大") - Every claim needs a `file:SymbolName` reference (NOT line numbers) - Include BOTH documented and undocumented interfaces ``` --- ## Phase 3: Conditional Section Detection Read the scanner's `detected_sections` output from Phase 0.2. This is the **primary** detection source — the scanner checks dependency manifests and file presence automatically. **Cross-reference** with subagent reports (skip in direct mode) for additional evidence richness. If a subagent reports a pattern not caught by the scanner (e.g., concurrency via raw `Promise.all` without a library dependency), add it. Refer to `references/section-detection-rules.md` for the full pattern reference. Record results as a checklist: ``` - [x] Storage Layer — scanner detected: prisma in dependencies - [ ] Embedding Pipeline — not detected - [x] Infrastructure Layer — scanner detected: Dockerfile present - [ ] Knowledge Graph — not detected - [ ] Scalability — not detected - [x] Concurrency — Agent B reported: Promise.all pattern in src/worker.ts ``` --- ## Phase 4: Synthesis & Draft ### 4.1 Merge Reports **Subagent mode**: Combine all subagent outputs into a working document. **Direct mode**: Read key files on-demand as you write each section. Do NOT pre-read all files. For each section, read only the files relevant to that section's analysis. Cross-validate: - Core abstractions ↔ Architecture layers: each abstraction belongs to a layer - Architecture data flow ↔ Usage interfaces: flows end at documented interfaces - Design decisions ↔ Code evidence: decisions are backed by found patterns ### 4.2 Generate Mermaid Diagrams + Structured Dependencies Using Agent B's raw data (or direct file analysis in direct mode), create: **Architecture Topology** (`graph TB`): - Each node = actual module/directory - Each edge = import/dependency relationship - Label edges with relationship type - Group nodes by layer using subgraph **Data Flow** (`sequenceDiagram`): - Each participant = actual module - Each arrow = actual function call or event - Cover the primary user-facing operation **Structured Module Dependencies** (text, below each Mermaid diagram): - Provide a machine-parseable dependency list as fallback for LLM readers - Format: `- **{module_name}** (\`{path}\`): imports [{dep1}, {dep2}, ...]` ### 4.3 Fill Output Template Follow `references/output-template.md` exactly. Fill each section: | Section | Primary Source | Secondary Source | |---------|---------------|-----------------| | 1. Project Identity | Scanner metadata + Phase 1 | Git metadata | | 2. Architecture | Agent B (Parts 1-6) | Agent A (abstractions per layer) | | 3. Core Abstractions | Agent A (Part 1) | Agent B (layer context) | | 4. Conditional | Phase 3 detection + relevant agents | — | | 5. Usage Guide | Agent C (Parts 1-4) | Scanner entry_points | | 6. Performance & Cost | Agent C (Part 6) + Agent B | — | | 7. Security & Privacy | Agent C (Part 5) | — | | 8. Design Decisions | Agent A (Part 2) | Agent B (architecture context) | | 8.5 Code Quality & Patterns | Agent B (Part 7) | Agent A (supporting observations) | | 9. Recommendations | Agent A (Part 4) | Agents B/C (supporting evidence) | ### 4.4 Write Output Write the profile to `docs/{project-name}.md` using the Write tool. --- ## Phase 5: Quality Gate Read `references/quality-checklist.md` and verify the output. ### 5.1 Banned Language Scan Search the written file for any word from the banned list: **English:** ``` well-designed, elegant, elegantly, robust, clean, impressive, state-of-the-art, cutting-edge, best-in-class, beautifully, carefully crafted, thoughtfully, well-thought-out, well-architected, nicely, cleverly, sophisticated, powerful, seamless, seamlessly, intuitive, intuitively ``` **Chinese:** ``` 優雅、完美、強大、直觀、無縫、精心、巧妙、出色、卓越、先進、高效、靈活、穩健、簡潔 ``` If found → replace with verifiable descriptions and re-write. ### 5.2 Number Audit Scan for all numeric claims. Each must have a traceable source. Remove or fix any "approximately", "around", "roughly", "several", "many", "numerous". ### 5.3 Structure Verification - [ ] Every `##` section starts with `>` blockquote summary - [ ] No directory tree duplicated from CODEBASE_MAP.md - [ ] No file extension enumeration (use percentages) - [ ] No generic concluding paragraph - [ ] At least one Mermaid diagram in Architecture section - [ ] Structured module dependency list below each Mermaid diagram - [ ] All Mermaid nodes reference actual modules ### 5.4 Core Question Test For each of the 4 core questions, locate the specific answer in the output: 1. Core abstractions → Section 3 2. Module to modify → Section 2 Layer Boundaries table 3. Biggest risk → Section 9 first recommendation 4. When to use/not use → Section 1 positioning line ### 5.5 Evidence Audit - Section 3: every abstraction has `file:SymbolName` reference - Section 8: every decision has `file:SymbolName` + alternative + tradeoff - Section 8.5: code quality patterns have framework names + coverage facts - Section 9: every recommendation has `file_path` + specific problem + concrete fix If any check fails → fix the issue in the file and re-verify. --- ## Output After all phases complete, report to the user: ``` Profile generated: docs/{project-name}.md - {total_files} files scanned ({total_tokens} tokens) - {N} core abstractions identified - {N} design decisions documented - {N} recommendations - Conditional sections: {list of included sections or "none"} ```