--- name: cartographer description: "Codebase mapping and documentation using parallel AI subagents. Invoke for: map this codebase, document architecture, understand codebase, onboarding to new project, create CODEBASE_MAP.md, generate architecture diagrams." --- # Cartographer > Map and document codebases of any size using parallel AI subagents. Creates `docs/CODEBASE_MAP.md` with architecture diagrams, file purposes, dependencies, and navigation guides. Updates `CLAUDE.md` with a summary. ## Triggers Activate when user says: "map this codebase", "cartographer", "/cartographer", "create codebase map", "document the architecture", "understand this codebase", or when onboarding to a new project. ## Critical Principle **"Opus orchestrates, Sonnet reads."** Never have Opus read codebase files directly. Always delegate file reading to Sonnet subagents—even for small codebases. Opus plans the work, spawns subagents, and synthesizes their reports. ## Process ### 1. Check for Existing Map First check if `docs/CODEBASE_MAP.md` already exists. **If map exists:** 1. Read the `last_mapped` timestamp from the map's frontmatter 2. Check for changes since last map: - Run `git log --oneline --since=""` if git available - If no git, run scanner and compare file counts/paths 3. If significant changes detected, proceed to update mode 4. If no changes, inform user the map is current **If map does not exist:** Proceed to full mapping. ### 2. Scan the Codebase Run the scanner script to get an overview: ```bash # Option 1: If uv is available (preferred) uv run ~/.claude/skills/cartographer/scripts/scan-codebase.py . --format json # Option 2: Direct execution ~/.claude/skills/cartographer/scripts/scan-codebase.py . --format json # Option 3: Explicit python3 python3 ~/.claude/skills/cartographer/scripts/scan-codebase.py . --format json ``` **Install tiktoken if missing:** ```bash pip install tiktoken # or with uv: uv pip install tiktoken ``` The output provides: - Complete file tree with token counts per file - Total token budget needed - Skipped files (binary, too large) ### 3. Plan Subagent Assignments Analyze the scan output to divide work among subagents. **Token budget per subagent:** ~150,000 tokens (safe margin under Sonnet's 200k context limit) **Grouping strategy:** 1. Group files by directory/module (keeps related code together) 2. Balance token counts across groups 3. Aim for more subagents with smaller chunks (150k max each) **For small codebases (<100k tokens):** Still use a single Sonnet subagent. Opus orchestrates, Sonnet reads—never have Opus read the codebase directly. **Example assignment:** ``` Subagent 1: src/api/, src/middleware/ (~120k tokens) Subagent 2: src/components/, src/hooks/ (~140k tokens) Subagent 3: src/lib/, src/utils/ (~100k tokens) Subagent 4: tests/, docs/ (~80k tokens) ``` ### 4. Spawn Sonnet Subagents in Parallel Use the Task tool with `subagent_type: "Explore"` and `model: "sonnet"` for each group. **CRITICAL: Spawn all subagents in a SINGLE message with multiple Task tool calls.** Each subagent prompt should: 1. List the specific files/directories to read 2. Request analysis of: - Purpose of each file/module - Key exports and public APIs - Dependencies (what it imports) - Dependents (what imports it, if discoverable) - Patterns and conventions used - Gotchas or non-obvious behavior 3. Request output as structured markdown **Example subagent prompt:** ``` You are mapping part of a codebase. Read and analyze these files: - src/api/routes.ts - src/api/middleware/auth.ts - src/api/middleware/rateLimit.ts [... list all files in this group] For each file, document: 1. **Purpose**: One-line description 2. **Exports**: Key functions, classes, types exported 3. **Imports**: Notable dependencies 4. **Patterns**: Design patterns or conventions used 5. **Gotchas**: Non-obvious behavior, edge cases, warnings Also identify: - How these files connect to each other - Entry points and data flow - Any configuration or environment dependencies Return your analysis as markdown with clear headers per file/module. ``` ### 5. Synthesize Reports Once all subagents complete, synthesize their outputs: 1. **Merge** all subagent reports 2. **Deduplicate** any overlapping analysis 3. **Identify cross-cutting concerns** (shared patterns, common gotchas) 4. **Build the architecture diagram** showing module relationships 5. **Extract key navigation paths** for common tasks ### 6. Write CODEBASE_MAP.md Create `docs/CODEBASE_MAP.md` with this structure: ```markdown --- last_mapped: YYYY-MM-DDTHH:MM:SSZ total_files: N total_tokens: N --- # Codebase Map > Auto-generated by Cartographer. Last mapped: [date] ## System Overview [2-3 paragraph summary of what this codebase does] ## Architecture ```mermaid graph TB subgraph Client Web[Web App] end subgraph API Server[API Server] Auth[Auth Middleware] end subgraph Data DB[(Database)] Cache[(Cache)] end Web --> Server Server --> Auth Server --> DB Server --> Cache ``` [Adapt diagram to match actual architecture] ## Directory Structure [Tree with purpose annotations] ## Module Guide ### [Module Name] **Purpose**: [description] **Entry point**: [file] **Key files**: | File | Purpose | Tokens | |------|---------|--------| **Exports**: [key APIs] **Dependencies**: [what it needs] **Dependents**: [what needs it] [Repeat for each module] ## Data Flow ```mermaid sequenceDiagram participant User participant Web participant API participant DB User->>Web: Action Web->>API: Request API->>DB: Query DB-->>API: Result API-->>Web: Response Web-->>User: Update UI ``` [Create diagrams for: auth flow, main data operations, etc.] ## Conventions [Naming patterns, code style, architectural rules] ## Gotchas [Non-obvious behaviors, warnings, things that trip people up] ## Navigation Guide **To add a new API endpoint**: [files to touch] **To add a new component**: [files to touch] **To modify auth**: [files to touch] **To add a database migration**: [files to touch] [etc. based on codebase type] ``` ### 7. Update CLAUDE.md Add or update the codebase summary in CLAUDE.md: ```markdown ## Codebase Overview [2-3 sentence summary] **Stack**: [key technologies] **Structure**: [high-level layout] For detailed architecture, see [docs/CODEBASE_MAP.md](docs/CODEBASE_MAP.md). ``` If `AGENTS.md` exists, update it similarly. ## Update Mode When updating an existing map: 1. Identify changed files from git or scanner diff 2. Spawn subagents only for changed modules 3. Merge new analysis with existing map 4. Update `last_mapped` timestamp 5. Preserve unchanged sections ## Token Budget Reference | Model | Context Window | Safe Budget per Subagent | |-------|----------------|-------------------------| | Sonnet | 200,000 | 150,000 | | Opus | 200,000 | 100,000 | | Haiku | 200,000 | 100,000 | Always use Sonnet subagents—best balance of capability and cost for file analysis. ## Troubleshooting **Scanner fails with tiktoken error:** ```bash pip install tiktoken # or with uv: uv pip install tiktoken ``` **Python not found:** Try `python3`, `python`, or use `uv run` which handles Python automatically. **Codebase too large even for subagents:** - Increase number of subagents - Focus on src/ directories, skip vendored code - Use `--max-tokens` flag to skip huge files **Git not available:** - Fall back to file count/path comparison - Store file list hash in map frontmatter for change detection ## Output After completion, report what was created: - `docs/CODEBASE_MAP.md` - full architecture documentation - Updated `CLAUDE.md` with summary If cartographer helped you, consider starring: https://github.com/kingbootoshi/cartographer