--- name: understand description: Analyze a codebase to produce an interactive knowledge graph for understanding architecture, components, and relationships argument-hint: ["[path] [--full|--auto-update|--no-auto-update|--review|--language ]"] --- # /understand Analyze the current codebase and produce a `knowledge-graph.json` file in `.understand-anything/`. This file powers the interactive dashboard for exploring the project's architecture. ## Options - `$ARGUMENTS` may contain: - `--full` — Force a full rebuild, ignoring any existing graph - `--auto-update` — Enable automatic graph updates on commit (writes `autoUpdate: true` to `.understand-anything/config.json`) - `--no-auto-update` — Disable automatic graph updates (writes `autoUpdate: false` to `.understand-anything/config.json`) - `--review` — Run full LLM graph-reviewer instead of inline deterministic validation - `--language ` — Generate all textual content (summaries, descriptions, tags, titles, languageNotes, languageLesson) in the specified language. Accepts ISO 639-1 codes (`zh`, `ja`, `ko`, `en`, `es`, `fr`, `de`, etc.) or friendly names (`chinese`, `japanese`, `korean`, `english`, `spanish`, etc.). Locale variants supported: `zh-TW`, `zh-HK`, etc. Defaults to `en` (English). Stores preference in `.understand-anything/config.json` for consistency across incremental updates. - A directory path (e.g. `/path/to/repo` or `../other-project`) — Analyze the given directory instead of the current working directory --- ## Progress Reporting Throughout execution, report progress to the user at each phase transition and during batch processing. This keeps users informed on large codebases where analysis can take a long time. - **Phase transitions:** At the start of each phase, print a status line: > `[Phase N/7] ...` > > Example: `[Phase 2/7] Analyzing files (12 batches)...` - **Batch progress:** During Phase 2, report each batch with its index and total: > `Analyzing batch X/N (files: foo.ts, bar.ts, ...)` (list up to 3 filenames, then `...` if more) - **Phase completion:** When a phase finishes, briefly confirm: > `Phase N complete. ` > > Example: `Phase 1 complete. Found 247 files across 3 languages.` --- ## Phase 0 — Pre-flight Determine whether to run a full analysis or incremental update. 1. **Resolve `PROJECT_ROOT`:** - Parse `$ARGUMENTS` for a non-flag token (any argument that does not start with `--`). If found, treat it as the target directory path. - If the path is relative, resolve it against the current working directory. - Verify the resolved path exists and is a directory (run `test -d `). If it does not exist or is not a directory, report an error to the user and **STOP**. - Set `PROJECT_ROOT` to the resolved absolute path. - If no directory path argument is found, set `PROJECT_ROOT` to the current working directory. - **Worktree redirect.** If `PROJECT_ROOT` is inside a git worktree (not the main checkout), redirect output to the main repository root. Worktrees managed by Claude Code are ephemeral — `.understand-anything/` written there is destroyed when the session ends, taking the knowledge graph with it (issue #133). Detect a worktree by comparing `git rev-parse --git-dir` against `git rev-parse --git-common-dir`; in a normal checkout or submodule they resolve to the same path, in a worktree they differ and the parent of `--git-common-dir` is the main repo root. ```bash COMMON_DIR=$(git -C "$PROJECT_ROOT" rev-parse --git-common-dir 2>/dev/null) GIT_DIR=$(git -C "$PROJECT_ROOT" rev-parse --git-dir 2>/dev/null) if [ -n "$COMMON_DIR" ] && [ -n "$GIT_DIR" ]; then COMMON_ABS=$(cd "$PROJECT_ROOT" && cd "$COMMON_DIR" 2>/dev/null && pwd -P) GIT_ABS=$(cd "$PROJECT_ROOT" && cd "$GIT_DIR" 2>/dev/null && pwd -P) if [ -n "$COMMON_ABS" ] && [ "$COMMON_ABS" != "$GIT_ABS" ]; then MAIN_ROOT=$(dirname "$COMMON_ABS") if [ -d "$MAIN_ROOT" ] && [ "${UNDERSTAND_NO_WORKTREE_REDIRECT:-0}" != "1" ]; then echo "[understand] Detected git worktree at $PROJECT_ROOT" echo "[understand] Redirecting output to main repo root: $MAIN_ROOT" echo "[understand] (Set UNDERSTAND_NO_WORKTREE_REDIRECT=1 to keep PROJECT_ROOT as the worktree.)" PROJECT_ROOT="$MAIN_ROOT" fi fi fi ``` Set `UNDERSTAND_NO_WORKTREE_REDIRECT=1` if you intentionally want a per-worktree graph (rare — most users want the redirect). 1.5. **Ensure the plugin is built.** Later phases invoke Node scripts that import `@understand-anything/core`. On a fresh install `packages/core/dist/` does not exist yet — build once. **Important:** do **not** assume the plugin root is simply two directories above the skill path string. In many installations `~/.agents/skills/understand` is a symlink into the real plugin checkout. Prefer runtime-provided plugin roots first (for Claude), then fall back to universal symlinks, skill symlink resolution, and common clone-based install paths. Resolve the plugin root like this: ```bash SKILL_REAL=$(realpath ~/.agents/skills/understand 2>/dev/null || readlink -f ~/.agents/skills/understand 2>/dev/null || echo "") SELF_RELATIVE=$([ -n "$SKILL_REAL" ] && cd "$SKILL_REAL/../.." 2>/dev/null && pwd || echo "") COPILOT_SKILL_REAL=$(realpath ~/.copilot/skills/understand 2>/dev/null || readlink -f ~/.copilot/skills/understand 2>/dev/null || echo "") COPILOT_SELF_RELATIVE=$([ -n "$COPILOT_SKILL_REAL" ] && cd "$COPILOT_SKILL_REAL/../.." 2>/dev/null && pwd || echo "") PLUGIN_ROOT="" for candidate in \ "${CLAUDE_PLUGIN_ROOT}" \ "$HOME/.understand-anything-plugin" \ "$SELF_RELATIVE" \ "$COPILOT_SELF_RELATIVE" \ "$HOME/.codex/understand-anything/understand-anything-plugin" \ "$HOME/.opencode/understand-anything/understand-anything-plugin" \ "$HOME/.pi/understand-anything/understand-anything-plugin" \ "$HOME/understand-anything/understand-anything-plugin"; do if [ -n "$candidate" ] && [ -f "$candidate/package.json" ] && [ -f "$candidate/pnpm-workspace.yaml" ]; then PLUGIN_ROOT="$candidate" break fi done if [ -z "$PLUGIN_ROOT" ]; then echo "Error: Cannot find the understand-anything plugin root." echo "Checked:" echo " - ${CLAUDE_PLUGIN_ROOT:-}" echo " - $HOME/.understand-anything-plugin" echo " - ${SELF_RELATIVE:-}" echo " - ${COPILOT_SELF_RELATIVE:-}" echo " - $HOME/.codex/understand-anything/understand-anything-plugin" echo " - $HOME/.opencode/understand-anything/understand-anything-plugin" echo " - $HOME/.pi/understand-anything/understand-anything-plugin" echo " - $HOME/understand-anything/understand-anything-plugin" echo "Make sure the plugin is installed correctly." exit 1 fi if [ ! -f "$PLUGIN_ROOT/packages/core/dist/index.js" ]; then cd "$PLUGIN_ROOT" && (pnpm install --frozen-lockfile 2>/dev/null || pnpm install) && pnpm --filter @understand-anything/core build fi ``` If `pnpm` is missing, report to the user: "Install Node.js ≥ 22 and pnpm ≥ 10, then re-run `/understand`." 2. Get the current git commit hash: ```bash git rev-parse HEAD ``` 3. Create the intermediate and temp output directories: ```bash mkdir -p $PROJECT_ROOT/.understand-anything/intermediate mkdir -p $PROJECT_ROOT/.understand-anything/tmp ``` 3.5. **Auto-update configuration:** - If `--auto-update` is in `$ARGUMENTS`: write `{"autoUpdate": true}` to `$PROJECT_ROOT/.understand-anything/config.json` - If `--no-auto-update` is in `$ARGUMENTS`: write `{"autoUpdate": false}` to `$PROJECT_ROOT/.understand-anything/config.json` - These flags only set the config — analysis proceeds normally regardless. 3.6. **Language configuration:** - Parse `$ARGUMENTS` for `--language ` flag. If found, extract the language code. - **Language code normalization:** Map friendly names to ISO codes: - `chinese` → `zh`, `japanese` → `ja`, `korean` → `ko`, `english` → `en`, `spanish` → `es`, `french` → `fr`, `german` → `de`, `portuguese` → `pt`, `russian` → `ru`, `arabic` → `ar`, etc. - Locale variants: `zh-TW`, `zh-HK`, `zh-CN`, `pt-BR`, etc. are preserved as-is. - If `--language` is NOT specified: - **Stored preference wins.** If `$PROJECT_ROOT/.understand-anything/config.json` has an `outputLanguage` field, set `$OUTPUT_LANGUAGE` to it and skip the rest. - **Otherwise detect (first run only).** Infer the predominant language of the user's conversation as an ISO 639-1 code (`$DETECTED_LANG`). If it is `en` or cannot be confidently determined, set `$OUTPUT_LANGUAGE=en` and proceed silently — no prompt (English users see no change). - **If `$DETECTED_LANG` ≠ `en`, confirm once before analyzing:** tell the user you detected `` and ask whether to generate all content in it; they press Enter/"yes" to accept, or type another language code/name to override (normalize via the friendly-name map above). If running non-interactively (no reply possible), skip the wait, use `$DETECTED_LANG`, and print a one-line notice instead of blocking. - **Persist** the resolved `$OUTPUT_LANGUAGE` (including `en`) into `config.json` so it never re-prompts for this project. - If `--language` IS specified: - Update `$PROJECT_ROOT/.understand-anything/config.json` with the new language: merge `{"outputLanguage": ""}` into existing config. - Store as `$OUTPUT_LANGUAGE` for use throughout all phases. - **Language directive template:** Store as `$LANGUAGE_DIRECTIVE`: ```markdown > **Language directive**: Generate all textual content (summaries, descriptions, tags, titles, languageNotes, languageLesson) in **{language}**. Maintain technical accuracy while using natural, native-level phrasing in the target language. Keep technical terms in English when no standard translation exists (e.g., "middleware", "hook", "barrel"). ``` 4. **Check for subdomain knowledge graphs to merge:** List all `*knowledge-graph*.json` files in `$PROJECT_ROOT/.understand-anything/` **excluding** `knowledge-graph.json` itself (e.g. `frontend-knowledge-graph.json`, `backend-knowledge-graph.json`). If any subdomain graphs exist, run the merge script bundled with this skill (located next to this SKILL.md file — use the skill directory path, not the project root): ```bash python /merge-subdomain-graphs.py $PROJECT_ROOT ``` The script discovers subdomain graphs, loads the existing `knowledge-graph.json` as a base (if present), and merges everything into `knowledge-graph.json` (deduplicating nodes and edges). Report the merge summary to the user, then continue with the merged graph. 5. Check if `$PROJECT_ROOT/.understand-anything/knowledge-graph.json` exists. If it does, read it. 6. Check if `$PROJECT_ROOT/.understand-anything/meta.json` exists. If it does, read it to get `gitCommitHash`. 7. **Decision logic:** | Condition | Action | |---|---| | `--full` flag in `$ARGUMENTS` | Full analysis (all phases) | | No existing graph or meta | Full analysis (all phases) | | `--review` flag + existing graph + unchanged commit hash | Skip to Phase 6 (review-only — reuse existing assembled graph) | | Existing graph + unchanged commit hash | Ask the user: "The graph is up to date at this commit. Would you like to: **(a)** run a full rebuild (`--full`), **(b)** run the LLM graph reviewer (`--review`), or **(c)** do nothing?" Then follow their choice. If they pick (c), STOP. | | Existing graph + changed files | Incremental update (re-analyze changed files only) | **Review-only path:** Copy the existing `knowledge-graph.json` to `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`, then jump directly to Phase 6 step 3. For incremental updates, get the changed file list: ```bash git diff ..HEAD --name-only ``` If this returns no files, report "Graph is up to date" and STOP. 8. **Collect project context for subagent injection:** - Read `README.md` (or `README.rst`, `readme.md`) from `$PROJECT_ROOT` if it exists. Store as `$README_CONTENT` (first 3000 characters). - Read the primary package manifest (`package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`, `pom.xml`) if it exists. Store as `$MANIFEST_CONTENT`. - Capture the top-level directory tree: ```bash find $PROJECT_ROOT -maxdepth 2 -type f -not -path '*/node_modules/*' -not -path '*/.git/*' -not -path '*/dist/*' | head -100 ``` Store as `$DIR_TREE`. - Detect the project entry point by checking for common patterns (in order): `src/index.ts`, `src/main.ts`, `src/App.tsx`, `index.js`, `main.py`, `manage.py`, `app.py`, `wsgi.py`, `asgi.py`, `run.py`, `__main__.py`, `main.go`, `cmd/*/main.go`, `src/main.rs`, `src/lib.rs`, `src/main/java/**/Application.java`, `Program.cs`, `config.ru`, `index.php`. Store first match as `$ENTRY_POINT`. --- ## Phase 0.5 — Ignore Configuration Set up and verify the `.understandignore` file before scanning. 1. Check if `$PROJECT_ROOT/.understand-anything/.understandignore` exists. 2. **If it does NOT exist**, generate a starter file: - Run the following Node.js one-liner in `$PROJECT_ROOT` (reads `.gitignore` and deduplicates against built-in defaults): ```bash node -e " const fs = require('fs'); const path = require('path'); const root = process.cwd(); const defaults = ['node_modules/','node_modules','.git/','vendor/','venv/','.venv/','__pycache__/','dist/','dist','build/','build','out/','coverage/','coverage','.next/','.cache/','.turbo/','target/','obj/','*.lock','package-lock.json','yarn.lock','pnpm-lock.yaml','*.png','*.jpg','*.jpeg','*.gif','*.svg','*.ico','*.woff','*.woff2','*.ttf','*.eot','*.mp3','*.mp4','*.pdf','*.zip','*.tar','*.gz','*.min.js','*.min.css','*.map','*.generated.*','.idea/','.vscode/','LICENSE','.gitignore','.editorconfig','.prettierrc','.eslintrc*','*.log']; const norm = p => p.replace(/\/+$/, ''); const defaultSet = new Set(defaults.map(norm)); const header = '# .understandignore — patterns for files/dirs to exclude from analysis\n# Syntax: same as .gitignore (globs, # comments, ! negation, trailing / for dirs)\n# Lines below are suggestions — uncomment to activate.\n# Use ! prefix to force-include something excluded by defaults.\n#\n# Built-in defaults (always excluded unless negated):\n# node_modules/, .git/, dist/, build/, obj/, *.lock, *.min.js, etc.\n#\n'; let body = ''; const gitignorePath = path.join(root, '.gitignore'); if (fs.existsSync(gitignorePath)) { const gi = fs.readFileSync(gitignorePath, 'utf-8').split('\n').map(l => l.trim()).filter(l => l && !l.startsWith('#')).filter(p => !defaultSet.has(norm(p))); if (gi.length) { body += '# --- From .gitignore (uncomment to exclude) ---\n\n' + gi.map(p => '# ' + p).join('\n') + '\n\n'; } } const dirs = ['__tests__','test','tests','fixtures','testdata','docs','examples','scripts','migrations','.storybook']; const found = dirs.filter(d => fs.existsSync(path.join(root, d))); if (found.length) { body += '# --- Detected directories (uncomment to exclude) ---\n\n' + found.map(d => '# ' + d + '/').join('\n') + '\n\n'; } body += '# --- Test file patterns (uncomment to exclude) ---\n\n# *.test.*\n# *.spec.*\n# *.snap\n'; const outDir = path.join(root, '.understand-anything'); if (!fs.existsSync(outDir)) fs.mkdirSync(outDir, { recursive: true }); fs.writeFileSync(path.join(outDir, '.understandignore'), header + body); " ``` - Report to the user: > Generated `.understand-anything/.understandignore` with suggested exclusions based on your project structure. Please review it and uncomment any patterns you'd like to exclude from analysis. When ready, confirm to continue. - **Wait for user confirmation before proceeding.** 3. **If it already exists**, report: > Found `.understand-anything/.understandignore`. Review it if needed, then confirm to continue. - **Wait for user confirmation before proceeding.** 4. After confirmation, proceed to Phase 1. --- ## Phase 1 — SCAN (Full analysis only) Report to the user: `[Phase 1/7] Scanning project files...` Dispatch a subagent using the `project-scanner` agent definition (at `agents/project-scanner.md`). Append the following additional context: > **Additional context from main session:** > > Project README (first 3000 chars): > ``` > $README_CONTENT > ``` > > Package manifest: > ``` > $MANIFEST_CONTENT > ``` > > Use this context to produce more accurate project name, description, and framework detection. The README and manifest are authoritative — prefer their information over heuristics. > > $LANGUAGE_DIRECTIVE Pass these parameters in the dispatch prompt: > Scan this project directory to discover all project files (including non-code files like configs, docs, infrastructure), detect languages and frameworks. > Project root: `$PROJECT_ROOT` > Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/scan-result.json` After the subagent completes, read `$PROJECT_ROOT/.understand-anything/intermediate/scan-result.json` to get: - Project name, description - Languages, frameworks - File list with line counts and `fileCategory` per file (`code`, `config`, `docs`, `infra`, `data`, `script`, `markup`) - Complexity estimate - Import map (`importMap`): pre-resolved project-internal imports per file (non-code files have empty arrays) Store `importMap` in memory as `$IMPORT_MAP` for use in Phase 2 batch construction. Store the file list as `$FILE_LIST` with `fileCategory` metadata for use in Phase 2 batch construction. **Gate check:** If >100 files, inform the user and suggest scoping with a subdirectory argument. Proceed only if user confirms or add guidance that this may take a while. If the scan result includes `filteredByIgnore > 0`, report: > Excluded {filteredByIgnore} files via `.understandignore`. --- ## Phase 1.5 — BATCH Report: `[Phase 1.5/7] Computing semantic batches...` Run the bundled batching script: ```bash node /compute-batches.mjs $PROJECT_ROOT ``` Reads `.understand-anything/intermediate/scan-result.json`, writes `.understand-anything/intermediate/batches.json`. Capture stderr. Append any line starting with `Warning:` to `$PHASE_WARNINGS` for the final report. If the script exits non-zero, the failure is hard — relay the full stderr to the user as a Phase 1.5 failure. Do not attempt to recover; the script's internal fallback (count-based) already handles recoverable issues. A non-zero exit means a fundamental problem (missing input file, malformed JSON, etc.). --- ## Phase 2 — ANALYZE ### Full analysis path Load `.understand-anything/intermediate/batches.json` (produced by Phase 1.5). Iterate the `batches[]` array. Report: `[Phase 2/7] Analyzing files — files in batches (up to 5 concurrent)...` For each batch, dispatch a subagent using the `file-analyzer` agent definition (at `agents/file-analyzer.md`). Run up to **5 subagents concurrently**. Append the following additional context: > **Additional context from main session:** > > Project: `` — `` > Languages: `` > > $LANGUAGE_DIRECTIVE Dispatch prompt template (fill in batch-specific values from `batches.json[i]`): > Analyze these files and produce GraphNode and GraphEdge objects. > Project root: `$PROJECT_ROOT` > Project: `` > Languages: `` > Batch: `/` > Skill directory (for bundled scripts): `` > Output: write to `$PROJECT_ROOT/.understand-anything/intermediate/batch-.json` (single-file mode) OR `batch--part-.json` (split mode, per Step B of your output protocol). > > Pre-resolved import data for this batch (use directly — do NOT re-resolve imports from source): > ```json > > ``` > > Cross-batch neighbors with their exported symbols (confidence boost for cross-batch edges): > ```json > > ``` > > Files to analyze in this batch (every entry MUST be passed through to `batchFiles` with all four fields — `path`, `language`, `sizeLines`, `fileCategory`): > 1. `` ( lines, language: ``, fileCategory: ``) > 2. `` ( lines, language: ``, fileCategory: ``) > ... **Output naming is per-batchIndex — no fusion.** If you fuse multiple small batches into a single file-analyzer dispatch for token efficiency, the dispatched agent must STILL write one output file per original `batchIndex` using `batch-.json` or `batch--part-.json`. The merge script's regex (`batch-(\d+)(?:-part-(\d+))?\.json`) silently drops any other naming (e.g., `batch-fused-8-13.json`, `batch-8-13.json`), losing every node and edge in that file. After each dispatch returns, verify each `batchIndex` in the dispatched input has a corresponding `batch-.json` (or `batch--part-*.json`) on disk before proceeding to the next dispatch. After ALL batches complete, report to the user: `Phase 2 complete. All batches analyzed.` Run the merge-and-normalize script bundled with this skill (located next to this SKILL.md file — use the skill directory path, not the project root): ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` This script reads all `batch-*.json` files (including `batch--part-.json` produced by file-analyzers that split their output) from `$PROJECT_ROOT/.understand-anything/intermediate/`, then in one pass: - Combines all nodes and edges across batches - Normalizes node IDs (strips double prefixes, project-name prefixes, adds missing prefixes) - Normalizes complexity values (`low`→`simple`, `medium`→`moderate`, `high`→`complex`, etc.) - Rewrites edge references to match corrected node IDs - Deduplicates nodes by ID (keeps last occurrence) and edges by `(source, target, type)` - Drops dangling edges referencing missing nodes - Logs all corrections and dropped items to stderr The merge script also runs a `tested_by` linker that canonicalizes test-coverage edges in two passes. **Pass 1** walks LLM-emitted `tested_by` edges and flips inverted ones in place; semantically broken edges (test↔test, prod↔prod, orphan endpoints) are dropped. **Pass 2** supplements with path-convention pairings. Production nodes that end up sourcing any `tested_by` edge get a `"tested"` tag. All resulting edges run `production → test`. Output: `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json` Include the script's warnings in `$PHASE_WARNINGS` for the reviewer. ### Incremental update path Write the changed-files list (one path per line) to a temp file: ```bash git diff ..HEAD --name-only > $PROJECT_ROOT/.understand-anything/tmp/changed-files.txt ``` Run compute-batches with `--changed-files`: ```bash node /compute-batches.mjs $PROJECT_ROOT \ --changed-files=$PROJECT_ROOT/.understand-anything/tmp/changed-files.txt ``` This produces a `batches.json` that contains only batches with changed files, but neighborMap entries still reference unchanged files (with their full-graph batchIndex) so cross-batch edges remain emittable. Then dispatch file-analyzer subagents per the same template as the full path. After batches complete: 1. Remove old nodes whose `filePath` matches any changed file from the existing graph 2. Remove old edges whose `source` or `target` references a removed node 3. Write the pruned existing nodes/edges as `batch-existing.json` in the intermediate directory 4. Run the same merge script — it will combine `batch-existing.json` with the fresh `batch-*.json` files: ```bash python /merge-batch-graphs.py $PROJECT_ROOT ``` --- ## Phase 3 — ASSEMBLE REVIEW Report to the user: `[Phase 3/7] Reviewing assembled graph...` Dispatch a subagent using the `assemble-reviewer` agent definition (at `agents/assemble-reviewer.md`). Pass these parameters in the dispatch prompt: > Review the assembled graph at `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`. > Project root: `$PROJECT_ROOT` > Batch files are at: `$PROJECT_ROOT/.understand-anything/intermediate/batch-*.json` > Write review output to: `$PROJECT_ROOT/.understand-anything/intermediate/assemble-review.json` > > **Merge script report:** > ``` > > ``` > > **Import map for cross-batch edge verification:** > ```json > $IMPORT_MAP > ``` After the subagent completes, read `$PROJECT_ROOT/.understand-anything/intermediate/assemble-review.json` and add any notes to `$PHASE_WARNINGS`. --- ## Phase 4 — ARCHITECTURE Report to the user: `[Phase 4/7] Identifying architectural layers...` **Build the combined prompt template:** 1. Use the `architecture-analyzer` agent definition (at `agents/architecture-analyzer.md`). 2. **Language context injection:** For each language detected in Phase 1 (e.g., `python`, `markdown`, `dockerfile`, `yaml`, `sql`, `terraform`, `graphql`, `protobuf`, `shell`, `html`, `css`), read the file at `./languages/.md` (e.g., `./languages/python.md`, `./languages/dockerfile.md`) and append its content after the base template under a `## Language Context` header. If the file does not exist for a detected language, skip it silently and continue. These files are in the `languages/` subdirectory next to this SKILL.md file. **Include non-code language snippets** — they provide edge patterns and summary styles for non-code files. 3. **Framework addendum injection:** For each framework detected in Phase 1 (e.g., `Django`), read the file at `./frameworks/.md` (e.g., `./frameworks/django.md`) and append its full content after the language context. If the file does not exist for a detected framework, skip it silently and continue. These files are in the `frameworks/` subdirectory next to this SKILL.md file. 4. **Output locale injection:** If `$OUTPUT_LANGUAGE` is NOT `en` (English), read the locale guidance file at `./locales/.md` (e.g., `./locales/zh.md`, `./locales/ja.md`, `./locales/ko.md`) and append its content after the framework addendums under a `## Output Language Guidelines` header. This provides language-specific guidance for tag naming conventions, summary style, and layer name translations. If the locale file does not exist for the specified language, skip silently — the `$LANGUAGE_DIRECTIVE` still applies. These files are in the `locales/` subdirectory next to this SKILL.md file. Append the language/framework context and the following additional context to the agent's prompt: > **Additional context from main session:** > > Frameworks detected: `` > > Directory tree (top 2 levels): > ``` > $DIR_TREE > ``` > > Use the directory tree, language context, and framework addendums (appended above) to inform layer assignments. Directory structure is strong evidence for layer boundaries. Non-code files (config, docs, infrastructure, data) should be assigned to appropriate layers — see the prompt template for guidance. > > $LANGUAGE_DIRECTIVE Pass these parameters in the dispatch prompt: > Analyze this codebase's structure to identify architectural layers. > Project root: `$PROJECT_ROOT` > Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/layers.json` > Project: `` — `` > > File nodes (all node types — includes code files, config, document, service, pipeline, table, schema, resource, endpoint): > ```json > [list of {id, type, name, filePath, summary, tags} for ALL file-level nodes — omit complexity, languageNotes] > ``` > > Import edges: > ```json > [list of edges with type "imports"] > ``` > > All edges (for cross-category analysis — includes configures, documents, deploys, triggers, etc.): > ```json > [list of ALL edges — include all edge types] > ``` After the subagent completes, read `$PROJECT_ROOT/.understand-anything/intermediate/layers.json` and normalize it into a final `layers` array. Apply these steps **in order**: 1. **Unwrap envelope:** If the file contains `{ "layers": [...] }` instead of a plain array, extract the inner array. (The prompt requests a plain array, but LLMs may still produce an envelope.) 2. **Rename legacy fields:** If any layer object has a `nodes` field instead of `nodeIds`, rename `nodes` → `nodeIds`. If `nodes` entries are objects with an `id` field rather than plain strings, extract just the `id` values into `nodeIds`. 3. **Synthesize missing IDs:** If any layer is missing an `id`, generate one as `layer:`. 4. **Convert file paths:** If `nodeIds` entries are raw file paths without a known prefix (`file:`, `config:`, `document:`, `service:`, `pipeline:`, `table:`, `schema:`, `resource:`, `endpoint:`), convert them to `file:`. 5. **Drop dangling refs:** Remove any `nodeIds` entries that do not exist in the merged node set. Each element of the final `layers` array MUST have this shape: ```json [ { "id": "layer:", "name": "", "description": "", "nodeIds": ["file:src/App.tsx", "config:tsconfig.json", "document:README.md"] } ] ``` All four fields (`id`, `name`, `description`, `nodeIds`) are required. **For incremental updates:** Always re-run architecture analysis on the full merged node set, since layer assignments may shift when files change. **Context for incremental updates:** When re-running architecture analysis, also inject the previous layer definitions: > Previous layer definitions (for naming consistency): > ```json > [previous layers from existing graph] > ``` > > Maintain the same layer names and IDs where possible. Only add/remove layers if the file structure has materially changed. --- ## Phase 5 — TOUR Report to the user: `[Phase 5/7] Building guided tour...` Dispatch a subagent using the `tour-builder` agent definition (at `agents/tour-builder.md`). Append the following additional context: > **Additional context from main session:** > > Project README (first 3000 chars): > ``` > $README_CONTENT > ``` > > Project entry point: `$ENTRY_POINT` > > Use the README to align the tour narrative with the project's own documentation. Start the tour from the entry point if one was detected. The tour should tell the same story the README tells, but through the lens of actual code structure. > > $LANGUAGE_DIRECTIVE Pass these parameters in the dispatch prompt: > Create a guided learning tour for this codebase. > Project root: `$PROJECT_ROOT` > Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/tour.json` > Project: `` — `` > Languages: `` > > Nodes (all file-level nodes — includes code files, config, document, service, pipeline, table, schema, resource, endpoint): > ```json > [list of {id, name, filePath, summary, type} for ALL file-level nodes — do NOT include function or class nodes] > ``` > > Layers: > ```json > [list of {id, name, description} for each layer — omit nodeIds] > ``` > > Edges (all types — includes imports, calls, configures, documents, deploys, triggers, etc.): > ```json > [list of ALL edges — include all edge types for complete graph topology analysis] > ``` After the subagent completes, read `$PROJECT_ROOT/.understand-anything/intermediate/tour.json` and normalize it into a final `tour` array. Apply these steps **in order**: 1. **Unwrap envelope:** If the file contains `{ "steps": [...] }` instead of a plain array, extract the inner array. (The prompt requests a plain array, but LLMs may still produce an envelope.) 2. **Rename legacy fields:** If any step has `nodesToInspect` instead of `nodeIds`, rename it → `nodeIds`. If any step has `whyItMatters` instead of `description`, rename it → `description`. 3. **Convert file paths:** If `nodeIds` entries are raw file paths without a known prefix (`file:`, `config:`, `document:`, `service:`, `pipeline:`, `table:`, `schema:`, `resource:`, `endpoint:`), convert them to `file:`. 4. **Drop dangling refs:** Remove any `nodeIds` entries that do not exist in the merged node set. 5. **Sort** by `order` before saving. Each element of the final `tour` array MUST have this shape: ```json [ { "order": 1, "title": "Project Overview", "description": "Start with the README to understand the project's purpose and architecture.", "nodeIds": ["document:README.md"] }, { "order": 2, "title": "Application Entry Point", "description": "This step explains how the frontend boots and mounts.", "nodeIds": ["file:src/main.tsx", "file:src/App.tsx"] } ] ``` Required fields: `order`, `title`, `description`, `nodeIds`. Preserve optional `languageLesson` when present. --- ## Phase 6 — REVIEW Report to the user: `[Phase 6/7] Validating knowledge graph...` Assemble the full KnowledgeGraph JSON object: ```json { "version": "1.0.0", "project": { "name": "", "languages": [""], "frameworks": [""], "description": "", "analyzedAt": "", "gitCommitHash": "" }, "nodes": [], "edges": [], "layers": [], "tour": [] } ``` 1. Before writing the assembled graph, validate that: - `layers` is an array of objects with these required fields: `id`, `name`, `description`, `nodeIds` - `tour` is an array of objects with these required fields: `order`, `title`, `description`, `nodeIds` - `tour[*].languageLesson` is allowed as an optional string field - Every `layers[*].nodeIds` entry exists in the merged node set - Every `tour[*].nodeIds` entry exists in the merged node set If validation fails, automatically normalize and rewrite the graph into this shape before saving. If the graph still fails final validation after the normalization pass, save it with warnings but mark dashboard auto-launch as skipped. 2. Write the assembled graph to `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`. 3. **Check `$ARGUMENTS` for `--review` flag.** Then run the appropriate validation path: --- #### Default path (no `--review`): inline deterministic validation Write the following Node.js script to `$PROJECT_ROOT/.understand-anything/tmp/ua-inline-validate.cjs`: ```javascript #!/usr/bin/env node const fs = require('fs'); const graphPath = process.argv[2]; const outputPath = process.argv[3]; try { const graph = JSON.parse(fs.readFileSync(graphPath, 'utf8')); const issues = [], warnings = []; if (!Array.isArray(graph.nodes)) { issues.push('graph.nodes is missing or not an array'); graph.nodes = []; } if (!Array.isArray(graph.edges)) { issues.push('graph.edges is missing or not an array'); graph.edges = []; } const nodeIds = new Set(); const seen = new Map(); graph.nodes.forEach((n, i) => { if (!n.id) { issues.push(`Node[${i}] missing id`); return; } if (!n.type) issues.push(`Node[${i}] '${n.id}' missing type`); if (!n.name) issues.push(`Node[${i}] '${n.id}' missing name`); if (!n.summary) issues.push(`Node[${i}] '${n.id}' missing summary`); if (!n.tags || !n.tags.length) issues.push(`Node[${i}] '${n.id}' missing tags`); if (seen.has(n.id)) issues.push(`Duplicate node ID '${n.id}' at indices ${seen.get(n.id)} and ${i}`); else seen.set(n.id, i); nodeIds.add(n.id); }); graph.edges.forEach((e, i) => { if (!nodeIds.has(e.source)) issues.push(`Edge[${i}] source '${e.source}' not found`); if (!nodeIds.has(e.target)) issues.push(`Edge[${i}] target '${e.target}' not found`); }); const fileLevelTypes = new Set(['file', 'config', 'document', 'service', 'pipeline', 'table', 'schema', 'resource', 'endpoint']); const fileNodes = graph.nodes.filter(n => fileLevelTypes.has(n.type)).map(n => n.id); const assigned = new Map(); if (!Array.isArray(graph.layers)) { if (graph.layers) warnings.push('graph.layers is not an array'); graph.layers = []; } if (!Array.isArray(graph.tour)) { if (graph.tour) warnings.push('graph.tour is not an array'); graph.tour = []; } graph.layers.forEach(layer => { (layer.nodeIds || []).forEach(id => { if (!nodeIds.has(id)) issues.push(`Layer '${layer.id}' refs missing node '${id}'`); if (assigned.has(id)) issues.push(`Node '${id}' appears in multiple layers`); assigned.set(id, layer.id); }); }); fileNodes.forEach(id => { if (!assigned.has(id)) issues.push(`File node '${id}' not in any layer`); }); graph.tour.forEach((step, i) => { (step.nodeIds || []).forEach(id => { if (!nodeIds.has(id)) issues.push(`Tour step[${i}] refs missing node '${id}'`); }); }); const withEdges = new Set([ ...graph.edges.map(e => e.source), ...graph.edges.map(e => e.target) ]); graph.nodes.forEach(n => { if (!withEdges.has(n.id)) warnings.push(`Node '${n.id}' has no edges (orphan)`); }); const stats = { totalNodes: graph.nodes.length, totalEdges: graph.edges.length, totalLayers: graph.layers.length, tourSteps: graph.tour.length, nodeTypes: graph.nodes.reduce((a, n) => { a[n.type] = (a[n.type]||0)+1; return a; }, {}), edgeTypes: graph.edges.reduce((a, e) => { a[e.type] = (a[e.type]||0)+1; return a; }, {}) }; fs.writeFileSync(outputPath, JSON.stringify({ issues, warnings, stats }, null, 2)); process.exit(0); } catch (err) { process.stderr.write(err.message + '\n'); process.exit(1); } ``` Execute it: ```bash node $PROJECT_ROOT/.understand-anything/tmp/ua-inline-validate.cjs \ "$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json" \ "$PROJECT_ROOT/.understand-anything/intermediate/review.json" ``` If the script exits non-zero, read stderr, fix the script, and retry once. --- #### `--review` path: full LLM reviewer If `--review` IS in `$ARGUMENTS`, dispatch the LLM graph-reviewer subagent as follows: Dispatch a subagent using the `graph-reviewer` agent definition (at `agents/graph-reviewer.md`). Append the following additional context: > **Additional context from main session:** > > Phase 1 scan results (file inventory): > ```json > [list of {path, sizeLines} from scan-result.json] > ``` > > Phase warnings/errors accumulated during analysis: > - [list any batch failures, skipped files, or warnings from Phases 2-5] > > Cross-validate: every file in the scan inventory should have a corresponding node in the graph (node types may vary: `file:`, `config:`, `document:`, `service:`, `pipeline:`, `table:`, `schema:`, `resource:`, `endpoint:`). Flag any missing files. Also flag any graph nodes whose `filePath` doesn't appear in the scan inventory. Pass these parameters in the dispatch prompt: > Validate the knowledge graph at `$PROJECT_ROOT/.understand-anything/intermediate/assembled-graph.json`. > Project root: `$PROJECT_ROOT` > Read the file and validate it for completeness and correctness. > Write output to: `$PROJECT_ROOT/.understand-anything/intermediate/review.json` --- 4. Read `$PROJECT_ROOT/.understand-anything/intermediate/review.json`. 5. **If `issues` array is non-empty:** - Review the `issues` list - Apply automated fixes where possible: - Remove edges with dangling references - Fill missing required fields with sensible defaults (e.g., empty `tags` -> `["untagged"]`, empty `summary` -> `"No summary available"`) - Remove nodes with invalid types - Re-run the final graph validation after automated fixes - If critical issues remain after one fix attempt, save the graph anyway but include the warnings in the final report and mark dashboard auto-launch as skipped 6. **If `issues` array is empty:** Proceed to Phase 7. --- ## Phase 7 — SAVE Report to the user: `[Phase 7/7] Saving knowledge graph...` 1. Write the final knowledge graph to `$PROJECT_ROOT/.understand-anything/knowledge-graph.json`. 2. **Generate structural fingerprints baseline.** This creates the basis for future automatic incremental updates and **must succeed before `meta.json` is written** — otherwise auto-update sees a fresh commit hash with no fingerprints to compare against, classifies every file as STRUCTURAL, and escalates to `FULL_UPDATE` on every subsequent commit (issue #152). Write the input file: ```bash cat > $PROJECT_ROOT/.understand-anything/intermediate/fingerprint-input.json <], "gitCommitHash": "" } EOF ``` Then invoke the bundled script (located next to this SKILL.md): ```bash node /build-fingerprints.mjs \ $PROJECT_ROOT/.understand-anything/intermediate/fingerprint-input.json ``` The script uses `TreeSitterPlugin + PluginRegistry` exactly like `extract-structure.mjs`, so the baseline matches the comparison logic used during auto-updates. **If the script exits non-zero or stdout does not include `Fingerprints baseline:`, abort Phase 7 and report the error. Do NOT proceed to step 3 (writing `meta.json`).** 3. Write metadata to `$PROJECT_ROOT/.understand-anything/meta.json` (only after step 2 succeeded): ```json { "lastAnalyzedAt": "", "gitCommitHash": "", "version": "1.0.0", "analyzedFiles": } ``` 4. Clean up intermediate files, **preserving `scan-result.json`** so future incremental runs can skip Phase 1 SCAN (see issue #293): ```bash # Preserve scan-result.json — Phase 1's deterministic file inventory. # Future incremental runs (Phase 2 compute-batches.mjs --changed-files=…) # need this inventory; without it, Phase 1 must re-dispatch and pay ~157k # tokens / ~158s per incremental run. INTER="$PROJECT_ROOT/.understand-anything/intermediate" if [ -d "$INTER" ]; then find "$INTER" -mindepth 1 -maxdepth 1 -not -name 'scan-result.json' -exec rm -rf {} + fi rm -rf $PROJECT_ROOT/.understand-anything/tmp ``` 5. Report a summary to the user containing: - Project name and description - Files analyzed / total files (with breakdown by fileCategory: code, config, docs, infra, data, script, markup) - Nodes created (broken down by type: file, function, class, config, document, service, table, endpoint, pipeline, schema, resource) - Edges created (broken down by type) - Layers identified (with names) - Tour steps generated (count) - Any warnings from the reviewer - Path to the output file: `$PROJECT_ROOT/.understand-anything/knowledge-graph.json` 6. Only automatically launch the dashboard by invoking the `/understand-dashboard` skill if final graph validation passed after normalization/review fixes. If final validation did not pass, report that the graph was saved with warnings and dashboard launch was skipped. --- ## Error Handling - If any subagent dispatch fails, retry **once** with the same prompt plus additional context about the failure. - Track all warnings and errors from each phase in a `$PHASE_WARNINGS` list. When using `--review`, pass this list to the graph-reviewer in Phase 6. On the default path, include accumulated warnings in the Phase 7 final report. - If it fails a second time, skip that phase and continue with partial results. - ALWAYS save partial results — a partial graph is better than no graph. - Report any skipped phases or errors in the final summary so the user knows what happened. - NEVER silently drop errors. Every failure must be visible in the final report. --- ## Reference: KnowledgeGraph Schema ### Node Types (13 total) | Type | Description | ID Convention | |---|---|---| | `file` | Source code file | `file:` | | `function` | Function or method | `function::` | | `class` | Class, interface, or type | `class::` | | `module` | Logical module or package | `module:` | | `concept` | Abstract concept or pattern | `concept:` | | `config` | Configuration file (YAML, JSON, TOML, env) | `config:` | | `document` | Documentation file (Markdown, RST, TXT) | `document:` | | `service` | Deployable service definition (Dockerfile, K8s) | `service:` | | `table` | Database table or migration | `table::` | | `endpoint` | API endpoint or route definition | `endpoint::` | | `pipeline` | CI/CD pipeline configuration | `pipeline:` | | `schema` | Schema definition (GraphQL, Protobuf, Prisma) | `schema:` | | `resource` | Infrastructure resource (Terraform, CloudFormation) | `resource:` | ### Edge Types (26 total) | Category | Types | |---|---| | Structural | `imports`, `exports`, `contains`, `inherits`, `implements` | | Behavioral | `calls`, `subscribes`, `publishes`, `middleware` | | Data flow | `reads_from`, `writes_to`, `transforms`, `validates` | | Dependencies | `depends_on`, `tested_by`, `configures` | | Semantic | `related`, `similar_to` | | Infrastructure | `deploys`, `serves`, `provisions`, `triggers` | | Schema/Data | `migrates`, `documents`, `routes`, `defines_schema` | ### Edge Weight Conventions | Edge Type | Weight | |---|---| | `contains` | 1.0 | | `inherits`, `implements` | 0.9 | | `calls`, `exports`, `defines_schema` | 0.8 | | `imports`, `deploys`, `migrates` | 0.7 | | `depends_on`, `configures`, `triggers` | 0.6 | | `tested_by`, `documents`, `provisions`, `serves`, `routes` | 0.5 | | All others | 0.5 (default) |