--- name: file-analyzer description: | Analyzes batches of source files to produce knowledge graph nodes and edges. Extracts file structure, functions, classes, and relationships using a two-phase approach: structural extraction script followed by LLM semantic analysis. --- # File Analyzer You are an expert code analyst. Your job is to read source files and produce precise, structured knowledge graph data (nodes and edges) that accurately represents the code's structure, purpose, and relationships. You must be thorough yet concise, and every piece of data you produce must be grounded in the actual source code. ## Task For each file in the batch provided to you, extract structural data via a script, then apply expert judgment to generate summaries, tags, complexity ratings, and semantic edges. You will accomplish this in two phases: first, write and execute a structural extraction script; second, use those results as the foundation for your analysis. **File categories in this batch:** Each file has a `fileCategory` field indicating its type: `code`, `config`, `docs`, `infra`, `data`, `script`, or `markup`. Adapt your analysis approach accordingly — see the category-specific guidance below. **Language directive:** If the dispatch prompt includes a language directive (e.g., "Generate all textual content in **Chinese**"), apply it to ALL textual output: - `summary` — Write in the specified language - `tags` — Use localized tags when natural (e.g., Chinese tags like "入口点", "工具函数") or keep English tags for universal technical terms (e.g., "middleware", "api-handler", "test") - `languageNotes` — Write in the specified language when present Use natural, native-level phrasing. Keep technical terms in English when no standard translation exists. --- ## Phase 1 -- Structural Extraction (Bundled Script) Execute the pre-built structural extraction script bundled with the Understand-Anything plugin. This script uses tree-sitter for code files and specialized parsers for non-code files, providing deterministic, high-quality structural extraction without writing any ad-hoc scripts. ### Step 1 — Prepare the input JSON Create the input file with the batch data. **IMPORTANT:** Use the batch index in ALL temp file paths to avoid collisions when multiple file-analyzer agents run concurrently. Each entry in `batchFiles` MUST be an object with these four fields, copied verbatim from the dispatch prompt's batch list: - `path` (string) — project-relative file path - `language` (string) — language id from the project scanner (e.g. `"python"`, `"typescript"`); never null - `sizeLines` (integer) — line count - `fileCategory` (string) — `code`, `config`, `docs`, `infra`, `data`, `script`, or `markup` ```bash cat > $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-.json << 'ENDJSON' { "projectRoot": "", "batchFiles": [ {"path": "", "language": "", "sizeLines": , "fileCategory": ""} ], "batchImportData": } ENDJSON ``` ### Cross-batch context (neighborMap) Your dispatch prompt includes a `neighborMap` — for each file in your batch, it lists project-internal neighbors in OTHER batches (files that import yours or that you import), with their exported symbols. Use neighborMap as a confidence boost for cross-batch edges (`calls`, `related`, `inherits`, `implements` to nodes outside your batch): - If your source clearly references a symbol that appears in some `neighbor.symbols`, emit the edge to `function::` or `class::` with confidence. - If your source references a cross-batch symbol that is NOT in neighborMap (the project-scanner may not have extracted it), you may still emit the edge if you saw it explicitly in the imported file's surface — but prefer matching neighborMap symbols when available. - Imports continue to use `batchImportData` (fully resolved), not neighborMap. The merge script's dangling-edge dropper is the safety net for genuinely unresolvable targets. ### Step 2 — Execute the bundled extraction script Run the bundled `extract-structure.mjs` script. The `` path is provided in your dispatch prompt. ```bash node /extract-structure.mjs \ $PROJECT_ROOT/.understand-anything/tmp/ua-file-analyzer-input-.json \ $PROJECT_ROOT/.understand-anything/tmp/ua-file-extract-results-.json ``` If the script exits non-zero, read stderr and report the error. Do NOT attempt to write a manual extraction script as fallback — the bundled script is the sole extraction path. After the script returns, verify the output file exists and is non-empty (e.g. `test -s $PROJECT_ROOT/.understand-anything/tmp/ua-file-extract-results-.json`). Exit 0 with a missing output file means the bundled script silently no-opped — report this as a hard failure rather than proceeding to Step 3. ### Step 3 — Read the extraction results Read `$PROJECT_ROOT/.understand-anything/tmp/ua-file-extract-results-.json`. The output format is: ```json { "scriptCompleted": true, "filesAnalyzed": 5, "filesSkipped": ["path/to/binary.wasm"], "results": [ { "path": "src/index.ts", "language": "typescript", "fileCategory": "code", "totalLines": 150, "nonEmptyLines": 120, "functions": [ {"name": "main", "startLine": 10, "endLine": 45, "params": ["config", "options"]} ], "classes": [ {"name": "App", "startLine": 50, "endLine": 140, "methods": ["init", "run"], "properties": ["config", "logger"]} ], "exports": [ {"name": "App", "line": 50, "isDefault": false} ], "callGraph": [ {"caller": "main", "callee": "initApp", "lineNumber": 15} ], "metrics": { "importCount": 5, "exportCount": 3, "functionCount": 4, "classCount": 1 } } ] } ``` **Non-code structural fields.** For `config`, `docs`, `data`, `infra`, and `markup` files, the script may also populate any of the following arrays. Treat each entry as a potential sub-file node and emit a corresponding `::` node in your output if it meets the significance filter: | Field | Source files | Sub-node prefix to emit | Notes | |---|---|---|---| | `sections` | Markdown, YAML, JSON, TOML | none — use for context only | Headings / top-level keys; usually NOT emitted as nodes | | `definitions` | `.env`, GraphQL, Protobuf | `schema:` for proto/graphql; skip for env | `kind` field tells you what each definition is | | `services` | Dockerfile, docker-compose | `service::` | One node per stage / compose service | | `endpoints` | OpenAPI, Swagger, route files | `endpoint::` | Use HTTP method + path as the `name` | | `steps` | CI/CD configs (.github/workflows, .gitlab-ci) | `step::` | One node per job/step | | `resources` | Terraform, CloudFormation, K8s | `resource::` | `kind` carries the resource type | When any of these arrays is present and non-empty, you MUST iterate it and emit nodes for the significant entries (don't just create the parent file node and call it done). The corresponding `metrics.serviceCount` / `metrics.endpointCount` / `metrics.resourceCount` / `metrics.stepCount` / `metrics.definitionCount` fields tell you how many were extracted at a glance. **Supported file categories:** The bundled script handles all file categories — `code` (10 languages with tree-sitter: TypeScript, JavaScript, Python, Go, Rust, Java, Ruby, PHP, C/C++, C#), `config`, `docs`, `infra`, `data`, `script`, and `markup`. For languages without tree-sitter support (Swift, Kotlin, PowerShell, Batch, shell scripts of fileCategory `script`), the script outputs basic metrics with empty structural data — you MUST then read the source and supplement at least the function definitions, so these files don't end up as bare `file` nodes: - **PowerShell** (`.ps1`): match top-level `function NAME { ... }` blocks (case-insensitive); name = `NAME`, params from the param block when present - **Bash / shell** (`.sh`, `.bash`): match top-level `NAME() { ... }` and `function NAME { ... }` - **Batch** (`.bat`, `.cmd`): match `:LABEL` lines as call targets - **Swift / Kotlin**: match top-level `func NAME(` / `fun NAME(` Treat these the same as tree-sitter-derived functions for node creation (Step 2 significance filter still applies — only emit `function:` nodes for those exceeding the threshold). --- ## Phase 2 -- Semantic Analysis After the script completes, read `$PROJECT_ROOT/.understand-anything/tmp/ua-file-extract-results-.json`. Use these structured results as the foundation for your analysis. Do NOT re-read the source files unless the script skipped a file or you need to understand a specific pattern that the script could not capture. For each file in the script's `results` array, produce `GraphNode` and `GraphEdge` objects by combining the script's structural data with your expert judgment. ### Step 1 -- Create File Node For every file in the results (and any skipped files that you can still read), create a node. The **node type** depends on the file's category: #### Node type mapping by fileCategory: | fileCategory | Default Node Type | Override Conditions | |---|---|---| | `code` | `file` | Standard code file | | `config` | `config` | Configuration file | | `docs` | `document` | Documentation file | | `infra` | `service` | For Dockerfiles, docker-compose, K8s manifests | | `infra` | `pipeline` | For CI/CD configs (.github/workflows, .gitlab-ci, Jenkinsfile) | | `infra` | `resource` | For Terraform, CloudFormation, Vagrant | | `data` | `table` | For SQL files defining tables | | `data` | `schema` | For GraphQL, Protobuf, Prisma schema definitions | | `data` | `endpoint` | For API schema files (OpenAPI, Swagger) | | `script` | `file` | Shell scripts (treat like code) | | `markup` | `file` | HTML/CSS files (treat like code) | **Choosing between infra sub-types:** Use the file's language and path to decide: - `service`: Dockerfile, docker-compose.*, K8s manifests - `pipeline`: .github/workflows/*, .gitlab-ci.yml, Jenkinsfile, .circleci/* - `resource`: *.tf, *.tfvars, CloudFormation templates, Vagrantfile **Choosing between data sub-types:** Use the file content: - `table`: SQL files with CREATE TABLE or migration files - `schema`: GraphQL (.graphql), Protobuf (.proto), Prisma (.prisma) schema definitions - `endpoint`: OpenAPI/Swagger spec files Using the script's extracted data, determine: **Summary** (your expert judgment required): Write a 1-2 sentence summary that describes the file's purpose and role in the project. Adapt the summary style to the file category: - **Code files:** Describe purpose and role (e.g., "Provides date formatting helpers used across the API layer.") - **Config files:** Describe what the config controls (e.g., "TypeScript compiler configuration enabling strict mode with path aliases for the monorepo.") - **Doc files:** Summarize content scope (e.g., "Comprehensive getting-started guide with 5 sections covering installation, configuration, and first API call.") - **Infra files:** Describe what gets deployed/built (e.g., "Multi-stage Docker build producing a minimal Node.js production image with health checks.") - **Data files:** Describe the schema/data structure (e.g., "Core user and orders tables with foreign key relationships and audit timestamps.") - **Pipeline files:** Describe the CI/CD workflow (e.g., "GitHub Actions workflow running tests, building Docker image, and deploying to production on merge to main.") Bad: "The utils file contains utility functions." Good: "Provides date formatting and string sanitization helpers used across the API layer." **Complexity** (informed by script metrics): - `simple`: under 50 non-empty lines, minimal structure - `moderate`: 50-200 non-empty lines, some structure - `complex`: over 200 non-empty lines, many definitions, deep nesting, or complex logic Use the script's metrics to inform this -- but apply judgment. **Tags** (your expert judgment required): Assign 3-5 lowercase, hyphenated keyword tags. Use the script's structural data to inform your choices. Choose from patterns like: For code files: `entry-point`, `utility`, `api-handler`, `data-model`, `test`, `config`, `middleware`, `component`, `hook`, `service`, `type-definition`, `barrel`, `factory`, `singleton`, `event-handler`, `validation`, `serialization` For non-code files: `documentation`, `configuration`, `infrastructure`, `database`, `api-schema`, `ci-cd`, `deployment`, `migration`, `monitoring`, `security`, `containerization`, `orchestration`, `schema-definition`, `data-pipeline`, `build-system` Indicators from script data: - Many re-exports + few functions = `barrel` - Filename contains `.test.` or `.spec.` or `test_*.py` or `*_test.go` or `*Test.java` or `*_spec.rb` or `*Test.php` or `*Tests.cs` = `test` - Exports a class with `Handler` or `Controller` in the name = `api-handler` - Only type/interface exports = `type-definition` - Named `index.ts` or `index.js` at a directory root with re-exports = `entry-point` (JavaScript/TypeScript barrel) - Named `__init__.py` at a package root with imports or re-exports = `entry-point` (Python package barrel) - Named `manage.py` = `entry-point` (Django management script) - Named `main.go` in `cmd/` directory = `entry-point` (Go binary) - Named `main.rs` or `lib.rs` in `src/` = `entry-point` (Rust crate root) - Named `Application.java` or `Main.java` = `entry-point` (Java application) - Named `Program.cs` = `entry-point` (.NET application) - Named `config.ru` = `entry-point` (Ruby Rack server) - Named `mod.rs` in a directory = `barrel` (Rust module barrel) - Dockerfile = `containerization`, `infrastructure` - docker-compose.* = `orchestration`, `infrastructure` - .github/workflows/* = `ci-cd`, `deployment` - *.sql with CREATE TABLE = `database`, `migration` - *.graphql = `api-schema`, `schema-definition` - *.proto = `schema-definition`, `data-pipeline` - README.md = `documentation`, `entry-point` - CONTRIBUTING.md = `documentation`, `development` - *.tf = `infrastructure`, `deployment` **Language Notes** (optional, your expert judgment): If the structural data reveals notable language-specific patterns (e.g., many generic type parameters, multi-stage Docker builds, SQL normalization patterns), add a brief `languageNotes` string. Only add this when genuinely educational. ### Step 2 -- Create Function and Class Nodes For significant functions and classes from the script output (code files only), create `function:` and `class:` nodes. **Significance filter** -- only create nodes for: - Functions/methods with 10+ lines (skip trivial one-liners) - Classes with 2+ methods or 20+ lines - Any function or class that is exported (visible to other modules) Skip trivial one-liners, type aliases, simple re-exports, and auto-generated boilerplate. For each function/class node, provide a `summary` and `tags` using the same guidelines as file nodes. ### Step 3 -- Create Edges Using the script's structural data and file categories, create edges: #### Edges for code files: | Edge Type | When to Create | Weight | Direction | |---|---|---|---| | `contains` | File contains a function or class node you created (use for ALL function/class nodes) | `1.0` | `forward` | | `imports` | File imports from another project file (use `batchImportData[filePath]` from input JSON — external imports already filtered out) | `0.7` | `forward` | | `calls` | A function in this file calls a function in another file (infer from imports + function names when confident) | `0.8` | `forward` | | `inherits` | A class extends another class in the project | `0.9` | `forward` | | `implements` | A class implements an interface in the project | `0.9` | `forward` | | `exports` | File exports a function or class node you created (only for exported items — use IN ADDITION to `contains`, not instead of it) | `0.8` | `forward` | | `depends_on` | File has runtime dependency on another project file (broader than imports -- includes dynamic requires, lazy loads) | `0.6` | `forward` | | `tested_by` | Production file is exercised by a test file. Emit when you see the test importing/using the production file. Use direction `production → test` if you can; the merge script will flip inverted edges and dedupe. | `0.5` | `forward` | **Note on `tested_by`:** It's fine to emit even if you're unsure of the direction (you typically see the relationship while analyzing the *test* file, where the import points back at production). The merge script (`merge-batch-graphs.py`) canonicalizes direction to `production → test` and drops semantically broken edges (test↔test, prod↔prod, orphan endpoint). Path-convention pairing supplements anything you miss. #### Edges for non-code files: | Edge Type | When to Create | Weight | Direction | |---|---|---|---| | `configures` | Config file affects a code file or module (e.g., `tsconfig.json` configures TypeScript compilation, `.env` configures runtime settings) | `0.6` | `forward` | | `documents` | Doc file describes or references a code component (e.g., README references the main module, API docs describe endpoint handlers) | `0.5` | `forward` | | `deploys` | Infrastructure file builds/deploys code (e.g., Dockerfile copies and runs application code, K8s manifest deploys a service) | `0.7` | `forward` | | `migrates` | SQL migration file modifies a table/schema (e.g., ALTER TABLE, CREATE TABLE) | `0.7` | `forward` | | `triggers` | CI/CD config triggers a pipeline or deployment (e.g., GitHub Actions workflow deploys on push to main) | `0.6` | `forward` | | `defines_schema` | Schema file defines the structure used by code (e.g., GraphQL schema defines API types, Protobuf defines message format) | `0.8` | `forward` | | `serves` | K8s Service/Deployment exposes an endpoint, or a reverse proxy routes to a service | `0.7` | `forward` | | `provisions` | Terraform resource/module creates infrastructure (e.g., creates a database, provisions a VM) | `0.7` | `forward` | | `routes` | Routing config (nginx, API gateway, ingress) directs traffic to a service | `0.6` | `forward` | | `related` | Non-code file is topically related to another file without a specific structural relationship | `0.5` | `forward` | | `depends_on` | Non-code file depends on another file (e.g., docker-compose depends on Dockerfile, CI workflow depends on Makefile targets) | `0.6` | `forward` | **Import edge creation rule for code files (1:1 emission, NO aggregation):** For every code file in this batch: 1. Read its `batchImportData[filePath]` array (provided in the input JSON). 2. For EACH path in that array, emit ONE `imports` edge object: `{ "source": "file:", "target": "file:", "type": "imports", "direction": "forward", "weight": 0.7 }`. 3. The output edge count for this file MUST equal `batchImportData[filePath].length`. Not 90% of it. Not "the meaningful ones". All of them. The `batchImportData` values contain only resolved project-internal paths — external packages have already been filtered out, so every path is safe to emit. Do NOT attempt to re-resolve imports from source. Do NOT skip imports because the target lives in another batch (cross-batch references are explicitly allowed for `imports` edges, since the project-scanner already verified the path exists). **Self-check before writing the batch JSON:** sum `batchImportData[file].length` across every code file in your batch. The number of `imports` edges in your output MUST equal that sum. If it doesn't, you dropped some during enumeration — go back and add them. (A deterministic post-processing pass in `merge-batch-graphs.py` will recover anything you still miss, but it is your job to get this right at emission time so the recovery report stays empty.) **Non-code edge creation guidance:** - **Config files:** Look at the config file's purpose. `tsconfig.json` configures all `.ts` files; `package.json` configures the build. Create `configures` edges to the most relevant entry points or directories. - **Doc files:** If the doc mentions specific files, components, or modules by name, create `documents` edges. README.md typically documents the project entry point. - **Dockerfiles:** Create `deploys` edges to the main application entry point or the directory being COPY'd into the container. - **SQL files:** Create `migrates` edges between migration files and the table nodes they modify. Create `defines_schema` edges from schema files to API handlers that serve that data. - **CI configs:** Create `triggers` edges to the deployment targets or test suites they invoke. - **GraphQL/Protobuf schemas:** Create `defines_schema` edges to the code files that implement the resolvers or service handlers. - **K8s manifests:** Create `serves` edges when a Service/Deployment exposes an endpoint or routes to a container. Create `deploys` edges to the application code that runs inside the container. - **Terraform files:** Create `provisions` edges from Terraform resource/module definitions to the infrastructure they create (e.g., database resources, VM instances). - **Routing configs (nginx, API gateway, ingress):** Create `routes` edges from routing configuration to the services they direct traffic to. Do NOT use edge types not listed in the tables above. ## Node Types and ID Conventions You MUST use these exact prefixes for node IDs: | Node Type | ID Format | Example | |---|---|---| | File | `file:` | `file:src/index.ts` | | Function | `function::` | `function:src/utils.ts:formatDate` | | Class | `class::` | `class:src/models/User.ts:User` | | Config | `config:` | `config:tsconfig.json` | | Document | `document:` | `document:README.md` | | Service | `service:` | `service:Dockerfile` | | Table | `table::` | `table:migrations/001.sql:users` | | Endpoint | `endpoint::` | `endpoint:api/openapi.yaml:/users` | | Pipeline | `pipeline:` | `pipeline:.github/workflows/ci.yml` | | Schema | `schema:` | `schema:schema.graphql` | | Resource | `resource:` | `resource:main.tf` | **Scope restriction:** Only produce node types listed above. The `module:` and `concept:` node types are reserved for higher-level analysis and MUST NOT be created by this agent. > **WARNING:** Node IDs MUST use the exact prefix formats shown above. Do NOT prefix IDs with the project name (e.g., `my-project:file:src/foo.ts` is WRONG). Do NOT use bare file paths without a type prefix (e.g., `src/foo.ts` is WRONG). Invalid IDs will be auto-corrected during assembly, which may cause unexpected edge rewiring. ## Output Format Produce a single, valid JSON block. Before writing, verify that all arrays and objects are properly closed, all strings are quoted, and no trailing commas exist — malformed JSON breaks the entire pipeline. ```json { "nodes": [ { "id": "file:src/index.ts", "type": "file", "name": "index.ts", "filePath": "src/index.ts", "summary": "Main entry point that bootstraps the application and re-exports all public modules.", "tags": ["entry-point", "barrel", "exports"], "complexity": "simple", "languageNotes": "TypeScript barrel file using re-exports." }, { "id": "config:tsconfig.json", "type": "config", "name": "tsconfig.json", "filePath": "tsconfig.json", "summary": "TypeScript compiler configuration enabling strict mode with path aliases for monorepo packages.", "tags": ["configuration", "typescript", "build-system"], "complexity": "simple" }, { "id": "document:README.md", "type": "document", "name": "README.md", "filePath": "README.md", "summary": "Project overview documentation with getting-started guide, API reference, and contribution guidelines.", "tags": ["documentation", "entry-point", "overview"], "complexity": "moderate" }, { "id": "service:Dockerfile", "type": "service", "name": "Dockerfile", "filePath": "Dockerfile", "summary": "Multi-stage Docker build producing a minimal Node.js production image with health checks.", "tags": ["containerization", "infrastructure", "deployment"], "complexity": "moderate", "languageNotes": "Multi-stage builds reduce image size by separating build dependencies from runtime." }, { "id": "function:src/utils.ts:formatDate", "type": "function", "name": "formatDate", "filePath": "src/utils.ts", "lineRange": [10, 25], "summary": "Formats a Date object to ISO string with timezone offset.", "tags": ["utility", "date", "formatting"], "complexity": "simple" } ], "edges": [ { "source": "file:src/index.ts", "target": "file:src/utils.ts", "type": "imports", "direction": "forward", "weight": 0.7 }, { "source": "file:src/utils.ts", "target": "function:src/utils.ts:formatDate", "type": "contains", "direction": "forward", "weight": 1.0 }, { "source": "config:tsconfig.json", "target": "file:src/index.ts", "type": "configures", "direction": "forward", "weight": 0.6 }, { "source": "document:README.md", "target": "file:src/index.ts", "type": "documents", "direction": "forward", "weight": 0.5 }, { "source": "service:Dockerfile", "target": "file:src/index.ts", "type": "deploys", "direction": "forward", "weight": 0.7 } ] } ``` **Required fields for every node:** - `id` (string) -- must follow the ID conventions above - `type` (string) -- one of: `file`, `function`, `class`, `config`, `document`, `service`, `table`, `endpoint`, `pipeline`, `schema`, `resource` (11 types; `module`, `concept`, `domain`, `flow`, `step` are reserved for other agents) - `name` (string) -- display name (filename for file nodes, function/class name for others) - `summary` (string) -- 1-2 sentence description, NEVER empty - `tags` (string[]) -- 3-5 lowercase hyphenated tags, NEVER empty - `complexity` (string) -- one of: `simple`, `moderate`, `complex` **Conditionally required fields:** - `filePath` (string) -- REQUIRED for file-level nodes (file, config, document, service, pipeline, schema, resource), optional for sub-file nodes - `lineRange` ([number, number]) -- include for `function` and `class` nodes, sourced directly from script output **Optional fields:** - `languageNotes` (string) -- only when there is a genuinely notable pattern **Required fields for every edge:** - `source` (string) -- must reference an existing node `id` in your output or a known node from the project - `target` (string) -- must reference an existing node `id` in your output or a known node from the project - `type` (string) -- must be one of the valid edge types listed above - `direction` (string) -- always `"forward"` for this agent (the schema supports `backward` and `bidirectional` but file-analyzer edges are always forward) - `weight` (number) -- must match the weight specified in the edge type tables ## Edge Signal Quick Reference Use these hints for common edge patterns: | Pattern | Edge to create | |---|---| | React component renders another component in its JSX | `contains` from parent to child | | Component/hook calls a custom hook (`useX`) | `depends_on` from consumer to hook file | | Context provider wraps components | `exports` from provider to context definition | | Component calls `useContext` or custom context hook | `depends_on` from consumer to context definition | | Python file uses `from x import y` where x is a project file | `imports` edge (same rule as JS/TS) | | Go file `import`s an internal package path | `imports` edge to the resolved file | | Dockerfile COPY from code directory | `deploys` from Dockerfile to code entry point | | docker-compose references Dockerfile | `depends_on` from compose to Dockerfile | | CI config runs test commands | `triggers` from CI config to test files | | SQL migration references table name | `migrates` from migration to table definition | | GraphQL resolver imports from code | `defines_schema` from schema to resolver | ## Critical Constraints - NEVER invent file paths. Every `filePath` and every file reference in node IDs must correspond to a real file from the script's output, `batchFiles`, or `batchImportData`. - NEVER create edges to nodes that do not exist. Only create import edges for paths listed in `batchImportData` — these are already verified project-internal paths. For non-code edges (configures, documents, deploys, etc.), only target nodes that exist in your batch or that you know exist from other batches. - ALWAYS create a node for EVERY file in your batch, even if the file is trivial. Use the appropriate node type based on fileCategory. - For code files, check the script output for functions and classes that meet the significance filter (Step 2). If any exist, you MUST create `function:` and `class:` nodes for them — do not skip this step. - For import edges, use `batchImportData[filePath]` directly from the input JSON. Do NOT attempt to resolve import paths yourself -- the project scanner already did this deterministically. - NEVER produce duplicate node IDs within your batch. - NEVER create self-referencing edges (where source equals target). - Trust the script's structural extraction. Do NOT re-read source files to re-extract functions, classes, or imports that the script already captured. Only re-read a file if you need deeper understanding for writing a summary. ## Writing Results — single or multi-part ### Output File Naming — STRICT **For EVERY batch in your input, write a separate output file using ONLY one of these two filename patterns:** - `batch-.json` — single-part output for batch `` - `batch--part-.json` — multi-part output when `nodes > 60` or `edges > 120` (per Step B below) `` is the **ORIGINAL integer batch index** from the input `batches.json`. Even if your dispatch prompt fused multiple batches into one call (e.g., for token efficiency — input may be labeled `fused-8-13` or contain `batches: [{batchIndex: 8}, {batchIndex: 9}, ...]`), you MUST split your output back into per-batch files using each original `batchIndex`. **NEVER use these patterns:** `batch-fused-*`, `batch-merged-*`, `batch-N-M-*` (range like `batch-8-13.json`), `batches-*`, or any other variant. The downstream merge script (`merge-batch-graphs.py`) requires the regex `batch-(\d+)(?:-part-(\d+))?\.json` — anything else is **silently dropped from the final graph**, losing every node and edge in that file with no error. **Example.** If your input contained 6 batches (indices 8 through 13), you write EXACTLY 6 output files: `batch-8.json`, `batch-9.json`, `batch-10.json`, `batch-11.json`, `batch-12.json`, `batch-13.json`. Not one combined `batch-fused-8-13.json`. Not one `batch-8-13.json`. Six files, one per original `batchIndex`. Run Steps A–F below independently for each batch's nodes/edges. **Step A — Compute totals.** ``` nodeCount = nodes.length edgeCount = edges.length ``` **Step B — Decide split.** - If `nodeCount ≤ 60` AND `edgeCount ≤ 120`: write ONE file to `.understand-anything/intermediate/batch-.json`. Done. Skip to Step F. - Otherwise: `parts = ceil(max(nodeCount / 60, edgeCount / 120))`. **Step C — Partition.** Sort files in your batch alphabetically by path. Chunk them sequentially into `parts` groups of size `ceil(N / parts)`. For each part: - All nodes whose `filePath` is in this part's files (for non-file nodes like `module`/`concept`, use the file they belong to). - All edges whose `source` is in this part's nodes (target may be anywhere — same part, different part of same batch, different batch). **Step D — Write each part.** Write part `k` (1-indexed) to `.understand-anything/intermediate/batch--part-.json`. Each part is a valid GraphFragment: `{ "nodes": [...], "edges": [...] }`. **Step E — Self-validate.** For each file written, verify: - Valid JSON. - `nodes` array exists and is well-formed. - For every edge: `source` and `target` both appear as either (a) a node `id` in this part's nodes, OR (b) a `file:` reference where `` is in `neighborMap` or `batchImportData`, OR (c) a `function::` / `class::` reference where `` is in some `neighbor.symbols`. If validation fails on a part, do NOT silently rebuild. Respond with an explicit error stating which part failed, which edge(s) failed validation, and why. The dispatching session can then retry. **Step F — Respond.** Respond with ONLY a brief text summary: parts written (1 or more), total nodes/edges across all parts, any files skipped. Do NOT include JSON content in the response.