--- name: swarm description: >- Dispatches many independent items in parallel: create a table, fan out to subagents, aggregate results. One row = one unit of work. compatibility: >- Requires @langchain/quickjs code interpreter with swarm_task PTC tool metadata: entrypoint: scripts/index.ts required-ptc-tools: swarm_task read_file write_file edit_file glob --- # Swarm Process many independent items in parallel. `create` builds a table handle; `run` fans work out across rows and merges results back. One row = one unit of work — swarm handles batching automatically. ## Flow 1. **Create.** Build a table from a source — files, a glob pattern, or pre-parsed records. One row per item. Returns a handle. 2. **Run.** Dispatch an `instruction` template across rows. Results are merged back into the table. Returns `{ completed, failed, skipped, failures }`. 3. **Aggregate.** Use `rows()` and plain JS to count, filter, or summarize. Do not spawn additional subagents for aggregation. 4. **Retry.** Re-run with `filter: { column: "", exists: false }` to reprocess only failed rows. ## Choosing a source **`glob` / `filePaths`** — one file = one row. Use when each file is an independent unit of work. Each row gets `{ id, file }`; the subagent reads the file itself via the `{file}` placeholder. **`tasks`** — pass pre-built records directly. Use when the data lives inside a file (JSONL, CSV, JSON array). Read and parse the file first inside `eval`, then pass the records. One record = one row — do not group multiple items into a single row. For small files (under ~500 lines), parse and create in one block: ```javascript const { create } = await import("@/skills/swarm"); const raw = await tools.readFile({ file_path: "/data.jsonl" }); const records = raw.trim().split("\n").map(l => JSON.parse(l)); const table = await create({ tasks: records }); console.log(table); ``` For large files, read in chunks of 500 lines to avoid truncation: ```javascript const { create } = await import("@/skills/swarm"); let records = []; let offset = 0; while (true) { const chunk = await tools.readFile({ file_path: "/data.txt", offset, limit: 500 }); const lines = chunk.split("\n").filter(l => l.trim()); for (const l of lines) { records.push({ id: `r${records.length}`, text: l }); } if (lines.length < 500) break; offset += 500; } const table = await create({ tasks: records }); console.log(table); ``` When the file is too large to parse and dispatch in one `eval` call, split across two blocks. Only the block that calls swarm functions needs the import: ```javascript // eval 1: parse only — no swarm import needed const raw = await tools.readFile({ file_path: "/data.jsonl" }); globalThis.records = raw.trim().split("\n").map(l => JSON.parse(l)); console.log(`Parsed ${globalThis.records.length} records`); ``` ```javascript // eval 2: create and dispatch const { create, run } = await import("@/skills/swarm"); const table = await create({ tasks: globalThis.records }); const result = await run(table.id, { instruction: "Classify {text}", responseSchema: { type: "object", properties: { label: { type: "string" } }, required: ["label"], }, }); console.log(result); ``` Passing `filePaths: ["/data.jsonl"]` would produce a table with **one row** pointing at the file — not one row per record inside it. ## When to use `subagentType` Omit `subagentType` for classification, extraction, labeling, and any task where a single model call with structured output is sufficient. This is the default and is significantly cheaper and faster — each dispatch is a direct model call, no tools, no iteration. Set `subagentType` when the task requires tools, file access, or multi-step reasoning. Each dispatch runs a full agentic loop with the named subagent. ```javascript // Direct model call — classification, no tools needed await run(table.id, { instruction: "Classify {text}", responseSchema: { type: "object", properties: { label: { type: "string" } }, required: ["label"] }, }); // Subagent — needs to read files and reason over multiple steps await run(table.id, { subagentType: "reviewer", instruction: "Review {file} for security issues.", responseSchema: { type: "object", properties: { finding: { type: "string" } }, required: ["finding"] }, }); ``` ## Instruction + context `instruction` is a per-item template with `{column}` placeholders. Placeholders are resolved by the framework — your column names appear in prompts as references to the values listed alongside, never as raw template syntax. Subagents do the work — do not process items yourself in JS and write the results into rows. `context` is free-form prose prepended to every subagent prompt. Use it for shared background: domain terms, classification rules, examples, etc. ```javascript const { create, run } = await import("@/skills/swarm"); const table = await create({ glob: "src/**/*.ts" }); const r = await run(table.id, { subagentType: "reviewer", instruction: "Review {file} for security issues. List findings or write 'no issues'.", context: "TypeScript Express backend using Prisma ORM. Focus on injection, auth bypass, path traversal.", responseSchema: { type: "object", properties: { review: { type: "string" } }, required: ["review"], }, }); console.log(r); // → { completed: 45, failed: 2, skipped: 0, failures: [...] } ``` ## Structured output `responseSchema` is required. Schema properties become top-level columns on each row and constrain what subagents can return. ```javascript const { run } = await import("@/skills/swarm"); await run(table.id, { instruction: "Classify: {text}", responseSchema: { type: "object", properties: { sentiment: { type: "string", enum: ["positive", "negative", "neutral"] }, }, required: ["sentiment"], }, }); // Row after: { id: "r1", text: "...", sentiment: "positive" } ``` ## Batching By default, swarm auto-batches to keep total dispatches under 10. For small tables (≤10 rows) each row gets its own subagent call. For larger tables, rows are grouped automatically. Set `batchSize` to control grouping: - **Number** — uniform batch size for all rows. `batchSize: 1` forces per-row dispatch; `batchSize: 20` groups in twenties. - **Function** — `(row, rowCount) => number`. Returns the desired batch size for each row. Rows with the same batch size are grouped together, then chunked. Allows mixed dispatch where some rows go solo and others batch. ```javascript const { create, run } = await import("@/skills/swarm"); const table = await create({ tasks: items }); // Complex items get individual attention; simple ones batch together await run(table.id, { instruction: "Analyze {text}", responseSchema: { type: "object", properties: { analysis: { type: "string" } }, required: ["analysis"], }, batchSize: (row) => (row.token_count > 1000 ? 1 : 10), }); ``` Batch sizes are clamped to [1, 50] after evaluation. ## Aggregation After `run()`, use `rows()` and plain JS — no additional subagents needed. ```javascript const { rows } = await import("@/skills/swarm"); const data = await rows(table.id, { columns: ["sentiment"] }); const counts = {}; data.forEach(r => { counts[r.sentiment] = (counts[r.sentiment] || 0) + 1 }); console.log(counts); // → { positive: 120, negative: 45, neutral: 35 } ``` ## Chaining passes `run` updates the table in place — chain calls to accumulate columns. ```javascript const { create, run } = await import("@/skills/swarm"); const table = await create({ tasks: interviews }); await run(table.id, { instruction: "Classify sentiment of {text}", responseSchema: { type: "object", properties: { sentiment: { type: "string", enum: ["positive", "negative", "neutral"] } }, required: ["sentiment"], }, }); await run(table.id, { filter: { column: "sentiment", equals: "negative" }, instruction: "Summarize why {text} had negative sentiment.", responseSchema: { type: "object", properties: { summary: { type: "string" } }, required: ["summary"], }, }); ``` ## Action-only tasks When subagents perform actions (write a file, apply a fix) rather than return data, use a simple schema with a status or marker field. The `exists: false` filter still works for retries. ```javascript const { create, run } = await import("@/skills/swarm"); const fixedSchema = { type: "object", properties: { fixed: { type: "string" } }, required: ["fixed"], }; const table = await create({ glob: "src/**/*.ts" }); await run(table.id, { subagentType: "fixer", instruction: "Add missing JSDoc to all exported functions in {file}.", responseSchema: fixedSchema, }); // retry any that failed await run(table.id, { subagentType: "fixer", instruction: "Add missing JSDoc to all exported functions in {file}.", responseSchema: fixedSchema, filter: { column: "fixed", exists: false }, }); ``` ## Filtering ```javascript { column: "status", equals: "done" } { column: "status", notEquals: "done" } { column: "category", in: ["A", "B"] } { column: "result", exists: false } // not yet processed { and: [filter1, filter2] } { or: [filter1, filter2] } ``` ## Technical notes - **Only import `@/skills/swarm` in blocks where you call swarm functions.** Data preparation (reading files, parsing, storing in `globalThis`) does not need the import. Destructure only what you use: `{ create }`, `{ run }`, `{ create, run }`, etc. - **Console output is capped at ~5 KB.** Never log raw file contents — log only counts and short samples. - **`readFile` inside `eval` returns raw content — no line-number prefixes.** Request at most 500 lines per call. For files with more than 500 lines, loop with incrementing `offset`. - **When building a table from a file, read it inside `eval`.** Data read inside the sandbox stays there; it never enters the agent's context window. - **Never write to `.swarm/` directly.** Always use `create()`. - **Everything the subagent needs must be in `instruction` + `context`.** Subagents can't see the agent's context. - **Row ids must be unique.** `create()` rejects sources that produce duplicate ids. For `tasks`, that's a caller-side responsibility; for `glob` / `filePaths`, ids are auto-disambiguated by parent directory. - **Unknown columns fail fast.** If `instruction` references `{foo}` and no matched row provides `foo`, `run()` throws before any subagent is dispatched. ## API Reference ### `create(source)` Create a table. Returns a handle `{ id, count, columns }`. | Source | Description | |--------|------------| | `{ glob: "src/**/*.ts" }` or `{ glob: ["src/**/*.ts", "lib/**/*.ts"] }` | Match files by one or more patterns. Columns: `id`, `file` | | `{ filePaths: ["a.ts", "b.ts"] }` | Explicit file list. Columns: `id`, `file` | | `{ tasks: [{ id: "t1", text: "..." }] }` | Custom rows. Each must have `id` | ### `run(tableId, options)` Dispatch work across rows. Returns `{ completed, failed, skipped, failures }`. | Option | Default | Description | |--------|---------|------------| | `instruction` | (required) | Template with `{column}` placeholders | | `responseSchema` | (required) | JSON Schema (`type: "object"`) — properties become row columns | | `context` | — | Prose prepended to every subagent prompt | | `filter` | — | Only dispatch matching rows | | `subagentType` | — | Name of subagent to dispatch to. When set, runs a full agentic loop. When omitted, runs a direct model call | | `batchSize` | auto | Number or `(row, rowCount) => number`. Auto caps dispatches at 10; `1` = per-row; function = per-row sizing | | `concurrency` | `10` | Max concurrent subagent dispatches (clamped to 1–10) | ### `rows(tableId, options?)` Retrieve rows. Use for inspection and JS-based aggregation. | Option | Description | |--------|------------| | `filter` | Only return matching rows | | `columns` | Project to specific columns | | `limit` | Max rows returned |