# @khoralabs/agent-capabilities

**Composable toolkits + policies → deterministic SHA-256 fingerprints** for static tool definitions and for the effective tool set at evaluation time—so you can correlate behavior with a **versioned capability snapshot** (logs, evals, storage).

## What it does

- **Composable graph**: `tool`, `toolkit`, `dynamicToolkit`; evaluate with `ToolkitContext` (`env`, optional `namespace` / `agentId` / `agentName`, optional `pipelineHooks` / `inheritedPipelineHooks`).
- **Pipeline hooks** (not part of static hashes): `onPolicyEvaluated` / `onToolExecuted` via `mergeToolPipelineHooks`. Three levels — `hooks` on `toolkit` / `tool`, plus `ToolkitContext.pipelineHooks` (runtime). Typical merge order: ancestor toolkit → tool → runtime. Member tool policies are usually evaluated once at the parent toolkit (deduped); leaf `tool` hooks for policy run when that tool evaluates a policy not already in the shared `PolicyResultMap`.
- **Policies**: async gates that prune tools at runtime; policies dedupe by object identity.
- **Template capabilities (`staticHash` on a registered agent)**: hash of the **root composable** plus **agent-level instruction lines** from `createRegisteredAgent` — the agent *definition* you ship. `staticContext` is **not** part of this hash; keep default merged context out of the template fingerprint.
- **Capability runtime (`runtimeHash`)**: hash of **enabled tools only**, after policies (sorted by tool name). Differs from the template when policy or environment changes which tools are in play.
- **Invocation binding (optional `invocationHash` on an `CapabilityLink`)**: a separate SHA-256 over a **host-normalized** plain object (e.g. `subjectId`, `personaSlug`, policy bundle id) via `computeInvocationContextHash` / `createCapabilityLink` — the *run* or *tenant* slice without stuffing those fields into `staticInstructions` just to change hashes. Omit when you do not need binding-level lineage.
- **Zero runtime dependencies** (`dependencies` is empty). **[Standard Schema](https://standardschema.dev)** `inputSchema`; hashed canonically — see [standard-schema guide](../../docs/standard-schema.md) and [hashing appendix](../../docs/hashing.md).

This is **not** end-user authentication. `agentId` / `name` on `RegisteredAgent` are **your** labels for telemetry or storage.

## When to use it

- Tool lists change by **environment**, **feature flags**, or **deploys** — you need to know **which snapshot** ran (e.g. assistant gets different tools in staging vs prod).
- **Policies** gate tools — you need **runtime** capabilities, not only static.
- You want **stable ids** for dashboards, evals, or logs without ad hoc versioning.
- **Before/after** changing a tool’s schema or instructions — static hashes shift; use `diffToolRefs` / canonical payloads to compare.

**When not to:** you only need a single fixed tool list forever and never compare runs—skip this and use your framework’s tools directly.

**Out of scope:** your database adapter, threads, transports. This package defines the **persistence contract** (`AgentCapabilitiesPersistence`, Smithy service) and a `:memory:` reference implementation; you implement the same interface for your production store (SQL, document DB, object storage metadata, etc.).

## Quick example

Full pipeline (matches how many apps record one evaluation):

```ts
import {
  computeRuntimeCapabilitiesFromEvaluation,
  toolkit,
  tool,
} from "@khoralabs/agent-capabilities";

const search = tool({
  name: "search",
  inputSchema: yourStandardSchema,
  instructions: "…",
  handler: async () => {},
});

const root = toolkit([search], { name: "my-agent-tools" });

const { runtimeHash, toolRefs, evaluatedTools, nameToStaticHash } =
  await computeRuntimeCapabilitiesFromEvaluation(root, {
    env: { userTier: "pro" },
  });
// Build a CapabilityLink (optional invocation):
//   await createCapabilityLink({ agent, enabledToolNames: Object.keys(evaluatedTools),
//     nameToStaticHash, tools: evaluatedTools, invocationContext: { subjectId: "…" } });
// Or use computeFullCapabilityLink({ agent, ctx, invocationContext: { … } }).
```

Lower-level pieces: `collectToolStaticHashes(root)` → map of tool name → leaf hash; `evaluateComposable(root, ctx)` → tools; then `computeRuntimeHash(enabledNames, map, tools)` or `resolveRuntimeToolRefs(...)`.

More runnable scripts under `examples/` (see below). For Vercel AI SDK, use [`@khoralabs/agent-capabilities-ai-sdk`](../capabilities-ai-sdk).

## Declarative agents and sessions for implementors

**Single declaration.** Treat **`RegisteredAgent`** (from `createRegisteredAgent`) plus **`register(agent, { hooks, ctx, run })`** as one declaration of (1) *who* the agent is—root composable, static instructions, static context—and (2) *how* sessions are wired: optional **hooks**, **context** layers (`ctx`), and the **`run`** function. Registration is data-shaped; you are not reimplementing evaluation or the session machine.

**One orchestration implementation.** For a product, the only required **orchestration** at the session layer is a **`SessionRunner`**: implement **`run`** as `({ agent, input, context }) => output`. Everything else there is optional: **hooks** for cross-cutting behavior and **`ctx`** for merged static context and async resolvers. Session hooks wrap **one** invocation of `run`; they do not replace it.

**Attribution and telemetry.** See the [attribution and telemetry guide](../../docs/attribution-telemetry.md) for hook layers, the per-turn persist recipe, and `invocationContext` vs `sessionContext` vs merged `SessionContext`.

**Two hook layers** — bind functions to the right layer so “hooks” does not mean “rewrite the tool loop”:

1. **Toolkit pipeline hooks** — `onPolicyEvaluated` / `onToolExecuted`, merged via `mergeToolPipelineHooks`, on **`toolkit` / `tool`** definitions and optionally **`ToolkitContext.pipelineHooks`**. These run **inside** composable evaluation while policies and tools execute. Use for telemetry or side effects around policy/tool execution, not for substituting your own evaluation loop.

2. **Session hooks** — `onStart`, `onBeforeContext`, `onAfterContext`, `onBeforeRun`, `onAfterRun`, `onError` on **`register`** / **`createSession`**, or chained on the returned **`AgentSession`**. These run **around** building `SessionContext` and calling **`run`**. Use for session lifecycle, logging, or injecting fields before your runner evaluates affordances (e.g. building a `ToolkitContext` inside `run` or `onBeforeRun`).

**Session API.** Call **`createSession(agentId)`** with the same string **`agentId`** you used at register time, then **`start(input)`**. Optional per-session overrides use the same `{ hooks, ctx, run }` shape.

**Session lifecycle (`start` order):** `onStart` → `onBeforeContext` (agent + input only) → merge `ctx` into `context` → `onAfterContext` → `onBeforeRun` → `run` → `onAfterRun` or `onError`. Use `onBeforeContext` for early setup; use `onAfterRun` for attribution (`recordTurnAttribution`) after capture inside `run`.

**Optional “one declarative blob” later.** A small factory or type that bundles **`RegisteredAgent`** with default **`RegisterAgentOptions`** is only sugar on top of **`register`**; it does not change semantics.

## API overview

Grouped by role; full exports (including types like `ToolSpec`, `Composable`, `CapabilityLink`) are in [`src/index.ts`](src/index.ts).

### Composables and evaluation

- `tool` / `toolkit` / `dynamicToolkit`
- `evaluateComposable(composable, ctx)`
- `policy(id, evaluate, { executeBinding?: "snapshot" | "live" })` — default `live`; use `snapshot` with shared `resolvedPolicies` at AI SDK execute
- `gateToolPoliciesAtExecute` — execute-boundary policy gate (used by ai-sdk adapter)
- `mergeToolPipelineHooks` / `evaluatePolicyWithHooks` — optional telemetry; hooks are **not** hashed

### Hashing and runtime snapshot

- `collectToolStaticHashes` / `computeRuntimeHash` / `resolveRuntimeToolRefs`
- `computeRuntimeCapabilitiesFromEvaluation` — one-shot evaluate + `nameToStaticHash` + runtime hash + `toolRefs` + `evaluatedTools`
- `hashToolSpecStatic` — dynamic-only / fallback tool static hash
- `hashPlainObject` / `schemaToHashInput`

### Invocation (binding lineage, optional)

- `normalizeInvocationContextForHash` / `invocationContextCanonicalPayload` / `computeInvocationContextHash`
- `computeFullCapabilityLink` — evaluate the agent’s root + `createCapabilityLink` in one call (optional `invocationContext`)

### Canonical payloads (debug / UI)

- `runtimeCapabilityCanonicalPayload` / `toolSpecCanonicalPayload` (invocation: `invocationContextCanonicalPayload`)

### Agent label + link

- `createRegisteredAgent` / `createCapabilityLink` (optional `invocationContext` / `invocationContextAllowlist`)

### Dashboard-style helpers

- `formatHashShort` / `diffToolRefs` / `diffCapabilityLinks` / `explainCapabilityLinkRelationship`
- `formatCapabilityDiffReport` / `bun run capability-diff` — compare two link or envelope JSON files; see [capability diff CLI](../../docs/capability-diff-cli.md)

### Persistence (Smithy contract + `:memory:`)

- `AgentCapabilitiesPersistence` — implement for your DB; see [persistence guide](../../docs/persistence.md)
- `createMemoryAgentCapabilitiesPersistence()` — `:memory:` backend (like SQLite `:memory:`)
- `recordTurnAttribution(persistence, { op, sessionId, link, envelope? })` — write link + optional envelope after capture
- `registeredAgentToRegistrationRow` / `capabilityLinkToRow` / `envelopeToRow` / `defaultOpContext`

### Session host (`createAgentRegistry`)

- `createAgentRegistry({ persistence? })` — defaults to `:memory:` persistence; session host + orchestration overlay
- `createToolRegistry` / `hashToolComposableStatic`
- `await createAgentRegistry().register(agent, { hooks, ctx, run })` — see [Declarative agents and sessions for implementors](#declarative-agents-and-sessions-for-implementors)
- `createAgentRegistry().createSession(agentId, { hooks, ctx, run, sessionId? })` — `agentId` matches `RegisteredAgent.agentId`
  - `session.onStart(...)` / `session.onBeforeContext(...)` / `session.onAfterContext(...)` / `session.onBeforeRun(...)` / `session.onAfterRun(...)` / `session.onError(...)`
  - `session.start(input)` runs with composed hooks and merged context (`session > registry > agent static`), then **`run`**

### Optional host / UX helpers

Not required for hashing or persistence — see [host helpers guide](../../docs/host-helpers.md).

- `elapsedMs` — timing from `performance.now()`
- `createToolRegistry` — in-memory composable catalog (tests/examples)
- `withFormattedResults` — `{ ok, data? } | { ok: false, error }` wrapper

### Capture one turn (persistence + same-turn LLM)

- `AGENT_SNAPSHOT_ENVELOPE_VERSION` — current `AgentSnapshotEnvelope.schemaVersion` (`"1"`); see [schema versions](../../docs/schema-versions.md)
- `captureAgentRuntimeSnapshot` — one evaluation pass → `AgentRuntimeSnapshot` + live `evaluatedTools` / `instructions` / `link` / `toolRefs`
- `captureAgentSnapshotEnvelope` — same pass → full `AgentSnapshotEnvelope` (optional `sessionContext`, `includeStatic`)
- `registeredAgentToWire` / `toolkitContextToWire` — wire helpers used by capture

## Capture one turn for persistence

For each message or job, call **`captureAgentSnapshotEnvelope`** (or **`captureAgentRuntimeSnapshot`** if you only need the runtime slice):

```ts
const { envelope, link, evaluatedTools, instructions } = await captureAgentSnapshotEnvelope({
  agent,
  ctx: { env: { userTier: "pro" }, agentId: agent.agentId, agentName: agent.name },
  invocationContext: { subjectId: "user-1" }, // optional third fingerprint
  sessionContext: { messageId: "msg-abc" },   // envelope.context (not hashed)
  policyMode: "authoritative",
});
// Persist envelope (JSON) or Smithy CapabilityLinkRow fields from link + toolRefs
// Use evaluatedTools + instructions for the LLM on this same turn
```

| Field | Role |
|-------|------|
| `invocationContext` | Hashed into `link.invocationHash` (tenant/subject/persona binding); see [invocation context](../../docs/invocation-context.md) |
| `sessionContext` | Stored in `envelope.context` only; not part of capability hashes |
| `runtime.toolkitContext` | JSON-safe `env` / `agentId` / `namespace` from `ToolkitContext` (hooks omitted) |
| `runtime.affordances` | Wire tools for storage/replay via `hydrateAffordances` |
| `evaluatedTools` | Live handlers for this turn (not persisted) |

Use **`captureAgentRuntimeSnapshot`** when the static template is unchanged and you only append runtime rows. Use **`computeFullCapabilityLink`** when you only need hashes without a full wire snapshot.

## Mapping to persistence

Hashes and wire payloads are computed in-process; durable storage uses [`AgentCapabilitiesPersistence`](../../docs/persistence.md) (Smithy `AgentCapabilitiesPersistenceService`). Host backends assign opaque ids (`registrationId`, `linkId`, etc.); row builders accept optional ids.

**What to store:** prefer `recordTurnAttribution` or a full **`AgentSnapshotEnvelope`** from `captureAgentSnapshotEnvelope`, or a **`CapabilityLink`** (includes `toolRefs`) plus wire affordances. `AgentRuntimeSnapshot` still exposes top-level `toolRefs` for envelope v1; they should match `link.toolRefs`. If you need forensics, persist the **same** `invocationContext` object you passed to capture (or store it in host `metadata`).

## Invocation context

Recommended keys and the split between hashed `invocationContext` and non-hashed `sessionContext` are documented in [docs/invocation-context.md](../../docs/invocation-context.md). Export: `InvocationContextRecommended`.

## Examples

```bash
bun run example:static
bun run example:dynamic
bun run example:capabilities
bun run example:diff
bun run example:session-attribution
```

`01-static-toolkit.ts` / `02-dynamic-toolkit.ts` — evaluate composables and map tools via `@khoralabs/agent-capabilities-ai-sdk`.

`05-session-attribution.ts` — session host with capture in `run` and `recordTurnAttribution` in `onAfterRun` (see [attribution and telemetry guide](../../docs/attribution-telemetry.md)).

## Tests

```bash
bun test
```