# Extensions

An **extension** is a single JavaScript or TypeScript file that contributes
to ferridriver at runtime. One file can contribute to three hosts:

- **MCP server** (`ferridriver mcp`) — registers tools via `defineTool(...)`.
- **BDD test runner** (`ferridriver bdd`) — registers Cucumber step
  definitions, hooks, and parameter types via `Given`/`When`/`Then`/
  `Before`/`After`/`defineParameterType`/...
- **Ad-hoc scripts** (`ferridriver run`, MCP `run_script`) — the same VM
  bindings the above two use.

The same file can serve all three. It branches on the `ferridriver.host`
global to decide what to contribute where.

> Companion document: `docs/plugin-architecture.md` records *why* the
> system is shaped this way (the comparison against VS Code / Deno / WASM /
> Rollup and the decisions deferred). This document is the *how*: the
> authoring contract and reference.

---

## Mental model

```
extension.ts ──► rolldown bundle (TS + node_modules + tree-shake)
             ──► QuickJS bytecode (compiled ONCE at startup)
             ──► content-hash cache (in-memory, process-local)
             ──► Module::load per session VM (no re-parse)
             ──► top-level defineTool()/Given() run → Rust ExtensionRegistry
```

Registration functions (`defineTool`, `Given`, `Before`, ...) are native
Rust functions, not JS shims. Calling them at the top level of your module
pushes an entry into a Rust-owned registry. Hosts then read back the kinds
they care about and invoke your handler natively — the MCP tool path and
the BDD step path use the exact same dispatch mechanism.

Implication: **all contribution happens as a side effect of the module's
top-level code running once.** There is no `activate()` / `onLoad()`
lifecycle hook — ES module top-level *is* your load hook.

---

## Detecting the host

`ferridriver.host` is a string set once per session: `"mcp"`, `"bdd"`, or
`"script"`. Use it so one file can ship a tool and its matching step
without registering the wrong thing in the wrong host:

```ts
if (ferridriver.host === "mcp") {
  defineTool({
    name: "box.login",
    description: "Log a test user in and return the session cookie",
    inputSchema: { type: "object", properties: { user: { type: "string" } }, required: ["user"] },
    allow: { net: ["*.box.com"] },
    handler: async ({ args, request }) => {
      const res = await request.post("https://api.box.com/login", { data: { user: args.user } });
      return { cookie: (await res.json()).cookie };
    },
  });
}

if (ferridriver.host === "bdd") {
  Given("I am logged in as {string}", async function (user: string) {
    await this.page.goto(`https://app.box.com/login?u=${user}`);
  });
}
```

Registering for the wrong host is harmless (the host ignores kinds it does
not consume) but wastes work and muddies intent — gate it.

---

## Authoring MCP tools

### `defineTool`

Two equivalent forms:

```ts
// Inline handler on the manifest object:
defineTool({
  name: "string",              // required, globally unique, dot-namespaced by convention
  description: "string",       // optional, surfaced in tools/list
  inputSchema: { ... },        // optional JSON Schema, surfaced in tools/list AND enforced
  exposeAsTool: true,          // optional, default false (see below)
  timeoutMs: 30000,            // optional per-invocation handler timeout (ms)
  allow: { ... },              // optional capability manifest (see below)
  handler: async (ctx) => { ... },
});

// Or manifest + separate handler:
defineTool(manifest, async (ctx) => { ... });
```

### `exposeAsTool`

- `false` (default): the tool is callable from other extension/script code
  as `await plugins["name"](args)`, but is **not** advertised in the MCP
  server's `tools/list`. Use for shared helpers.
- `true`: additionally promoted to a first-class MCP tool. `name`,
  `description`, and `inputSchema` become the tool's contract. The tool
  call and the `plugins[...]` binding route through the same handler.

### Handler context

The handler receives one object:

| Field      | Type                  | Notes |
|------------|-----------------------|-------|
| `args`     | the caller's argument | For a promoted tool, the MCP `arguments` object. |
| `page`     | `Page` \| undefined   | The live browser page for the session. |
| `context`  | `BrowserContext` \| undefined | The session's browser context. |
| `request`  | `HttpClient` \| undefined | HTTP client. Net-restricted if `allow.net` is non-empty. |
| `commands` | `PluginCommands`      | `.run(name, vars?)` — runs a declared shell template. |

Return any JSON-serialisable value; it becomes the tool result.

> When the manifest declares `inputSchema`, the caller's `args` are
> validated against it (full JSON Schema, via the `jsonschema` crate)
> **before** the handler runs; a non-conforming call is rejected as a
> tool error and the handler is never entered. You still get the parsed
> value as `args` — validation does not coerce, only gate.

---

## Capabilities

`allow` is a declarative, default-deny capability manifest, enforced in
Rust at the binding boundary. The handler source alone cannot grant itself
authority it did not declare.

### `allow.commands` (alias: `allow.exec`)

A name → command map. The handler may only run commands it declared
(default-deny). Each value is a **shorthand string** (a `sh -c` line) or
a **spec object**:

```ts
defineTool({
  name: "git.sha",
  allow: {
    commands: {
      // shorthand: a shell line
      headSha: "git -C ${repo} rev-parse HEAD",
      // spec object: no shell, explicit policy
      clone: {
        run: ["git", "clone", "${url}", "${dest}"], // argv array → no shell
        timeoutMs: 60000,
        env: ["SSH_AUTH_SOCK"],   // else the child env is scrubbed
        cwd: "/tmp",
        output: "text",           // "text" | "json" | "lines"
      },
    },
  },
  handler: async ({ commands }) => {
    const sha = await commands.run("headSha", { repo: "/srv/app" });
    return { sha: sha.trim() };
  },
});
```

Spec fields (all optional except `run`): `run` (string ⇒ `sh -c`;
array ⇒ direct exec, no shell), `timeoutMs`, `env` (server env names to
pass through — otherwise only `PATH` is kept), `cwd`, `output`,
`persistent`.

One-shot semantics (`commands.run(name, vars?)`):

- An undeclared `name` throws. Output past 8 MiB, non-zero exit, or
  timeout throws (the whole process group is killed on timeout).
- `${name}` is **strictly** substituted: every placeholder must be a
  supplied value and every value must be a string/number/boolean. A
  missing placeholder or an object/array value throws — no silent empty.
- Shell form single-quote-escapes each value; **argv form does not need
  to** — values are passed as literal arguments, so shell metacharacters
  in them are inert. Prefer argv unless you actually need a pipeline.
- `output` shapes stdout: `text` (trimmed string, default — no
  guessing), `json` (parsed; invalid JSON throws), `lines` (array of
  non-empty trimmed lines).

**Trust boundary.** A shell-form `run` line is author-supplied code with
the server process's authority (`$(…)`, `&&`, `|`, redirection live);
only the `${values}` are escaped. Argv form removes the shell entirely.
Never write a shell line that re-evaluates a value (`sh -c "${x}"`,
`eval ${x}`) — that defeats the escaping. Template = trusted code you
commit; values = untrusted data.

### Persistent commands (servers, watchers)

Declare `persistent: true` for a long-running process. It is managed
with a different verb set and its lifetime is the **session's**, not the
call's:

```ts
allow: { commands: { dev: { run: "npm run dev", persistent: true } } }
// ...
await commands.start("dev");          // { name, pid }; idempotent if up
const s = await commands.status("dev"); // { running, pid, exitCode, uptimeMs, stdout, stderr }
await commands.stop("dev");           // SIGKILLs the process group
```

- `run` on a `persistent` spec (or `start`/`status`/`stop` on a one-shot
  spec) throws — the kinds don't mix.
- The process **survives a script-VM rebuild** (timeout/OOM/browser
  relaunch) so a dev server keeps running across calls. It is killed
  when the session ends (idle-TTL reap, explicit close, server
  shutdown), on `stop`, or if it exits on its own.
- `status` returns the last ~64 KiB of stdout/stderr (a ring buffer — a
  chatty server won't grow memory unbounded). Max 16 persistent
  processes per session.

### `allow.net`

A host allow-list scoping the handler's HTTP — both the `request` client
and the global `fetch` (they share one core, so the list binds both).

- Empty / absent: HTTP is unrestricted (back-compat default).
- Non-empty: the tool's `request` binding and `fetch` both flip to
  **default-deny**. Each entry is an exact host (`api.box.com`) or a
  leading-wildcard suffix (`*.box.com`, which also matches the bare apex
  `box.com`). Any other host throws before the request is made. The
  policy follows the running handler: a tool calling another tool, or
  two tools running concurrently, each see only their own declared list.

`allow.net` scopes HTTP (`request` + `fetch`) **only**. `page`/`context`
browser navigation is a separate, deliberately ungated authority — an
automation tool must be able to navigate. There is no `fs` capability:
the handler context exposes no filesystem handle, so an `fs` scope would
gate nothing.

---

## Authoring BDD steps

Cucumber-js-shaped surface, native-backed:

```ts
Given("a user {string}", async function (name: string) { /* ... */ });
When("they click {word}", async function (sel: string) { /* ... */ });
Then("the title is {string}", async function (expected: string) { /* ... */ });

defineStep("...");          // keyword-agnostic; And/But also map here
Before(async function () { /* ... */ });
Before("@tag", async function () { /* ... */ });          // tag-filtered
After(async function (s) {
  if (s.result.status === "FAILED") this.attach(await this.page.screenshot(), "image/png");
});
BeforeAll(async () => { /* ... */ });   AfterAll(async () => { /* ... */ });

defineParameterType({ name: "color", regexp: "red|green|blue", transformer: (s) => s.toUpperCase() });

setDefaultTimeout(10000);                 // ms; per-registry default
setWorldConstructor(class { /* ... */ }); // custom World (last call wins, per VM)
setDefinitionFunctionWrapper((fn) => fn); // wrap every step body (retry/trace)
```

Per-step / per-hook timeout via the options bag:

```ts
Given("slow thing", { timeout: 30000 }, async function () { /* ... */ });
Before({ timeout: 2000 }, async function () { /* ... */ });
```

The step `this` is the per-scenario **World**. Fixtures are installed on
it: `this.page`, `this.context`, `this.request`, `this.browser`, plus
`this.parameters` (Cucumber `--world-parameters`), `this.attach`,
`this.log`, `this.skip()`. A custom `setWorldConstructor` is invoked as
`new World({ parameters })`; fixtures are augmented onto the instance.

Step bodies return:

- (nothing) / resolved promise → **passed**
- string `"pending"` → **pending**
- string `"skipped"` or `this.skip()` → **skipped**
- throw → **failed** (error remapped to the original `.ts`/`.js` location
  via the rolldown source map, including the stack)

`setParallelCanAssign` is accepted but inert: ferridriver parallelises at
the test-runner worker level (one VM per worker), not cucumber-js's
per-pickle scheduler.

> There is also a **built-in Rust step library** (`ferridriver-bdd/src/
> steps/*`, registered via `#[given]`/`#[when]`/inventory). That is the
> shipped step vocabulary, not the user extension surface — it is not
> loaded from your `.ts` files and is out of scope for this document.

---

## Discovery and configuration

Extensions are configured in the unified config file
(`ferridriver.toml`/`.yaml`/`.json`), top-level (both hosts load it):

```toml
# Files or directories. A directory is scanned RECURSIVELY for any
# source file (.js .cjs .mjs .jsx .ts .cts .mts .tsx). Used by the MCP
# server (tools) AND, bundled alongside BDD step files, by the test
# runner (steps).
extensions = ["./extensions", "./tools/box-login.ts"]

[scripting]
# Sandbox relaxations — default-deny, like allow.net.
# Names a script may read via process.env (intersected with the real
# environment; absent names stay absent — never invented). Empty ⇒
# process.env is {}.
allowEnv = ["HOME", "TZ"]

[test]
# JS/TS step-definition globs. Defaults to steps/**/*.{js,ts} and
# step_definitions/**/*.{js,ts} when empty.
steps = ["features/steps/**/*.ts"]
```

The `ferridriver bdd` runner bundles discovered step files **and** the
configured `extensions` into one module, so an extension's `Given/When/
Then` are available to tests exactly like a step file's.

Both discovery paths (MCP plugin loader and BDD runner) share one
accepted-extension set and one recursive walk, so a `.tsx`/`.cts`
extension is visible identically to both hosts.

---

## Node-ish APIs: `process` and `fetch`

So real npm packages run, scripts and handlers get a sandbox-safe
`process` and a standard `fetch`.

### `process`

Always available (no authority, real values): `platform`, `arch`,
`version`, `versions`, `release`, `argv` (`["ferridriver","script"]`),
`pid`, `nextTick`, `hrtime` (+ `hrtime.bigint()` -> BigInt ns),
`stdout`/`stderr` (`.write(chunk)` routes into the captured console —
`stdout`->log, `stderr`->error, one trailing newline trimmed; returns
`true`, `isTTY` is `false`), `cwd()` (returns the sandbox root, never
the real cwd). `nextTick(cb)` is a FIFO microtask (via
`queueMicrotask`), not Node's separate higher-priority queue — order
follows scheduling order.

- `process.env` — **default `{}`**. Only the names in `[scripting]`
  `allowEnv`, and only if set in the server's environment, appear; the
  object is frozen. A name you didn't list is simply absent — there is
  no way for a script to read an unlisted variable.
- `process.exit()` — throws (a script must never kill the server).
- `process.binding`/`dlopen`/`kill`/`chdir`/`setuid`/… — not present.
- `process.versions.node` — never present (`process.versions` is
  honest: `ferridriver` + `quickjs` only). This is not Node.

### `fetch`

Web-standard `fetch(input, init?)` with the WHATWG globals `Headers`,
`Request`, and `Response` (constructible; `instanceof` works):

```ts
const r = await fetch("https://api.example.com/x", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: { hello: "world" },        // object ⇒ JSON; string ⇒ sent as-is
});
if (!r.ok) throw new Error(`HTTP ${r.status}`);
const data = await r.json();
```

`Headers` follows the spec (case-insensitive, `, `-combined,
`set-cookie` separate + `getSetCookie()`, real iterators, `forEach`).
`Response` has `status`/`ok`/`statusText`/`url`/`redirected`/`type`/
`bodyUsed`/`headers`, single-use `text()`/`json()`/`arrayBuffer()`,
`clone()`, and static `Response.json()`/`error()`/`redirect()`.
`Request` (`new Request(url|Request, init?)`) carries
`url`/`method`/`headers`/`redirect`/`credentials`/`bodyUsed` and is
accepted by `fetch`. `AbortController`/`AbortSignal` are standard
(`controller.abort(reason?)`, `signal.aborted`/`reason`/
`throwIfAborted()`/`onabort`/`addEventListener('abort')`,
`AbortSignal.abort/timeout/any`); `fetch(url, { signal })` rejects an
already-aborted call before I/O and cancels an in-flight request.
`Response.body` is a `ReadableStream` that pulls chunks **live off the
socket** — a large/streamed body is not fully buffered;
`getReader().read()` -> `{value:Uint8Array,done}`,
`for await (const chunk of res.body)`, `cancel()`, `locked`;
`text()`/`json()`/`arrayBuffer()` drain it on demand. `new
ReadableStream({ start(c){ c.enqueue(x); c.close() } })` works too.
`Blob` (`new Blob(parts, {type})`, `size`/`type`/`text()`/
`arrayBuffer()`/`bytes()`/`slice()`/`stream()`) and `FormData`
(`append`/`set`/`get`/`getAll`/`has`/`delete`/`keys`/`values`/
`entries`/`forEach`) are accepted as `fetch` bodies — a `Blob` sends
its bytes + type, a `FormData` is sent as `multipart/form-data`.
Subset, for now: `clone()` of a not-yet-read streamed `Response`
throws (no stream tee), no `ReadableStream` `pull`/`tee`/BYOB,
`FormData` iteration is via `entries()`/`forEach` (arrays), and a
`signal` set on a `Request` instance is not yet forwarded (pass it
through `init.signal`).

The Playwright page-network `Request`/`Response` (from `page.on(...)`,
`route`, navigation) are unchanged but are not global constructors
(matching Playwright, which never globalised them) — the bare
`Request`/`Response` globals are the fetch classes.

It runs on the **same HTTP core as `request`** — so cookies/session are
shared and any `allow.net` restriction on a tool's `request` applies to
`fetch` the same way (no second stack, no bypass). `request` (the
Playwright-style API) stays; `fetch` is the standard entry point.

---

## The compile pipeline

1. **Discover** files (config + globs).
2. **Bundle** each with rolldown (oxc): resolves the whole import graph
   including `node_modules`, transpiles TS, tree-shakes, emits one ESM
   chunk with a hidden source map. Cache-miss bundles run concurrently.
3. **Compile** the chunk to QuickJS bytecode once, in a single throwaway
   runtime shared by the whole batch.
4. **Cache** bytecode + extracted manifests keyed by
   `hash(canonical path + file bytes)`. Unchanged files skip bundle +
   compile entirely on reload.
5. **Load** the bytecode into each session VM with `Module::load` — no
   re-parse, no resolver (imports are already inlined).

Consequences worth knowing as an author:

- **Imports work.** `import './helpers.ts'`, `import pkg from 'some-dep'` —
  all bundled and tree-shaken. No Node/Bun in the run path; QuickJS has no
  Node builtins (rolldown `platform: neutral`).
- **The bytecode cache is in-memory and process-local**, never written to
  disk — a requirement of the `unsafe Module::load` invariant (bytecode is
  interpreter-build- and process-specific). Restarting the server
  rebuilds it.
- **One bad file does not abort the batch.** Bundle/compile/manifest
  failures are reported per file and skipped; the server still starts.
- **Errors are source-mapped.** A thrown error in a bundled step is
  reported at the original `.ts:line:col`, stack included.

---

## State and lifetime

What you can rely on between calls, when running under the MCP server.

### Two ways to keep state

A *session* is identified by the `session` argument (`instance:context`,
default `"default"`). All `run_script` calls and all plugin tool calls
that share a session also share state:

- **`globalThis`** — anything you assign (`globalThis.cache = …`,
  `function f(){}`, `var x`) stays visible to later calls in the same
  session. Use it for rich in-session working state: parsed data,
  helper closures, accumulated results.
- **`vars`** — a small string→string store (`vars.set`, `vars.get`,
  `vars.has`, `vars.delete`, `vars.keys`). Use it for the few values
  that must *outlive a reset* of `globalThis` (see below): an auth token
  you captured once, a pagination cursor, a feature flag.

`page`, `context`, `request`, `browser` always reflect the session's
current browser — never cache them in `globalThis`; cache what you read
from them, not the handles.

### When `globalThis` resets (and `vars` does not)

`globalThis` is fast but not permanent. It is wiped — silently, you just
see a fresh global on the next call — when any of these happen:

- a call hits its timeout or runs the browser/runtime out of memory;
- the session's browser is relaunched or reconnected (a new browser
  session under the same name — old page references would be dead);
- the server is busy with many sessions and reclaims an idle one's
  working memory to serve others.

`vars` survives all of those for the life of the session. The session
itself (and its `vars`) ends only when it sits unused past the idle
timeout (default 30 minutes), is closed explicitly, or the server stops.

Rule of thumb: build freely in `globalThis`; copy into `vars` the
handful of things you cannot afford to recompute or re-fetch after a
reset.

### Isolation

Tools and scripts in one session share the *same* `globalThis` — it is
shared working space, not a sandbox between tools. Don't depend on
another tool's globals, and don't clobber built-ins
(`globalThis.JSON`, prototypes); a tool that does will break later
calls in that session. Different sessions never share state. Calls
within one session are serialised (no two run at once); different
sessions run independently.

### BDD

Under the test runner the model differs: one VM per worker, scenarios
parallel across workers and serial within one. The `World` (`this`) is
rebuilt per scenario; `setWorldConstructor` /
`setDefinitionFunctionWrapper` are per-VM (last call wins). `vars` /
`globalThis` continuity is not a BDD concept — use the `World` and
hooks.

### Imports

No cross-file or cross-plugin shared state beyond what you `import`
directly. Share helpers by importing them; there is no implicit
cross-plugin channel by design.

---

## Reference

### Manifest (`PluginManifest`)

| Field          | Wire (camelCase) | Default | Meaning |
|----------------|------------------|---------|---------|
| name           | `name`           | —       | Required, non-empty, unique across all loaded extensions. Binding/tool key. |
| description    | `description`    | none    | Shown in `tools/list`. |
| input schema   | `inputSchema`    | none    | JSON Schema; **enforced** — non-conforming calls rejected before the handler. |
| allow          | `allow`          | `{}`    | Capability manifest. |
| expose as tool | `exposeAsTool`   | `false` | Promote to a first-class MCP tool. |
| timeout ms     | `timeoutMs`      | none    | Per-invocation handler timeout (ms); enforced for every caller. |

### Capability manifest (`PluginAllow`)

| Field    | Wire        | Default | Meaning |
|----------|-------------|---------|---------|
| commands | `commands`  | `{}`    | name → command (shell string or spec object; `persistent` opt-in); alias `exec`. |
| net      | `net`       | `[]`    | host allow-list for `request` + `fetch`; empty = unrestricted. |

### Registration surface (JS globals)

`defineTool` · `Given` · `When` · `Then` · `defineStep` · `And` · `But` ·
`Before` · `After` · `BeforeAll` · `AfterAll` · `BeforeStep` · `AfterStep` ·
`defineParameterType` · `setDefaultTimeout` · `setDefinitionFunctionWrapper`
· `setWorldConstructor` · `setParallelCanAssign` (inert) · `ferridriver.host`

---

## What the runtime guarantees

What you can count on as an author:

1. **`inputSchema` is enforced.** If you declare one, a call whose
   arguments do not match it is rejected as a tool error *before* your
   handler runs — you never see malformed input through the schema. A
   schema that is itself invalid JSON Schema is reported, not ignored.
   Still validate domain rules the schema cannot express inside the
   handler.
2. **Tool names are unique and non-empty.** A duplicate or blank `name`
   fails that extension at load time. A name that collides with a
   built-in or another loaded tool is not exposed. Namespace your names
   (`vendor.area.action`).
3. **Tool failures are reported as errors.** When your handler throws,
   the caller gets an error result (not a "success" containing an error
   string), with the message first and the full detail after. (Plain
   `run_script` is different: it always succeeds and you inspect its
   `status` field.)
4. **`timeoutMs` is honoured for every caller** — whether the tool is
   invoked as a promoted MCP tool or by another extension. Without it,
   only the session-wide script timeout applies.
5. **Discovery is recursive and uniform.** A configured directory is
   scanned recursively; `.js .cjs .mjs .jsx .ts .cts .mts .tsx` are all
   accepted, the same way for the MCP server and the test runner. A file
   you name explicitly is used as-is.
6. **You can inspect what loaded.** The built-in `ferridriver_extensions`
   tool lists every loaded extension file, its tools, descriptions,
   whether each is exposed, its timeout, and its declared capabilities.

### Things to keep in mind

- **Shell-form `commands` are code, not config.** A string `run` (or
  shorthand) executes via `sh -c` with the *server process's*
  privileges — `$(…)`, `&&`, `|`, redirection are live. `${values}` are
  shell-escaped, but never write a line that re-interprets a value
  (`sh -c "${x}"`, `eval ${x}`): that defeats the escaping. **Argv form**
  (`run: ["cmd", "${arg}"]`) runs with no shell at all — prefer it; the
  trust-boundary concern simply disappears. Template = trusted code you
  commit; values = untrusted data (see *Capabilities*).
- `inputSchema` validation runs on every call. That is fine for tool
  call volumes; do not put megabyte schemas on a tool expecting
  thousands of calls per second.