## Fork 100 microVMs in 101 ms. BRANCH a live VM in 56 ms (v0.4 live mode). A microVM sandbox runtime for **AI agent fan-out**. Children fork from a warmed parent snapshot, inheriting its address space copy-on-write instead of cold-booting their own kernel. forkd is built on Firecracker. The parent VM boots once, imports your runtime (Python + your dependencies, a JIT-warmed JVM, an already-loaded ML model) and is paused to disk. Each child is a separate Firecracker process that `mmap`s the parent's memory image with `MAP_PRIVATE`; the kernel implements copy-on-write at the page level, so children share the parent's resident memory until they diverge. The result is two properties at once: per-child KVM isolation, and a spawn cost that's closer to `fork(2)` than to a cold-boot VM. forkd also supports **BRANCH**: pause a running sandbox, snapshot its in-flight state, and resume — all in ~150 ms — so an agent can fork mid-thought, not only at warm-up. v0.3.4 fixed a slow-path regression where repeated BRANCHes on the same parent ballooned from 150 ms to 2.7 s ([#146](https://github.com/deeplethe/forkd/issues/146)); the chain now stays flat (17.6× faster on the 6th consecutive BRANCH). **v0.4 live BRANCH** collapses the source-pause window from ~200 ms (Diff) to **56 ms p50 / 64 ms p90** on a 1.5 GiB source — measured on a real BRANCH workload, [`bench/live-fork-pause-window/RESULTS-v0.4.md`](./bench/live-fork-pause-window/RESULTS-v0.4.md). **3.6× faster pause** vs v0.3 Diff at p50, and the gap *widens* on slower storage because Live's pause is disk-independent (memory copy runs after resume, not during). With `wait: false` the caller returns in ~70 ms while the background copy completes asynchronously — a **200×** RT improvement for fire-and-forget BRANCH from agent code. Pass `--live` / `--no-wait` on the CLI, `mode: "live"` / `wait: false` on REST, or the same on the Python / TypeScript / MCP SDKs. ```python from forkd import Controller c = Controller() # Source must boot with live_fork=True (memfd-backed RAM, the prereq # for UFFD_WP to see writes from the running parent). parent = c.spawn_sandboxes("pyagent", n=1, live_fork=True)[0] # ... drive parent ... then BRANCH live + fire-and-forget: branch = c.branch_sandbox(parent["id"], mode="live", wait=False) # Returns after ~10 ms with status="writing"; poll list_snapshots # until status="ready" for the background copy to finish. ``` ```bash # CLI: spawn live-fork-capable children locally, then live-BRANCH the # daemon-tracked one. The two paths don't compose yet — daemon-side # spawn from the CLI is the next gap (see issue #209 for status). sudo -E forkd fork --tag pyagent -n 1 --per-child-netns --live-fork sudo -E forkd snapshot --from-sandbox --live --no-wait ``` Requires Linux ≥ 5.7, `vm.unprivileged_userfaultfd=1` (or `CAP_SYS_PTRACE`), and the vendored Firecracker fork from [deeplethe/firecracker:forkd-v0.4-mem-backend-shared-v1.12](https://github.com/deeplethe/firecracker/tree/forkd-v0.4-mem-backend-shared-v1.12) — `forkd doctor` probes both. Full design: [`DESIGN-v0.4.md`](./DESIGN-v0.4.md). Empirical PoC data: [`experiments/v0.4-*-poc/`](./experiments/). Tracking issue [#101](https://github.com/deeplethe/forkd/issues/101).
## Demo: branch a thinking agent A 24-second walkthrough of the LangGraph branch-and-fan-out demo — source agent runs a ReAct loop, gets BRANCHed mid-thought, three grandchildren each receive a different steering hint, all three produce divergent itineraries while inheriting the same prior reasoning state. ![forkd branch-and-fan-out demo](./docs/assets/demo-en.webp) Headline divergence: the source (no hint) picks Nishiki Market for Day 1; all three hinted children independently substitute Arashiyama Bamboo Grove. The cost-focused child also adds "may be pricey" annotations the others don't. **The model wasn't told to swap places** — each hint perturbed the next LLM call, the rest of the prior reasoning came along unchanged. Full mechanism + numbers + raw transcripts in [`recipes/langgraph-react/`](./recipes/langgraph-react/) and [`recipes/langgraph-react/DEMO.md`](./recipes/langgraph-react/DEMO.md). ### And: filesystem state, not just reasoning For the "but couldn't you just call the LLM 3 times in parallel?" objection, see [`recipes/coding-agent-fork/`](./recipes/coding-agent-fork/) — a 50 MiB binary blob travels byte-identically across all 4 sandboxes through a single BRANCH. Three grandchildren each apply a different fix to a buggy Python package; their `__pycache__/` and edits stay isolated, but the 50 MiB inheritance is shared. Bytes can't fit in a prompt. **3.3 s pause for the BRANCH operation.**
## Properties - **Hardware isolation.** Each child is its own Firecracker microVM backed by KVM. Escape requires a hypervisor or kernel vulnerability, not a `runc` regression. - **Warmed runtimes inherit for free.** Imports, JIT compilation, model weights, prefetched caches — anything the parent did is already resident in the child. - **Real Linux per child.** Multi-vCPU, full TCP networking, `apt install`, outbound HTTPS. Unlike function-level snapshot runtimes that trade single-vCPU + serial-I/O for raw spawn speed, forkd children can run real Python servers, model inference, or any workload that needs a full kernel. - **Multi-tenant by construction.** Per-child network namespace, per- child cgroup v2 memory limit, independent `/dev/urandom` re-seeded by `vmgenid` (Linux 5.20+). - **Built for agent fan-out.** AI agent workloads that fan out into many short-lived sandboxes — code-interpreter, tool-use, evaluation rollouts — are the design point. The warmed parent collapses the per-request `import numpy` / `import torch` cost across the entire cohort. - **Operable.** Daemon process owning state, REST API on Unix or TCP, Prometheus `/metrics`, append-only JSON audit log, systemd unit. - **Open source.** Apache 2.0, no vendor SDK.
## Benchmarks Same Linux host (Ubuntu 24.04, Linux 6.14, 20 vCPU, 30 GiB, KVM). Workload: spawn 100 sandboxes that each run `import numpy; numpy.zeros(5).tolist()`. ![Spawn time at N=100](./bench/chart-spawn-100.png) ![Host memory per sandbox](./bench/chart-memory-per.png) | Backend | Wall-clock at N=100 | Memory delta per sandbox | Notes | |---|---:|---:|---| | **forkd** | **101 ms** | **0.12 MiB** | fork-from-warm via snapshot CoW | | CubeSandbox¹ | 1.06 s | 5 MiB | RustVMM microVM, cold-boot (pool fast path) | | BoxLite² | 113.2 s | — | KVM microVM, cold-boot OCI rootfs | | OpenSandbox³ | 122.0 s | — | Docker runtime via abstraction layer | | Firecracker cold-boot | 759 ms | 84 MiB | raw VM boot, no orchestration | | gVisor (runsc) | 288.6 s | — | userspace kernel container | | Docker (runc) | 335.3 s | 4 MiB | standard container runtime | ¹ CubeSandbox: 1.06 s wall-clock is the **fast-path** N=100 figure on this host (1056 ± 14 ms over five runs, 100 % success every run), measured with a bench script that pre-warms Python's `ThreadPoolExecutor` to keep client-side lazy-init out of the timing. An earlier slow-path measurement on the same host returned 20.3 s with 77/100 success — that template had a 2 GiB writable-layer size that didn't match the default 1 GiB pool, so every sandbox went through a live `mkfs.ext4 + reflink-copy`; after the upstream maintainer at [#235](https://github.com/TencentCloud/CubeSandbox/issues/235) clarified the distinction, we added `2Gi` to `pool_default_format_size_list` and re-ran. The host runs cube **v0.2.0**, which carries a ~50 ms latency regression that [PR #234](https://github.com/TencentCloud/CubeSandbox/pull/234) fixes in v0.2.1; the value above is the v0.2.0 baseline. Cube advertises **<60 ms** single-instance cold-start on a 96 vCPU host; we did not retest that shape. See [`bench/CUBESANDBOX.md`](./bench/CUBESANDBOX.md) for the full methodology, both rows, and the `cmdTimeout` race we filed two PRs upstream against ([#236](https://github.com/TencentCloud/CubeSandbox/pull/236) / [#237](https://github.com/TencentCloud/CubeSandbox/pull/237)). ² BoxLite is optimised for one long-lived stateful Box per workload, not 100 concurrent fresh microVMs. The cold fan-out is included for direct comparability. See [`bench/BOXLITE.md`](./bench/BOXLITE.md). ³ OpenSandbox is an abstraction layer over Docker / K8s / gVisor / Kata / Firecracker; the number is for its default Docker runtime. See [`bench/OPENSANDBOX.md`](./bench/OPENSANDBOX.md). Reproduce: `bench/bench-spawn-100.sh` then `bench/generate_charts.py`. For one sandbox doing the same numpy expression two ways: | Call | Time | What it does | |---|---:|---| | `sandbox.eval("numpy.zeros(5).tolist()")` | 1 ms | Reuses the warmed Python in PID 1 | | `sandbox.commands.run("python3 -c '...'")` | 96 ms | Cold subprocess re-imports numpy |
## How it works ```mermaid flowchart TB %% ─── parent ─────────────────────────────────────────────── subgraph PARENT["Parent VM (booted once, warmed)"] direction TB runtime["PID 1
Python + numpy + your deps
imported into RAM"] end PARENT -- "pause + snapshot" --> SNAP["Snapshot on disk
memory.bin (CoW source)
vmstate (vCPU + devices)"] %% ─── controller ─────────────────────────────────────────── CLIENT["Client (CLI / Python SDK)"] -- "POST /v1/sandboxes n=100" --> CTL["forkd-controller
REST · auth · audit · /metrics"] CTL -- "restore_many_with(...)" --> SNAP %% ─── children ───────────────────────────────────────────── subgraph CHILDREN["100 Child Firecracker processes (kernel CoW per page)"] direction LR subgraph NS1["netns forkd-child-1"] C1["Child 1
mmap MAP_PRIVATE
cgroup memory.max"] end subgraph NS2["netns forkd-child-2"] C2["Child 2
mmap MAP_PRIVATE
cgroup memory.max"] end subgraph NSN["netns forkd-child-100"] CN["Child 100
mmap MAP_PRIVATE
cgroup memory.max"] end end SNAP -. "shared file
(read-mostly)" .-> C1 SNAP -. "shared file" .-> C2 SNAP -. "shared file" .-> CN %% ─── network ────────────────────────────────────────────── C1 -- "veth" --> BR["host bridge forkd-br0
MASQUERADE"] C2 -- "veth" --> BR CN -- "veth" --> BR BR --> UPLINK(("uplink → internet")) %% styling classDef parent fill:#e8f3ec,stroke:#4c956c,color:#1f2933; classDef snap fill:#fff3df,stroke:#d4a259,color:#1f2933; classDef ctl fill:#e6efff,stroke:#5b7dba,color:#1f2933; classDef child fill:#ffffff,stroke:#52606d,color:#1f2933; classDef net fill:#f1f3f5,stroke:#8d99ae,color:#1f2933; class PARENT,runtime parent; class SNAP snap; class CTL,CLIENT ctl; class NS1,NS2,NSN,C1,C2,CN child; class BR,UPLINK net; ``` See [`DESIGN.md`](./DESIGN.md) for the full design and the open problems the architecture leaves on the table.
## How forkd compares The sandbox-runtime space has a wide spread of designs. The table below summarises positioning of forkd against the most-cited open-source projects. Numbers in quotes are **as advertised by the upstream project** unless they match a row in our benchmark chart above. forkd does not measure other projects on workloads they were not designed for. | Project | Primitive | Cold-start (N=100) | Fork-from-warm | Quotas | Auth / TLS | License | |---|---|---|:---:|---|---|---| | **forkd** | Firecracker + snapshot CoW | **101 ms** | ✓ | cgroup `memory.max` | bearer + rustls | Apache 2.0 | | [CubeSandbox][cs] | RustVMM + KVM microVM | 1.06 s¹ | "coming soon" | <5 MiB / instance | not in OSS | Apache 2.0 | | [Daytona][dy] | OCI workspace | <90 ms² | ✗ | per workspace | API keys (platform) | **AGPL-3.0** | | [OpenSandbox][os] | Docker / K8s + gVisor / Kata / FC | 122 s | ✗ | via runtime | gateway (k8s) | Apache 2.0 | | [E2B][e2b] | Firecracker (in [infra][e2b-infra]) | not in OSS | ✗ | platform | API keys (cloud) | Apache 2.0 | | [BoxLite][bl] | KVM / Hypervisor.framework + OCI | 113 s | ✗ stateful Box | KVM + seccomp | egress policy only | Apache 2.0 | | Modal | proprietary snapshot fork | not public | ✓ | ✓ | ✓ | proprietary | | Firecracker raw | microVM only | 759 ms | manual | n/a | n/a | Apache 2.0 | | Docker (runc) | OCI container | 335 s | ✗ | cgroups | n/a | Apache 2.0 | | gVisor (runsc) | userspace kernel | 289 s | ✗ | cgroups | n/a | Apache 2.0 | ¹ Wall-clock at N=100 concurrent on this **bare-metal** host (`systemd-detect-virt: none`, i7-12700, 20 vCPU, no nested virt). This is the **fast-path** number — `pool_default_format_size_list` was extended to include the template's writable-layer size, so each sandbox reuses a pre-formatted pool entry rather than going through a live `mkfs.ext4 + reflink-copy`. 1056 ± 14 ms over five runs, 100 % success every run, measured with a bench script that pre-warms Python's `ThreadPoolExecutor` to keep client-side lazy-init out of the timing. Host runs cube **v0.2.0**, which carries a ~50 ms latency regression that [PR #234](https://github.com/TencentCloud/CubeSandbox/pull/234) fixes in v0.2.1 — the figure above is the v0.2.0 baseline. An earlier slow-path measurement on the same host (writable-layer size that didn't match the default pool) returned 20.3 s with 77/100 success — that mismatch was on our side and the maintainer corrected it at [#235](https://github.com/TencentCloud/CubeSandbox/issues/235). Cube advertises **<60 ms** single-instance cold-start (P99 200 ms at N=100 concurrent) under the fast-path configuration on a 96 vCPU host — that figure isn't disputed and we did not retest it here. Note also that this row compares **fork-from-warm (forkd)** with **cold-start (every other project)**; they're different operating points by design, not equivalent primitives. See [bench/CUBESANDBOX.md](./bench/CUBESANDBOX.md) for the full methodology, both rows, and the upstream cmdTimeout race we filed PRs [#236](https://github.com/TencentCloud/CubeSandbox/pull/236) / [#237](https://github.com/TencentCloud/CubeSandbox/pull/237) against. ² Daytona's advertised number; we did not measure it (workspace runtime, not a fan-out-comparable shape). [cs]: https://github.com/TencentCloud/CubeSandbox [dy]: https://github.com/daytonaio/daytona [os]: https://github.com/alibaba/OpenSandbox [e2b]: https://github.com/e2b-dev/E2B [e2b-infra]: https://github.com/e2b-dev/infra [bl]: https://github.com/boxlite-ai/boxlite **Where forkd fits.** - **Code interpreters and Jupyter-kernel sandboxes.** Each conversation turn or tool call spawns a fresh kernel; the warmed parent carries the SciPy / ML runtime, so per-request `import numpy` / `import torch` collapses to zero. This is the design point — the workload shape Anthropic / OpenAI / Modal code-interpreter products are all on. - **Evaluation harnesses.** Hundreds of repository checkouts or test rollouts in parallel — SWE-bench-style — without paying Docker cold-start per task. - **Per-user code execution at fan-out scale.** Many short-lived sandboxes sharing one warmed parent, each child KVM-isolated by construction. - **Untrusted-code execution in CI.** `git clone`, `pip install`, `pytest` inside a real Linux VM, not a container namespace. - **Self-hosted alternative to managed sandbox SaaS.** One Linux box with KVM, single-binary daemon, Apache 2.0 — no per-second cloud fees, no vendor lock-in. **Where the others fit better.** CubeSandbox: faster pure cold-start (<60 ms advertised). Daytona: workspace runtimes where each user owns one long-lived sandbox. OpenSandbox: one orchestration API across multiple isolation backends. BoxLite: embeddable, daemon-less, cross-platform (macOS via Hypervisor.framework). Modal: the closed- source managed system with the same primitive. **Where forkd is wrong.** Function-level snapshot runtimes that give up real Linux (single-vCPU, serial I/O only) beat forkd's ~100 ms by an order of magnitude — at the cost of not running real Python servers, `apt install`, or outbound HTTPS.
## Enterprise deployment FAQ Skim answers for platform / procurement teams scoping forkd: **Can we deploy on Kubernetes?** Yes — one forkd-controller Pod hosts N sandbox children; the K8s scheduler runs **once** at Pod creation regardless of fan-out (vs one Pod-per-sandbox in Kata / Firecracker-on-K8s designs). A starter manifest ships at [`packaging/k8s/`](./packaging/k8s/). Requires nodes with `/dev/kvm` + cgroup v2; managed K8s (GKE / EKS / AKS) typically needs a metal SKU or explicit nested-virt to qualify. **How many sandboxes fit in one Pod?** With a 512 MiB warmed Python+numpy parent, rough sizing: - **~1 actively-running agent per vCPU** (compute-bound bottleneck) - **~50 idle-pooled agents per 8 GiB Pod RAM** (process-state bottleneck, not memory) Measured CoW overhead at N=100 is **0.12 MiB / child** on top of the parent ([bench/](./bench/)), so memory rarely caps fan-out — vCPU + process count dominate. Heavier parents (browser, ML inference) hit the ceiling sooner; measure with yours. **How do existing agents connect?** - **REST** — `POST /v1/sandboxes n=100`, language-agnostic, bearer-token auth - **Python SDK** — `from forkd import Sandbox` (drop-in for `from e2b import Sandbox`) - **LangGraph / AutoGen / CrewAI** — through the Python SDK, no special glue - **MCP** — `pip install forkd-mcp` ships an MCP server for Claude Desktop / Claude Code / Cursor / Cline. See [`sdk/mcp/`](./sdk/mcp/) **Production case shapes (from production users + this repo's recipes):** - **AI code interpreter** — one warmed parent (SciPy / torch pre-imported), fork-per-conversation-turn. Recipe: [`e2b-codeinterpreter/`](./recipes/e2b-codeinterpreter/) - **SWE-bench-style parallel evals** — N parallel repo checkouts, each child runs `pytest` isolated. Recipe: [`coding-agent/`](./recipes/coding-agent/) - **Per-user code exec at scale** — shared warmed parent, child KVM-isolated per user - **Untrusted CI** — `git clone + pip install + pytest` inside a real Linux VM, not a container namespace - **Fork-per-test isolated databases** — recipe: [`postgres-fixture/`](./recipes/postgres-fixture/) — ready-to-query postgres at ~10 ms per child instead of ~2 s of fresh `initdb`
## Quick start Requires: x86_64 Linux with KVM, Ubuntu 22.04 or newer. Two steps to a real fork: set up the host (one-time), then `forkd pull` + fork (~30 s). The sections after that show alternative entry points for custom recipes. ### Confirm your host is ready ```bash # 1. CLI + daemon binaries (pre-built tarball, no Rust toolchain needed): curl -sSL https://github.com/deeplethe/forkd/releases/download/v0.3.4/forkd-v0.3.4-x86_64-linux.tar.gz \ | sudo tar -xz -C /usr/local/bin/ # 2. Host bring-up: sudo bash scripts/setup-host.sh # KVM + tap device, one-time sudo bash scripts/netns-setup.sh 3 # per-child network namespaces # 3. Sanity-check: forkd doctor # green-lights everything above # 4. (Optional) language clients for programmatic use: pip install forkd # Python SDK — calls the daemon over HTTP npm install @deeplethe/forkd # TypeScript SDK ``` `forkd doctor` runs 16 checks (KVM, hardware virt, cgroup v2, IP forward, tap, netns, Firecracker binary + version, Docker daemon, snapshot dir + disk space, kernel image, controller reachability, platform, plus `uffd_wp` + `memfd_create` for the v0.4 live-fork path) and emits specific fix hints for each non-pass. Run this first whenever something feels off. ![forkd doctor — 14 checks pass on a configured host](./docs/assets/doctor-14pass.webp) ### Run your first fork (recommended) ```bash # 14.5 MiB pack (Python 3.12 + LangGraph ready) → ~15 s download, sha256 verified. forkd pull deeplethe/langgraph-react # 3 children sharing the parent's memory, ~10 ms per child. sudo -E forkd fork --tag langgraph -n 3 --per-child-netns ``` See [`docs/HUB.md`](./docs/HUB.md) for the registry model + how to publish your own snapshot pack. ### Alternative: build from a Docker image `forkd from-image` wraps Docker pull → ext4 → boot + warmup → pause → register tag into a single verb. Use this when you need a recipe that isn't on the Hub yet: ```bash sudo -E forkd from-image python:3.12-slim \ --tag py-numpy \ --extra python3-numpy # 2-3 min the first time (Docker pull + ext4 + warmup); cached after that. sudo -E forkd fork --tag py-numpy -n 5 --per-child-netns ``` ### Probe your install's latency ```bash forkd bench --tag py-numpy --n 5 # forkd bench against snapshot py-numpy # spawn (n=1) 61 ms sb-...-0027 # exec round-trip 22 ms exit=0 # branch (diff=true) 287 ms pause_ms=234 diff_physical_bytes=393216 # fanout (n=5) 65 ms 13ms/child # cleanup 136 ms # ----- # total 571 ms ``` Run this against any snapshot to see how forkd actually performs on your hardware. Screenshot-friendly output. ### Alternative: build from source (advanced) ```bash # 1. Host setup: KVM, Firecracker, Rust, KSM, hugepages, tap device. sudo bash scripts/setup-host.sh sudo bash scripts/host-tap.sh cargo build --release sudo install -m 0755 target/release/{forkd,forkd-controller} /usr/local/bin/ # 2. Build a warmed rootfs from a Docker image. sudo bash scripts/build-rootfs.sh python:3.12-slim python-rootfs.ext4 1536 python3-numpy # 3. Fetch a kernel. curl -O https://s3.amazonaws.com/spec.ccfc.min/firecracker-ci/v1.10/x86_64/vmlinux-6.1.141 # 4. Run a one-shot sandbox. sudo -E forkd run --image python:3.12-slim --kernel ./vmlinux-6.1.141 \ -- python3 -c "import numpy; print(numpy.zeros(5).sum())" # 0.0 ``` ### Multi-child fork-out ```bash # Provision N per-child network namespaces (one-time per N). sudo bash scripts/netns-setup.sh 100 # Create a tagged parent snapshot. sudo forkd snapshot --tag pyagent \ --kernel ./vmlinux-6.1.141 \ --rootfs ./python-rootfs.ext4 \ --tap forkd-tap0 # Fork 100 children sharing the parent's memory. sudo -E forkd fork --tag pyagent -n 100 --per-child-netns --memory-limit-mib 256 # Talk to one of them. sudo forkd eval --child forkd-child-42 -- "numpy.zeros(100).sum()" ``` ### Python SDK In-guest agent (E2B-compatible): ```python from forkd import Sandbox # drop-in for `from e2b import Sandbox` with Sandbox() as sb: print(sb.commands.run("uname -a").stdout) print(sb.eval("numpy.zeros(5).tolist()")) # reuses warmed PID 1 ``` Controller daemon (lifecycle + branching): ```python from forkd import Controller c = Controller() # http://127.0.0.1:8889 children = c.spawn_sandboxes("pyagent", n=1, per_child_netns=True, live_fork=True) # v0.4: enables `mode="live"` later sb_id = children[0]["id"] # … drive sb_id via in-guest Sandbox, then branch before a risky step. # `mode="live"` collapses source pause to sub-50 ms (vs ~200 ms for diff); # `wait=False` returns after the source resumes (~10 ms), background # copy finishes asynchronously — poll list_snapshots for status="ready". branch = c.branch_sandbox(sb_id, tag="checkpoint-1", mode="live", wait=False) grandchildren = c.spawn_sandboxes(branch["tag"], n=5) # speculative fan-out ``` See [`docs/design/branching.md`](./docs/design/branching.md) for the fork-from-warm tree model and use cases. ### TypeScript SDK For Node.js 18+ agents (LangChain.js, agent-twins, anything JS-side): ```bash npm install @deeplethe/forkd ``` ```ts import { Controller } from '@deeplethe/forkd'; const ctrl = new Controller(); // reads FORKD_URL, FORKD_TOKEN from env const [parent] = await ctrl.spawnSandboxes({ snapshotTag: 'pyagent', n: 1, perChildNetns: true, liveFork: true, // v0.4: enables { mode: 'live' } later }); // ... drive parent via REST/HTTP ... // `mode: 'live'` + `wait: false` = source pauses sub-50 ms and the // caller returns after ~10 ms; the background memory copy completes // asynchronously, snapshot reaches `status: 'ready'` afterward. const branch = await ctrl.branchSandbox(parent.id, { mode: 'live', wait: false }); const kids = await ctrl.spawnSandboxes({ snapshotTag: branch.tag, n: 5, perChildNetns: true }); ``` Surface parity with the Python SDK: `spawnSandboxes` / `branchSandbox` take the same `prewarm` / `liveFork` / `mode` / `wait` / `measure_diff` options. See [`sdk/typescript/`](./sdk/typescript/). ### MCP server For Claude Desktop, Claude Code, Cursor, and any other [MCP](https://modelcontextprotocol.io/)-aware client: ```bash pip install forkd-mcp # then add to claude_desktop_config.json: # "mcpServers": { "forkd": { "command": "forkd-mcp" } } ``` The server exposes `spawn_sandboxes`, `exec_command`, `eval_code`, and five other tools — the agent can fork and drive forkd microVMs directly. See [`sdk/mcp/README.md`](./sdk/mcp/README.md). ### Framework integration recipes (host-side, no rootfs build) Drop-in patterns for the four agent frameworks most often used with forkd. Each is ~150-250 lines of Python with a `--dry-run` mode that exercises the forkd plumbing without an LLM key: | Recipe | Driver | Forkd-specific move | |---|---|---| | [`mcp-agent/`](./recipes/mcp-agent/) | Claude Desktop / Cursor / Cline (MCP) | End-to-end MCP protocol verification | | [`crewai-fanout/`](./recipes/crewai-fanout/) | CrewAI | N agents on N microVMs from one parent — per-agent isolation, ~24ms per child | | [`autogen-branch/`](./recipes/autogen-branch/) | AutoGen | Forkd-backed `CodeExecutor` + mid-conversation BRANCH that fans out N alternatives | | [`openai-swarm/`](./recipes/openai-swarm/) | OpenAI Swarm / Agents SDK | Handoff = BRANCH: agent B inherits agent A's full VM state | ### Pre-built rootfs recipes Skip the rootfs design step — pick one of the [`recipes/`](./recipes/) and run its `build.sh`: | Recipe | When to pick | |---|---| | [`python-numpy/`](./recipes/python-numpy/) | Reproduce the benchmark; lightest Python + numpy | | [`e2b-codeinterpreter/`](./recipes/e2b-codeinterpreter/) | AI code-interpreter agents (E2B SDK-compatible) | | [`jupyter-kernel/`](./recipes/jupyter-kernel/) | Notebook / SciPy stack pre-imported; ~1 ms per kernel | | [`coding-agent/`](./recipes/coding-agent/) | SWE-bench / coding agents with `git` + dev tools | | [`nodejs/`](./recipes/nodejs/) | JS / TS workloads, Playwright fan-out | | [`playwright-browser/`](./recipes/playwright-browser/) | Browser-driving agents (computer-use, web research, UI test gen). Fork warmed Chromium at ~10 ms | | [`agent-workbench/`](./recipes/agent-workbench/) | Kitchen sink — browser + VSCode + Jupyter + MCP | | [`postgres-fixture/`](./recipes/postgres-fixture/) | Fork-per-test isolated postgres; ready-to-query in ~10 ms instead of ~2 s of fresh initdb | ### Snapshot Hub (skip the rootfs build entirely) Once a parent snapshot is built (yours or someone else's), `forkd pack` ships it as a single `.tar.zst` file with a manifest + per-file sha256. Other hosts pull and resume forking in seconds, no Docker round-trip: ```bash # producer host forkd pack --tag pyagent --out pyagent.forkd-snapshot.tar.zst # 23× compression typical — pyagent's 512 MiB memory.bin → ~22 MiB on disk # upload to an R2 / S3 bucket forkd push --tag pyagent "https://" # consumer host forkd pull https://hub.example.com/pyagent.forkd-snapshot.tar.zst forkd fork --tag pyagent -n 100 --per-child-netns # already warm ``` `forkd images` lists local snapshots with their sizes. Integrity is verified on unpack via the manifest's sha256s.
## Operating in daemon mode The controller daemon owns the registry of snapshots and live sandboxes, exposes the REST API, and writes structured audit logs. Recommended for any deployment beyond local development. ```bash sudo install -m 0644 packaging/systemd/forkd-controller.service /etc/systemd/system/ sudo mkdir -p /etc/forkd sudo bash -c 'head -c 32 /dev/urandom | base64 > /etc/forkd/token' sudo chmod 600 /etc/forkd/token sudo systemctl enable --now forkd-controller ``` Then drive it over HTTP: ```bash TOKEN=$(sudo cat /etc/forkd/token) curl -H "Authorization: Bearer $TOKEN" -X POST http://127.0.0.1:8889/v1/sandboxes \ -H 'Content-Type: application/json' \ -d '{"snapshot_tag":"pyagent","n":5,"per_child_netns":true,"memory_limit_mib":256}' # [{"id":"sb-67a1b3-0000","pid":...,...}, ...] curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8889/metrics # forkd_sandboxes_active 5 ``` Full API reference: [`docs/API.md`](./docs/API.md). Operator runbook: [`docs/RUNBOOK.md`](./docs/RUNBOOK.md). Security posture: [`docs/SECURITY.md`](./docs/SECURITY.md).
## Repository layout ``` crates/ forkd-vmm/ Firecracker wrapper: BootConfig, Vm, Snapshot, cgroup forkd-cli/ `forkd` binary (snapshot, fork, run, exec, eval, pack/unpack/pull/push/images, cleanup) forkd-controller/ `forkd-controller` daemon: REST, registry, audit rootfs-init/ forkd-init.sh PID 1 inside the guest; mounts pseudo-fs, launches agent forkd-agent.py TCP server on :8888 inside the guest (ping/exec/eval) sdk/python/ E2B-compatible Python SDK sdk/mcp/ MCP server (`forkd-mcp`) — drive forkd from Claude Desktop / Claude Code / any MCP client scripts/ Host-side helpers (KVM, Firecracker, netns, rootfs) packaging/systemd/ systemd unit for the controller packaging/k8s/ Starter Kubernetes manifest for forkd-controller recipes/ Framework integration recipes (mcp-agent, crewai-fanout, autogen-branch, openai-swarm) + parent-rootfs recipes (python-numpy, e2b-codeinterpreter, jupyter-kernel, coding-agent, nodejs, playwright-browser, agent-workbench, postgres-fixture). See recipes/README.md. bench/ Benchmark harness, chart generators, results docs/ API.md, SECURITY.md, RUNBOOK.md ```
## Status Alpha. The fork-on-write primitive, controller daemon, REST API, auth, audit logging, cgroup memory limits, Prometheus metrics, and Python + TypeScript SDKs are in place and exercised by 25 unit + integration tests in CI. On-disk formats and API shapes may still change before 1.0. Production-readiness items not yet in this release: - Multi-node scheduling (one daemon = one host). - Default-deny egress on per-child netns (today: shared MASQUERADE rule; users wanting an allow-list policy add their own `iptables` rules per netns). - cpu.max, io.max, pids.max quotas beyond the existing `memory.max`. - Third-party security audit. Roadmap (v0.3 candidates + production-readiness gaps): [docs/ROADMAP.md](./docs/ROADMAP.md). Issue-level tracking: [GitHub issues](https://github.com/deeplethe/forkd/issues). Release notes per version: [CHANGELOG.md](./CHANGELOG.md). Security posture and past advisories: [docs/SECURITY.md](./docs/SECURITY.md). **v0.3 phase 1 shipped (v0.3.0 → v0.3.4)** — diff-snapshot BRANCH cuts source-pause window from **29.3 s to 205 ms (143×) on a 4 GiB SSD source** (idle); typical agent workload (30-300 MiB dirty) **6-15×**. Multi-BRANCH on the same sandbox works in v0.3.1 — 5 consecutive diff BRANCHes give a **14× aggregate** downtime reduction vs Full. v0.3.2 closes the SDK surface-parity gap: Python SDK `spawn_sandboxes(prewarm=...)` and `branch_sandbox(diff=..., measure_diff=...)` now match the REST and TypeScript SDK options. **v0.3.4 closes the multi-BRANCH pause anomaly ([#146](https://github.com/deeplethe/forkd/issues/146))** — a 30-line `posix_fallocate` fix bypasses ext4's writeback-throttle compounding, so the 6th consecutive BRANCH stays at ~150 ms instead of climbing to 2.7 s (17.6× faster). Full table and honest caveats in [`bench/pause-window/RESULTS-v0.3.md`](./bench/pause-window/RESULTS-v0.3.md); 75-trial sweep raw data in `bench/pause-window/*-sweep-*.csv`. Opt in by setting `"diff": true` on `POST /v1/sandboxes/:id/branch`, or use the `--diff` flag on `forkd snapshot --from-sandbox`. The win is the source's **downtime**, not the total BRANCH API latency: a background `cp` of the source's memory.bin runs in parallel with the source itself, then the diff window closes, source resumes, the diff is merged onto the pre-copied output. Source TCP connections, kvmclock, and timers see a ~200 ms gap instead of 29 s. Total BRANCH API time is still bandwidth-bound on the cp; the trade-off favors live BRANCH from a long-running agent. v0.3.1's multi-BRANCH support uses the previous BRANCH's output as the chain head — no separate shadow file, zero extra storage. If the user `DELETE`s an intermediate BRANCH snapshot, the chain detects the missing file and falls back to the source-tag base with a logged warning. Full design: [`docs/design/diff-snapshots.md`](./docs/design/diff-snapshots.md). **v0.4 live BRANCH** is wired up end-to-end on the user surface (REST `mode: "live"`, CLI `--live`, Python / TypeScript / MCP SDKs; PRs [#204](https://github.com/deeplethe/forkd/pull/204)–[#207](https://github.com/deeplethe/forkd/pull/207)). The mechanism: spawn with `live_fork: true` to back guest RAM with a memfd shared between Firecracker and the controller, then BRANCH with UFFD_WP capturing dirty pages out-of-band so the source resumes immediately after vCPU dump (sub-50 ms) and the memory copy completes asynchronously. We initially planned to stay on vanilla FC, but `mem_backend.backend_type: "Shared"` with `shared: true` is the one upstream gap we couldn't work around without `mmap MAP_SHARED` — hence the vendored fork at [deeplethe/firecracker:forkd-v0.4-mem-backend-shared-v1.12](https://github.com/deeplethe/firecracker/tree/forkd-v0.4-mem-backend-shared-v1.12). We have an open upstream proposal ([`FIRECRACKER-UPSTREAM-PROPOSAL.md`](./FIRECRACKER-UPSTREAM-PROPOSAL.md)); once that lands, the vendor requirement goes away. Tracking issue: [#101](https://github.com/deeplethe/forkd/issues/101). Bench numbers on a clean parent (`bench/live-fork-pause-window.md`) are in progress — Phase 6 E2E captured pause_ms = 41-48 ms in early tests but the parent had pre-baked guest Oopses contaminating the measurement. > **0.1.4 contains daemon security fixes.** Two HIGH-class > validation gaps in `POST /v1/sandboxes` (path-traversal via > `snapshot_tag`) and `packaging/k8s/` (placeholder bearer token > accepted at startup) affected 0.1.0–0.1.3 inclusive; users on > those versions should upgrade. Full advisories in > [docs/SECURITY.md#past-advisories](./docs/SECURITY.md#past-advisories). > 0.1.3 also fixed a CLI `--tag` path-traversal (affected 0.1.0–0.1.2).
## Contributing Pull requests welcome. Before opening one, please: 1. Open an issue describing what you want to change. APIs are still moving; we'd rather align early than ask you to rewrite the patch. 2. `cargo fmt --all && cargo clippy --all-targets -- -D warnings && cargo test --all` locally. 3. Sign-off your commits (`git commit -s`).
## Star history

## License Apache 2.0. See [LICENSE](./LICENSE) and [NOTICE](./NOTICE).