# Branch-and-fan-out demo — real results The forkd "fork a thinking agent" demo, end-to-end on real hardware. The latest clean run is in [`results-2026-05-18/`](./results-2026-05-18/); the earlier [`results-2026-05-17/`](./results-2026-05-17/) is the same mechanism with a less-capable model (Qwen2.5-7B) — kept for comparison so you can see what changes when you swap models. ## TL;DR for a tweet thread > 🍴 forkd just forked a running ReAct agent: **163 ms** pause on tmpfs-backed snapshot storage, **4 s** on the SATA SSD this demo recorded against. Same code, only the disk differs. > > A source agent had spent 2 steps gathering weather + place > data for a Kyoto + Osaka trip. We BRANCHed it and spawned 3 > grandchildren from the same cognitive state. Each got a > different steering hint — "be thorough", "be minimal", > "optimize for cost". > > All 3 produced **different** itineraries, inheriting the same > tool results, same conversation history, same Python heap. > The only thing that diverged was the next thought. > > Headline divergence: the parent (no hint) put Nishiki Market > on Day 1. All three hinted children dropped it and substituted > Arashiyama Bamboo Grove — a free outdoor activity. The > cost-focused child even annotated dining stops with "may be > pricey" warnings. > > This is the speculative-parallel-exploration primitive Modal > Sandboxes keeps closed-source. Now on KVM, open-source. ↓ ## The setup that produced the run - Host: yangdongxu-desktop, Ubuntu 24.04, Linux 6.14, 20 vCPU, 30 GiB RAM - forkd built from `demo/summary-show-in-flight` (see PR #66) - Source rootfs: `python:3.12-slim` + `requests`, ~206 MiB - LLM: **DeepSeek-V3** via SiliconFlow's OpenAI-compatible API - Task: "Plan a 2-day trip to Kyoto and Osaka. Use the tools to check weather and find places." ## Headline numbers | Metric | Value | |---|---| | Daemon-measured pause window | **4007 ms** (SATA SSD storage; see [RESULTS-v0.2.md](../../bench/pause-window/RESULTS-v0.2.md) for 163 ms on tmpfs) | | Memory image size | 513 MiB | | Grandchildren spawned | 3 | | Steering hints applied | 3 (one per child) | | Network retries this run | **0** (clean) | | Per-agent token cost | 1395–1546 | | Snapshot tag (auditable) | `langgraph-fork-1779037370` | ## The divergence at a glance | Agent | Hint | Day-1 afternoon (Kyoto) | Notable framing | |---|---|---|---| | **parent** | _(none — control)_ | **Nishiki Market** ($$) | baseline; no special framing | | **thorough** | "cultural depth, slow" | **Arashiyama Bamboo** (free) | replaced shopping w/ cultural-nature | | **minimal** | "daylight outside, no shopping" | **Arashiyama Bamboo** (free) | replaced shopping w/ outdoor | | **cost** | "avoid \$\$\$, prefer free or \$" | **Arashiyama Bamboo** (free) | + warning labels on $$ stops, explicit cost-optimization footer | Worth highlighting: the model wasn't told to "drop Nishiki Market" or "add Arashiyama". It chose to re-rank based on the hint. All three hinted children **independently agreed** on the substitution. Cost went further and added meta-commentary like "though dining options may be pricey" and an explicit "Cost Optimization" footer that the others didn't. ## Full itineraries See [`results-2026-05-18/summary.md`](./results-2026-05-18/summary.md) for the auto-generated render of all four agents' final answers. Raw per-event JSONL is in the same directory. ## What this validates 1. **The BRANCH primitive works on a real agent workload.** 4 s pause, 0 errors, all 4 agents completed cleanly with their respective post-branch reasoning. 2. **In-guest agents are pause-blind.** No socket errors, no timeouts at wake-up, no retries needed in this run. Same pattern we measured synthetically in [`bench/pause-window/RESULTS-v0.2.md`](../../bench/pause-window/RESULTS-v0.2.md), now confirmed on a real LLM agent. 3. **Hint-based perturbation post-branch is real.** Each child's NEXT LLM call sees a different system message; the inherited conversation history + tool results stay the same. This is the cheapest faithful model of speculative parallel exploration on a stateful agent. ## What the earlier run 9 shows (and what we learned from it) The first end-to-end run (committed in [`results-2026-05-17/`](./results-2026-05-17/)) used Qwen2.5-7B-Instruct. The mechanism worked but the model: - Had network retries on first call after restore (~90 s wall before reaching branch) - Occasionally emitted tool-call arguments as freeform content - Kept calling search_places past the point where it should have produced a final answer The hint side-channel STILL worked — the children's in-flight `think` events showed clear divergence (e.g. minimal's "Nishiki Market - food, $" vs the original "food, $$" — model self-downgraded the price). But the answers came out messy. The fix landed in PR #66: 1. Default model bumped to DeepSeek-V3 (much better tool discipline) 2. System prompt explicit about "use each tool at most twice, then stop calling tools" 3. `branch_after_step=2` (DeepSeek converges in 2 steps; the prior `=3` was unreachable) 4. `summarize.py` falls back to last `think` when no `answer` exists, so future flaky runs still tell a story run-12 (2026-05-18) reflects all of those. Same mechanism, cleaner output. ## Reproducing ```bash export FORKD_URL=http://127.0.0.1:8889 export FORKD_TOKEN=$(cat /etc/forkd/token) export SILICONFLOW_API_KEY=... bash recipes/langgraph-react/demo.sh ``` `recipes/langgraph-react/README.md` has the detailed recipe + design notes.