# turbo-surf — Rust port Native-speed core of turbo-surf. Premise: turbo-dom ships as a pure Rust crate, so the browserless crawler is Rust too. The only piece that *must* stay JS is the `@playwright/test` drop-in façade (agents `import` it inside their own Node process); it's a thin shim over the napi addon — all the muscle is Rust. turbo-dom is consumed from **crates.io** as the `turbo-dom-parser` crate (`{ package = "turbo-dom", version = "0.3.1" }`) — its pure-Rust `rtdom::Tree` (handle-based `u32` DOM, no napi/wasm boundary). ## Crates | Crate | Scope | |-------|-------| | `turbo-surf-core` | Tier 1 — net / cookies / robots / url / frontier / crawl scheduling / cache / measure | | `turbo-surf-page` | Tier 2 — `TurboNavigator` (fetch+parse over `rtdom::Tree`) | | `turbo-surf-view` | extraction & views — extract / visible / aria / ax / locator / markdown / text / schema / query / xpath / hydration / dom-ops / actions | | `turbo-surf-render` | Tier 3 — `deno_core` isolate + the rtdom↔V8 DOM binding (JS execution / hydration) | | `turbo-surf-transform` | swc TS/JSX → classic JS for the render tier | | `turbo-surf-napi` | the `.node` addon — in-process bridge from the core to Node (+ stateful `Session`) | | `turbo-surf-mcp` | stdio JSON-RPC MCP server — native binary, full 60-tool surface (parity with the JS server) over a stateful session | `cargo test` runs the full offline suite across the workspace (200+ tests); `cargo clippy --workspace --all-targets` and `cargo fmt` are clean. ## Tier 1 — `turbo-surf-core` Direct ports of the JS modules, same behavior and edge cases: - `url` — `resolve` / `canonicalize` (tracking-param strip, query sort, frag drop) / `is_http_url`. Frontier dedupe basis. - `frontier` — canonical-dedup URL queue with depth + ring cursor. - `robots` — robots.txt parse, per-agent grouping, longest-match Allow/Disallow with `*`/`$` wildcards (hand-rolled glob, no regex dep), TTL cache; injected `RobotsFetcher` → offline-testable. - `cookies` — RFC 6265 subset `CookieJar` (domain/path scope, Secure, HttpOnly, Expires/Max-Age, SameSite; `storageState` round-trip). Times are `f64` ms (session cookie = `f64::INFINITY`). Self-contained HTTP-date parser (no chrono). - `net` — `fetch_html` over reqwest (gzip/br, rustls, HTTP/2). Charset sniff, byte cap, content-type gate, CookieJar round-trip, manual per-hop redirect follow, and a **shared pooled `build_client()`** passed via `FetchOptions::client` so connections + TLS sessions are reused across pages. - `crawl` — frontier-driven scheduling: global + per-host concurrency, per-host politeness, retry/backoff, depth/page caps, robots gate. Fetch+parse seam is the `Navigator` trait (tier-2 `Page` implements it). - `cache` / `measure` — `ResponseCache` (304/storageState) and crawl summaries. ## Tier 2 — `turbo-surf-page` `TurboNavigator` implements `crawl::Navigator` — fetches via `net::fetch_html`, parses with `Tree::parse`, projects a `Nav` (title + absolute-resolved ``s). The tier-1 `crawl::crawl(opts, nav)` driver runs unchanged over it. `parse_nav` is pure (no network) and offline-tested end to end against the scheduler. ## views — `turbo-surf-view` The extraction/interaction surface over the same `rtdom::Tree`: `extract`, `visible` (cascade), `aria`/`ax`/`aria_snapshot`, `locator` (by_role/text/label), `markdown`, `text`, `schema`, `query`, `xpath`, `hydration`, `dom_ops` (checked/editable/css/ select), `actions` (fill/submit/click-intent). All pure + offline-tested; a differential `tests/parity.rs` checks them against a committed JS golden. ## Tier 3 — `turbo-surf-render` The JS-execution path, end to end over a **real DOM**. The page's own scripts run on a `deno_core` V8 isolate against a genuine `document`, mutate the turbo-dom tree in place, and the render returns the hydrated HTML (the Lane B contract). - Boots a **`deno_core` V8 isolate** (true isolate — host heap unreachable from guest; a runaway-execution **budget** watchdog terminates a wedged script). - **The DOM is a native `rtdom`↔V8 binding** — `browser_env`, vendored from [turbo-test](../../turbo-test) (its battle-tested binding that runs React + Testing Library). A JS DOM node is a V8 object holding a turbo-dom handle in an internal field; methods/accessors are native callbacks straight onto `Tree`. No JS-DOM-in-JS-VM indirection. See [`src/browser_env.rs`](crates/turbo-surf-render/src/browser_env.rs) for the vendor/sync story (verbatim copy + a one-command re-vendor script; the turbo-surf-specific deltas — `install_html`/`document_html` and the env bootstrap — live separately and are never patched into the upstream file). - The runtime in [`src/runtime.rs`](crates/turbo-surf-render/src/runtime.rs) grafts that binding onto deno_core's context, then layers the non-DOM `window` env a real page needs (`navigator`, `location`, virtual timers + real microtasks, `fetch`/XHR **over the tier-1 net stack**, `URL`, `crypto`(+`subtle`), `MessageChannel`, `ReadableStream`, `AbortController`, `BroadcastChannel`, `WebSocket`, `document.cookie` bridged to the shared `CookieJar`, observers, history) + the hydration pump (executes injected `