# Pylon: Function Runtime

How TypeScript functions actually execute in Pylon — and an honest evaluation
of whether the current choice is the right one.

## TL;DR

- **Engine:** Bun, run as a single child process per Pylon instance.
- **Protocol:** NDJSON over stdio. The Rust supervisor sends `call`
  messages, the Bun process replies with `db` / `stream` / `schedule` /
  `runFn` / `return` / `error` messages, all line-delimited JSON.
- **Concurrency model:** Single-threaded — every top-level function call
  serializes on `FnRunner::io_lock`. Bun is one process and the protocol
  isn't multiplexed at this layer.
- **Data access:** TypeScript code does **not** touch SQLite directly. Every
  `ctx.db.insert/get/query/...` is RPC'd back to Rust over stdio, executed
  against the Rust-owned `DataStore`, and the result is returned as a
  `db_result` message. Mutations hold the SQLite write lock for the entire
  duration of the handler.
- **Failure handling:** 30-second per-call timeout. On timeout the supervisor
  kills the Bun process and respawns it. The supervisor also respawns on any
  unexpected exit. A killed/respawned process loses all in-flight calls.
- **Why a subprocess and not embedded V8?** Two reasons: (1) Bun ships with
  TypeScript transformation, Node compat, and a fast cold start out of the
  box — embedding V8 means re-implementing all of that in Rust; (2) keeping
  the JS engine out-of-process means a buggy handler crashes only the Bun
  child, not the whole Pylon binary.

## How a function call flows

```
HTTP POST /api/fn/createOrder
        │
        ▼
┌──────────────────────────────┐
│  router (crates/router)      │  resolves /api/fn/* → fn dispatch
└──────────────────────────────┘
        │
        ▼
┌──────────────────────────────┐
│  FnRunner::call              │  acquires io_lock (serializes top-level calls)
│  (crates/functions/runner)   │  acquires SQLite write lock if mutation
└──────────────────────────────┘
        │
        ▼  CallMessage(call_id, fn_name, fn_type, args, auth)
┌──────────────────────────────┐
│  Bun child process           │  resolves the handler, awaits ctx.* calls
│  (packages/functions/        │
│   runtime.ts)                │
└──────────────────────────────┘
        │
        ▼  DbOpMessage{op: "insert", entity: "Order", data: {...}}
┌──────────────────────────────┐
│  FnRunner::recv loop         │  pulls one message, executes against DataStore,
│  execute_db_op               │  sends back DbResultMessage{call_id, op_id, data}
└──────────────────────────────┘
        │
        ▼  ReturnMessage(call_id, value)
┌──────────────────────────────┐
│  FnRunner::call returns      │  releases write lock, releases io_lock
└──────────────────────────────┘
```

Per call the cost is:
- **One spawn-time handshake** (paid once at boot, ~50-150ms on a hot disk).
- **One `call` message + one `return` message** to the child.
- **N round-trips** for `N` `ctx.db.*` operations the handler makes — each
  is a stdio write + JSON parse on both sides.

## What runs where

| Concern | Lives in | Notes |
|---|---|---|
| HTTP routing | `crates/router` | Platform-agnostic; reused on Workers target. |
| SQLite writes / reads | `crates/runtime` | Single writer, N reader connections. |
| WebSocket fanout | `crates/runtime/ws.rs` | 16 sharded broadcast channels. |
| Function dispatch | `crates/functions/runner.rs` | Owns the Bun process + protocol. |
| Function definitions | `packages/functions/runtime.ts` | Parses `functions/*.ts`, registers handlers. |
| User handler code | `packages/functions/runtime.ts` (inside Bun) | One Bun process loads everything. |
| Auth / policy | `crates/auth`, `crates/policy` | Run in Rust before / around fn dispatch. |

## Concurrency, in detail

There is exactly one Bun child per Pylon instance. Inside that child, calls
*can* run concurrently in JS (they're just promises) — the Rust side just
won't *send* a second `call` message until the first one returns, because
`io_lock` serializes the top-level dispatch.

That decision is deliberate:
- The protocol isn't multiplexed at the message layer (there's no per-call
  inbox demux on the Rust side).
- The single Bun event loop already serializes JS execution, so true
  parallelism inside the child wouldn't buy much.
- Mutations hold the SQLite write lock for the whole handler anyway — the
  bottleneck is writes, not JS execution.

**Reads (queries)** could parallelize across the read pool, but currently
don't — every fn call goes through `io_lock`. This is the single biggest
performance limit at the function layer today; see *Limits* below.

## Limits (read this before benchmarking)

- **Top-level calls serialize.** Even read-only `query` functions queue
  behind the call ahead. ~10K small-handler calls/sec on an M2 Mac;
  workloads dominated by complex handlers will be lower.
- **DB ops are stdio JSON.** Every `ctx.db.get(id)` is a stdio round-trip
  with two JSON encode/decode cycles. Cheap (~10µs) but it adds up — a
  handler that does 100 sequential `ctx.db.get()` calls eats ~1ms in
  protocol overhead before any actual work.
- **No per-call resource limits.** A handler can `while(true)` and the
  whole process burns CPU until the 30s timeout fires, killing every other
  in-flight call.
- **One bug = whole runtime down.** A handler that segfaults Bun, exhausts
  memory, or hits a Bun bug takes the entire Pylon function layer offline
  until the supervisor respawns (~100ms). In a single-tenant deployment
  this is fine. In a managed multi-tenant setup it's a noisy-neighbor
  problem.
- **Hot reload requires full process restart.** Editing one function
  invalidates Bun's module cache for everything; the supervisor restarts
  the whole runtime.
- **No sandboxing.** Handlers can read the filesystem, open sockets, exec
  subprocesses — whatever Bun lets them do. Fine for self-host, dangerous
  for multi-tenant.

## Alternatives we evaluated

This is honest, not a pitch. Each option is real and we considered it.

### 1. Stay with Bun subprocess (current)

- **Pros:** Working today. Bun's TS transform is fast and free. Real Node
  API compat means handlers can `import` from npm without ceremony. Single
  binary + `bun` on PATH is the entire dependency story.
- **Cons:** All of *Limits* above.
- **Verdict:** Right call for self-host and 1-instance-per-tenant
  deployments. Stop here unless you specifically need one of the things
  below.

### 2. Multiple Bun workers (subprocess pool)

- **Pros:** Cheap upgrade path. Spin up `N` Bun processes, route calls by
  `call_id % N`. Read queries parallelize across workers. Fault isolation
  improves — one worker crashing only drops its in-flight calls. ~1 day
  of work, no architectural change.
- **Cons:** Doesn't solve the SQLite single-writer bottleneck (mutations
  still queue). Worker pool needs a supervisor + load balancer that
  handles per-worker handshake state. Memory footprint scales with `N`
  (~80MB per Bun worker baseline).
- **Verdict:** First upgrade we'd take if the function layer becomes a
  bottleneck under read-heavy workloads. Plan to ship behind
  `PYLON_FN_WORKERS=N` with default `1`.

### 3. `deno_core` / `rusty_v8` — embedded V8 isolates

- **Pros:** True per-call isolates (each function call could get its own
  fresh JS context). Shared-memory DB ops (no JSON serialization — pass
  `serde_json::Value` directly across the FFI boundary). Cheap to spawn
  (microseconds vs Bun's ~50ms cold start). Memory limits per isolate.
  V8 snapshots make cold starts faster still.
- **Cons:** Embedding V8 is non-trivial (~2-3 weeks to get to parity with
  the current Bun-based runtime). Binary size grows by ~15MB. No automatic
  Node API compat — handlers can't `import` arbitrary npm packages without
  us writing polyfills or restricting to a curated stdlib. TypeScript
  transform needs SWC (separate dependency, ~5MB).
- **Verdict:** The right choice if and when we ship a managed
  multi-tenant cloud. The isolation story is what makes per-tenant
  sandboxing tractable. Not worth the rebuild for self-host.

### 4. `workerd` — Cloudflare's open-source Workers runtime

- **Pros:** Battle-tested isolate-per-request semantics. V8-based. Async
  by design. If we want first-class Cloudflare Workers parity, building
  on workerd locally means handlers behave identically in dev and on the
  Workers deploy target.
- **Cons:** Heavy (~120MB binary). Designed for HTTP-shaped workloads, not
  RPC-shaped — embedding it for our `ctx.db` round-trip pattern would be
  fighting the grain. Less flexible than `deno_core` for non-Workers
  targets.
- **Verdict:** Compelling specifically for the Workers target. If we end
  up shipping `pylon deploy --target workers` as a serious option, we
  should evaluate using workerd in dev too so handler behavior stays
  consistent. Not the right choice for the general runtime.

### 5. `deno_runtime` — full Deno as an embedded library

- **Pros:** Most complete embedded option. Sandboxed by default
  (permissions). TypeScript native. Top-tier Node API compat.
- **Cons:** ~50MB binary impact. Performance overhead vs Bun on
  short-handler workloads. Cold start on first call is heavier than
  necessary. Adds a large dependency surface.
- **Verdict:** No clear win over either Bun-subprocess (simpler) or
  `deno_core` (lighter). Skip.

### 6. `wasmtime` + JS-on-Wasm

- **Pros:** Wasm sandboxing is best-in-class. Could run handlers from
  untrusted sources safely.
- **Cons:** JS-on-Wasm is 5-20x slower than V8. Not viable for the
  perf characteristics Pylon needs.
- **Verdict:** Not a serious option today.

### 7. QuickJS / Boa — pure-Rust embedded JS

- **Pros:** Tiny binary impact. Fully in-process.
- **Cons:** Both are 10-50x slower than V8. No real Node compat. TypeScript
  needs an external transformer.
- **Verdict:** Reasonable for an "edge function" lite mode where size
  matters more than speed. Not a primary runtime.

## Recommendation

For the next 12 months: **stay on Bun subprocess.** It's working, it's
fast, and the limits don't bite at the workloads Pylon's positioned for
(self-host + per-tenant deploys + the cloud free tier).

Two things to add when the bottleneck becomes real:

1. **Worker pool** behind a flag. Defaults to 1 for backward-compat.
   Apps with read-heavy workloads opt in. ~1 day of work.

2. **`deno_core` runtime as a second option** when (and only when) we
   ship a true multi-tenant managed cloud where untrusted handler code
   needs sandboxing. ~3 weeks of work; ship it as
   `PYLON_FN_RUNTIME=isolates`, keep Bun as the default.

The wrong move would be to swap engines speculatively — every option above
has real costs and the demo workloads don't need any of them. Pick the
upgrade when the data justifies it.