# Experiments Running notes for nvpn/FIPS performance and reliability work. Keep entries short enough to compare later: date, build/commit, setup, result, and decision. ## 2026-05-14 - FSP rekey continuity fix Setup: - Pi, Windows VM, Linux VM, MacBook, and mini daemons were already on the stale-FMP-session drain fixes. Pi/Windows 90 second continuity runs still showed occasional loss near the default FSP rekey interval. Result: - FIPS 0.3.6 adds retained/resendable final `SessionMsg3` handling for FSP rekey, mirroring the initial XK establishment repair path. The FIPS unit suite passed with 1152 tests, plus the targeted rekey resend and decrypt-failure recovery tests. Decision: - Bump nvpn to `fips-endpoint` 0.3.6 and redeploy the daemons before treating any remaining Pi/Windows or Mac/mini continuity loss as a different bug. ## 2026-05-14 - Windows/Linux direct-LAN parity check Setup: - Ubuntu-dev and win11-dev on Vader's direct LAN, not routed through the nvpn mesh: Ubuntu `192.168.122.103:55140`, Windows `192.168.122.147:55141`. `tcpdump` confirmed FIPS UDP directly between those endpoints. - Both nvpn peers used the LAN MTU profile with tunnel MTU 1290. The test tunnel routes were Ubuntu `10.44.99.179/32` and Windows `10.44.132.184/32`. - Baselines on the same VMs were Windows WireGuardNT against Ubuntu BoringTun and Ubuntu kernel WireGuard. Results: - Baseline BoringTun/WireGuardNT: Windows to Linux about 572-649 Mbit/s, Linux to Windows about 1.26-1.42 Gbit/s. - Baseline kernel WireGuard/WireGuardNT: Windows to Linux about 573-603 Mbit/s, Linux to Windows about 728-765 Mbit/s. - Initial nvpn direct-LAN samples were about 193/237/203/330 Mbit/s Windows to Linux for 1/2/4/8 streams, and about 237/290/223/222 Mbit/s Linux to Windows. - A Windows mesh-receive burst drain that batches ready FIPS packets into one Wintun write improved Linux to Windows to about 357/410/479/350 Mbit/s for 1/2/4/8 streams. Windows to Linux stayed in the same broad band at about 276/283/267/279 Mbit/s. - A Windows send-side batch/drain experiment hurt Windows to Linux throughput and was reverted. - Profiling with `FIPS_PERF=1 NVPN_PIPELINE_TRACE=1` was too expensive for headline throughput, but showed the remaining Windows sender gap clearly: Windows uses the FIPS core Tokio per-datagram UDP send path, with millisecond endpoint command waits under load, while Linux uses the raw batched/GSO sender path. - Follow-up apples-to-apples userspace baseline, after adding Windows `wg-upstream-test --scoped-host` at `77c133d`: `tcpdump` during nvpn ping confirmed direct FIPS UDP on `enp1s0` between `192.168.122.103:36344` and `192.168.122.147:58013`. The baseline used Ubuntu `boringtun-cli 0.7.1` on `btbench` and Windows nvpn's BoringTun/Wintun WG upstream runtime, with tunnel IPs `10.88.0.1/32` and `10.88.0.2/32`. - Current nvpn FIPS direct-LAN samples were about 260/259/261/259 Mbit/s Windows to Linux for 1/2/4/8 streams, and about 351/349/356/356 Mbit/s Linux to Windows. - Userspace BoringTun/BoringTun samples on the same VMs were about 305/295/298/309 Mbit/s Windows to Linux for 1/2/4/8 streams, and about 347/419/492/517 Mbit/s Linux to Windows. Decision: - Keep the Windows receive-side Wintun write batching because it materially improves the Linux-to-Windows direction without changing protocol format. - Do not keep send-side batching in nvpn. Kernel WireGuard/WireGuardNT is a useful ceiling but not the right baseline for nvpn FIPS. Against userspace BoringTun/BoringTun, current nvpn is close enough Windows-to-Linux (roughly 84-88% of BoringTun) that there is no obvious simple nvpn-side fix. Linux-to-Windows still falls behind BoringTun as stream count rises, so the next parity work should focus on FIPS core / Windows receive-path scaling rather than another small CLI-side batching tweak. ## 2026-05-13 - `nvpn update` CLI e2e release gate Setup: - Added a local updater fixture that writes a file-backed release manifest and tarball containing a fake `nvpn` executable for the current platform target. - The script runs `nvpn update --check`, then `nvpn update --force --path` into an artifact directory so the real daemon/CLI binary is never overwritten. Result: - `./scripts/e2e-update-cli.sh` passed locally. - `NVPN_RELEASE_GATE_DOCKER_E2E=0 ./scripts/release-gate.sh` passed with the new updater e2e included after fmt, clippy, and workspace tests. Decision: - Keep the CLI updater e2e in the release gate as a cheap guard. The existing desktop updater e2e scripts still cover macOS/Linux/Windows GUI update flows separately, but they do not exercise the `nvpn update` CLI command. ## 2026-05-13 - routed FIPS fallback for stale pending sessions Setup: - Private FIPS mesh with reply-learned routing enabled. - Regression modeled a destination that still has a sendable direct peer route while its end-to-end FSP session is stuck in `Initiating`. - App endpoint bytes and TUN packets were both queued behind that stale session. - Live MacBook could see VM peers directly, while mini could not complete direct NAT traversal to those same VM peers. - Later live debugging showed mini was receiving signed lookup responses for the missing VM peers, so discovery/routing was no longer the blocker. The remaining symptom was repeated encrypted traffic from those peers while the mini-side FSP session was still waiting for handshake completion. Result: - Before FIPS `c1c71eb`, queued traffic returned without starting discovery, so a peer could remain `fips link pending` and not fall back through other mesh neighbors. - FIPS `c1c71eb` now kicks reply-learned discovery for queued endpoint and TUN traffic whenever the existing session is not established. - Before FIPS `e6662e7`, a transit node that had the target as a direct peer still did not hand the lookup to that target if it was not a tree neighbor. - FIPS `e6662e7` forwards lookup requests to direct non-tree targets, allowing asymmetric paths such as mini -> MacBook -> VM. - After that fix, mini could route at least one VM peer through FIPS, but two VM peers still remained `fips link pending`. The remaining gap was the origin/transit fallback still being limited to tree peers. - FIPS `83fbf03` keeps tree/bloom routing as the primary lookup path, then lets reply-learned fallback ask any authenticated sendable peer when no tree/bloom candidate exists. Transit fallback excludes the previous hop and originator so the request ID still distinguishes originator from relay. - Added unit coverage for both endpoint-data and TUN-packet branches, including the stale direct-route case, direct non-tree target forwarding, origin fallback to a non-tree sendable peer, and transit fallback without origin echo. - FIPS `811eef3` keeps the final XK `SessionMsg3` around briefly after the initiator marks a session established, resends it on the normal handshake resend timer, and resends it again if a duplicate `SessionAck` arrives. A responder that sees early encrypted data while still waiting for msg3 also resends its `SessionAck`. The regression test drops the first msg3 and proves the responder establishes from the replacement. The final FIPS commit also includes clippy cleanup for the hot-path refactor, a CI fixture repair for dropped one-shot synthetic UDP handshakes, nextest serialization for synthetic localhost UDP node tests, and CI harness fixes for STUN-fault and DNS resolver tests. - Strengthened `scripts/e2e-fips-routed-udp-docker.sh` to force the safe MTU profile, assert `utun100` MTU 1150, and move 1000-byte no-fragment ping payloads plus UDP payloads both ways while direct Alice/Bob underlay UDP is blocked. - Added `scripts/e2e-fips-nat-safe-mtu-docker.sh`, which places Bob behind a Docker NAT, forces the safe MTU profile, verifies both peers show online via FIPS, and moves safe-MTU ping plus UDP payloads in both directions. - `./scripts/e2e-fips-routed-udp-docker.sh` passed after the stricter safe-MTU and bidirectional data checks. - `./scripts/e2e-fips-nat-safe-mtu-docker.sh` passed with Bob using `198.51.100.10 via 172.30.242.2` for the underlay route to Alice, Alice observing Bob through the NAT public address, and 976-byte UDP payload files received on both sides. - `NVPN_RELEASE_GATE_DOCKER_E2E=0 ./scripts/release-gate.sh` passed against FIPS `811eef3`, including fmt, clippy, workspace tests, and the `nvpn update` CLI e2e. Decision: - Keep the explicit routed-FIPS and NAT safe-MTU Docker e2e tests in the nvpn release gate, and keep this FIPS unit coverage as the lower-level guard against stale direct/NAT session state and half-established XK sessions blocking mesh fallback. ## 2026-05-12 - macOS Wi-Fi to Ethernet, safe MTU Setup: - Local MacBook on Wi-Fi to Mac mini on Ethernet. - FIPS core at `c7fb565` (`Revert "perf: parallelize fmp encryption with ordered send"`). - Private mesh safe defaults: underlay UDP MTU 1280, tunnel MTU 1150. - Both daemons built with local FIPS patches and ad-hoc signed. Results: - nvpn MacBook to mini UDP at 400 Mbit/s target: about 240 Mbit/s with near-zero loss. - nvpn MacBook to mini TCP: about 200-223 Mbit/s depending on run. - nvpn mini to MacBook TCP: about 345-356 Mbit/s. - Tailscale MacBook to mini TCP at same time: about 292 Mbit/s. - Tailscale MacBook to mini UDP at 400 Mbit/s target: reached about 400 Mbit/s but with about 3.6% loss. Decision: - Safe MTU is reliable but leaves LAN throughput on the table. - Add an explicit LAN MTU/profile override (`mesh_mtu_profile = "lan"` or `NVPN_MESH_MTU_PROFILE=lan`) for controlled tests instead of making LAN-sized frames the global default. ## 2026-05-12 - fresh macOS Wi-Fi to Ethernet comparison Setup: - Local MacBook on Wi-Fi to Mac mini on Ethernet. - Running daemons still at the stable safe-MTU build because launchd restart requires elevated `launchctl kickstart`. - Direct LAN, Tailscale, and nvpn tested back-to-back with the same iperf3 server. Results: - Direct LAN TCP: MacBook to mini about 495 Mbit/s; mini to MacBook about 318 Mbit/s. - Direct LAN UDP at 400 Mbit/s target: about 400 Mbit/s both directions with about 0.13% loss. - Tailscale TCP: MacBook to mini about 299 Mbit/s; mini to MacBook about 332 Mbit/s. - Tailscale UDP at 400 Mbit/s target: about 400 Mbit/s both directions; loss about 0.05% forward and 3.5% reverse. - nvpn safe-MTU TCP: MacBook to mini about 188 Mbit/s; mini to MacBook about 323 Mbit/s. - nvpn safe-MTU UDP at 400 Mbit/s target: MacBook to mini about 203 Mbit/s with about 5.1% loss; mini to MacBook about 393 Mbit/s with about 23.7% loss. Observations: - Direct LAN proves the path can carry 400 Mbit/s UDP and much higher forward TCP than nvpn currently achieves. - Daemon logs show macOS `ENOBUFS` send backpressure during the UDP runs. - The mini also had unrelated session AEAD recovery churn with another peer during the same window, so reliability work in that path may contaminate throughput samples until it is fixed. Decision: - The forward nvpn TCP gap is still worth fixing. - Next live test is the explicit LAN MTU profile on both Macs after a privileged daemon restart. If it does not close most of the forward gap, the next target is macOS sender pacing/queueing rather than MTU. ## 2026-05-12 - explicit LAN MTU profile deployed Setup: - Local MacBook on Wi-Fi to Mac mini on Ethernet. - nostr-vpn at `2509c9b` (`perf: add private mesh mtu test profile`). - FIPS core at `c7fb565`. - Both daemons built with local FIPS patches, ad-hoc signed, restarted through launchd, and configured with `mesh_mtu_profile = "lan"`. - Live private mesh interface MTU was 1290 on both Macs. 15 second results: - Direct LAN TCP: MacBook to mini about 499 Mbit/s; mini to MacBook about 437 Mbit/s. - Direct LAN UDP at 400 Mbit/s target: MacBook to mini about 397 Mbit/s with 0% loss; mini to MacBook about 400 Mbit/s with about 1.3% loss. - Tailscale TCP: MacBook to mini about 228 Mbit/s; mini to MacBook about 314 Mbit/s, both with thousands of retransmits. - Tailscale UDP at 400 Mbit/s target: MacBook to mini about 400 Mbit/s with about 2.3% loss; mini to MacBook about 400 Mbit/s with about 0.04% loss. - nvpn LAN-MTU TCP: MacBook to mini about 228 Mbit/s; mini to MacBook about 416 Mbit/s. - nvpn LAN-MTU UDP at 400 Mbit/s target: MacBook to mini about 265 Mbit/s with near-zero loss; mini to MacBook about 400 Mbit/s with 0% loss. 90 second nvpn stability results: - TCP MacBook to mini: about 234 Mbit/s, 2697 retransmits. - TCP mini to MacBook: about 357 Mbit/s, 2626 retransmits. - UDP MacBook to mini at 275 Mbit/s target: about 263 Mbit/s with about 0.055% loss. - UDP mini to MacBook at 400 Mbit/s target: about 400 Mbit/s with about 1.0% loss. Observations: - The LAN MTU profile makes nvpn competitive with or faster than Tailscale for TCP on this sample and improves reverse UDP to line rate. - Forward UDP from the Wi-Fi MacBook remains capped around 260-265 Mbit/s before the daemon hits macOS UDP send pressure. Earlier logs showed `No buffer space available` and `EncryptWorker channel full` on this path. - The remaining gap is directional and sender-side: mini to MacBook is much faster over the same private mesh protocol and same tunnel MTU. Decision: - Keep 1280/1150 as the safe default and use the explicit LAN profile for controlled LAN tests. - Continue investigating macOS sender pacing/queueing and packet-rate reduction for the MacBook-to-mini direction. Avoid retry/drop variants because previous tests increased loss and hurt TCP. Follow-up 1452/1322 explicit override: - Setting `mesh_underlay_udp_mtu = 1452` and `mesh_tunnel_mtu = 1322` kept both utuns up and improved MacBook-to-mini UDP at 400 Mbit/s target slightly, to about 273 Mbit/s with near-zero loss. - MacBook-to-mini TCP regressed to about 194 Mbit/s in the same sample. - Decision: do not promote 1452/1322 to the `lan` profile yet. It may be a useful one-off UDP test override, but the 1420/1290 profile is the better balanced default for now. ## 2026-05-12 - Darwin `sendmsg_x` batch send trial Setup: - FIPS experiment `985e1ab` used Darwin's private `sendmsg_x` syscall for connected UDP sockets, falling back to per-datagram `send(2)` if refused. - Both Macs were rebuilt and restarted on the LAN MTU profile. Results: - The focused loopback unit test passed, proving the syscall is available on this macOS build. - Short samples were mixed: one MacBook-to-mini TCP run reached about 247 Mbit/s, but UDP at 400 Mbit/s target fell to about 250 Mbit/s. - A 90 second sample regressed every leg: MacBook-to-mini TCP about 176 Mbit/s, mini-to-MacBook TCP about 122 Mbit/s, MacBook-to-mini UDP at 275 Mbit/s target about 228 Mbit/s, and mini-to-MacBook UDP at 400 Mbit/s target about 289 Mbit/s. - An adaptive fallback on partial/`ENOBUFS` batch sends did not recover the loss: forward TCP stayed near baseline and UDP at 275 Mbit/s lost packets. Decision: - Reverted in FIPS `e4edff2`. `sendmsg_x` reduces syscall count, but sustained Wi-Fi behavior is worse, likely because the kernel sees burstier UDP writes. - Keep the conservative per-datagram macOS send loop until there is a pacing model that improves long runs, not just short TCP bursts. ## 2026-05-12 - mini disk-full interference Setup: - After repeated build/restart/iperf cycles, mini returned `iperf3` errors like `unable to create a new stream: No space left on device`. Findings: - The mini data volume had only about 116 MiB free. - Generated Rust build artifacts were the main safe cleanup target, including about 20 GiB in `/Users/sirius/src/fips/target` and 6.4 GiB in `/Users/sirius/src/nostr-vpn/target`. - Removing generated `target` directories restored about 32 GiB free. Post-clean sanity: - Direct LAN and Tailscale were healthy before cleanup, but iperf server creation and nvpn samples were unreliable while disk was full. - After cleanup and reverting `sendmsg_x`, nvpn 10 second samples returned to expected ranges: MacBook-to-mini TCP about 203 Mbit/s, mini-to-MacBook TCP about 322 Mbit/s, MacBook-to-mini UDP at 275 Mbit/s target reached 275 Mbit/s with 0% loss, and mini-to-MacBook UDP at 400 Mbit/s target reached 400 Mbit/s with about 1.1% loss. Decision: - Treat very low nvpn samples while the mini is disk-full as contaminated. - Keep at least tens of GiB free on remote bench hosts before interpreting daemon or iperf behavior. ## 2026-05-12 - LAN-sized MTU defaults trial Setup: - Same macOS Wi-Fi to Ethernet path. - Private mesh defaults temporarily raised to underlay UDP MTU 1420 and tunnel MTU 1290. Results: - MacBook to mini UDP at 400 Mbit/s target improved to roughly 272-292 Mbit/s. - Mini to MacBook UDP at 400 Mbit/s target could reach roughly 400 Mbit/s with low loss. - MacBook to mini TCP improved to roughly 232 Mbit/s. - Mini to MacBook TCP was roughly 315-392 Mbit/s. Decision: - Useful on clean LAN paths, but too optimistic for NAT traversal and nested tunnels. Restore 1280/1150 safe defaults until blackhole-safe probing exists. ## 2026-05-12 - ordered parallel FMP sender Setup: - FIPS experiment `85858a2` added a WireGuard-like parallel encrypt stage and per-destination ordered sender. - Unit tests and `cargo test -p fips-core --lib` passed. Results: - Live MacBook to mini UDP at 400 Mbit/s target regressed to about 238.5 Mbit/s. - Live MacBook to mini TCP regressed to about 182 Mbit/s. Decision: - Reverted in FIPS `c7fb565`. The extra per-packet ordering/channel overhead cost more than the parallel encryption helped on this path. ## 2026-05-12 - macOS send backpressure variants Setup: - macOS connected UDP send path under Wi-Fi pressure. Results: - Yield/retry on `WouldBlock`, `ENOBUFS`, and `ENOMEM` gives conservative throughput with near-zero loss. - A bounded retry/drop variant caused high UDP loss and worse TCP behavior. - A fixed 10 us sleep reduced throughput further. Decision: - Keep retry/yield behavior for reliability. Throughput work should focus on reducing per-packet work and using an explicit larger MTU on paths that can carry it, not dropping on macOS socket pressure. ## 2026-05-12 - nvpn mesh packet copies and utun write pressure Setup: - Local MacBook on Wi-Fi to Mac mini on Ethernet. - LAN MTU profile on both Macs: underlay UDP MTU 1420, tunnel MTU 1290. - FIPS core at `e4edff2` (Darwin `sendmsg_x` reverted). - Both daemons built with local FIPS patches, ad-hoc signed, and restarted through launchd. Results: - `266595b` moved outbound FIPS mesh packets instead of cloning them. Best 15s samples after deploying to both Macs: MacBook-to-mini TCP about 255 Mbit/s, mini-to-MacBook TCP about 465 Mbit/s, MacBook-to-mini UDP at 275 Mbit/s target 0% loss, MacBook-to-mini UDP at 400 Mbit/s target about 283 Mbit/s with near zero loss, and mini-to-MacBook UDP at 400 Mbit/s target about 400 Mbit/s with near-zero loss. - A direct TUN-read-to-FIPS-send experiment removed the channel between TUN read and mesh send, but regressed MacBook-to-mini TCP to about 110 Mbit/s and added UDP loss. Decision: keep the channel decoupling. - Long reverse UDP exposed silent utun write drops. Before the write fix, a 90s mini-to-MacBook UDP400 run lost about 19%; a 30s reproduction lost about 50%. Direct LAN and Tailscale over the same path sustained UDP400 with low loss (direct 90s reverse about 0.056% loss; Tailscale 90s reverse about 0.14%). - `c00edda` adds raw TUN writes that wait for fd writability and retry `WouldBlock`, instead of using boringtun's `write4/write6` helpers that return `0` for every write error. Final targeted samples: 30s mini-to-MacBook UDP400 about 400 Mbit/s with about 1.1% loss; 30s MacBook-to-mini UDP275 about 275 Mbit/s with 0% loss. - Full 90s no-drain/write-backpressure sample: MacBook-to-mini TCP about 250 Mbit/s, mini-to-MacBook TCP about 401 Mbit/s, MacBook-to-mini UDP400 about 259 Mbit/s with about 0.086% loss, and mini-to-MacBook UDP400 about 400 Mbit/s with variable loss (observed 0.004% to 7.6% across repeated runs). - A receive `try_recv` burst-drain experiment made reverse UDP400 recover from catastrophic loss but hurt forward TCP/UDP and made the receive side too bursty. A separate bounded TUN-write queue also worsened 350-400 Mbit/s reverse UDP loss. Decision: do not keep either sub-experiment. - Mini daemon rejoin smoke after `launchctl kickstart -k`: local status already showed the direct UDP path on the first poll, and a follow-up `ping -c 3` over nvpn had 0% loss with about 7 ms average RTT. This is a smoke test only; it does not cover Wi-Fi roaming. Observations: - The remaining MacBook-to-mini UDP400 ceiling is still around 260-280 Mbit/s; this is the macOS/Wi-Fi sender side and still trails Tailscale's current UDP400 result on this LAN. - Reverse UDP400 no longer collapses deterministically, but it is still less stable than Tailscale/direct LAN over 90s. The remaining loss is not explained by the rekey message counter: FIPS defaults to `node.rekey.after_messages = 2^48`; time-based rekey defaults to 120 seconds, so 90s runs do not normally cross the periodic rekey timer. - Local daemon CPU during reverse UDP400 receive was about 70-74%, so the current reverse loss does not look like a single pegged userspace CPU core. Decision: - Keep owned packet movement and raw TUN write backpressure. - Continue investigating the residual reverse UDP400 variability and the MacBook sender-side UDP400 ceiling before claiming parity with Tailscale. ## 2026-05-12 - pipeline trace and connected UDP activation Setup: - Local MacBook on Wi-Fi to Mac mini on Ethernet. - Both daemons built with `NVPN_PIPELINE_TRACE_DEFAULT=1` and `FIPS_PIPELINE_TRACE_DEFAULT=1`. - FIPS added queue-wait counters for endpoint command, FMP worker, transport, endpoint event, connected-vs-wildcard UDP sends, and connected UDP activation. - Compared current nvpn against same-window Tailscale over the same machines. Results: - Connected UDP was not reliably active for NAT-traversal sockets until the traversal socket was bound with `SO_REUSEADDR`/`SO_REUSEPORT` before FIPS adopted it. After the reuse fix, both app-resource and `~/.cargo/bin/nvpn` binaries showed `connected_udp_installed`, then steady `udp_send_connected` traffic. - Current 20s TCP samples after the connected-UDP fix: nvpn MacBook-to-mini about 227 Mbit/s receiver, nvpn mini-to-MacBook about 350 Mbit/s receiver; same-window Tailscale was about 268 Mbit/s forward and 348 Mbit/s reverse. - MacBook-to-mini forward traces still show sender-side Darwin UDP pressure: one 5s interval saw about `24k/s` FMP worker sends, about `15.7k/s` successful UDP send calls, and about `91k/s` `udp_send_backpressure` events with `fmp_worker_queue_wait` p95 near 134 ms. - Reverse mini-to-MacBook traces were much cleaner, with high connected send rates and little or no backpressure, matching the throughput result. - Queue-cap trials: 256 was too shallow and hurt reverse throughput; 32768 hid saturation and inflated latency/retransmits; 1024 is the best known balance for pushing back toward TUN without building a large userspace buffer. Observations: - The remaining MacBook-to-mini gap is not from crypto and not from the connected-UDP fast path being absent. `sample(1)` on the MacBook sender spent most active worker time in `sendto`, with ChaCha20-Poly1305 a secondary cost. - Wireguard-go and boringtun do not expose an obvious Darwin send primitive that we are missing: non-Linux wireguard-go uses single-packet `WriteMsgUDP`, and boringtun uses a plain utun fd path. Tailscale's utun shows Darwin offload/channel flags (`TSO4`, `TSO6`, `CHANNEL_IO`, checksum offloads) that this daemon's boringtun-created utun does not. - Linux does not show the same problem because the Linux path can use UDP GSO and TUN offloads/batching; Darwin currently has neither in this stack. Decision: - Keep the connected UDP reuse fix and macOS stale-socket clearing. - Keep the 1024 encrypt-worker queue cap. - Do not revive Darwin `sendmsg_x`, fixed 10 us sleep, bounded retry/drop, or 256-queue experiments; all regressed sustained runs. - Next focused experiment is a macOS-only adaptive pause after repeated `ENOBUFS` bursts (`FIPS_SEND_BACKPRESSURE_SLEEP_AFTER`, `FIPS_SEND_BACKPRESSURE_SLEEP_MICROS`) to reduce spin-retry storms without sleeping on clean sends. Long-term parity may require a Darwin utun offload/channel backend rather than more UDP send-loop tuning. ## 2026-05-12 - adaptive ENOBUFS pause and ordered macOS sender v2 Setup: - Same MacBook Wi-Fi to Mac mini Ethernet path, LAN MTU profile, connected UDP active, pipeline tracing compiled on. - First deployed adaptive macOS send pacing: retry/yield remains the default, but after four consecutive `ENOBUFS`/`ENOMEM` results the sender sleeps for 1 us before retrying. Results: - 20s same-window TCP after adaptive pacing: nvpn MacBook-to-mini about 248 Mbit/s receiver with 0 retransmits; nvpn mini-to-MacBook about 409 Mbit/s receiver with 2117 retransmits. Tailscale in the same window was about 301 Mbit/s forward and 355 Mbit/s reverse. - Local sender traces confirmed the pause triggers under load: representative 5s intervals showed about `13k-16k/s` successful UDP sends, about `12k-18k/s` backpressure events, and about `3k-4.5k/s` micro-sleeps. This is much better than the earlier `80k-90k/s` retry storm, but still leaves large FMP worker queue waits. Follow-up experiment: - FIPS now has a macOS-only ordered sender v2: rx_loop still assigns counters sequentially, FMP encryption is spread across workers, and completed packets are serialized by a per-socket/per-destination sender thread. Linux keeps the existing GSO/sendmmsg path. - This is different from the reverted `85858a2` experiment: the new version is only for macOS and only moves the FMP encrypt/send boundary, specifically to avoid making the Darwin UDP sender do AEAD work between kernel send attempts. Decision: - Adaptive pacing is worth keeping unless the ordered sender v2 shows a clear regression. - The ordered sender v2 still needs deployment and live TCP/UDP comparison before it can replace the current macOS path. ## 2026-05-13 - macOS sender wake and receive hot-path cleanup Setup: - Same MacBook Wi-Fi to Mac mini Ethernet path, LAN MTU profile, connected UDP active. Pipeline tracing defaults are now runtime-env only so normal benches are trace-off unless explicitly enabled. - Both launchd daemons were rebuilt from the same temporary FIPS ref, copied into `~/.cargo/bin/nvpn` and the app resource copy, ad-hoc signed, then restarted with `launchctl kickstart`. Results: - Trace-off plus corrected ENOBUFS drop accounting: nvpn about 221 Mbit/s MacBook-to-mini and about 249 Mbit/s mini-to-MacBook. Same-window Tailscale was about 290/359 Mbit/s; raw LAN was about 540/415 Mbit/s. - macOS custom worker queue removed the earlier crossbeam `semaphore_signal_trap` hotspot. 20s TCP was about 205/348 Mbit/s; same-window Tailscale about 298/354 Mbit/s. `sample(1)` then showed the ordered sender completion condvar as the next sender-side cost. - Splitting the ordered sender condvars gave about 215/428 Mbit/s in one clean window. Reverse beat the same-window Tailscale sample, but MacBook-to-mini still trailed. Sender samples were mostly Darwin UDP `sendto` plus completion wakeups; receiver samples still showed `npub_for_node_addr`/`encode_npub` on each delivered endpoint packet. - Caching encoded npubs in the FIPS identity cache removed that receiver hot-path encode. It did not solve the sender bottleneck: one noisy window was nvpn 136/322 Mbit/s vs Tailscale 310/399 Mbit/s, while raw LAN was still healthy at 622/516 Mbit/s. - Disabling the ordered sender and using the simpler hash-by-send-target worker path was not better: about 186/336 Mbit/s vs same-window Tailscale 263/380 Mbit/s. Keep it only as `FIPS_MACOS_ORDERED_SENDER=0` for comparison. - Batching ordered-sender completions by worker batch kept the better ordered behavior and reduced clean-run retransmits: about 215/307 Mbit/s vs same-window Tailscale 258/342 Mbit/s. Profiling still showed sender wakeups because the high worker count means many batches contain only one packet. - Capping the default macOS encrypt pool to four workers was rejected: forward improved slightly to about 225 Mbit/s, but reverse collapsed to about 198 Mbit/s while Tailscale was about 283/356 Mbit/s in the same window. - Making the macOS worker queue signal only when its worker is actually parked is a keeper. Same-window 20s TCP improved to about 252/384 Mbit/s vs Tailscale 329/390 Mbit/s. - Skipping endpoint identity registration on already-established sessions removed the remaining per-packet identity-cache work from the steady sender path. Repeated MacBook-to-mini samples remained variable at about 232-255 Mbit/s, so this was correctness/CPU cleanup rather than a sender ceiling fix. - Moving established endpoint FSP encryption into the existing FMP worker job kept wire format and nonce ordering but removed the inner ChaCha20-Poly1305 seal from the rx-loop task. Final default-stride 20s sample: nvpn about 232/426 Mbit/s vs same-window Tailscale about 323/369 Mbit/s. Final 90s nvpn stability runs stayed up: about 223 Mbit/s MacBook-to-mini and about 391 Mbit/s mini-to-MacBook; both meshes still showed the tested peer reachable afterward. - `FIPS_MACOS_WORKER_STRIDE=4` as a compiled default was rejected twice. Before the FSP worker it slightly helped reverse but did not help the weak direction; after the FSP worker it collapsed reverse throughput to about 198 Mbit/s. Decision: - Keep runtime-only tracing, fixed ENOBUFS drop accounting, macOS custom worker queue, parked-worker-only queue signalling, ordered sender as the default, batched ordered completions, encoded-npub identity cache entries, established endpoint identity-registration skip, endpoint FSP worker preseal, and idle pruning for stale macOS send flows. - Do not cap macOS encrypt workers by default; use `FIPS_ENCRYPT_WORKERS=N` only for experiments. - Keep `FIPS_MACOS_WORKER_STRIDE` only as an experiment knob; the compiled default remains 1. - The current MacBook-to-mini gap is still the macOS sender side, not LAN capacity, not receive-side npub encode, and no longer the inner FSP AEAD. Next likely wins are a lower-wakeup dispatch/completion design, less frequent diagnostics under data-plane load, or a Darwin utun/channel/offload backend comparable to what Tailscale appears to get from its interface. ## 2026-05-13 - Darwin sender mode and socket option retest Setup: - Same macOS Wi-Fi sender to Ethernet mini receiver path with the LAN MTU profile still enabled. - Connected UDP is disabled by default on macOS. Earlier per-peer connected sockets improved the syscall shape on Linux, but on Darwin they caused direct FMP liveness/fallback trouble under load; `netstat` should show only the wildcard UDP socket on macOS unless an explicit env override is being tested. - For launchd env-var A/Bs, `kickstart` alone was not enough: the loaded job kept the old plist environment. Reliable env tests require editing `EnvironmentVariables`, then `launchctl bootout` + `bootstrap`. Results: - Clean two-sided default restart with connected UDP off and ordered sender on: nvpn about 103-109 Mbit/s MacBook-to-mini and about 317 Mbit/s mini-to-MacBook. Same-window Tailscale was about 251/355 Mbit/s. - `FIPS_PERF=1` on the sender showed the active path was `udp_send_wildcard`, not connected UDP, with no `udp_send_backpressure` events. Representative intervals were `udp_send_wildcard` about 10k-12k/s, `udp_send` average roughly 32-40 us, FMP encrypt average roughly 10-12 us, and worker/endpoint queue waits in the hundreds of microseconds. - Properly reloaded `FIPS_MACOS_ORDERED_SENDER=0` improved the weak direction to about 146.8 Mbit/s with fewer retransmits, while reverse stayed strong at about 349.5 Mbit/s. This makes the simpler hash-by-send-target worker path the better Darwin default for this workload. - `FIPS_MACOS_NET_SERVICE_TYPE=vi` on top of the simpler sender regressed the weak direction to about 109.9 Mbit/s. Keep Darwin service-class sockopts opt-in only. - `FIPS_MACOS_WORKER_BATCH=8` on both Macs was worse than the default worker drain batch: about 126.6 Mbit/s MacBook-to-mini and about 277.0 Mbit/s mini-to-MacBook, with more retransmits in the weak direction. Default batch 32 in the same build reached about 157-160 Mbit/s MacBook-to-mini in short samples. - `FIPS_MACOS_WORKER_BATCH=64` was rejected without a throughput run because the Mac-to-mini link stayed relayed for more than two discovery/rekey intervals after the two-sided launchd restart. Default batch 32 recovered direct again. - On this machine Tailscale's utun does show `TSO4`, `TSO6`, `CHANNEL_IO`, and checksum offload flags while the boringtun-created nvpn utun does not. Stock wireguard-go and boringtun do not enable those flags through the plain utun fd path, so Tailscale likely has a Darwin interface/backend advantage outside the UDP crypto protocol itself. Decision: - Default macOS FMP sending to the hash-by-send-target worker path; keep `FIPS_MACOS_ORDERED_SENDER=1` as an experiment for AEAD-bound cases. - Keep connected UDP default-on for Linux and default-off for macOS. - Keep `FIPS_MACOS_WORKER_BATCH` as an experiment knob with default 32; smaller batches hurt this Wi-Fi/Ethernet path and 64 exposed restart/churn fragility. - Do not promote `SO_NET_SERVICE_TYPE` on Darwin without a fresh path-specific win; it can make Wi-Fi sender pacing worse. - The remaining forward gap is still sender-side packet-rate efficiency on Darwin plus a likely utun backend/offload gap versus Tailscale. `kqueue` write-ready handling is not the first fix while ENOBUFS is absent. The realistic next steps are lower-handoff sending, safer per-path interface/socket specialization, a Darwin utun backend that can match Tailscale's channel/offload flags, or FIPS-level packet coalescing, which would be a wire-format change. ## 2026-05-13 - mini Docker nvpn versus boringtun Setup: - Host: `Siriuss-Mac-mini`. - nostr-vpn at `cc8e603` on `codex/nvpn-perf-test`; FIPS sibling checkout at `e036c0e` on `master`. - Docker PATH for non-interactive shell: `export PATH=/usr/local/bin:/Applications/Docker.app/Contents/Resources/bin:$HOME/.docker/cli-plugins:/Applications/Docker.app/Contents/Resources/cli-plugins:$PATH`. - Commands: `DURATION=10 PROJECT_NAME=nvpn-perf-mini scripts/perf-docker.sh` and `DURATION=10 PROJECT_NAME=nvpn-boringtun-mini WG_THREADS_LIST='1 4' scripts/perf-docker-boringtun.sh`. - The e2e Docker image patched embedded FIPS to the sibling BuildKit context. Results: - nvpn TCP: single stream 2822 Mbit/s receiver, 4 streams 2941 Mbit/s receiver, 8 streams 3000 Mbit/s receiver. - nvpn UDP: 200 Mbit/s target delivered 200 Mbit/s with 0% loss; 1000 Mbit/s target delivered 1000 Mbit/s with 0.035% loss. - nvpn ping: 300/300 packets, avg 0.857 ms. - boringtun `WG_THREADS=1` TCP: single stream 3340 Mbit/s receiver, 4 streams 3336 Mbit/s receiver, 8 streams 3350 Mbit/s receiver. - boringtun `WG_THREADS=1` UDP: 200 Mbit/s target delivered 200 Mbit/s with 0.24% loss; 1000 Mbit/s target delivered 988 Mbit/s with 1.2% loss. - boringtun `WG_THREADS=1` ping: 300/300 packets, avg 0.489 ms. - boringtun `WG_THREADS=4` regressed badly: TCP receiver throughput was 1483/1359/1229 Mbit/s for 1/4/8 streams. UDP was near target, with 0.11% loss at 200 Mbit/s and 0.3% loss at 1000 Mbit/s. Profiling: - `scripts/perf-docker-cpu.sh` with `DURATION=20 PROJECT_NAME=nvpn-cpu-mini` showed nvpn using about 193-198% CPU on node-a and about 267-273% CPU on node-b during a 2794 Mbit/s single-stream TCP run. - `FIPS_PERF=1 NVPN_PIPELINE_TRACE=1` on a kept Docker mesh showed the sender TUN read/write stages were not the bottleneck. Representative sender intervals: TUN read about 310k/s at about 0.4 us avg, TUN-to-mesh queue wait about 56-60 us avg, endpoint command wait about 67-72 us avg, and FMP worker queue wait about 300-350 us avg. The hot path is the ordered single-destination FIPS send pipeline and queueing behind it. Rejected variants: - FIPS Linux encrypt worker batch 32 -> 64 preserved wire format but collapsed TCP to about 1094 Mbit/s receiver with huge retransmits and made UDP lossy (about 4.9% loss at 200 Mbit/s and 49% loss at 1000 Mbit/s). Reverted. - nvpn TUN-to-mesh send drain 64 -> 32 did not improve TCP materially (2831/2930/2998 Mbit/s receiver for 1/4/8 streams) and worsened UDP loss versus baseline (0.11% at 200 Mbit/s, 0.33% at 1000 Mbit/s). Reverted. Decision: - Keep the FIPS protocol wire format unchanged. - Do not increase Linux FIPS worker batch size; larger bursts trade a small syscall win for severe TCP retransmits and UDP loss in Docker. - The fair TCP gap versus boringtun `WG_THREADS=1` remains, but nvpn is already better on the UDP loss samples. Next Linux/FIPS work should reduce the single-destination ordered send queue cost without making bursts larger, likely by removing a handoff or making the ordered worker cheaper rather than by coalescing packets. ## 2026-05-13 - macOS connected UDP default and sender profiling Setup: - Same MacBook Wi-Fi sender to Ethernet mini receiver path, using the direct LAN endpoint between the Macs. - FIPS commits tested and pushed to the mini checkout: `090e241` removed unnecessary Darwin wildcard reuse when connected UDP is disabled, `2d18ab7` enabled connected UDP by default on macOS, and `3d566c7` aligned the Darwin listener reuse default so connected siblings can actually bind beside the live listener. - Both daemons were rebuilt from the local FIPS path, ad-hoc signed, installed as `~/.cargo/bin/nvpn`, and restarted through launchd. Final cleanup restored a clean launchd environment with only `OSLogRateLimit` plus launchd's own service variables. Results: - Fresh same-window TCP after connected UDP default-on: nvpn MacBook-to-mini about 256 Mbit/s, Tailscale MacBook-to-mini about 341 Mbit/s; nvpn mini-to-MacBook about 404 Mbit/s, Tailscale mini-to-MacBook about 379 Mbit/s. Reverse can now match or beat Tailscale; the remaining gap is the MacBook Wi-Fi sender direction. - Later samples varied with Wi-Fi conditions. One noisy post-restart window was nvpn about 169-181 Mbit/s while Tailscale was about 272-328 Mbit/s; direct LAN in the same period still reached about 559 Mbit/s MacBook-to-mini and about 421 Mbit/s mini-to-MacBook. - Final clean-env 10s sanity after removing all A/B launchd variables: nvpn about 174/351 Mbit/s and same-window Tailscale about 232/349 Mbit/s. - MTU-safe UDP payloads (`iperf3 -u -l 1200`) showed the same directional cap: nvpn MacBook-to-mini about 223 Mbit/s with near-zero loss, while Tailscale could carry a 300 Mbit/s target with low loss. This points at packet-rate / sender path efficiency, not crypto correctness or MTU blackholing. - Runtime tracing on the MacBook sender showed nostr-vpn's TUN-to-mesh handoff was small: TUN-to-mesh queue wait mostly single-digit microseconds and mesh send about 2 us per packet. FIPS steady-state intervals were dominated by the single peer worker doing FMP/FSP AEAD plus connected UDP send: roughly 19k-20k connected UDP sends/s under tracing, FMP encrypt about 15 us avg, UDP send about 30 us avg, and FMP worker queue wait usually sub-millisecond. Rejected A/Bs: - `FIPS_MACOS_ORDERED_SENDER=1` default stride collapsed to about 8 Mbit/s. `FIPS_MACOS_WORKER_STRIDE=16` recovered to about 212 Mbit/s but added hundreds of retransmits and still did not beat the direct worker path. - `FIPS_MACOS_WORKER_BATCH=1` fell to about 161 Mbit/s; batch 64 fell to about 115 Mbit/s with retransmits. Keep the default batch 32. - Darwin `SO_NET_SERVICE_TYPE` remains opt-in; earlier service-class tests regressed this Wi-Fi sender path. - Backpressure drop tuning is inconclusive. Drop-after-16 improved one noisy sample to about 215 Mbit/s with no retransmits, and drop-after-1 once reached about 226 Mbit/s with normal retransmits, but a repeat drop-after-1 run fell to about 168 Mbit/s with heavy retransmits while same-window Tailscale reached about 328 Mbit/s. Do not change the compiled default without better ENOBUFS/drop instrumentation. Decision: - Keep connected UDP default-on for macOS now that listener/peer reuse flags are aligned; it is the only small socket change that repeatedly improved the weak direction. - Keep the direct hash-by-send-target worker as the macOS default. The existing ordered sender is useful as a reference experiment but is too expensive in its current BTreeMap/condvar form. - The remaining gap is still Darwin Wi-Fi sender packet-rate efficiency. The next high-leverage implementation work should be a cheaper WireGuard-like route/nonce -> parallel encrypt -> ordered send design, or packet coalescing if the protocol can grow an option bit. Avoid more MTU/service-class/batch guessing until the instrumentation can count ENOBUFS, dropped bulk packets, and per-stage queue waits in the same run. ## 2026-05-13: direct-link failure must route through FIPS neighbors Observation: - On the MacBook, `hashtree-node.nvpn` was reachable directly, but on the mini it stayed `pending (fips link pending)` after repeated NAT traversal timeouts. Mini still had six healthy FIPS links, so endpoint traffic should have routed through the mesh instead of failing. Finding: - FIPS core already supports multi-hop EndpointData once route coordinates are known. The app embedding was still using tree routing defaults, which can strand first-contact traffic when the destination's direct link is down and bloom/coordinate state is incomplete. Change: - `nostr-vpn` now sets the embedded FIPS endpoint to reply-learned routing. That lets first-contact EndpointData discovery flood through established tree neighbors and learn the reverse path from the verified response. - Promoted `scripts/e2e-fips-routed-udp-docker.sh` into `scripts/release-gate.sh` so release verification catches the app-level Docker version of this failure: Alice and Bob's direct UDP path is blocked and packets must pass both directions through Charlie. ## 2026-05-14: macOS lag without link loss traced to daemon bookkeeping Observation: - MacBook-to-mini screen sharing briefly lagged again while both nvpn daemons stayed running and reported fresh direct UDP links. Short pings over nvpn had no loss but showed bursts up to roughly 100-400 ms, while the same LAN underlay stayed around 5-13 ms. Finding: - `sample(1)` on both daemons showed active time in the daemon async loop writing runtime JSON state with `sync_all`/`fcntl`/`rename` and running macOS route/snapshot probes through `netstat`/`ipconfig`. This points at event-loop stalls and queueing, not FIPS crypto, MTU, or a broken direct link. Change: - Runtime status/control files now remain atomic but skip per-update fsync; the durable user config writer still fsyncs separately. - Periodic macOS underlay route repair and network snapshot probes now run on Tokio's blocking pool so shell command latency does not occupy a runtime worker that is also servicing tunnel work. ## 2026-05-14: macOS sender batching after bookkeeping fix Observation: - With the daemon-loop stalls removed, the MacBook Wi-Fi sender still lagged Tailscale and one instrumented run collapsed to about 37 Mbit/s with 1194 TCP retransmits. Tailscale in the same window was about 272/348 Mbit/s. Finding: - FIPS perf counters showed AEAD work was small. The sender instead showed multi-millisecond outbound worker queue waits and occasional hundred-ms UDP send outliers; the receiver path was clean. This points at bursty Darwin UDP sending and kernel/radio pacing, not crypto or MTU. - MTU was not the reason: nvpn utun was 1290 and Tailscale utun was 1280. - Connected UDP still matters: disabling it with the same batch setting dropped MacBook-to-mini to about 149 Mbit/s. Ordered-sender mode stayed around 218 Mbit/s with more retransmits, so it remains opt-in. Result: - Runtime small-batch tests with connected UDP enabled recovered the collapse mode. Final no-perf 20 second forward sweeps were close: batch 8 about 215 Mbit/s, batch 32 about 214 Mbit/s, and batch 2 about 210 Mbit/s. A prior batch 2 run reached about 205 Mbit/s with 1 retransmit and about 356 Mbit/s mini-to-MacBook. - FIPS core now defaults macOS direct-worker batches to 8 instead of 32. This shortens Darwin send bursts without forcing one worker wake per datagram. Forward is still below the best same-window Tailscale result, but the severe collapse mode is reduced and reverse remains at Tailscale level. ## Bench commands Use both directions and record both TCP and UDP: ```sh iperf3 -c -t 15 --json iperf3 -c -t 15 -R --json iperf3 -u -b 400M -c -t 15 --json iperf3 -u -b 400M -c -t 15 -R --json ``` Compare with Tailscale on the same machines and with boringtun/wireguard-go where available. Once short runs look good, repeat with 90 second TCP/UDP runs and churn tests: daemon restart, peer rejoin, and network roaming. ## 2026-05-14: screenshare reconnect during dev daemon rebuild Observation: - A macOS-to-macOS screenshare session briefly went into reconnect while the nvpn mesh was being rebuilt and redeployed. Finding: - Both macOS launchd services were running the daemon directly from `target/release/nvpn`. A release build rewrites that file. The daemon detected `service executable changed on disk` and exited so launchd restarted it, matching the brief reconnect window. - Separately, one macOS host stayed at 6/8 peers until Linux VM daemons were updated. Those daemons had the same product version but were still built with older FIPS code. Updating to `fips-core`/`fips-endpoint` 0.3.4 keeps stale traversal failures from disturbing already-active peers. Change: - `nvpn service install` now stages the daemon to a stable service executable path before writing service metadata: `/Library/PrivilegedHelperTools/