# Real-world cache hit — single user, single day A real Reasonix user shared their DeepSeek dashboard for **2026-05-01**. Used with permission, anonymized. ![DeepSeek usage dashboard, 2026-05-01](2026-05-01-deepseek-dashboard.png) ## The numbers | | Tokens | |---|---:| | Input — cache hit | 435,033,856 | | Input — cache miss | 767,616 | | Output | 179,763 | | **Day total** | **435,981,235** | **Cache hit ratio (input):** `435,033,856 / (435,033,856 + 767,616)` = **99.82%** ## Cost — using the prices Reasonix bills against (`src/telemetry/stats.ts`) USD per 1M tokens — `inputCacheHit / inputCacheMiss / output`: - `deepseek-v4-flash` — `0.028 / 0.139 / 0.278` - `deepseek-v4-pro` — `0.139 / 1.667 / 3.333` Assuming **v4-flash** (the project default): | | This user (99.82% hit) | Same workload, **0% cache** | |---|---:|---:| | Cache-hit input | $12.18 | — | | Cache-miss input | $0.11 | $60.58 | | Output | $0.05 | $0.05 | | **Total / day** | **$12.34** | **$60.63** | → Cache saved this user **$48.29**, or **~80%** off the un-cached baseline, on a single day. On **v4-pro** (5× the prefix-cache discount) the same workload would cost **~$62.35** vs **~$727.08** without cache — a **~91% saving**. ## "Isn't that just DeepSeek's prefix cache?" DeepSeek's API ships prefix caching enabled by default; the *cache* is theirs, the *hit rate* is the client's. Same API, different clients, very different hit rates: - DeepSeek's own web chat: 60–80% within a single conversation, drops to 0% on a new session (system prompt may differ). - Cherry Studio / Open WebUI / generic OpenAI-shape SDKs: typically 30–60% on long sessions — history gets reordered, tool specs get re-serialized, every drift breaks the prefix. - Cline / Continue and other XML-tool-call clients: lower still — every tool result inlines into the conversation, shifting bytes the cache keys on. 99.82% is what falls out of these four design choices in Reasonix: 1. **`ImmutablePrefix`** (`src/memory.ts`) — system prompt + tool specs are frozen at session start. Same byte sequence every turn. 2. **`AppendOnlyLog`** — turns only append. No reorder, no edit-in-place. 3. **`VolatileScratch`** — chain-of-thought / per-turn scratch lives outside the cached prefix so it never poisons the next hit. 4. **Auto-compact** — when context approaches the cap, older turns fold into a summary message *appended* to the prefix; the prefix itself isn't rewritten, so the cache survives the fold. DeepSeek gave us cacheable bytes. The four mechanisms above are how we keep the bytes cacheable. ## Reproduce The synthetic side of this lives in `benchmarks/tau-bench/` — same task set run through `CacheFirstLoop` vs a deliberately cache-hostile baseline. The real-world data above is what the synthetic numbers look like once a user runs the harness in anger. Submit your own dashboard screenshot if you want it anonymized and added here — open an issue.