# Problems 0CompactMem solves > Concrete pain points users hit with LLM agents, what we changed, and the > measured impact. Mainly useful for prospective contributors and people > evaluating whether 0CompactMem covers their case. Each section names a specific problem and the iteration that resolved it. You can `git log --grep="iter N"` for the actual change. --- ## No cross-session memory — starting from zero every time Every new conversation loses every previous decision, pitfall, and constraint. Significant warm-up time is wasted re-building context. **Solution.** Knowledge extracted at session end (decisions, reasoning chains, design constraints, quantitative evidence) → stored in `store.db` → retrieved and injected at the next session start. ``` Recall@3: +147% MRR: +320% A/B quality: +68% Session recall: 94.2% ``` --- ## High retrieval latency — noticeable lag on every prompt Subprocess-based retrieval was P50 ≈ 54 ms per keystroke. **Solution.** Persistent `retriever_daemon.py` over a Unix socket, plus a three-level cache (FTS5 result cache + two-level TLB). ``` P50: 54 ms → 0.1 ms (540×) ``` --- ## Context window fills up; forced compaction loses reasoning chains **Solution.** Layered compression — zram-style output-compression hints (`output_compressor.py`) + Context Pressure Governor with four watermark levels + swap eviction of low-frequency chunks. --- ## Session interruption loses "what I was doing" **Solution.** CRIU-style session checkpoint — unfinished intent extracted at `Stop`, persisted to `session_intents`, auto-injected at the next `SessionStart` (24h TTL). --- ## Architectural constraints scattered across history, easily violated **Solution.** Auto-detect constraint patterns (22 patterns) → store as `design_constraint` chunks with `importance = 0.95`, `oom_adj = -800` (never evicted) → auto-injected on every `UserPromptSubmit`. ``` 21 active constraints, top constraint retrieved ×2,043 times ``` --- ## Multiple agents overwriting each other's memory (iter 259) Concurrent sessions caused last-writer-wins races on shared files. **Solution.** Per-session `shadow_traces` and `session_intents` tables (`PRIMARY KEY = session_id`), per-session named files (`.shadow_trace.{sid[:16]}.json`). Verified by a 20-test isolation suite. --- ## Stop hook blocks on I/O-heavy transcript parsing (iter 260) `extractor.py` spent 50–150 ms on file I/O inside the synchronous `Stop` hook. **Solution.** `submit_extract_task()` enqueues to `ipc_msgq` (< 5 ms) → `extractor_pool.py` daemon processes via `ThreadPoolExecutor(3)`. Falls back gracefully if the pool isn't running. ``` Stop hook: 50–150 ms (sync) → < 5 ms (async queue) ``` --- ## Repeated injection wastes tokens (iter 359, 361) Without dedup, the same chunk is injected with full `raw_snippet` on every prompt in a long session — re-shipping content that's already in the model's working memory. **Solution: three-layer token-budget enforcement.** - **FULL → LITE demotion (iter 361)** — once a chunk has been injected with full format (summary + raw_snippet) in this session, subsequent injections are demoted to LITE (summary only). Once the LLM has seen the raw text, the marginal value of re-shipping it is zero. - **Session dedup (iter 359)** — chunks injected ≥ `session_dedup_threshold` (default 2) times are excluded from context entirely. - **Same-hash TLB bypass** — identical prompt hashes return the cached result immediately. Zero DB queries, zero new tokens. Measured (`tests/test_token_budget.py`): ``` Injection cost: ~44 tokens / call (avg 178 chars) FULL → LITE saving: ~62 tokens / repeat (-69.6% per re-injection) User re-explanation saved: ~300 tokens / call Net token ROI: +256 tokens / call Context cap enforced: ≤ 800 chars (max_context_chars sysctl) ```