--- title: "Compression Engines" version: 3.8.40 lastUpdated: 2026-06-28 --- # Compression Engines OmniRoute compression is built around engine contracts. A mode can run one engine directly (`caveman` or `rtk`) or a deterministic stacked pipeline that executes multiple engines in order. ## Modes | Mode | Engine path | Intended input | | ------------ | ---------------------------------- | -------------------------------------------- | | `off` | none | Exact prompt preservation | | `lite` | Caveman lite helpers | Low-risk always-on cleanup | | `standard` | Caveman | Natural-language prompt condensation | | `aggressive` | Caveman + history/tool summarizers | Long chat sessions | | `ultra` | Caveman + pruning helpers | Context-limit recovery | | `rtk` | RTK | Terminal, shell, build, test, and git output | | `stacked` | Pipeline, default `rtk -> caveman` | Mixed tool logs and prose, max savings | ## Engine Registry The registry lives in `open-sse/services/compression/engines/registry.ts`. Engines expose a shared contract: - `id`: stable engine id such as `caveman` or `rtk` - `apply(text, config)`: legacy execution path used by stacked pipelines - `compress(input, config)`: primary execution path returning text + stats - `getConfigSchema()`: returns the JSON-Schema-like shape of valid config - `validateConfig(config)`: returns `{ valid, errors[] }` Registration uses `registerCompressionEngine(engine)` (or `registerEngine` for advanced cases), which calls `assertValidEngine()` and `validateConfig(defaultConfig)` before accepting. Use `unregisterCompressionEngine(id)` to remove an engine at runtime. `strategySelector.ts` registers the built-in engines before compression runs. This lets preview, runtime compression, stacked mode, tests, and future engines use the same execution path. ### MCP description compression (related) A separate registry compresses MCP tool description metadata at registry-level — see `open-sse/mcp-server/descriptionCompressor.ts` and [MCP-SERVER.md](../frameworks/MCP-SERVER.md). It reuses Caveman rules but operates on tool metadata, not request payloads. ### Additional built-in engines Beyond Caveman, RTK, and LLMLingua-2, the registry ships several specialized lossless / structural engines (used by stacked pipelines, the playground, and tests): | Engine | Id | What it does | | ------------- | --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | CCR | `ccr` | Content-Compress-Retrieve (H4): replaces large contiguous text blocks with content-addressed references, so repeated/large blocks are sent once and referenced thereafter. | | headroom | `headroom` | SmartCrusher (H3 + N5): lossless tabular compaction of homogeneous JSON-array payloads into a columnar `[N rows]` form. | | ionizer | `ionizer` | Head/middle/tail row sampling for very large homogeneous blocks, storing the elided middle as a CCR content-addressed reference. | | session-dedup | `session-dedup` | Content-addressed cross-turn deduplication (TokenMizer-inspired): elides text already seen in earlier turns of the same session. | ## Caveman Caveman mode focuses on semantic condensation of normal prose: - preserves code blocks, URLs, JSON, paths, and structured data - removes filler, hedging, repeated context, and verbose connective phrasing - supports language-aware file rule packs in `open-sse/services/compression/rules/` - remains available through the legacy `standard`, `aggressive`, and `ultra` modes The dashboard surface is `Dashboard -> Context & Cache -> Caveman`. Caveman upstream reports `~75%` fewer output tokens, `65%` average output savings in benchmarks with a `22-87%` range, and a `~46%` input-compression tool. OmniRoute uses the Caveman input-side number when documenting stacked prompt/context savings; Caveman output mode remains a separate response-behavior feature. ## RTK RTK mode focuses on command and tool output: - detects output classes such as `git status`, `git branch`, `git diff`, Vitest/Jest/Pytest, Cargo/Go tests, TypeScript/Vite/Webpack builds, ESLint, npm audit/installs, Docker logs, shell `find`/`grep`, stack traces, and generic logs - applies 49 JSON filters from `open-sse/services/compression/engines/rtk/filters/` - supports the RTK-style declarative pipeline: ANSI stripping, replace, match-output short-circuit, strip/keep lines, per-line truncation, head/tail/max-line truncation, and on-empty fallback - supports trust-gated project filters in `.rtk/filters.json` and global filters in `DATA_DIR/rtk/filters.json` - strips ANSI sequences, progress noise, repeated lines, and unhelpful boilerplate - preserves actionable failures, warnings, summaries, changed files, and tail context - can optionally retain redacted raw output for recovery/debugging through authenticated management routes The dashboard surface is `Dashboard -> Context & Cache -> RTK`. Operational details for custom filters, trust, verify, and raw-output recovery live in [`RTK_COMPRESSION.md`](./RTK_COMPRESSION.md). RTK upstream reports `60-90%` savings for command-output compression. Its README example shows a 30-minute Claude Code session going from `~118,000` tokens to `~23,900`, or `79.7%` saved. ## LLMLingua-2 (Semantic Pruning) LLMLingua-2 mode performs **semantic token pruning** on prose using a small ONNX token classifier, complementing the rule-based Caveman and RTK engines: - compresses prose in non-system messages only; fenced code blocks and other preserved constructs are never altered - runs the `@atjsh/llmlingua-2` backend (ONNX via `@huggingface/transformers`) in a worker thread, so model inference never blocks the request event loop - is **stackable** (`stackPriority` 35): in a stacked pipeline it runs after the structural engines (CCR, session-dedup, headroom, Caveman) but before `ultra`, since semantic pruning is most effective on already-structurally-compressed text — e.g. `rtk -> caveman -> llmlingua` - **fail-opens on any error** (missing optional deps, worker spawn, model load, inference, or timeout) → the original text is returned unchanged, never an error Engine location: `open-sse/services/compression/engines/llmlingua/`. The dashboard surface is `Dashboard -> Context & Cache -> LLMLingua`. ### Models The default model is **TinyBERT** (`atjsh/llmlingua-2-js-tinybert-meetingbank`, ~57 MB, fast). A higher-accuracy **BERT-base** model (`Arcoldd/llmlingua4j-bert-base-onnx`, ~710 MB) is available via the engine config `model` field. `@huggingface/transformers` downloads the selected model lazily from the HuggingFace Hub into `${DATA_DIR}/models/llmlingua` on the first call (`modelStore.ts`); a `modelPath` config override points it at a local copy instead (offline / air-gapped installs). ### Optional dependencies & on-demand install The prunable LLMLingua runtime peer stack is **optional**. Three packages are declared as `optionalDependencies` in `package.json` and kept **external** by the production build (`scripts/build/prepublish.ts` does not bundle them): | Package | Version (pin) | Notes | | -------------------- | ------------- | ---------------------------------------------- | | `@atjsh/llmlingua-2` | `2.0.3` | Entry package; declares the others as peers | | `@tensorflow/tfjs` | `4.22.0` | Heaviest dep — dominates the ~800 MB footprint | | `js-tiktoken` | `^1.0.20` | Tokenizer | `@huggingface/transformers` is pinned at `3.5.2` as an **optional** dependency (shared with the local embeddings path and also traced into the standalone bundle). Keeping it optional prevents `onnxruntime-node` CUDA provider postinstall failures on CUDA 11 hosts from aborting the whole OmniRoute install; when the optional stack is absent, LLMLingua still fail-opens. Only the three packages above are prunable SLM peers. A standard `npm install` (dev) installs the optional stack automatically unless optional dependencies are omitted. **Why on-demand:** the npm-published package, the standalone bundle, and the Docker image ship **without** these deps to stay slim. When they are absent, the worker's dependency gate (a `@atjsh/llmlingua-2` resolve probe in `worker.ts`) fails and the engine **fail-opens silently** — selecting LLMLingua becomes a no-op (text returned unchanged, no error logged). To activate it in a pruned environment, install the optional stack: ```bash # pin to the versions declared in package.json optionalDependencies npm install @atjsh/llmlingua-2@2.0.3 @tensorflow/tfjs@4.22.0 js-tiktoken ``` Roughly **~800 MB** total: the TensorFlow.js + transformers runtimes dominate; the TinyBERT model adds ~57 MB downloaded at first use (not via npm). Per environment: - **Dev / `npm install`** — installed automatically unless you passed `--omit=optional` (or `--no-optional`). No action needed. - **Global npm (`npm i -g omniroute`) / standalone** — run the install command above inside the installed package directory, or reinstall without omitting optional deps. - **Docker** — add the install command in a derived image layer; the published image ships slim by design. - **VPS (PM2)** — install into the app's `node_modules`, then restart the process so the worker re-probes the gate. **Verify it is active:** with LLMLingua selected, real prose actually shrinks (the engine stops fail-opening), and the first request triggers the model download into `${DATA_DIR}/models/llmlingua`. The gate intentionally probes only `@atjsh/llmlingua-2` — the other peers are ESM-only and `require.resolve` throws on them even when present — so the worker still fail-opens if any peer is genuinely missing at `import()` time. ## Stacked Pipelines Stacked mode runs pipeline steps in order. The default is: ```txt rtk -> caveman ``` Use this for coding-agent sessions where a prompt combines command output with human or assistant prose. RTK reduces noisy tool logs first, then Caveman compresses remaining natural language. Pipeline steps are configured with `stackedPipeline` in compression settings or through compression combos. When both engines reduce the same eligible payload, savings compound: ```txt combined = 1 - (1 - RTK savings) * (1 - Caveman input savings) average = 1 - (1 - 0.80) * (1 - 0.46) = 89.2% range = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6% ``` ## MCP Accessibility Tree Filter The MCP accessibility-tree smart filter is a post-execution compression layer that runs on MCP **tool results**, not on prompts or context. It targets the verbose accessibility-tree and browser snapshot payloads returned by tools like Playwright, computer-use, and browser-automation MCP servers. ### What it does 1. **Noise stripping** — removes empty generic/text entries (`- generic:`, `- text: ""`) 2. **Sibling collapse** — when ≥ `collapseThreshold` (default 30) consecutive lines are structural repeats, collapses them into the first `collapseKeepHead` (default 10) lines + a count summary + the last `collapseKeepTail` (default 5) lines 3. **Ref preservation** — `[ref=eXX]` anchors required by Playwright/computer-use are never touched 4. **Hard truncation** — if the text after collapse still exceeds `maxTextChars` (default 50,000), truncates with a navigation hint so the agent can continue working ### Engine location ```txt open-sse/services/compression/engines/mcpAccessibility/ index.ts ← smartFilterText() entry point collapseRepeated.ts ← sibling-collapse algorithm constants.ts ← DEFAULT_MCP_ACCESSIBILITY_CONFIG ``` ### Configuration Controlled by `compression.mcpAccessibility` in global settings (migration 056). Default config: ```json { "enabled": true, "maxTextChars": 50000, "collapseThreshold": 30, "collapseKeepHead": 10, "collapseKeepTail": 5, "minLengthToProcess": 2000 } ``` The filter is only applied to tool-result payloads whose `type` is `"text"` and whose length exceeds `minLengthToProcess`. It does not affect prompt compression or request payloads. ### Expected savings 60–80% on browser snapshot tool results, depending on page complexity. The collapse algorithm is O(n) in line count and adds negligible latency. ### This filter vs the compression engines above | Aspect | Caveman / RTK / Stacked | MCP accessibility filter | | ----------- | ------------------------- | -------------------------------------- | | Target | Request prompts / context | MCP tool results | | Trigger | Compression mode setting | `compression.mcpAccessibility.enabled` | | Scope | All SSE messages | Tool results only | | Ref anchors | N/A | Preserved unconditionally | --- ## Compression Combos Compression combos are named compression profiles that can be assigned to routing combos: - `compression_combos`: stores mode, pipeline, RTK config, language config, and default marker - `compression_combo_assignments`: maps a compression combo to a routing combo - runtime integration resolves an assigned compression combo before generic combo overrides - analytics include `compression_combo_id` and `engine` Dashboard surface: `Dashboard -> Context & Cache -> Compression Combos`. ## API Surface | Route | Purpose | | -------------------------------------- | ---------------------------------------------------------------- | | `/api/settings/compression` | Global compression settings (includes `mcpAccessibility` config) | | `/api/compression/preview` | Preview any compression mode | | `/api/compression/language-packs` | List available Caveman language packs | | `/api/context/caveman/config` | Caveman settings alias | | `/api/context/rtk/config` | RTK defaults and settings | | `/api/context/rtk/filters` | RTK filter catalog | | `/api/context/rtk/test` | RTK preview/test endpoint | | `/api/context/rtk/raw-output/[id]` | Authenticated redacted raw-output recovery | | `/api/context/combos` | Compression combo CRUD | | `/api/context/combos/[id]/assignments` | Routing-combo assignment CRUD | | `/api/context/analytics` | Compression analytics alias | Management routes require management authentication or API-key policy checks. ## MCP Tools Compression exposes five MCP tools: | Tool | Scope | Purpose | | ----------------------------------- | ------------------- | -------------------------------- | | `omniroute_compression_status` | `read:compression` | Settings, analytics, cache stats | | `omniroute_compression_configure` | `write:compression` | Update global settings | | `omniroute_set_compression_engine` | `write:compression` | Set mode and optional pipeline | | `omniroute_list_compression_combos` | `read:compression` | List compression combos | | `omniroute_compression_combo_stats` | `read:compression` | Read combo/engine analytics | ## Known limitations - **LLMLingua-2 (SLM) requires co-located optional deps.** The worker only runs in a production build when `@atjsh/llmlingua-2` + peers are co-located into `dist/node_modules` (see `scripts/build/colocateOptionals.mjs`, #4286). Without them the engine fail-opens (returns the original text). Worker resolution no longer depends on `import.meta.url` (it dies in the standalone bundle) — it anchors on the runtime cwd / `argv[1]`. - **Caveman language packs `de` / `fr` / `ja` are partial.** They ship `context` + `filler` + `structural` rules but no `dedup` / `ultra` packs, so `ultra` intensity is no stronger than `full` for those languages (they use only their own rules — there is no silent fall-back to the English `dedup`/`ultra` rules, which would mangle foreign text). `en` / `es` / `id` / `pt-BR` are complete. Contributions of `dedup.json` + `ultra.json` for the partial packs are welcome. - **Stacked telemetry only lists engines that compressed.** A stacked-pipeline step whose engine ran but produced 0 % savings returns `stats:null` and so does not appear in `engineBreakdown` — indistinguishable from a step that was skipped. Distinguishing "ran, 0 %" from "skipped" would require a breakdown-model change and is deferred. ## Validation The focused gates for this area are: ```bash node --import tsx/esm --test tests/unit/compression/rtk-*.test.ts tests/unit/compression/pipeline-integration.test.ts tests/unit/compression/context-compression-api.test.ts node --import tsx/esm --test tests/unit/compression/*.test.ts tests/golden-set/*.test.ts tests/integration/compression-pipeline.test.ts tests/unit/api/compression/compression-api.test.ts node --import tsx/esm --test tests/unit/compression/mcpAccessibility*.test.ts npm run typecheck:core ```