# Prism Coder
**Give your AI agent memory that lasts.** Persistent sessions, knowledge graphs, and offline tool-routing β fully local and free.
[](https://www.npmjs.com/package/prism-mcp-server)
[](https://github.com/modelcontextprotocol/servers)
[](../../LICENSE)
[](https://huggingface.co/dcostenco)
Prism Coder is an [MCP server](https://modelcontextprotocol.io) that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight `prism-coder` model fleet (2Bβ27B) for fast, offline tool-routing β no cloud required.
**No account needed. No API keys. Runs on your machine.**
A paid subscription adds cloud sync, higher model tiers, and team features through the [Synalux portal](https://synalux.ai).
---
## Quickstart
The free tier needs no account, no API key, and no cloud. Add the server to your MCP client:
```json
{
"mcpServers": {
"prism": {
"command": "npx",
"args": ["-y", "prism-mcp-server"]
}
}
}
```
Open Claude Desktop or Cursor and your agent now has memory backed by a local SQLite database (`~/.prism-mcp/data.db`).
**Optional β local model fleet** for offline tool-routing. Pull whichever fits your hardware:
```bash
ollama pull dcostenco/prism-coder:2b # 2.3 GB Β· mobile / lightweight (99.1% routing accuracy)
ollama pull dcostenco/prism-coder:4b # 3.4 GB Β· verifier (100% accuracy)
ollama pull dcostenco/prism-coder:9b # 5.8 GB Β· default router (100% accuracy, Qwen3.5)
ollama pull dcostenco/prism-coder:27b # 16 GB Β· complex tasks (100% accuracy)
```
Prism detects both the namespaced (`dcostenco/prism-coder:9b`) and bare (`prism-coder:9b`) Ollama tags automatically.
---
## What it does
Your AI agent forgets everything between sessions. Prism fixes that β and adds verification, drift detection, and multi-agent coordination on top.
### Mind Palace β persistent memory that survives across sessions
Every conversation feeds a persistent store. The next session loads the right context automatically β no re-explaining.
The dashboard shows your current project state, pending TODOs, intent health, and a neural knowledge graph β all built automatically from your agent sessions.
### Knowledge Graph β semantic + keyword + graph search
Ask "what did I decide about the auth flow last month?" and get an answer with citations, combining vector similarity, full-text search, and graph traversal.
### Session History β immutable audit trail
Every session is logged with files changed, decisions made, and TODOs. Search, filter, and replay any past session.
### Inference Metrics β see where your tokens go
Every `prism_infer` call tracks which model handled it (local Ollama vs cloud) and how many tokens were consumed. When you save a session, Prism shows a summary:
```
π Inference Metrics (this session):
Total calls: 12 β Local: 10 (83%) | Cloud: 2 (17%)
Tokens: 8,420 in + 3,150 out = 11,570 total
Avg latency: 1,240ms
By model:
prism-coder:27b: 6 calls, 7,200 tokens, avg 1,800ms
prism-coder:9b: 4 calls, 2,870 tokens, avg 620ms
synalux-27b: 2 calls, 1,500 tokens, avg 1,100ms
```
Local calls use actual Ollama token counts (`prompt_eval_count` / `eval_count` from Ollama); cloud calls use char/4 estimates. Metrics are tracked locally β no portal dependency, no env vars, works offline. Per-call data is also forwarded to the Synalux portal as best-effort analytics (independent of the display).
### Session Drift Detection
Long agent sessions can wander from their original goal. `session_detect_drift` compares current work against the stated goal and returns `on_track / minor_drift / major_drift` so the agent can self-correct.
### Behavioral Verification β catch bad edits before they happen
AI agents apply patterns from checklists without understanding the real-world impact. The `verify_behavior` tool challenges the agent with a scenario it must answer **before** editing β forcing it to think through what the end user will experience.
```
Agent: "I'll revert this kitchen display change"
Prism: "β οΈ Scenario: A cook sees a 3-item ticket. One item is voided.
What should the cook see after the void?"
Agent: "The ticket stays visible with the remaining 2 items."
Prism: "Correct β your revert would hide the ticket entirely."
```
17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed β works in any MCP client.
### Time Travel
Roll back to any previous session state. Compare diffs between versions. Restore a known-good state with one click.
### Cognitive Routing
Three memory types, automatically sorted: **episodic** (what happened β session logs, decisions), **semantic** (what's true β facts, architecture), and **procedural** (how to do X β workflows, patterns). When you search, the router picks the right store instead of dumping everything.
### Multi-Agent Hivemind
Coordinate multiple AI agents working on the same project. Each agent has its own session, but they share memory through the knowledge graph. The Hivemind Radar shows real-time agent status, tasks, and activity.
### Neural Search
Search across all memories with highlighted results, knowledge graph editing, and memory density metrics.
---
## Local-first and privacy
The free tier runs entirely on your machine. Paid tiers add cloud sync through the Synalux portal, which is what enables cross-device memory and team sharing.
| | Local tier (free) | Cloud tier (paid) |
|---|---|---|
| Memory storage | Local SQLite | Synalux portal (Supabase-backed) |
| Inference | Local Ollama models | Local models + cloud fallback |
| API keys required | None | Synalux subscription key |
| Web search / scrape | Not included | Via Synalux portal (provider keys server-side) |
| What leaves your machine | Nothing | Memory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit) |
| Works offline | β
| Local features yes; sync/cloud no |
**Handling sensitive data.** All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the **local tier** for full air-gap, or use **Enterprise** which includes a HIPAA Business Associate Agreement.
---
## Models
The `prism-coder` fleet uses Qwen3.5 for MCP tool-routing AND general inference. The 9B and 27B are fine-tuned with LoRA (r=128, all 64 layers including DeltaNet); the 2B and 4B use stock Qwen3.5-4B at different quantization levels. The 27B scored 100% on BFCL function-calling and 100% on an internal 15-problem coding eval at $0 inference cost.
`prism_infer` supports three modes: `route` (tool routing, fast, nothink), `chat` (conversation with thinking), and `code` (code generation with thinking). In chat/code modes, the model uses `` blocks for chain-of-thought reasoning, which are stripped before the response is served. If the local model fails a quality gate (empty, think-only, or truncated), paid tiers automatically escalate to Claude via the Synalux portal.
| Model | Ollama tag | Size | [BFCL](https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v3_multi_turn.html) Accuracy | Role | Tier |
|---|---|---|---|---|---|
| Qwen3.5-4B Q3_K_M | `prism-coder:2b` | 2.3 GB | 99.1% Γ 3 seeds | iPhone / mobile first gate | Free |
| Qwen3.5-4B Q4_K_M | `prism-coder:4b` | 3.4 GB | 100% Γ 3 seeds | Verifier | Free |
| Qwen3.5-9B (LoRA) | `prism-coder:9b` | 5.8 GB | 100% Γ 3 seeds | Default router | Standard+ |
| Qwen3.5-27B (LoRA) | `prism-coder:27b` | 16 GB | 100% Γ 3 seeds | Quality tier (DeltaNet, 28.5 tok/s) | Advanced+ |
Weights: [huggingface.co/dcostenco](https://huggingface.co/dcostenco) (public GGUF). Latency depends on model size and hardware β see [Benchmarks](#benchmarks) to measure it on your own machine rather than trusting a printed number.
### Cascade
```
query β prism-coder:9b (local router, default)
β prism-coder:4b (grounding verifier)
β prism-coder:2b (iPhone / mobile, auto-selected by RAM)
β prism-coder:27b (complex tasks, on demand)
β cloud fallback (paid tiers, for max quality)
```
### Multi-Layer Verification
Every tool-grounded answer on paid tiers passes through deterministic L3 routing rules and an NLI grounding verifier before reaching the user. Free-tier users get the deterministic gates (L1, L3-Tool, L3-Tier0) without the model-based NLI check.
| Layer | What | Model | Cost |
|---|---|---|---|
| **L1** | Crisis/medical safety gate | None (regex) | 0 ms |
| **L3-Tool** | Tool name remap + false-positive rejection | None (deterministic) | 0 ms |
| **L3-Tier0** | Integer grounding (set membership) | None (deterministic) | 0 ms |
| **L3-Tier2** | NLI verifier (claim β ENTAILED/NEUTRAL/CONTRADICTED) | prism-coder:2b | ~200 ms |
| **L4** | Hallucination judge (opt-out for clinical) | prism-coder:4b | ~500 ms |
Fail-closed on the verified path: when the grounding verifier runs (Standard tier and up), timeout, ambiguity, or missing evidence yields a refusal, not pass-through. Free-tier users get the deterministic L1/L3-Tool gates but not the NLI verifier.
---
## Benchmarks
**Reproduce every number yourself.** All evals are open-source and self-contained:
```bash
git clone https://github.com/dcostenco/prism-coder && cd prism-coder
pip install anthropic requests
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 9b 27b
```
**Routing eval (115 cases, 12 categories, 3-seed mean).** Routing accuracy includes the deterministic L3 correction layer β the same rules that run in production. On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is **near-saturated** for this taxonomy β it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is **offline routing reliability at zero cost**, not that a 2.3 GB model rivals a frontier model in general.
| Model | Routing accuracy | Notes |
|---|---|---|
| prism-coder:2b (Q3_K_M) | 99.1% Γ 3 seeds | 1 failure: regexβknowledge_search |
| prism-coder:4b / 9b / 27b | 100% Γ 3 seeds | Perfect on all 115 cases |
| Claude (frontier, same eval) | ~98% | Stronger everywhere outside this narrow task |
**Memory uplift (LoCoMo-Plus, self-published).** A separate long-context dialogue benchmark ([dcostenco/Locomo-Plus](https://github.com/dcostenco/Locomo-Plus)) measures how much structured memory helps a base model retain multi-day context. Results show large gains when a model is paired with Prism memory versus running raw. Note this benchmark is authored, run, and LLM-judged by this project β treat it as a reproducible demonstration, not an independent third-party result, and run it yourself with the commands in that repo.
### Code Generation Quality (27B vs Claude Opus)
Three progressively harder Python tasks run through `prism_infer(mode:"code", think:true)` on the local 27B and compared with Claude Opus. Both produce correct, production-quality code. The 27B is slightly more verbose (docstrings, examples); Opus is slightly tighter (`__slots__`, early-exit DFS). On routine coding the 27B at $0 replaces cloud calls entirely.
| Task | Local 27B | Claude Opus | Verdict |
|------|-----------|-------------|---------|
| Fibonacci with memoization | `@lru_cache`, ValueError on negative, docstring | Nested `_fib` to keep cache private | Both correct, equivalent |
| LRU Cache (OrderedDict, O(1)) | `Any` keys, isinstance capacity check, `__repr__` | `Hashable` key type (more precise), same ops | Both correct, Opus marginally tighter |
| Trie with autocomplete | `.lower()` normalization, collect+sort+slice | `__slots__` on TrieNode, early-exit DFS at limit | Both correct, Opus slightly more optimized |
Local 27B output β Trie with autocomplete (hardest task)
```python
class TrieNode:
def __init__(self):
self.children: dict[str, 'TrieNode'] = {}
self.is_end_of_word: bool = False
class Trie:
def __init__(self):
self.root: TrieNode = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for char in word.lower():
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_of_word = True
def search(self, word: str) -> bool:
node = self._get_node(word.lower())
return node is not None and node.is_end_of_word
def starts_with(self, prefix: str) -> bool:
return self._get_node(prefix.lower()) is not None
def autocomplete(self, prefix: str, limit: int = 5) -> list[str]:
node = self._get_node(prefix.lower())
if node is None:
return []
results: list[str] = []
self._collect_words(node, prefix.lower(), results)
results.sort()
return results[:limit]
def _get_node(self, key: str) -> 'TrieNode | None':
node = self.root
for char in key:
if char not in node.children:
return None
node = node.children[char]
return node
def _collect_words(self, node: TrieNode, prefix: str, results: list[str]) -> None:
if node.is_end_of_word:
results.append(prefix)
for char, child in sorted(node.children.items()):
self._collect_words(child, prefix + char, results)
```
| Metric | Local 27B | Cloud (Opus) |
|--------|-----------|-------------|
| Latency (Trie task) | ~30s | ~8s |
| Cost | $0 | ~$0.05 |
| Think mode | Enabled (stripped before serving) | N/A |
| Quality gate | Passed (no escalation needed) | N/A |
### Cloud Escalation in Practice (`cloud_fallback: true`)
The same three tasks with `cloud_fallback: true` β the quality gate decides whether local output is good enough or needs cloud escalation.
| Task | used_cloud | Quality Gate | Latency | What happened |
|------|:----------:|-------------|---------|---------------|
| Fibonacci (simple) | **no** | Passed | 11s | 27B served directly, $0 |
| LRU Cache (medium) | **no** | Passed | 21s | 27B served directly, $0 |
| Trie (hard) | **yes** | `loop_detected` | 55s | 27B looped β gate caught it β escalated to cloud 27B |
The quality gate detected repeated sentences (β₯3 of the same sentence in β₯6 total) in the 27B's Trie output and escalated automatically. The cloud fallback returned clean code. On a second run of the same prompt, the 27B produced clean output without escalation β the loop is stochastic, not systematic.
**Takeaway:** for ~80β90% of coding tasks, the 27B handles everything locally at $0. The quality gate + cloud escalation exists as a safety net for the remaining cases where the local model loops, truncates, or produces empty output. Paid tiers get automatic escalation; free tier gets the local result with a warning.
---
## Why Prism Coder
### vs AI coding assistants
These tables are the maintainer's assessment as of June 2026. Verify claims that matter to you β products change fast.
| Feature | Prism Coder | GitHub Copilot | Cursor | Windsurf | Amazon Q | Devin |
|---|:---:|:---:|:---:|:---:|:---:|:---:|
| Local inference (open-weight) | β
| β | β | β | β | β |
| Works fully offline | β
(free tier) | β | β | β | β | β |
| Persistent cross-session memory | β
| β
| β | β | β | β |
| Session drift detection | β
| β | β | β | β | β |
| L3 grounding verifier | β
| β | β | β | β | β |
| Behavioral verification (pre-edit) | β
| β | β | β | β | β |
| MCP server (tools + memory) | β
| β | β | β | β | β |
| Web IDE | β
| β
| β | β | β
| β
|
| VS Code extension | β
| β
| β | β | β
| β |
| Flat-rate team pricing | β
| β (per-seat) | β (per-seat) | β | β | β |
| HIPAA BAA available | β
(Enterprise) | β | β | β | β | β |
### vs local AI / memory tools
| Feature | Prism Coder | Ollama | LM Studio | Mem0 | Zep |
|---|:---:|:---:|:---:|:---:|:---:|
| Local inference cascade | β
| β
| β
| β | β |
| Cloud fallback | β
| β | β | β | β |
| Persistent cross-session memory | β
| β | β | β
| β
|
| Knowledge ingestion (MCP + webhook) | β
| β | β | β | β |
| Cognitive routing (3-store) | β
| β | β | β | β |
| Session drift detection | β
| β | β | β | β |
| Native MCP server | β
| β | β | β | β |
| Web IDE + VS Code extension | β
| β | β | β | β |
### Pricing β flat-rate, not per-seat
| | **Prism Coder** | GitHub Copilot | Cursor | Amazon Q |
|---|:---:|:---:|:---:|:---:|
| **Individual** | **$19/mo** | $10/mo | $20/mo | $19/mo |
| **Team (5 devs)** | **$49/mo flat** | $95/mo | $200/mo | $95/mo |
| **Enterprise (25 devs)** | **$99/mo flat** | $195/mo | $1,000/mo | Custom |
---
## Plans
All on-device models are free to run locally via Ollama on every tier. A subscription gates **cloud** features, higher model ceilings, and increased limits. Local model ceilings are advisory β on-device models run on your Ollama regardless of plan; the ceiling gates cloud inference and `prism_infer` routing.
| | **Free** | **Standard** $19/mo | **Advanced** $49/mo | **Enterprise** $99/mo |
|---|---|---|---|---|
| Seats | 1 | 1 | up to 5 | up to 25 |
| Local model ceiling | up to 4b | up to 9b | up to 27b | up to 27b |
| Daily cloud inference | -- | 200 | 2,000 | 100,000 |
| Cloud Coder (Web IDE) | -- | 100/day | 1,000/day | 100,000/day |
| Cloud search | -- | 50/day | 500/day | 100,000/day |
| Max output tokens | 512 | 1,024 | 2,048 | 4,096 |
| Cloud fallback | -- | Claude Opus 4.7 | Claude Opus 4.7 | Priority + Opus 4.7 |
| Grounding verifier (fact-check AI output) | -- | β
| β
| β
|
| Memory sync (cloud) | -- | β
| β
| β
|
| Knowledge / session memory | limited | unlimited | unlimited | unlimited |
| Analytics dashboard | -- | β
| β
| β
|
| HIPAA BAA | -- | -- | -- | β
|
14-day free trial on paid plans. 25+ seats: [contact sales](https://synalux.ai/support)
---
## How agents use it
Prism exposes 40+ MCP tools. The core memory loop:
| Tool | What it does |
|---|---|
| `session_load_context` | Recover the prior session's state on boot |
| `session_save_ledger` | Append an immutable session log entry |
| `session_save_handoff` | Save live state for the next session |
| `knowledge_search` | Semantic + keyword search over all memories |
| `query_memory_natural` | Natural-language Q&A over the memory store |
| `session_detect_drift` | Detect when a session has drifted from its goal |
| `verify_behavior` | Pre-edit scenario challenge β catch bad changes before they happen |
| `knowledge_ingest` | Teach Prism a codebase or document |
| `prism_infer` | Local-first inference (route/chat/code modes, thinking, cloud escalation) |
| `inference_metrics` | Session delegation stats on demand (call count, tokens, local/cloud split) |
### `prism_infer` β local-first inference with cloud escalation
```typescript
prism_infer({
prompt: "Write a binary search in Python",
mode: "code", // "route" | "chat" | "code"
think: true, // enable reasoning (default: true for chat/code)
model_ceiling: "27b", // use the quality tier
})
// β 27B generates code locally ($0), with thinking for quality
// β If quality gate fails + paid tier β auto-escalate to Claude
```
| Mode | Think | Model | Use case |
|------|-------|-------|----------|
| `route` | Off (fast) | 9B default | MCP tool routing |
| `chat` | On | 27B preferred | Conversation, reasoning |
| `code` | On | 27B preferred | Code generation, debugging |
Full TypeScript signatures live in [`src/tools/`](../../src/tools/); architecture in [`docs/ARCHITECTURE.md`](../ARCHITECTURE.md).
### `inference_metrics` β see your local-model usage on demand
Call `inference_metrics` anytime mid-session to see how many `prism_infer` calls ran locally vs cloud, with actual token counts:
```
π Inference Metrics β local-model delegation (this session):
Total calls: 5 β Local: 5 (100%) | Cloud: 0 (0%)
Tokens: 1,240 in + 380 out = 1,620 total
Avg latency: 420ms
By model:
prism-coder:27b: 3 calls, 1,100 tokens, avg 520ms
prism-coder:9b: 2 calls, 520 tokens, avg 270ms
```
The same block also appears automatically in `session_save_ledger` and `session_save_handoff` responses at session end.
**Note:** This tracks `prism_infer` delegation only β not your host model's (Claude's) own token spend. For that, use Claude Code's `/cost` command.
### Local-model delegation (opt-in)
By default, your AI agent (Claude, Cursor, etc.) handles everything itself. You can optionally enable delegation so the agent offloads cheap, verifiable sub-tasks to local Ollama models at $0:
```bash
# Enable via Prism config
prism config set delegation_enabled true
```
When enabled, the agent's task router may delegate qualifying work β bulk classification, field extraction, mechanical formatting β to `prism_infer` instead of using cloud tokens. The agent always verifies the result and redoes it itself if quality is degraded.
**Guardrails:**
- **Off by default** β enforced in code, not just convention
- **Never delegates:** code/text that ships to the user, security/safety logic, planning/reasoning, anything where a silent quality drop isn't obvious
- **Always verifies:** checks `quality_gate_failed` and `used_cloud` before trusting local output
How Prism survives context compaction
The LLM context window is treated as ephemeral scratch space; durable state lives in the persistent store (SQLite locally, the portal in the cloud). Every session begins with a mandatory `session_load_context` call, so the agent is oriented before it writes a response. When a project exceeds a threshold (default 50 entries), `session_compact_ledger` summarizes old entries into a rollup, soft-archives the originals, and links them in the graph. See [`docs/COMPACTION.md`](../COMPACTION.md)
---
## CLI
```bash
prism load # load session context
prism save # save ledger + handoff
prism search # search code across repos (exact / regex / symbol / semantic)
prism review # AI code review β security, performance, style
prism scan # security scan β secrets, licenses, Dockerfile
prism push # push local SQLite to the cloud backend
prism register-models # alias dcostenco/prism-coder:* -> prism-coder:*
```
### `prism search` β semantic code search
### `prism review` β AI code review with HIPAA checks
### `prism scan` β security scanner for secrets, Dockerfiles, licenses
---
## Companions
Prism works alongside these tools β use whichever fits your workflow.
### Web IDE β Prism Coder
A browser-based IDE at [synalux.ai/coder](https://synalux.ai/coder). Import any GitHub repo and get:
- **Monaco editor** with multi-tab, split view, syntax highlighting, and VS Code keybindings
- **In-browser Node.js** via WebContainer (your code runs in the browser sandbox, not on a server)
- **Integrated terminal** β WebContainer shell in-browser; optional server PTY via WebSocket when connected to a dev server
- **AI Agent Mode** β describe a task and the agent creates files, runs type-checks, and verifies
- **Source control** β commit, branch, push/pull, stash, blame, tag management
- **Live Share** β real-time collaborative editing with session links
- **Node.js debugger** via Chrome DevTools Protocol
- **Tasks runner** (VS Code `tasks.json` compatible), **Problems panel** (Monaco diagnostics)
- **12-language i18n** β full UI localization
Standard+ plans get cloud AI and higher rate limits. Free tier works with local Ollama. Code execution uses the in-browser WebContainer by default; Live Share and the optional PTY terminal connect to external servers when explicitly enabled.
### VS Code Extension β Synalux
Memory-augmented AI inside VS Code with clinical practice management features. Install from the marketplace:
```bash
code --install-extension synalux-ai.synalux
```
[](https://marketplace.visualstudio.com/items?itemName=synalux-ai.synalux)
AI chat, voice input, SOAP note generator, team collaboration, and video calls β all inside VS Code. Routes through local Ollama by default; cloud on paid tiers.
Feature details
- **AI**: Chat participant (`@synalux`), multi-agent pipeline, voice input, model switching, 10 tones
- **Clinical**: SOAP note generator, role-based access, document signing, patient board
- **Collaboration**: Team chat, DMs, video calls, customer board, visual builder, DevContainers
- **Privacy**: Local Ollama by default. `preferLocal=true` tries local first. Enterprise BAA available.
### Prism AAC
Communication app for non-speaking users, powered by the on-device prism-coder fleet for phrase prediction. macOS / iOS / web.
See [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)
---
## Git Hooks (Portable)
Pre-commit and pre-push security hooks that work with any editor, any AI tool, and direct CLI. No Claude Code dependency.
```bash
# Install in all repos (one-time)
bash synalux-private/scripts/install-git-hooks.sh
# Or install manually in a single repo
cp hooks/pre-commit .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit
cp hooks/pre-push .git/hooks/pre-push && chmod +x .git/hooks/pre-push
```
| Hook | What it checks | Mode |
|------|----------------|------|
| `pre-commit` | Dead code, orphan services, scaffold code, missing auth | `PRECOMMIT_MODE=advisory\|block\|off` |
| `pre-push` | 19-rule security audit (SSRF, SQL injection, secrets, IDOR, etc.) | `PREPUSH_MODE=advisory\|block\|off` |
Default mode is `advisory` (warn but allow). Set `*_MODE=block` for hard enforcement. Hooks look for full audit scripts in the repo first (`hooks/lib/`), then `~/.claude/hooks/` fallback, then minimal inline checks.
---
## Self-hosting (Enterprise)
Run the full model stack on your own hardware β no cloud, full data sovereignty.
**Requirements:** Mac M2 Pro+ (48 GB recommended) or Linux + NVIDIA GPU, plus [Ollama](https://ollama.com).
```bash
ollama pull dcostenco/prism-coder:9b # default router
export LOCAL_LLM_URL=http://localhost:11434
```
Routing is automatic: `9b β 4b β cloud fallback` on desktop/server, `2b β cloud fallback` on mobile/iPhone. For iOS or another machine on the same network, run `OLLAMA_HOST=0.0.0.0 ollama serve` and point `LOCAL_LLM_URL` at the host's IP.
---
## Configuration reference
| Variable | Purpose | Default |
|---|---|---|
| `PRISM_STORAGE` | `local` / `synalux` / `supabase` / `auto` | `auto` |
| `PRISM_SYNALUX_API_KEY` | Paid-tier portal key (`synalux_sk_...`) | -- (local if unset) |
| `LOCAL_LLM_URL` | Ollama endpoint | `http://localhost:11434` |
| `PRISM_FORCE_LOCAL` | Force local SQLite regardless of credentials | `false` |
| `TELEMETRY_WRITE_TOKEN` | Portal analytics token (optional β metrics display works without it) | -- |
With no variables set, Prism runs fully local. Set `PRISM_SYNALUX_API_KEY` (and leave `PRISM_STORAGE=auto`) to use the cloud backend.
---
## Testing
```bash
npm test # full suite (vitest) β 95 files, 2841 tests
npm test -- --coverage # coverage report
```
Coverage spans HRR retrieval, knowledge ingestion, the inference cascade and grounding verifier, inference metrics, telemetry allowlist, delegation gate, compaction, the model picker, and storage round-trips.
---
## Migration: local to cloud
To move free-tier history into the paid portal:
```bash
node scripts/migrate-local-to-portal.mjs --dry-run # preview, no network
PRISM_SYNALUX_API_KEY=synalux_sk_... \
node scripts/migrate-local-to-portal.mjs # push ledger + handoffs
```
It reads `~/.prism-mcp/data.db` and POSTs entries to the portal. Ledger entries are append-only and de-duped server-side; handoffs use last-write-wins per project. Re-running on the same DB is safe. This is a one-shot migration, not a sync daemon β after it, set `PRISM_STORAGE=synalux` (or leave it on `auto`).
---
## License
| Product | License |
|---|---|
| **prism-mcp-server** (this repo) | [AGPL-3.0](../../LICENSE) |
| **VS Code extension** (synalux-ai.synalux) | BSL-1.1 |
| **Web IDE** (synalux.ai/coder) | Synalux Terms of Service |
| **Prism AAC** | AGPL-3.0 |
The AGPL-3.0 license covers the MCP server and its source code. The VS Code extension and Web IDE are separate products with their own licenses. Commercial hosted/managed deployment of the MCP server is available via the Synalux subscription.