# Distributed Architecture Datahike's architecture is built on **immutable persistent data structures** that enable efficient distribution and collaboration. The database is fundamentally designed around two complementary approaches: 1. **Distributed Index Space (DIS)**: Share persistent indices across processes—readers access data directly without database connections 2. **Remote Procedure Calls (RPC)**: Centralize computation on a server for shared caching and simplified deployment ![Network topology](assets/network_topology.svg) # Distributed Index Space (DIS) **Distributed Index Space is Datahike's key architectural advantage.** It enables massive read scalability and powers collaborative systems by treating database snapshots as immutable values that can be shared like files. ## How it works Datahike builds on **copy-on-write persistent data structures** where changes create new structure sharing most data with previous versions. When you transact to a database: 1. New index nodes are written to the shared [storage backend](storage-backends.md) (S3, JDBC, file, etc.) 2. A new root pointer is published atomically 3. Readers pick up the new snapshot on next access—no active connections needed This is similar to [Datomic](https://datomic.com), but **Datahike connections are lightweight and require no communication by default**. If you only need to read from a database (e.g., a dataset provided by a third party), you just need read access to the storage—no server setup required. ## Scaling and collaboration The DIS model provides fundamental advantages for distributed systems: - **Massive read scaling**: Add readers without coordination—they access persistent indices directly - **Zero connection overhead**: No connection pooling, no network round-trips for reads - **Snapshot isolation**: Each reader sees a consistent point-in-time view - **Efficient sharding**: Create one database per logical unit (e.g., per customer, per project)—readers can join across databases locally - **Offline-first capable**: Readers can cache indices locally and sync differentially when online This architecture enables collaborative systems where multiple processes share access to evolving datasets without centralized coordination. The same design principles that enable DIS (immutability, structural sharing) also support more advanced distribution patterns including CRDT-based merge strategies (see [replikativ](https://github.com/replikativ/replikativ)) and peer-to-peer synchronization (demonstrated with [dat-sync](https://github.com/replikativ/dat-sync)). These capabilities are valuable even in centralized production environments: differential sync reduces bandwidth, immutable snapshots simplify caching and recovery, and the architecture naturally handles network partitions. ## Single writer model Datahike uses a **single-writer, multiple-reader** model—the same architectural choice as Datomic, Datalevin, and XTDB. While multiple readers can access indices concurrently via DIS, write operations are serialized through a single writer process to ensure strong consistency and linearizable transactions. To provide distributed write access, you configure a writer endpoint (HTTP server or Kabel WebSocket). The writer: - Serializes all transactions for strong consistency guarantees - Publishes new index snapshots to the shared storage backend - Allows unlimited readers to access the updated indices via DIS **All readers continue to access data locally** from the distributed storage (shared filesystem, JDBC, S3, etc.) without connecting to the writer—they only contact it to submit transactions. This model is supported by all Datahike clients: JVM, Node.js, browser, CLI, Babashka pod, and libdatahike. The client setup is simple, you just add a `:writer` entry in the configuration for your database, e.g. ```clojure {:store {:backend :file :id #uuid "a1b2c3d4-e5f6-7890-abcd-ef1234567890" :path "/shared/filesystem/store"} :keep-history? true :schema-flexibility :read :writer {:backend :datahike-server :url "http://localhost:4444" :token "securerandompassword"}} ``` You can now use the normal `datahike.api` as usual and all operations changing a database, e.g. `create-database`, `delete-database` and `transact` are sent to the server while all other calls are executed locally. ### AWS lambda An example setup to run Datahike distributed in AWS lambda without a server can be found [here](https://github.com/viesti/clj-lambda-datahike). It configures a singleton lambda for write operations while reader lambdas can be run multiple times and scale out. This setup can be upgraded later to use dedicated servers through EC2 instances. ### Streaming writer (Kabel) **Beta feature - please try it out and provide feedback.** The Kabel writer provides **real-time reactive updates** via WebSockets, complementing the HTTP server's REST API. Where HTTP server is ideal for conventional REST integrations (including non-Clojure clients), Kabel enables live synchronization where clients receive database updates as they happen, without polling. The stack consists of: - [kabel](https://github.com/replikativ/kabel) - WebSocket transport with middleware support - [distributed-scope](https://github.com/replikativ/distributed-scope) - Remote function invocation with Clojure semantics - [konserve-sync](https://github.com/replikativ/konserve-sync) - Differential store synchronization (only transmits changed data) This setup is particularly useful for browser clients where storage backends cannot be shared directly, and for applications requiring reactive UIs that update automatically when data changes on the server (see [JavaScript API](javascript-api.md)). #### Server setup The server owns the database and handles all write operations. It uses a file backend and broadcasts updates to connected clients via konserve-sync. ```clojure (ns my-app.server (:require [datahike.api :as d] [datahike.kabel.handlers :as handlers] [datahike.kabel.fressian-handlers :as fh] [kabel.peer :as peer] [kabel.http-kit :refer [create-http-kit-handler!]] [konserve-sync.core :as sync] [is.simm.distributed-scope :refer [remote-middleware invoke-on-peer]] [superv.async :refer [S go-try #{["Alice"] ["Bob"]} ;; Clean up (d/release conn) (