# Storage Backends

Datahike provides pluggable storage through [konserve](https://github.com/replikativ/konserve), allowing you to choose the backend that best fits your deployment model and performance requirements.

## Quick Reference

| Backend | Best For | Distribution | Durability | Write Throughput |
|---------|----------|--------------|------------|------------------|
| **File** | Unix tools, rsync, git-like workflows | Single machine | High | Good |
| **LMDB** | High-performance single machine | Single filesystem | High | Excellent |
| **Memory** | Testing, ephemeral data | Single process | None | Excellent |
| **JDBC** | Existing SQL infrastructure | Multi-machine | High | Good |
| **Redis** | High write throughput | Multi-machine | Medium | Excellent |
| **S3** | Distributed scale-out, cost-effective | Multi-region | Very high | Good |
| **GCS** | Google Cloud scale-out | Multi-region | Very high | Good |
| **DynamoDB** | Low latency, AWS-native | Multi-region | Very high | Excellent (expensive) |
| **IndexedDB** | Browser persistence | Browser | Medium | Good |

## Local Backends

### File Backend

**Use when**: You want to use Unix tools (rsync, git, backup scripts) to manage your database.

**Key advantage**: Deltas in persistent data structures translate directly into individual file deltas, making incremental backups and synchronization highly efficient.

```clojure
{:store {:backend :file
         :path "/var/lib/myapp/db"}}
```

**Characteristics**:
- Each immutable index fragment stored as an individual file
- Efficient incremental backups with rsync
- Can version database directories similar to or with git
- Good for single-machine deployments
- Extensively tested and reliable without external dependencies
- Not ideal for databases with a lot of churn

### LMDB Backend

**Use when**: You need maximum performance on a single machine within a single filesystem.

**Key advantage**: Lightning-fast memory-mapped database with ACID transactions, optimized for read-heavy workloads.

```clojure
;; Requires: org.replikativ/datahike-lmdb
{:store {:backend :lmdb
         :path "/var/lib/myapp/db"}}
```

**Characteristics**:
- Memory-mapped for zero-copy reads
- Single filesystem only (not distributed)
- Excellent read performance
- Lower memory overhead than file backend
- Well suited for very high churn of small changes
- Very low latency
- Large file blob that cannot be as efficiently synched a the file store

**Note**: The LMDB backend is available as a separate library: [datahike-lmdb](https://github.com/replikativ/datahike-lmdb), extending [konserve-lmdb](https://github.com/replikativ/lmdb).

### Memory Backend

**Use when**: Testing, development, or ephemeral data that doesn't need to survive process restarts.

```clojure
{:store {:backend :memory
         :id #uuid "550e8400-e29b-41d4-a716-446655440030"}}
```

**Characteristics**:
- No persistence - data lost on process exit
- Fastest possible performance
- Ideal for unit tests and REPL development
- Multiple databases distinguished by `:id`

## Distributed Backends

All distributed backends support **Distributed Index Space (DIS)**: multiple reader processes can directly access shared storage without database connections, enabling massive read scalability.

**Important**: Datahike uses a single-writer model. Multiple readers can access indices concurrently, but only one writer process should transact at a time. This is the same model used by Datomic, Datalevin, and XTDB.

### JDBC Backend

**Use when**: You already have PostgreSQL or another JDBC database in your infrastructure.

**Key advantage**: Leverage existing SQL database skills, backup procedures, and monitoring tools.

```clojure
;; Requires: org.replikativ/datahike-jdbc
{:store {:backend :jdbc
         :dbtype "postgresql"
         :host "db.example.com"
         :port 5432
         :dbname "datahike"
         :user "datahike"
         :password "..."}}
```

**Characteristics**:
- Use familiar SQL database operations
- Existing backup/restore procedures work
- Read scaling via DIS (readers don't interfere with writer in database's MVCC)
- Good for teams already operating PostgreSQL
- Available for: PostgreSQL, MySQL, H2, and others

**Note**: Available as separate library: [datahike-jdbc](https://github.com/replikativ/datahike-jdbc)

### Redis Backend

**Use when**: You need high write throughput and can tolerate weaker durability guarantees.

**Key advantage**: Excellent write performance with in-memory speed.

```clojure
;; Requires: org.replikativ/konserve-redis
{:store {:backend :redis
         :host "redis.example.com"
         :port 6379}}
```

**Characteristics**:
- Very high write throughput
- Durability depends on Redis persistence settings (RDB/AOF)
- Can lose recent writes on Redis crash
- Good for high-traffic applications where some data loss is acceptable
- Distributed reads via DIS

### S3 Backend

**Use when**: You want cost-effective distributed storage that scales to massive datasets.

**Key advantage**: Extremely scalable, pay-per-use pricing, natural fit for cloud-native architectures.

```clojure
;; Requires: org.replikativ/konserve-s3
{:store {:backend :s3
         :bucket "my-datahike-bucket"
         :region "us-east-1"}}
```

**Characteristics**:
- Unlimited scalability
- Very low storage costs (compared to databases)
- High durability (11 nines)
- Eventually consistent (may have slight read lag)
- Ideal for read-heavy workloads with occasional writes
- Can have high latency
- Works well with AWS Lambda deployments

**Performance note**: Higher latency than local storage, but cost-effective for billions of datoms.

### Google Cloud Storage (GCS) Backend

**Use when**: You're on Google Cloud Platform and want distributed storage.

**Key advantage**: Similar to S3 but optimized for GCP infrastructure.

```clojure
;; Requires: org.replikativ/konserve-gcs
{:store {:backend :gcs
         :bucket "my-datahike-bucket"
         :project-id "my-project"}}
```

**Characteristics**:
- Similar to S3 in characteristics
- Native GCP integration
- Good latency within GCP regions
- Cost-effective for large datasets

### DynamoDB Backend

**Use when**: You need low-latency distributed storage and are willing to pay premium pricing.

**Key advantage**: Single-digit millisecond latency with strong consistency options.

```clojure
;; Requires: org.replikativ/konserve-dynamodb
{:store {:backend :dynamodb
         :table "datahike"
         :region "us-east-1"}}
```

**Characteristics**:
- Very low latency
- Strong consistency available
- Higher costs than S3
- Good for latency-sensitive applications
- On-demand or provisioned capacity modes

## Browser Backend

### IndexedDB Backend

**Use when**: Building offline-capable browser applications with persistent local storage.

**Key advantage**: Durable browser-local storage with ClojureScript support.

```clojure
;; ClojureScript only
{:store {:backend :indexeddb
         :id "my-app-db"}}
```

**Characteristics**:
- Persistent across browser sessions
- ~50MB-unlimited quota (browser-dependent)
- Asynchronous API
- Often paired with TieredStore for performance

## Advanced: TieredStore

**TieredStore** creates memory hierarchies by layering backends, with faster storage in front of slower, more durable storage.

**Use cases**:
- **Browser**: Memory (fast) → IndexedDB (persistent)
- **Server**: Memory → LMDB → S3 (hot → warm → cold)
- **AWS**: LMDB (fast local) → S3 (distributed backup)

```clojure
;; Example: Fast memory cache backed by S3
{:store {:backend :tiered
         :id #uuid "550e8400-e29b-41d4-a716-446655440031"
         :frontend-config {:backend :memory
                          :id #uuid "550e8400-e29b-41d4-a716-446655440031"}
         :backend-config {:backend :s3
                         :bucket "persistent-store"
                         :region "us-east-1"
                         :id #uuid "550e8400-e29b-41d4-a716-446655440031"}
         :write-policy :write-through
         :read-policy :frontend-first}}
```

**How it works**:
- Reads check tiers in order (cache-first)
- Writes go to all tiers
- Stacking multiple tiers supported but rarely needed
- Provided by konserve's tiered store implementation

**Common patterns**:

**Browser with offline support**:
```clojure
{:store {:backend :tiered
         :id #uuid "550e8400-e29b-41d4-a716-446655440032"
         :frontend-config {:backend :memory
                          :id #uuid "550e8400-e29b-41d4-a716-446655440032"}
         :backend-config {:backend :indexeddb
                         :id #uuid "550e8400-e29b-41d4-a716-446655440032"}
         :write-policy :write-through}}
```

**AWS Lambda with S3 backing**:
```clojure
{:store {:backend :tiered
         :id #uuid "550e8400-e29b-41d4-a716-446655440033"
         :frontend-config {:backend :lmdb
                          :path "/tmp/cache"
                          :id #uuid "550e8400-e29b-41d4-a716-446655440033"}
         :backend-config {:backend :s3
                         :bucket "lambda-data"
                         :region "us-east-1"
                         :id #uuid "550e8400-e29b-41d4-a716-446655440033"}}}
```

## Backend-Specific Configuration

Each backend may have additional configuration options. See the konserve backend documentation for details:

- [konserve](https://github.com/replikativ/konserve) - Core abstraction
- [konserve-lmdb](https://github.com/replikativ/lmdb) - LMDB implementation
- [datahike-lmdb](https://github.com/replikativ/datahike-lmdb) - Datahike LMDB integration
- [datahike-jdbc](https://github.com/replikativ/datahike-jdbc) - JDBC backends
- [konserve-s3](https://github.com/replikativ/konserve-s3) - S3 backend
- [konserve-redis](https://github.com/replikativ/konserve-redis) - Redis backend

## Choosing a Backend

### For Development
→ **Memory** or **File** backend for simplicity

### For Single-Machine Production
→ **LMDB** for best performance
→ **File** if you need Unix tool integration

### For Distributed Production (Read Scaling)
→ **S3/GCS** for cost-effective scale
→ **DynamoDB** for low latency (higher cost)
→ **JDBC** if you already operate PostgreSQL

### For High Write Throughput
→ **Redis** if you can tolerate some data loss
→ **LMDB** for durable local writes
→ **DynamoDB** for distributed writes (expensive)

### For Browser Applications
→ **IndexedDB** for persistence
→ **TieredStore** (Memory → IndexedDB) for speed + durability

### For Cost Optimization
→ **File** backend with rsync for cheap backups
→ **S3** for large datasets (pennies per GB)
→ **TieredStore** to minimize expensive tier access

## Migration Between Backends

To migrate from one backend to another:

1. Export from source database:
```clojure
(require '[datahike.migrate :refer [export-db import-db]])
(export-db source-conn "/tmp/datoms-export")
```

2. Create destination database with new backend:
```clojure
(d/create-database new-config)
(def dest-conn (d/connect new-config))
```

3. Import into destination:
```clojure
(import-db dest-conn "/tmp/datoms-export")
```

The export format (CBOR) preserves all data types including binary data.

## Performance Considerations

### Read Performance
- **Fastest**: Memory, LMDB (memory-mapped)
- **Fast**: File (SSD), Redis
- **Good**: JDBC, S3 (with tiering)
- **Variable**: DynamoDB (provisioned vs on-demand)

### Write Performance
- **Fastest**: Memory, Redis
- **Fast**: LMDB, DynamoDB (provisioned)
- **Good**: File, JDBC, S3
- **Slower**: S3 (especially small writes)

### Distribution
- **No distribution**: Memory, File, LMDB (single filesystem)
- **Distributed reads**: All cloud backends via DIS
- **Single writer**: All backends (architectural constraint)

### Durability
- **None**: Memory (ephemeral)
- **Medium**: Redis (depends on persistence settings), IndexedDB
- **High**: File, LMDB, JDBC
- **Very high**: S3, GCS, DynamoDB (11 nines)

## Custom Backends

Datahike can use any konserve backend. To create a custom backend:

1. Implement the [konserve protocols](https://github.com/replikativ/konserve/blob/main/src/konserve/protocols.cljc)
2. Register your backend with konserve
3. Use it in Datahike configuration

See the [konserve documentation](https://github.com/replikativ/konserve) for details on implementing custom backends.