# Datahike Database Configuration

Datahike is highly configurable to support different deployment models and use cases. Configuration is set at database creation and cannot be changed afterward (though data can be migrated to a new configuration).

## Configuration Methods

Datahike uses the [environ library](https://github.com/weavejester/environ) for configuration, supporting three methods:

1. **Environment variables** (lowest priority)
2. **Java system properties** (middle priority)
3. **Configuration map argument** (highest priority - overwrites others)

This allows flexible deployment: hardcode configs in development, use environment variables in containers, or Java properties in production JVMs.

## Basic Configuration

The minimal configuration map includes:

```clojure
{:store              {:backend :memory      ;keyword - storage backend
                      :id #uuid "550e8400-e29b-41d4-a716-446655440020"} ;UUID - database identifier
 :name               nil                    ;string - optional database name (auto-generated if nil)
 :schema-flexibility :write                 ;keyword - :read or :write
 :keep-history?      true                   ;boolean - enable time-travel queries
 :attribute-refs?    false                  ;boolean - use entity IDs for attributes (Datomic-compatible)
 :index              :datahike.index/persistent-set  ;keyword - index implementation
 :store-cache-size   1000                   ;number - store cache entries
 :search-cache-size  10000}                 ;number - search cache entries
```

**Quick start** with defaults (in-memory database):

```clojure
(require '[datahike.api :as d])
(d/create-database)  ;; Creates memory DB with sensible defaults
```

## Storage Backends

Datahike supports multiple storage backends via [konserve](https://github.com/replikativ/konserve). The choice of backend determines durability, scalability, and deployment model.

**Built-in backends:**
- `:memory` - In-memory (ephemeral)
- `:file` - File-based persistent storage

**External backend libraries:**
- [LMDB](https://github.com/replikativ/datahike-lmdb) - High-performance local storage
- [JDBC](https://github.com/replikativ/datahike-jdbc) - PostgreSQL, MySQL, H2
- [Redis](https://github.com/replikativ/konserve-redis) - High write throughput
- [S3](https://github.com/replikativ/konserve-s3) - AWS cloud storage
- [GCS](https://github.com/replikativ/konserve-gcs) - Google Cloud storage
- [DynamoDB](https://github.com/replikativ/konserve-dynamodb) - AWS NoSQL
- [IndexedDB](https://github.com/replikativ/konserve-indexeddb) - Browser storage

**For detailed backend selection guidance**, see [Storage Backends Documentation](./storage-backends.md).

### Environment Variable Configuration

When using environment variables or Java system properties, name them like:

properties                  | envvar
----------------------------|--------------------------
datahike.store.backend      | DATAHIKE_STORE_BACKEND
datahike.store.username     | DATAHIKE_STORE_USERNAME
datahike.schema.flexibility | DATAHIKE_SCHEMA_FLEXIBILITY
datahike.keep.history       | DATAHIKE_KEEP_HISTORY
datahike.attribute.refs     | DATAHIKE_ATTRIBUTE_REFS
datahike.name               | DATAHIKE_NAME
etc.

**Note**: Do not use `:` in keyword strings for environment variables—it will be added automatically.

### Backend Configuration Examples

#### Memory (Built-in)

Ephemeral storage for testing and development:

```clojure
{:store {:backend :memory
         :id #uuid "550e8400-e29b-41d4-a716-446655440021"}}
```

Environment variables:
```bash
DATAHIKE_STORE_BACKEND=memory
DATAHIKE_STORE_CONFIG='{:id #uuid "550e8400-e29b-41d4-a716-446655440021"}'
```

#### File (Built-in)

Persistent local file storage:

```clojure
{:store {:backend :file
         :path "/var/db/datahike"}}
```

Environment variables:
```bash
DATAHIKE_STORE_BACKEND=file
DATAHIKE_STORE_CONFIG='{:path "/var/db/datahike"}'
```

#### LMDB (External Library)

High-performance local storage via [datahike-lmdb](https://github.com/replikativ/datahike-lmdb):

```clojure
{:store {:backend :lmdb
         :path "/var/db/datahike-lmdb"}}
```

#### JDBC (External Library)

PostgreSQL or other JDBC databases via [datahike-jdbc](https://github.com/replikativ/datahike-jdbc):

```clojure
{:store {:backend :jdbc
         :dbtype "postgresql"
         :host "db.example.com"
         :port 5432
         :dbname "datahike"
         :user "datahike"
         :password "secret"}}
```

#### S3 (External Library)

AWS S3 storage via [konserve-s3](https://github.com/replikativ/konserve-s3):

```clojure
{:store {:backend :s3
         :bucket "my-datahike-bucket"
         :region "us-east-1"}}
```

#### TieredStore (Composable)

Memory hierarchy (e.g., Memory → IndexedDB for browsers):

```clojure
{:store {:backend :tiered
         :id #uuid "550e8400-e29b-41d4-a716-446655440022"
         :frontend-config {:backend :memory
                          :id #uuid "550e8400-e29b-41d4-a716-446655440022"}
         :backend-config {:backend :indexeddb
                         :name "persistent-db"
                         :id #uuid "550e8400-e29b-41d4-a716-446655440022"}}}
         ;; All :id values must match for konserve validation
```

For complete backend options and selection guidance, see [Storage Backends](./storage-backends.md).


## Core Configuration Options

### Database Name

Optional identifier for the database. Auto-generated if not specified. Useful when running multiple databases:

```clojure
{:name "production-db"
 :store {:backend :file :path "/var/db/prod"}}
```

### Schema Flexibility

Controls when schema validation occurs:

- **`:write`** (default): Strict schema—attributes must be defined before use. Catches errors early.
- **`:read`**: Schema-less—accept any data, validate on read. Flexible for evolving data models.

```clojure
{:schema-flexibility :read}  ;; Allow any data structure
```

With `:read` flexibility, you can still define critical schema like `:db/unique`, `:db/cardinality`, or `:db.type/ref` where needed.

See [Schema Documentation](./schema.md) for details.

### Time-Travel Queries

Enable historical query capabilities:

```clojure
{:keep-history? true}  ;; Default: true
```

When enabled, use `history`, `as-of`, and `since` to query past states:

```clojure
(d/q '[:find ?e :where [?e :name "Alice"]] (d/as-of db #inst "2024-01-01"))
```

**Disable if**: You never need historical queries and want to save storage space.

See [Time Variance Documentation](./time_variance.md) for time-travel query examples.

### Attribute References

Store attributes as entity IDs (integers) instead of keywords in datoms for performance and Datomic compatibility:

```clojure
{:attribute-refs? true}  ;; Default: false
```

**How it works:**

Without attribute references (default):
```clojure
;; Datoms store attribute keywords directly
#datahike/Datom [1 :name "Alice" 536870913 true]
```

With attribute references enabled:
```clojure
;; Datoms store attribute entity IDs (integers)
#datahike/Datom [1 73 "Alice" 536870913 true]  ;; where 73 is the entity ID for :name
```

**Benefits:**
- **Better performance**: Integer comparisons are significantly faster than keyword comparisons, especially with many attributes
- **Datomic compatibility**: Matches Datomic's internal representation for easier migration
- **Attributes as entities**: Attributes become queryable entities in the database
- **Recommended for production**: Generally beneficial unless you have specific reasons to use keywords

**Considerations:**
- Must use `:schema-flexibility :write` (cannot use with `:read`)
- Requires ID ↔ keyword mapping (maintained automatically)
- System schema is bootstrapped into the index on database creation
- You still use keyword syntax in queries and transactions - translation is automatic

**Example:**

```clojure
;; Create database with attribute references
(def cfg {:store {:backend :memory
                  :id #uuid "550e8400-e29b-41d4-a716-446655440000"}
          :attribute-refs? true
          :schema-flexibility :write})

(d/create-database cfg)
(def conn (d/connect cfg))

;; Use normal keyword syntax in transactions and queries
(d/transact conn [{:db/ident :name
                   :db/valueType :db.type/string
                   :db/cardinality :db.cardinality/one}])

(d/transact conn [{:name "Alice"}])

;; Queries use keywords as usual - translation happens automatically
(d/q '[:find ?n :where [?e :name ?n]] @conn)
;; => #{["Alice"]}

;; But internally, datoms store integer attribute IDs for performance
```

**When to use:**
- **Use `:attribute-refs? true`** for production databases (recommended for performance)
- Use `:attribute-refs? false` only if you need `:schema-flexibility :read` or have specific compatibility requirements

### Index Selection

Choose the underlying index implementation:

```clojure
{:index :datahike.index/persistent-set}  ;; Default (recommended)
```

**Available indexes**:
- `:datahike.index/persistent-set` - Default, actively maintained, supports all features
- `:datahike.index/hitchhiker-tree` - Legacy, requires explicit library and namespace loading

Most users should use the default. Hitchhiker-tree is maintained for backward compatibility with existing databases.

## Advanced Configuration

### Single-Writer Model (Distributed Access)

For distributed deployments, configure a writer to handle all transactions while readers access storage directly via Distributed Index Space.

#### HTTP Server Writer

```clojure
{:store {:backend :file :path "/shared/db"}
 :writer {:backend :datahike-server
          :url "http://writer.example.com:4444"
          :token "secure-token"}}
```

Clients connect and transact through the HTTP server. Reads happen locally from shared storage.

#### Kabel WebSocket Writer (Beta)

Real-time reactive updates via WebSocket:

```clojure
{:store {:backend :indexeddb :name "app-db" :id store-id}
 :writer {:backend :kabel
          :peer-id server-peer-id
          :local-peer @client-peer}}  ;; Set up via kabel/distributed-scope
```

Enables browser clients with live synchronization. See [Distributed Architecture](./distributed.md) for setup details.

### Branching (Beta)

Access specific database branches (git-like versioning):

```clojure
{:store {:backend :file :path "/var/db"}
 :branch :staging}  ;; Default branch is :db
```

Create and merge branches for testing, staging, or experiments. See [Versioning](./versioning.md) for the branching API.

### Remote Procedure Calls

Send all operations (reads and writes) to a remote server:

```clojure
{:store {:backend :memory :id #uuid "550e8400-e29b-41d4-a716-446655440023"}
 :remote-peer {:backend :datahike-server
               :url "http://server.example.com:4444"
               :token "secure-token"}}
```

Useful for thin clients or when you want centralized query execution. See [Distributed Architecture](./distributed.md) for RPC vs. DIS trade-offs.

### Initial Transaction

Seed the database with schema or data on creation:

```clojure
{:store {:backend :memory :id #uuid "550e8400-e29b-41d4-a716-446655440024"}
 :initial-tx [{:db/ident :name
               :db/valueType :db.type/string
               :db/cardinality :db.cardinality/one}
              {:db/ident :email
               :db/valueType :db.type/string
               :db/unique :db.unique/identity
               :db/cardinality :db.cardinality/one}]}
```

Convenient for testing or deploying databases with predefined schema.

### Complete Configuration Example

```clojure
{:store {:backend :file
         :path "/var/datahike/production"
         :id #uuid "550e8400-e29b-41d4-a716-446655440000"}
 :name "production-db"
 :schema-flexibility :write
 :keep-history? true
 :attribute-refs? false
 :index :datahike.index/persistent-set
 :store-cache-size 10000
 :search-cache-size 100000
 :initial-tx [{:db/ident :user/email
               :db/valueType :db.type/string
               :db/unique :db.unique/identity
               :db/cardinality :db.cardinality/one}]
 :writer {:backend :datahike-server
          :url "http://writer.example.com:4444"
          :token "secure-token"}
 :branch :db}
```

## Migration and Compatibility

### URI Scheme (Pre-0.3.0, Deprecated)

Prior to version 0.3.0, Datahike used URI-style configuration. This is **still supported** but deprecated in favor of the more flexible hashmap format.

**Old URI format**:
```clojure
"datahike:memory://my-db?temporal-index=true&schema-on-read=true"
```

**New hashmap format** (equivalent):
```clojure
{:store {:backend :memory :id #uuid "550e8400-e29b-41d4-a716-446655440025"}
 :keep-history? true
 :schema-flexibility :read}
```

**Key changes**:
- `:temporal-index` → `:keep-history?`
- `:schema-on-read` → `:schema-flexibility` (`:read` or `:write`)
- Store parameters moved to `:store` map
- Memory backend: `:host`/`:path` → `:id`
- Direct support for advanced features (writer, branches, initial-tx)

Existing URI configurations continue to work—no migration required unless you need new features.

## Further Documentation

- [Storage Backends](./storage-backends.md) - Choosing and configuring storage
- [Schema](./schema.md) - Schema definition and flexibility
- [Time Variance](./time_variance.md) - Historical queries (as-of, history, since)
- [Versioning](./versioning.md) - Git-like branching and merging
- [Distributed Architecture](./distributed.md) - DIS, writers, and RPC
- [JavaScript API](./javascript-api.md) - Node.js and browser usage