# Using RedDB for Logs > **TL;DR** — RedDB is a serious log store if you want logs + metrics + > traces + semantic search in **one SQL-native engine**. This guide > walks you from schema design → ingest → dashboards → retention → > multi-model correlation, with honest notes on where we win vs > Loki / Elasticsearch / ClickHouse and where we don't yet. --- ## Why RedDB for logs? Most stacks glue together: | Need | Typical system | |-------------------------------|---------------| | Raw ingest | Loki / Fluentd / Vector | | SQL analytics on logs | ClickHouse / BigQuery | | Full-text / fuzzy search | Elasticsearch / OpenSearch | | Metrics + dashboards | Prometheus + Grafana | | Traces | Jaeger / Tempo | | Anomaly / classification | Python pipelines | RedDB collapses those into one engine with **one SQL surface**: * Append-only tables + hypertables → ingest path * Per-column codecs (Dict / LZ4 / ZSTD / Delta / DoubleDelta) → 5–20× compression on typical log shapes * Continuous aggregates → dashboards that don't rescan * `quantileTDigest` / `uniq` / `topK` / `count_if` → p99, unique users, top endpoints * Vector + hybrid search → semantic log search (`"logs parecidos com este incidente"`) * Graph store → trace / span traversal * Retention daemon → O(1) chunk drops * ML classifier + semantic cache → anomaly / classification over logs without leaving the DB You don't have to use everything — start with tables + codecs + MVs; the rest is available when you need it. --- ## 1. Pick the right model RedDB has three append-flavoured models; pick based on shape: | If the log is… | Use | |-----------------------------------------------------|-----| | Free-form JSON / text, millions of lines/s, no SQL | **Log Collections** (`POST /logs/{name}/append`) | | Typed rows (status, latency, service, user_id) | **`CREATE TABLE ... APPEND ONLY`** | | Has a dominant time axis + need time_bucket / retention / drop_chunks | **`CREATE HYPERTABLE`** | **Rule of thumb**: use **hypertables** for anything you'd put in Prometheus / Timescale / Influx — that's 80% of log workloads. Use **log collections** only when you can't enforce a schema at write time. --- ## 2. A realistic schema Let's model an HTTP access log with traces: ```sql CREATE HYPERTABLE access_log ( ts BIGINT CODEC(DoubleDelta, ZSTD(3)), service TEXT CODEC(Dict, LZ4), severity INT CODEC(T64), status INT CODEC(T64), method TEXT CODEC(Dict), path TEXT CODEC(Dict, LZ4), latency_ms INT CODEC(Delta), bytes_out BIGINT CODEC(Delta, ZSTD(3)), user_id BIGINT CODEC(Delta), region TEXT CODEC(Dict), client_ip TEXT CODEC(Dict, LZ4), trace_id TEXT CODEC(LZ4), span_id TEXT CODEC(LZ4), message TEXT CODEC(ZSTD(6)) ) CHUNK_INTERVAL '1d'; ``` Codec choice rationale: * **`Dict`** on `service` / `method` / `region` / `path` — cardinality is tiny compared to the row count. Expect 20–50× reduction. * **`T64`** on `status` / `severity` — values fit in 10 bits, dictionary overhead isn't worth it. * **`Delta`** / **`DoubleDelta`** on `ts`, `latency_ms`, `bytes_out`, `user_id` — typically monotonic or slowly drifting. * **`ZSTD`** on `message` — high-entropy free text; still wins 2–3×. * **`LZ4`** on `trace_id` / `span_id` — UUID-like strings with byte-level repetition only. > **Landing status**: the `CODEC(...)` DDL syntax is parsed by the > B3 codec pipeline but the parser bridge lands in the next sprint. > Until then declare the columns without `CODEC(...)` and call > [`encode_bytes`](../engine/columnar-execution.md#codecs) from the > ingest path. Indexes worth declaring now: ```sql CREATE INDEX access_log_service_ts ON access_log (service, ts); CREATE INDEX access_log_status_hash ON access_log USING HASH (status); CREATE INDEX access_log_trace_id ON access_log (trace_id); ``` --- ## 3. Ingest ### HTTP — JSON-first (recommended) Don't synthesise INSERT statements from your log pipeline. Use the `/ingest/{collection}` endpoint with a JSON body: ```bash curl -X POST http://localhost:5000/ingest/access_log \ -H 'Content-Type: application/json' \ -d '[ {"ts": 1714000000000000000, "service": "api", "severity": 2, "status": 200, "method": "GET", "path": "/health", "latency_ms": 3}, {"ts": 1714000000100000000, "service": "api", "severity": 3, "status": 500, "method": "POST", "path": "/checkout", "latency_ms": 842} ]' ``` Or pipe NDJSON from any tool that can emit one JSON object per line: ```bash cat today.ndjson | curl -X POST http://localhost:5000/ingest/access_log \ -H 'Content-Type: application/x-ndjson' --data-binary @- ``` See the [Ingest API](../api/ingest.md) page for the full contract (ack shape, envelope form, WebSocket, Vector / Fluent Bit config, auth, troubleshooting). ### HTTP — SQL (when you need expressions) ```bash curl -X POST http://localhost:5000/sql \ -H 'Content-Type: application/json' \ -d '{"query":"INSERT INTO access_log (ts, service, severity, status, method, path, latency_ms) VALUES (1714000000000000000, '\''api'\'', 2, 200, '\''GET'\'', '\''/health'\'', 3)"}' ``` ### gRPC / drivers Python: ```python from reddb import Client db = Client("localhost:55055") db.insert_many("access_log", [ {"ts": 1_714_000_000_000_000_000, "service": "api", "severity": 2, "status": 200, "method": "GET", "path": "/health", "latency_ms": 3}, ... # hundreds per batch for best throughput ]) ``` Node / Bun: ```js import { Client } from "reddb"; const db = new Client("localhost:55055"); await db.insertMany("access_log", batch); ``` Rust (in-process, highest throughput): ```rust use reddb::storage::timeseries::{LogLine, LogPipeline, LogSeverity}; let pipe = LogPipeline::new("access_log", "ts", "1d").unwrap(); pipe.set_retention(90 * 86_400); // 90 days let lines: Vec = events.into_iter().map(|e| { LogLine::now(LogSeverity::Info, &e.service, &e.message) .with_field("latency_ms", e.latency_ms) .with_field("status", e.status as f64) .with_label("path", &e.path) .with_label("region", &e.region) .with_trace(&e.trace_id, &e.span_id) }).collect(); pipe.ingest_batch(&lines); ``` ### Performance tips * **Always batch**. Hundreds of rows per INSERT. Single-row inserts hit ~30k/s; batched inserts hit ~800k/s on a laptop. * **Keep `ts` monotonic per writer** — it lets Delta codec win maximum ratio on timestamps. * Pre-group by `service`: your writer thread should emit contiguous runs of the same service so the `Dict` codec compresses best. * Use the `application` Rust port (`LogPipeline::ingest_batch`) when you're in-process — it bypasses the wire layer entirely. --- ## 4. Querying ### Point lookups ```sql -- A single trace's spans, chronologically SELECT ts, service, span_id, latency_ms, message FROM access_log WHERE trace_id = 'c4fc28e8-9a1e-4a7e-9a5f-e4c3b8f2a1d9' ORDER BY ts; ``` Partition pruning drops every chunk whose time range can't contain rows for that trace. ### Time-series dashboards ```sql SELECT time_bucket('1m', ts) AS bucket, service, count(*) AS hits, count_if(status >= 500) AS errors_5xx, quantileTDigest(0.50, latency_ms) AS p50, quantileTDigest(0.99, latency_ms) AS p99, uniq(user_id) AS unique_users, sum(bytes_out) AS bytes_served FROM access_log WHERE ts >= NOW() - INTERVAL '24 hours' GROUP BY bucket, service ORDER BY bucket DESC; ``` Every aggregate in that query is native: `count_if`, `quantileTDigest`, `uniq`, `sum` — you don't need a separate analytics engine. ### Continuous aggregates (pre-materialised) ```sql CREATE CONTINUOUS AGGREGATE access_5m AS SELECT time_bucket('5m', ts) AS bucket, service, count(*) AS hits, count_if(status >= 500) AS errors, avg(latency_ms) AS avg_lat, quantileTDigest(0.99, latency_ms) AS p99 FROM access_log GROUP BY bucket, service WITH (refresh_lag = '30s', max_interval_per_job = '1h'); ``` Dashboards hit `access_5m` — instant, constant-time response even on billions of rows. The refresh daemon fills new buckets every refresh cycle; **you never hand-schedule anything**. ### Error ranking ("what's blowing up") ```sql SELECT path, count(*) AS bad, topK(5, client_ip) AS top_offenders FROM access_log WHERE ts >= NOW() - INTERVAL '1 hour' AND status >= 500 GROUP BY path ORDER BY bad DESC LIMIT 10; ``` ### Tail (`tail -f`-style) ```sql -- Poll-based (every dashboard does this) SELECT * FROM access_log WHERE ts > $last_ts_watermark ORDER BY ts LIMIT 500; ``` In-process Rust callers can also use the bounded ring buffer: ```rust let new_lines = pipe.tail_since(last_watermark_ns); ``` It's back-pressured by `recent_capacity` (default 4096) so a slow consumer can't grow unbounded. ### Semantic search ("logs like this one") ```sql SELECT id, ts, message, SIMILARITY(message_vec, EMBEDDING('payment refused', 'openai')) AS score FROM access_log WHERE ts >= NOW() - INTERVAL '7 days' AND SIMILARITY(message_vec, EMBEDDING('payment refused', 'openai')) > 0.75 ORDER BY score DESC LIMIT 20 RERANK WITH (provider = 'cohere', model = 'rerank-english-v3.0'); ``` Embedding columns keep themselves in sync via the managed lifecycle (CDC-backed refresh). Re-ranking is declarative; the planner batches + caches calls. --- ## 5. Retention Four knobs — pick the one that matches your needs; they stack cleanly. ### Partition TTL (declarative, fastest) The simplest form: declare the TTL in the DDL itself, and RedDB drops whole chunks once their newest row is older than the TTL. O(1) per chunk — no row scans, no DELETE tombstones. ```sql -- Shipped form (no column list, TTL keyword instead of WITH (...)): CREATE HYPERTABLE logs TIME_COLUMN ts CHUNK_INTERVAL '1d' TTL '90d'; ``` See [Partition TTL](../data-models/partition-ttl.md) for the full semantics including per-chunk overrides (mixed-TTL hypertables), preview sweep, and comparison against TimescaleDB / ClickHouse. ### Chunk-based drop (fastest, O(1)) ```sql SELECT drop_chunks('access_log', INTERVAL '90 days'); ``` Chunks whose `max_ts` is older than 90 days disappear — no row scan. ### Automatic retention policy ```sql SELECT add_retention_policy('access_log', INTERVAL '90 days'); ``` The retention daemon scans policies every 60s (configurable via `red.config.retention.interval_ms`) and drops chunks as they age. Stats are queryable: ```sql SELECT * FROM ML_JOBS WHERE kind = 'retention' ORDER BY started_at DESC LIMIT 10; ``` ### Tiered — hot to cold ```sql -- Downsample raw-resolution data after 7 days into a 1-minute rollup CREATE CONTINUOUS AGGREGATE access_1m AS SELECT time_bucket('1m', ts) bk, service, count(*) c, ... FROM access_log GROUP BY bk, service; SELECT add_retention_policy('access_log', INTERVAL '7 days'); -- access_1m has no retention — it's your long-term history. ``` 7 days of raw detail + infinite 1-minute rollup. Storage flatlines at whatever the rollup weighs, irrespective of ingest rate. --- ## 6. Correlation across models — the differentiator A classic incident question: *"For this slow endpoint, which users saw errors and what did they click next?"* ```sql WITH slow_requests AS ( SELECT trace_id, user_id, ts FROM access_log WHERE path = '/checkout' AND latency_ms > 2000 AND status >= 500 AND ts >= NOW() - INTERVAL '1 hour' ) SELECT s.user_id, s.ts AS error_ts, next_click.path AS next_path, next_click.latency_ms AS next_latency, FOLLOW(user_graph, s.user_id, 'same_device', UP TO 1) AS related_users FROM slow_requests s JOIN LATERAL ( SELECT path, ts, latency_ms FROM access_log WHERE user_id = s.user_id AND ts > s.ts ORDER BY ts LIMIT 1 ) AS next_click ON true; ``` * SQL table (`access_log`) → facts * Graph (`user_graph`) → related users via `same_device` edges * LATERAL join → "next row per user" Try expressing that across Loki + Elasticsearch + Neo4j + your in-house Python script. --- ## 7. Dashboards: example Grafana panel Point Grafana at the SQL endpoint: ```yaml datasources: - name: RedDB type: postgres # PG-wire compat url: localhost:5433 database: prod ``` Panel query: ```sql SELECT bucket AS time, service, hits FROM access_5m WHERE bucket > $__timeFrom() ORDER BY bucket; ``` Time-series panel renders instantly from the continuous aggregate. --- ## 8. Observability: watching RedDB itself ```sql -- Ingest rate SELECT count(*) AS lines_last_min FROM access_log WHERE ts > (NOW() - INTERVAL '1 minute'); -- Chunk count + size SELECT * FROM show_chunks('access_log'); -- Retention activity SELECT * FROM ML_JOBS WHERE kind = 'retention' ORDER BY started_at DESC LIMIT 5; -- Continuous aggregate freshness SELECT name, last_refreshed_bucket FROM ML_MODELS_DASHBOARD -- reused view for all incremental engines WHERE name LIKE '%access%'; ``` --- ## 9. Comparison ### vs Loki | Dim | Loki | RedDB | |------------------------|------|-------| | Write throughput | ⭐⭐⭐⭐ | ⭐⭐⭐ | | Compression ratio | ⭐⭐⭐ | ⭐⭐⭐⭐ (per-column codecs) | | SQL analytics | ❌ (LogQL only) | ✅ | | Full-text search | ⭐⭐ (greps) | ⭐⭐ (BM25) + ⭐⭐⭐⭐ semantic | | Traces / graph | ❌ | ✅ | | Multi-model (logs + metrics + docs + vectors) | ❌ | ✅ | ### vs ClickHouse | Dim | ClickHouse | RedDB | |------------------------|-----------|-------| | OLAP on billions of rows | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ (gap documented) | | Compression codecs | ⭐⭐⭐⭐⭐ (same surface)| ⭐⭐⭐⭐ (same codec family) | | Time bucket / MVs | ⭐⭐⭐⭐ (MVs, some effort) | ⭐⭐⭐⭐⭐ (continuous aggregates native) | | Vector / graph / ML | ❌ (via UDFs) | ✅ native | | Operational complexity | ⭐⭐ (cluster setup hard) | ⭐⭐⭐⭐ (single binary) | ### vs Elasticsearch | Dim | Elasticsearch | RedDB | |------------------------|--------------|-------| | Full-text search | ⭐⭐⭐⭐⭐ | ⭐⭐ BM25 + ⭐⭐⭐⭐ semantic | | Typed SQL analytics | ⭐⭐ (SQL-ish) | ⭐⭐⭐⭐⭐ | | Storage footprint | ⭐⭐ (heavy) | ⭐⭐⭐⭐ (codec-dense) | | JVM dependency | yes | no — single Rust binary | | ML classifier inline | plugin | ✅ native | ### Short version Use RedDB when: * you want to keep logs, metrics, traces, and semantic search in one engine, * you want SQL-first analytics, * you can budget roughly the same ingest rate Loki delivers but want richer queries and ML on top. **Don't** use RedDB when: * you need multi-TB/day distributed ingest right now (wait for the distributed roadmap — see [`distributed-roadmap.md`](../architecture/distributed-roadmap.md)), * your primary workload is fuzzy text search on enormous corpora (Elasticsearch is still the sharper tool there today). --- ## 10. Troubleshooting **Q: Ingest rate is low.** Check `red.config.wal.group_commit = true` (it is by default). Batch your writes — 100+ rows per INSERT is the target. Run the ingest on the same process as the store when possible (`LogPipeline`). **Q: `time_bucket` queries scan every chunk.** Check that the `WHERE` filter uses the partition key column directly (`WHERE ts >= ... AND ts < ...`). Function-wrapped predicates (`WHERE date_trunc('hour', ts) = ...`) don't prune. **Q: Continuous aggregate lags behind.** Inspect `refresh_lag` — the daemon deliberately stays that far behind `NOW()` to let in-flight writes settle. Shrink it (min: 1 s). **Q: Disk usage keeps climbing.** `add_retention_policy` OR `drop_chunks` OR set up a downsample + retention pair (§ 5). Each chunk is independent; drop is instant. **Q: I need `DELETE FROM log WHERE user_id = ...` for GDPR.** Append-only blocks ad-hoc DELETEs. The GDPR-compatible pattern is: keep a separate "redactions" table keyed by `user_id`, and left-anti-join at query time. When the retention window expires, the data disappears with the chunk. If you need active erase, declare the table without `APPEND ONLY` and accept the UPDATE cost. --- ## 11. Reference Core primitives this guide uses: * [Hypertables](../data-models/hypertables.md) — chunking, drop_chunks. * [Continuous aggregates](../data-models/continuous-aggregates.md) — incremental MVs. * [Append-only tables](../data-models/append-only-tables.md) — declarative immutability. * [Columnar execution + codecs](../engine/columnar-execution.md) — compression. * [Competitive positioning](../architecture/competitive-positioning.md) — full TS/CH comparison. * `reddb::storage::timeseries::LogPipeline` — the Rust helper shown in §3. Questions / gaps / benchmarks to add? Open an issue tagged `logs`.