# Logs Quickstart — 5 minutes to a working pipeline This is the ultra-condensed version of [Using RedDB for Logs](./using-reddb-for-logs.md). Copy, paste, read the expanded guide when you want the why. ## 1. Declare the hypertable > **Status note:** the column-list + `CODEC(...)` form below is **planned** > (see [hypertables.md](../data-models/hypertables.md#extended-column-syntax-planned)). > The shipped form takes no column list: > > ```sql > CREATE TABLE logs (ts BIGINT, service TEXT, severity INT, message TEXT, trace_id TEXT); > CREATE HYPERTABLE logs TIME_COLUMN ts CHUNK_INTERVAL '1d' TTL '30d'; > CREATE INDEX logs_service_ts ON logs (service, ts); > ``` ```sql -- Planned future form (single DDL with column list + per-column codecs): CREATE HYPERTABLE logs ( ts BIGINT, service TEXT CODEC(Dict, LZ4), severity INT CODEC(T64), message TEXT CODEC(ZSTD(6)), trace_id TEXT CODEC(LZ4) ) CHUNK_INTERVAL '1d'; ``` ## 2. Ingest (batched) ```python db.insert_many("logs", [ {"ts": now_ns(), "service": "api", "severity": 2, "message": "ok", "trace_id": "t1"}, {"ts": now_ns(), "service": "db", "severity": 4, "message": "slow", "trace_id": "t2"}, {"ts": now_ns(), "service": "auth", "severity": 3, "message": "retry", "trace_id": "t3"}, ]) ``` ## 3. Dashboard query ```sql SELECT time_bucket('1m', ts) AS bucket, service, count(*) AS hits, count_if(severity >= 4) AS errors, quantileTDigest(0.99, latency_ms) AS p99 FROM logs WHERE ts >= NOW() - INTERVAL '1 hour' GROUP BY bucket, service ORDER BY bucket; ``` ## 4. Continuous aggregate for fast dashboards ```sql CREATE CONTINUOUS AGGREGATE logs_1m AS SELECT time_bucket('1m', ts) bk, service, count(*) hits, count_if(severity >= 4) errors FROM logs GROUP BY bk, service WITH (refresh_lag = '30s'); ``` Dashboards query `logs_1m` — sub-second response on billions of rows. ## 5. Retention — pick one **Declarative (simplest)**: attach a TTL at CREATE time. Chunks disappear once their newest row passes the TTL — O(1) metadata drop. ```sql -- At creation (shipped): CREATE HYPERTABLE logs TIME_COLUMN ts CHUNK_INTERVAL '1d' TTL '30d'; -- Or after the fact via policy daemon (planned): SELECT add_retention_policy('logs', INTERVAL '30 days'); ``` See [Partition TTL](../data-models/partition-ttl.md) for per-chunk overrides, preview sweep, and the cost model. ## 6. Semantic search ```sql CREATE EMBEDDING COLUMN message_vec ON logs (message) USING PROVIDER 'openai' MODEL 'text-embedding-3-small' ON CHANGE REFRESH; SELECT ts, service, message FROM logs WHERE SIMILARITY(message_vec, EMBEDDING('database timeout', 'openai')) > 0.80 ORDER BY ts DESC LIMIT 50; ``` ## 7. Error anomaly classification (optional) ```sql CREATE MODEL log_anomaly TYPE CLASSIFIER ALGORITHM LOGISTIC_REGRESSION FROM (SELECT message, (severity >= 4) AS is_error FROM logs LIMIT 100000) FEATURES (TF_IDF(message)) TARGET is_error WITH (async = true); SELECT ts, message, ML_CLASSIFY_PROBA('log_anomaly', message) AS anomaly_score FROM logs WHERE ts > NOW() - INTERVAL '10 minutes' ORDER BY anomaly_score DESC LIMIT 20; ``` ## That's it Read the [full guide](./using-reddb-for-logs.md) for schema design best practices, comparison vs Loki / ClickHouse / Elasticsearch, troubleshooting, and multi-model correlation patterns.