--- name: clickhouse-architect description: ClickHouse schema design and optimization. TRIGGERS - ClickHouse schema, compression codecs, MergeTree, ORDER BY tuning, partition key. allowed-tools: Read, Bash, Grep, Skill --- # ClickHouse Architect Prescriptive schema design, compression selection, and performance optimization for ClickHouse (v24.4+). Covers both ClickHouse Cloud (SharedMergeTree) and self-hosted (ReplicatedMergeTree) deployments. > **Self-Evolving Skill**: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues. ## When to Use This Skill Use this skill when: - Designing ClickHouse table schemas with ORDER BY key selection - Selecting compression codecs for column types - Configuring partition keys for data lifecycle management - Adding performance accelerators (projections, indexes, dictionaries) - Auditing and optimizing existing ClickHouse schemas ## Core Methodology ### Schema Design Workflow Follow this sequence when designing or reviewing ClickHouse schemas: 1. **Define ORDER BY key** (3-5 columns, lowest cardinality first) 2. **Select compression codecs** per column type 3. **Configure PARTITION BY** for data lifecycle management 4. **Add performance accelerators** (projections, indexes) 5. **Validate with audit queries** (see scripts/) 6. **Document with COMMENT statements** — ClickHouse table and column COMMENTs are the **single source of truth (SSoT)** for what each column means, how it was computed, and what constraints apply. No external doc, skill, or wiki supersedes the COMMENT. See [`references/schema-documentation.md`](./references/schema-documentation.md) ### ORDER BY Key Selection The ORDER BY clause is the most critical decision in ClickHouse schema design. **Rules**: - Limit to 3-5 columns maximum (each additional column has diminishing returns) - Place lowest cardinality columns first (e.g., `tenant_id` before `timestamp`) - Include all columns used in WHERE clauses for range queries - PRIMARY KEY must be a prefix of ORDER BY (or omit to use full ORDER BY) **Example**: ```sql -- Correct: Low cardinality first, 4 columns CREATE TABLE trades ( exchange LowCardinality(String), symbol LowCardinality(String), timestamp DateTime64(3), trade_id UInt64, price Float64, quantity Float64 ) ENGINE = MergeTree() ORDER BY (exchange, symbol, timestamp, trade_id); -- Wrong: High cardinality first (10x slower queries) ORDER BY (trade_id, timestamp, symbol, exchange); ``` ### Compression Codec Quick Reference | Column Type | Default Codec | Read-Heavy Alternative | Example | | ------------------------ | -------------------------- | ------------------------- | -------------------------------------------------- | | DateTime/DateTime64 | `CODEC(DoubleDelta, ZSTD)` | `CODEC(DoubleDelta, LZ4)` | `timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD)` | | Float prices/gauges | `CODEC(Gorilla, ZSTD)` | `CODEC(Gorilla, LZ4)` | `price Float64 CODEC(Gorilla, ZSTD)` | | Integer counters | `CODEC(T64, ZSTD)` | — | `count UInt64 CODEC(T64, ZSTD)` | | Slowly changing integers | `CODEC(Delta, ZSTD)` | `CODEC(Delta, LZ4)` | `version UInt32 CODEC(Delta, ZSTD)` | | String (low cardinality) | `LowCardinality(String)` | — | `status LowCardinality(String)` | | General data | `CODEC(ZSTD(3))` | `CODEC(LZ4)` | Default compression level 3 | **When to use LZ4 over ZSTD**: LZ4 provides 1.76x faster decompression. Use LZ4 for read-heavy workloads with monotonic sequences (timestamps, counters). Use ZSTD (default) when compression ratio matters or data patterns are unknown. **Note on codec combinations**: Delta/DoubleDelta + Gorilla combinations are blocked by default (`allow_suspicious_codecs`) because Gorilla already performs implicit delta compression internally—combining them is **redundant**, not dangerous. A historical corruption bug (PR #45615, Jan 2023) was fixed, but the blocking remains as a best practice guardrail. Use each codec family independently for its intended data type: ```sql -- Correct usage price Float64 CODEC(Gorilla, ZSTD) -- Floats: use Gorilla timestamp DateTime64 CODEC(DoubleDelta, ZSTD) -- Timestamps: use DoubleDelta timestamp DateTime64 CODEC(DoubleDelta, LZ4) -- Read-heavy: use LZ4 ``` ### PARTITION BY Guidelines PARTITION BY is for **data lifecycle management**, NOT query optimization. **Rules**: - Partition by time units (month, week) for TTL and data management - Keep partition count under 1000 total across all tables - Each partition should contain 1-300 parts maximum - Never partition by high-cardinality columns **Example**: ```sql -- Correct: Monthly partitions for TTL management PARTITION BY toYYYYMM(timestamp) -- Wrong: Daily partitions (too many parts) PARTITION BY toYYYYMMDD(timestamp) -- Wrong: High-cardinality partition key PARTITION BY user_id ``` ### Anti-Patterns Checklist (v24.4+) | Pattern | Severity | Modern Status | Fix | | ------------------------------- | -------- | ------------------ | ----------------------------------------------------------------- | | Too many parts (>300/partition) | Critical | Still critical | Reduce partition granularity | | Small batch inserts (<1000) | Critical | Still critical | Batch to 10k-100k rows | | High-cardinality first ORDER BY | Critical | Still critical | Reorder: lowest cardinality first | | No memory limits | High | Still critical | Set `max_memory_usage` | | Denormalization overuse | High | Still critical | Use dictionaries + materialized views | | Large JOINs | Medium | **180x improved** | Still avoid for ultra-low-latency | | Mutations (UPDATE/DELETE) | Medium | **1700x improved** | Use lightweight UPDATEs (v24.4+); see DELETE Strategy Guide below | ### DELETE Strategy Guide (v13.49.0+ Best Practices) Choose the right DELETE strategy based on scope. Ranked fastest to slowest: | Strategy | Syntax | Speed | Use When | | --------------------------- | --------------------------------------------------------- | ---------------------------- | --------------------------------------------------------------- | | `DROP PARTITION` | `ALTER TABLE t DROP PARTITION (key1, key2, keyN)` | **Instant** (metadata-only) | Purge entire partition ranges (months, corrupt data, test data) | | `DELETE IN PARTITION` | `ALTER TABLE t DELETE IN PARTITION (...) WHERE condition` | **Fast** (scans 1 partition) | Targeted row removal within a known partition | | `ALTER TABLE DELETE` | `ALTER TABLE t DELETE WHERE condition` | **Slow** (scans all parts) | Fallback when partition is unknown | | `DELETE FROM` (lightweight) | `DELETE FROM t WHERE condition` | Variable | **ANTI-PATTERN for write pipelines** — see warning below | **Anti-pattern: Lightweight `DELETE FROM` before INSERT** `DELETE FROM` sets `_row_exists=0` masks instead of physically removing rows. These ghost rows: - Persist until ClickHouse background merge (unpredictable timing) - Show up in queries without `FINAL` as phantom data - Cause false anomalies in monitoring/integrity checks - Were the root cause of months of phantom Stathera gaps in production (opendeviationbar #269) **Use `DELETE FROM` only for**: ad-hoc data correction where ghost rows don't matter (analytics cleanup, dev/test). **Never use in write pipelines** where INSERT follows DELETE. **All DELETE mutations should use**: `SETTINGS mutations_sync = 1` to block until completion (prevents INSERT-DELETE race conditions). **Partition-aware DELETE tip**: If your partition key includes the columns you're filtering on (e.g., `PARTITION BY (symbol, threshold, toYYYYMM(timestamp))`), use `DELETE IN PARTITION` to scope the scan to a single partition instead of scanning all parts. ### Table Engine Selection | Deployment | Engine | Use Case | | ------------------- | --------------------- | ------------------------------- | | ClickHouse Cloud | `SharedMergeTree` | Default for cloud deployments | | Self-hosted cluster | `ReplicatedMergeTree` | Multi-node with replication | | Self-hosted single | `MergeTree` | Single-node development/testing | **Cloud (SharedMergeTree)**: ```sql CREATE TABLE trades (...) ENGINE = SharedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}') ORDER BY (exchange, symbol, timestamp); ``` **Self-hosted (ReplicatedMergeTree)**: ```sql CREATE TABLE trades (...) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}') ORDER BY (exchange, symbol, timestamp); ``` ## Skill Delegation Guide This skill is the **hub** for ClickHouse-related tasks. When the user's needs extend beyond schema design, invoke the related skills below. ### Delegation Decision Matrix | User Need | Invoke Skill | Trigger Phrases | | ----------------------------------------------- | ------------------------------------------ | ---------------------------------------------------- | | Create database users, manage permissions | `devops-tools:clickhouse-cloud-management` | "create user", "GRANT", "permissions", "credentials" | | Configure DBeaver, generate connection JSON | `devops-tools:clickhouse-pydantic-config` | "DBeaver", "client config", "connection setup" | | Validate schema contracts against live database | `quality-tools:schema-e2e-validation` | "validate schema", "Earthly E2E", "schema contract" | ### Typical Workflow Sequence 1. **Schema Design** (THIS SKILL) → Design ORDER BY, compression, partitioning 2. **User Setup** → `clickhouse-cloud-management` (if cloud credentials needed) 3. **Client Config** → `clickhouse-pydantic-config` (generate DBeaver JSON) 4. **Validation** → `schema-e2e-validation` (CI/CD schema contracts) ### Example: Full Stack Request **User**: "I need to design a trades table for ClickHouse Cloud and set up DBeaver to query it." **Expected behavior**: 1. Use THIS skill for schema design 2. Invoke `clickhouse-cloud-management` for creating database user 3. Invoke `clickhouse-pydantic-config` for DBeaver configuration ## Performance Accelerators ### Projections Create alternative sort orders that ClickHouse automatically selects: ```sql ALTER TABLE trades ADD PROJECTION trades_by_symbol ( SELECT * ORDER BY symbol, timestamp ); ALTER TABLE trades MATERIALIZE PROJECTION trades_by_symbol; ``` ### Materialized Views Pre-compute aggregations for dashboard queries: ```sql CREATE MATERIALIZED VIEW trades_hourly_mv ENGINE = SummingMergeTree() ORDER BY (exchange, symbol, hour) AS SELECT exchange, symbol, toStartOfHour(timestamp) AS hour, sum(quantity) AS total_volume, count() AS trade_count FROM trades GROUP BY exchange, symbol, hour; ``` ### Dictionaries Replace JOINs with O(1) dictionary lookups for **large-scale star schemas**: **When to use dictionaries (v24.4+)**: - Fact tables with 100M+ rows joining dimension tables - Dimension tables 1k-500k rows with monotonic keys - LEFT ANY JOIN semantics required **When JOINs are sufficient (v24.4+)**: - Dimension tables <500 rows (JOIN overhead negligible) - v24.4+ predicate pushdown provides 8-180x improvements - Complex JOIN types (FULL, RIGHT, multi-condition) **Benchmark context**: 6.6x speedup measured on Star Schema Benchmark (1.4B rows). ```sql CREATE DICTIONARY symbol_info ( symbol String, name String, sector String ) PRIMARY KEY symbol SOURCE(CLICKHOUSE(TABLE 'symbols')) LAYOUT(FLAT()) -- Best for <500k entries with monotonic keys LIFETIME(3600); -- Use in queries (O(1) lookup) SELECT symbol, dictGet('symbol_info', 'name', symbol) AS symbol_name FROM trades; ``` ## Scripts Execute comprehensive schema audit: ```bash clickhouse-client --multiquery < scripts/schema-audit.sql ``` The audit script checks: - Part count per partition (threshold: 300) - Compression ratios by column - Query performance patterns - Replication lag (if applicable) - Memory usage patterns ## Additional Resources ### Reference Files | Reference | Content | | ------------------------------------------------------------------------------------------ | ---------------------------------------------- | | [`references/schema-design-workflow.md`](./references/schema-design-workflow.md) | Complete workflow with examples | | [`references/compression-codec-selection.md`](./references/compression-codec-selection.md) | Decision tree + benchmarks | | [`references/anti-patterns-and-fixes.md`](./references/anti-patterns-and-fixes.md) | 13 deadly sins + v24.4+ status | | [`references/audit-and-diagnostics.md`](./references/audit-and-diagnostics.md) | Query interpretation guide | | [`references/idiomatic-architecture.md`](./references/idiomatic-architecture.md) | Parameterized views, dictionaries, dedup | | [`references/schema-documentation.md`](./references/schema-documentation.md) | COMMENT patterns + naming for AI understanding | | [`references/cache-schema-evolution.md`](./references/cache-schema-evolution.md) | Cache invalidation + schema evolution patterns | ### External Documentation - [ClickHouse Best Practices](https://clickhouse.com/docs/best-practices) - [Altinity Knowledge Base](https://kb.altinity.com/) - [ClickHouse Blog](https://clickhouse.com/blog) ## Python Driver Policy **Use `clickhouse-connect` (official) for all Python integrations.** ```python # ✅ RECOMMENDED: clickhouse-connect (official, HTTP) import clickhouse_connect client = clickhouse_connect.get_client( host='localhost', port=8123, # HTTP port username='default', password='' ) result = client.query("SELECT * FROM trades LIMIT 1000") df = client.query_df("SELECT * FROM trades") # Pandas integration ``` ### Why NOT `clickhouse-driver` | Factor | clickhouse-connect | clickhouse-driver | | --------------- | ------------------ | ------------------- | | Maintainer | ClickHouse Inc. | Solo developer | | Weekly commits | Yes (active) | Sparse (months) | | Open issues | 41 (addressed) | 76 (accumulating) | | Downloads/week | 2.7M | 1.5M | | Bus factor risk | Low (company) | **High (1 person)** | **Do NOT use `clickhouse-driver`** despite its ~26% speed advantage for large exports. The maintenance risk outweighs performance gains: - Single maintainer (mymarilyn) with no succession plan - Issues accumulating without response - Risk of abandonment breaks production code **Exception**: Only consider `clickhouse-driver` if you have extreme performance requirements (exporting millions of rows) AND accept the maintenance risk. ## ClickHouse COMMENT = Single Source of Truth **Every ClickHouse table and column MUST have a COMMENT that fully documents its meaning, computation method, and constraints.** The COMMENT is the SSoT — no external document, skill, or wiki supersedes it. ### Why - ClickHouse COMMENTs have **no length limit**, support newlines, URLs, and unicode - **Zero performance impact** on queries (pure metadata, never in data path) - Visible via `DESCRIBE table`, `SHOW CREATE TABLE`, `system.columns` - Survives schema migrations (preserved through ALTER operations) ### What to Include in COMMENTs - **Column purpose** in plain English - **Computation formula** (if derived/computed) - **Unit** (seconds, milliseconds, bps, ratio) - **Valid range** or enum values - **Anti-patterns** (what NOT to do with this column) - **GitHub issue link** for provenance - **Source script** that populates the column ### Example ```sql ALTER TABLE t COMMENT COLUMN session_label 'STRICT session label. 8 values: sydney_only, tokyo_only, ... Only set when ENTIRE bar (open→close) falls within one session. cross_session = bar spans boundary. Use WHERE is_pure_session=1. GitHub: https://github.com/org/repo/issues/54 Source: scripts/populate-sessions/populate_v3.py'; ``` ### Anti-Pattern **NEVER** create a ClickHouse column without a COMMENT. A column without documentation is a column that will be misused. ## Related Skills | Skill | Purpose | | ------------------------------------------ | ----------------------------- | | `devops-tools:clickhouse-cloud-management` | User/permission management | | `devops-tools:clickhouse-pydantic-config` | DBeaver connection generation | | `quality-tools:schema-e2e-validation` | YAML schema contracts | | `quality-tools:multi-agent-e2e-validation` | Database migration validation | --- ## Troubleshooting | Issue | Cause | Solution | | ---------------------------- | ------------------------------ | ------------------------------------------------ | | Too many parts | Over-partitioned | Reduce partition granularity (monthly not daily) | | Slow queries | Wrong ORDER BY order | Put lowest cardinality columns first | | High memory usage | No memory limits set | Configure max_memory_usage setting | | Codec error on Delta+Gorilla | Suspicious codec combination | Use each codec family independently | | Projection not used | Optimizer chose different plan | Check EXPLAIN to verify projection selection | | Dictionary stale | Lifetime expired | Increase LIFETIME or trigger refresh | | Replication lag | Part merges falling behind | Check merge_tree settings, add resources | | INSERT too slow | Small batch sizes | Batch to 10k-100k rows per INSERT | ## Post-Execution Reflection After this skill completes, check before closing: 1. **Did the command succeed?** — If not, fix the instruction or error table that caused the failure. 2. **Did parameters or output change?** — If the underlying tool's interface drifted, update Usage examples and Parameters table to match. 3. **Was a workaround needed?** — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround. Only update if the issue is real and reproducible — not speculative.