--- name: clickhouse-query-optimization description: "20+ optimization rules for ClickHouse queries. Load when writing queries, debugging slow queries, fixing memory errors, or optimizing JOINs. Achieves 10-100x faster queries, 90% less memory usage." --- # ClickHouse Query Optimization Load when writing, debugging, or optimizing ClickHouse queries. ## Goals - 10-100x faster query execution - 90%+ reduction in memory usage - Eliminate full table scans (read <1% of data) - JOINs that complete in milliseconds instead of minutes ## Reference Documentation - [Query Optimization](https://clickhouse.com/docs/optimize/query-optimization) - [PREWHERE](https://clickhouse.com/docs/optimize/prewhere) - [JOINs](https://clickhouse.com/docs/best-practices/minimize-optimize-joins) - [Dictionaries](https://clickhouse.com/docs/sql-reference/dictionaries) - [Query Settings](https://clickhouse.com/docs/operations/settings/query-level) **Search terms:** query optimization, PREWHERE, JOIN, dictionary, EXPLAIN, performance, slow query, memory limit, timeout, full table scan, dictGet, query cache, DELETE, UPDATE, mutation, aggregation, GROUP BY, COUNT DISTINCT, uniq, query_log ## Critical Rules ### [CRITICAL] 1. **Filter on ORDER BY prefix columns.** Queries that skip the prefix scan the entire table. 2. **Avoid SELECT *.** Only select columns you need. 3. **Always check EXPLAIN indexes=1.** Verify index usage and row estimates before optimizing. ### [HIGH] 4. **Use PREWHERE for selective filters.** Reduces I/O before main WHERE clause. 5. **Prefer dictionaries over JOINs.** For key-value lookups on tables <10M rows. 6. **Use IN subqueries for existence checks.** Faster than JOIN when you don't need columns from the other table. ### [MEDIUM] 7. **Limit JOINs when possible.** Each JOIN increases complexity exponentially. ## Querying basics: ORDER BY, indexes and PREWHERE ### ORDER BY The `ORDER BY` clause in table definition determines query performance. ClickHouse stores data sorted by these columns and uses a sparse primary index to skip irrelevant data. **How it works:** - Data is divided into granules (default 8192 rows), can vary - Primary index stores first value of each granule for ORDER BY columns - Queries filtering on ORDER BY prefix in exact order or a prefix of it skip entire granules ```sql -- Table: ORDER BY (tenant_id, event_date, user_id) -- FAST: Filters on ORDER BY prefix (left to right) WHERE tenant_id = 1 -- Uses index WHERE tenant_id = 1 AND event_date = '2024-01-15' -- Uses index WHERE tenant_id = 1 AND event_date >= '2024-01-01' AND user_id = 123 -- Uses index -- SLOW: Skips columns or filters only on later columns WHERE event_date = '2024-01-15' -- Skips tenant_id, full scan WHERE user_id = 123 -- Skips tenant_id and event_date, full scan ``` **Key insight:** Always filter on the leftmost ORDER BY columns first for best performance. ### Indexes Skip indexes (data skipping indexes) help when filtering on columns NOT in ORDER BY: ```sql -- Table has ORDER BY (tenant_id, event_date, user_id) -- But you need to query by session_id -- Add bloom filter index ALTER TABLE events ADD INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 4; -- Now this query can skip granules instead of full scan SELECT * FROM events WHERE session_id = 'abc123'; ``` **Check if index is used:** ```sql EXPLAIN indexes = 1 SELECT * FROM events WHERE session_id = 'abc123'; -- Look for "Skip" section showing granules skipped ``` **Common index types:** - `bloom_filter` - High-cardinality columns (UUIDs, session IDs) - `minmax` - Range queries on numeric/date columns - `set(N)` - Low-cardinality columns ( **Note:** Skip indexes work similarly—they allow ClickHouse to skip granules for columns not in ORDER BY. Both mechanisms achieve the same goal: read less data by eliminating irrelevant granules before touching the actual rows. **Step 2: PREWHERE (column filtering within granules)** ``` For the 3 matching granules (24,576 rows): 1. Read tenant_id column → 98 KB (4 bytes × 24,576) 2. Read event_date column → 98 KB (4 bytes × 24,576) 3. Read status column → 98 KB (4 bytes × 24,576) 4. Filter: 1,247 rows match PREWHERE conditions 5. Read user_id column → 10 KB (8 bytes × 1,247 rows only) 6. Read error_message column → 623 KB (500 bytes × 1,247 rows only) 7. Read payload column → 2.4 MB (2KB × 1,247 rows only) Total read: ~3.3 MB Without PREWHERE: ~62 MB (read all columns for all 24,576 rows) ``` **Step 3: WHERE (final filtering)** ``` Apply: error_message LIKE '%timeout%' Result: 43 rows returned ``` ### Impact Reading fewer columns: - Reduces disk I/O - Improves compression efficiency - Lowers memory usage ## JOIN Optimization ### Decision Tree ``` Need data from another table? │ ├─ Only filtering rows, don't need columns from other table? │ Example: "Get orders, but only for premium users" │ └─ Use IN subquery │ ├─ Need columns from other table? │ Example: "Get orders WITH user name and email" │ │ │ ├─ Small lookup table (<10M rows), staleness OK? │ │ └─ Use Dictionary │ │ │ └─ Need real-time data or large table? │ └─ Use JOIN with filtered subqueries │ └─ Same join needed on every query? └─ Denormalize at insert time ``` ### Full Example: Dictionary vs JOIN ```sql -- Tables -- events: 500M rows, ORDER BY (tenant_id, event_date, user_id) -- users: 2M rows (user_id, name, tier, country) -- Task: Get user name and tier for today's events ``` **Slow: JOIN** ```sql SELECT e.event_time, e.event_type, u.name, u.tier FROM events e JOIN users u ON e.user_id = u.user_id WHERE e.tenant_id = 1 AND e.event_date = today(); -- Execution: -- 1. Scan events: 850K rows -- 2. Build hash table from users: 2M rows loaded into memory -- 3. Probe hash table for each event row -- Time: 3.2s, Memory: 890MB ``` **Fast: Dictionary** ```sql -- Create once CREATE DICTIONARY user_dict ( user_id UInt64, name String, tier String ) PRIMARY KEY user_id SOURCE(CLICKHOUSE(TABLE 'users')) LAYOUT(HASHED()) LIFETIME(300); -- Query SELECT event_time, event_type, dictGet('user_dict', 'name', user_id) AS name, dictGet('user_dict', 'tier', user_id) AS tier FROM events WHERE tenant_id = 1 AND event_date = today(); -- Execution: -- 1. Scan events: 850K rows -- 2. Dictionary lookup: O(1) hash lookup per row, already in memory -- Time: 0.4s, Memory: 120MB ``` **Tradeoff:** Dictionaries are eventually consistent. `LIFETIME(300)` means data can be up to 5 minutes stale. For real-time accuracy, use JOIN. Always ask the user if this delay is acceptable before implementing a dictioary. ```sql -- Check when dictionary was last updated SELECT name, last_successful_update_time, loading_duration FROM system.dictionaries; -- Force immediate reload SYSTEM RELOAD DICTIONARY user_dict; ``` ### IN Subquery for Filtering When you only need to filter (not fetch columns): ```sql -- Get orders from premium users SELECT * FROM orders WHERE user_id IN ( SELECT user_id FROM users WHERE tier = 'premium' ); -- NOT this (builds unnecessary hash table) SELECT o.* FROM orders o JOIN users u ON o.user_id = u.user_id WHERE u.tier = 'premium'; ``` ### When You Must JOIN Filter both sides first, smaller table on the right: ````sql SELECT e.event_time, u.name, u.email FROM ( SELECT event_time, user_id FROM events WHERE tenant_id = 1 AND event_date = today() ) e JOIN ( SELECT user_id, name, email FROM users WHERE active = 1 ) u ON e.user_id = u.user_id; ## Query Analysis ### EXPLAIN Queries ```sql -- Basic execution plan EXPLAIN SELECT ...; -- Show index usage (most useful) EXPLAIN indexes = 1 SELECT ...; -- Show execution pipeline EXPLAIN PIPELINE SELECT ...; -- Show query AST EXPLAIN AST SELECT ...; ```` ### Reading EXPLAIN indexes Output ```sql EXPLAIN indexes = 1 SELECT count() FROM events WHERE tenant_id = 1 AND event_date = '2024-01-15'; -- Output interpretation: -- ReadFromMergeTree -- Indexes: -- PrimaryKey -- Keys: tenant_id, event_date -- Condition: (tenant_id = 1) AND (event_date = '2024-01-15') -- Parts: 5/120 <- 5 parts selected out of 120 -- Granules: 42/15000 <- 42 granules out of 15000 (good!) ``` ### Find Slow Queries ```sql -- Recent slow queries SELECT query, query_duration_ms, read_rows, read_bytes, memory_usage FROM system.query_log WHERE type = 'QueryFinish' AND query_duration_ms > 1000 AND event_date >= today() - 1 ORDER BY query_duration_ms DESC LIMIT 20; -- Queries with full table scans SELECT query, read_rows, total_rows_approx FROM system.query_log WHERE type = 'QueryFinish' AND read_rows > 0.9 * total_rows_approx -- Reading >90% of table AND event_date >= today() - 1 ORDER BY read_rows DESC LIMIT 10; ``` ### Profile Query Execution ```sql -- Enable query profiling SET log_queries = 1; SET log_query_threads = 1; -- Run your query SELECT ...; -- Check thread-level metrics SELECT thread_name, sum(read_rows) AS rows, sum(read_bytes) AS bytes, max(peak_memory_usage) AS peak_memory FROM system.query_thread_log WHERE query_id = '' GROUP BY thread_name; ``` ## Query Cache (22.8+) Cache query results for repeated identical queries: ```sql -- Enable query cache for a session SET use_query_cache = 1; -- Run query (first execution populates cache) SELECT count(*), avg(amount) FROM orders WHERE order_date >= '2024-01-01'; -- Subsequent identical queries return cached result instantly -- Cache invalidated automatically when underlying data changes ``` **Configuration:** ```sql -- Set cache TTL (default: 60 seconds) SET query_cache_ttl = 300; -- 5 minutes -- Minimum rows to cache (skip small results) SET query_cache_min_query_runs = 2; -- Cache after 2nd run -- Check cache status SELECT * FROM system.query_cache; ``` **Best for:** - Dashboard queries hitting same data repeatedly - Reports regenerated frequently - Multi-tenant queries with shared dimensions - Queries that do not need real-time data Update rows without full table rewrite: ```sql -- Update specific columns ALTER TABLE events UPDATE status = 'processed' WHERE event_id = 12345; -- Can update multiple columns ALTER TABLE events UPDATE status = 'cancelled', cancelled_at = now() WHERE order_id IN (SELECT order_id FROM cancelled_orders); ``` **Limitations:** - Cannot update columns in ORDER BY or PARTITION BY - Uses mutations (async, check progress in system.mutations) ```sql -- Check mutation progress SELECT table, command, is_done, latest_fail_reason FROM system.mutations WHERE database = 'default' ORDER BY create_time DESC; ``` ## EXPLAIN ESTIMATE Get row/byte estimates without running query: ```sql -- Estimate data to be read EXPLAIN ESTIMATE SELECT * FROM events WHERE tenant_id = 1 AND event_date = '2024-01-15'; -- Output shows: -- estimated_rows -- estimated_bytes -- number of parts/granules to read ``` **Use for:** - Validating filter effectiveness before running - Cost comparison between query approaches - Capacity planning ## Troubleshooting **Always ask for user confirmation before applying schema changes (adding indexes, creating MVs, altering tables).** ### Memory Limit Exceeded **Problem:** `Memory limit exceeded`, `DB::Exception: Memory limit exceeded`, query killed by OOM **Diagnose:** ```sql EXPLAIN indexes = 1 SELECT ...; -- Check if filtering is effective ``` **Solutions:** | Cause | Fix | | ------------------------- | ------------------------------------------------------------- | | High-cardinality GROUP BY | Use approximate: `uniqHLL12()` instead of `COUNT(DISTINCT)` | | Large aggregation | Pre-aggregate with `AggregatingMergeTree` + Materialized View | | Full table scan | Add filters on ORDER BY columns (see below) | | Too many columns | Select only needed columns | ```sql -- BAD: Exact distinct on high cardinality SELECT COUNT(DISTINCT user_id) FROM events; -- GOOD: Approximate (2% error, 10x less memory) SELECT uniqHLL12(user_id) FROM events; -- BETTER: Pre-aggregate with MV (see clickhouse-materialized-views skill) ``` ### Full Table Scan Despite Filters **Problem:** Query reads all rows, slow despite WHERE clause, `Granules: N/N` in EXPLAIN **Diagnose:** ```sql EXPLAIN indexes = 1 SELECT * FROM events WHERE user_id = 123; -- If "Granules: N/N" → index not used ``` **Solutions:** | Cause | Fix | | ----------------------------- | ------------------------------------------------ | | Filter not on ORDER BY prefix | Add ORDER BY columns to WHERE, or add skip index | | Function on column | Store computed column, filter on that | | OR across ORDER BY columns | Rewrite as UNION ALL | | Type mismatch | Match filter type to column type | ```sql -- BAD: Function prevents index use WHERE toDate(event_time) = '2024-01-15' -- GOOD: Store date column, filter on it WHERE event_date = '2024-01-15' -- BAD: Skips ORDER BY prefix (tenant_id, event_date, user_id) WHERE user_id = 123 -- GOOD: Include prefix columns WHERE tenant_id = 1 AND event_date = today() AND user_id = 123 -- ALT: Add skip index if you can't change query ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4; ``` ### Slow JOINs **Problem:** JOINs take minutes, high memory during JOIN, query hangs on JOIN step **Solutions (in order of preference):** | Approach | When to Use | | ------------------ | --------------------------------------------------- | | Dictionary | Lookup table <10M rows, staleness OK | | IN subquery | Only filtering, don't need columns from other table | | Filter before JOIN | Must JOIN, but can reduce rows first | | Denormalize | Same JOIN on every query | ```sql -- BAD: JOIN builds hash table of entire users table SELECT e.*, u.name FROM events e JOIN users u ON e.user_id = u.user_id; -- GOOD: Dictionary lookup (see "Dictionary vs JOIN" section) SELECT *, dictGet('user_dict', 'name', user_id) AS name FROM events; -- GOOD: IN subquery when only filtering SELECT * FROM events WHERE user_id IN (SELECT user_id FROM users WHERE tier = 'premium'); -- GOOD: Filter both sides before JOIN SELECT e.*, u.name FROM (SELECT * FROM events WHERE tenant_id = 1 AND event_date = today()) e JOIN (SELECT user_id, name FROM users WHERE active = 1) u ON e.user_id = u.user_id; ``` ### High Disk I/O **Problem:** Queries slow with high `read_bytes`, disk wait times high, reading GBs for simple queries **Solutions:** | Cause | Fix | | ------------------- | ------------------------------------------------ | | Reading all columns | Select only needed columns | | No early filtering | Use PREWHERE on small, selective columns | | Missing skip index | Add bloom_filter for high-cardinality lookups | | Poor compression | Use appropriate codecs (see schema-design skill) | ```sql -- BAD: Reads all columns, filters late SELECT * FROM events WHERE status = 'error'; -- GOOD: PREWHERE filters before reading large columns SELECT event_time, error_message FROM events PREWHERE status = 'error' WHERE error_message LIKE '%timeout%'; ```