--- name: oceanbase-sql-optimization description: SQL optimization best practices for OceanBase database (MySQL & Oracle modes). Covers query optimization, index usage, execution plan analysis, slow query tuning, and performance optimization techniques. Activates for SQL optimization, query performance, index design, execution plan, slow query, database performance. license: MIT metadata: audience: developers, database-administrators domain: database-optimization compatibility: mysql-mode, oracle-mode --- # OceanBase SQL Optimization Skill **Expert in SQL query optimization for OceanBase distributed database (MySQL & Oracle modes).** Provides best practices for writing efficient SQL queries, designing effective indexes, analyzing execution plans, and tuning slow queries in OceanBase. This skill covers both MySQL mode and Oracle mode. --- ## Mode-Specific Syntax Guide While OceanBase uses the same underlying optimizer for both MySQL and Oracle modes, there are important syntax differences: | Aspect | MySQL Mode | Oracle Mode | |--------|-----------|-------------| | **EXPLAIN Syntax** | `EXPLAIN [INTO table_name] [SET statement_id = string]` | `EXPLAIN [INTO table_name] [SET STATEMENT_ID = 'string']` | | **System Views** | `oceanbase.GV$OB_SQL_AUDIT` | `GV$OB_SQL_AUDIT` (same name, different schema access) | | **Data Types** | `BIGINT`, `VARCHAR` | `NUMBER`, `VARCHAR2` | | **Time Functions** | `TIME_TO_USEC()`, `usec_to_time()` | `TO_DATE()`, `TO_CHAR()` | | **Schema Access** | `database.table` | `schema.table` | | **Limit Syntax** | `LIMIT n` | `FETCH FIRST n ROWS ONLY` | | **String Concatenation** | `CONCAT()` | `\|\|` operator | **Optimization Principles:** - ✅ **Same**: Execution plan operators, join algorithms, partition pruning logic - ✅ **Same**: Index design principles, query rewriting strategies - ❌ **Different**: Syntax, system view access, data type names, function names Examples in this skill are marked with mode indicators: **MySQL Mode:** or **Oracle Mode:** --- ## Core Optimization Principles ### 1. Understand Execution Plans **Always analyze execution plans before optimization:** **MySQL Mode:** ```sql obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= '2024-01-01'; ``` **Oracle Mode:** ```sql obclient [SALES_DB]> EXPLAIN SELECT * FROM order_table WHERE order_date >= DATE '2024-01-01'; ``` 查询结果如下: ``` +------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------+ | =========================================== | | |ID|OPERATOR |NAME |EST. ROWS|COST | | | ------------------------------------------- | | |0 |TABLE SCAN |order_table|1000 |1000 | | | |1 | TABLE GET |order_table|1000 |1000 | | +=========================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | filter([order_table.order_date >= '2024-01-01']), | | access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | partitions(p0) | +------------------------------------------------------------------------------------------------+ ``` **Key indicators to check:** - **EST. ROWS**: Estimated rows processed (lower is better) - **EST.TIME(us)**: Estimated execution time in microseconds (lower is better) - **OPERATOR**: Look for `TABLE SCAN` (full table scan, bad) vs `TABLE GET` (index lookup, good) - **Partitions**: Check if partition pruning is working (e.g., `partitions(p1)` means only one partition accessed) - **is_index_back**: `false` means no index back, `true` means index back required ### 2. Index Design Best Practices **Create indexes on frequently queried columns (syntax is the same for both modes):** ```sql -- ✅ GOOD: Index on WHERE clause column obclient [SALES_DB]> CREATE INDEX idx_order_date ON order_table(order_date); -- ✅ GOOD: Composite index for multi-column queries (column order matters) obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date); -- ✅ GOOD: Unique index for unique constraints obclient [SALES_DB]> CREATE UNIQUE INDEX idx_order_number ON order_table(order_number); -- ✅ GOOD: Index with STORING clause for covering index obclient [SALES_DB]> CREATE INDEX idx_customer_date ON order_table(customer_id, order_date) STORING (total_amount, status); -- ❌ BAD: Index on rarely queried columns obclient [SALES_DB]> CREATE INDEX idx_notes ON order_table(notes); ``` **OceanBase Index Types:** - **UNIQUE Index**: Ensures uniqueness, fast lookups - **LOCAL Index**: Partition-local index (default for partitioned tables) - **GLOBAL Index**: Cross-partition index - **Function Index**: Index on expressions (e.g., `CREATE INDEX idx_expr ON t1((c1 + c2))`) **Index selection rules:** - Index columns used in WHERE clauses - Index columns used in JOIN conditions - Consider composite indexes for multi-column filters (column order matters - most selective first) - Use STORING clause to create covering indexes (avoids index back) - Avoid over-indexing (each index adds write overhead) - For partitioned tables, prefer LOCAL indexes unless cross-partition queries are common ### 3. Query Writing Best Practices **Use specific column lists instead of SELECT \*:** ```sql -- ✅ GOOD: Select only needed columns obclient [SALES_DB]> SELECT order_id, customer_id, total_amount FROM order_table WHERE order_date >= '2024-01-01'; -- ❌ BAD: Select all columns obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01'; ``` **Use appropriate WHERE conditions:** ```sql -- ✅ GOOD: Use indexed column in WHERE obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345; -- ✅ GOOD: Use range queries on indexed columns obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'; -- ❌ BAD: Function on indexed column prevents index usage obclient [SALES_DB]> SELECT * FROM order_table WHERE YEAR(order_date) = 2024; -- ✅ GOOD: Rewrite to use index obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'; ``` --- ## Partition Pruning Optimization Partition pruning avoids accessing irrelevant partitions, significantly improving SQL execution efficiency. Always check the `partitions` field in the execution plan to verify partition pruning is working. ### Hash Partitioning **Ensure partition key is in WHERE clause:** ```sql -- ✅ GOOD: Partition key in WHERE clause enables partition pruning obclient [SALES_DB]> CREATE TABLE order_table ( order_id INT, customer_id INT, order_date DATE ) PARTITION BY HASH(order_id) PARTITIONS 5; obclient [SALES_DB]> SELECT * FROM order_table WHERE order_id = 12345; ``` 查询结果如下: ``` +------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |TABLE SCAN|order_table|990 |383 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | filter(nil), | | access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | partitions(p1) -- Only one partition accessed! | +------------------------------------------------------------------------------------------------+ ``` ```sql -- ❌ BAD: Missing partition key prevents pruning (scans all partitions) obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id = 1001; -- Check execution plan: partitions(p0, p1, p2, p3, p4) - all partitions scanned ``` ### Range Partitioning **Use range conditions that match partition boundaries:** ```sql -- ✅ GOOD: Range query matches partition boundaries obclient [SALES_DB]> CREATE TABLE order_table ( order_id INT, customer_id INT, order_date DATE ) PARTITION BY RANGE(order_date) ( PARTITION p0 VALUES LESS THAN('2024-01-01'), PARTITION p1 VALUES LESS THAN('2024-02-01'), PARTITION p2 VALUES LESS THAN('2024-03-01') ); obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'; ``` 查询结果如下: ``` +------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------+ | ================================================= | | |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| | | ------------------------------------------------- | | |0 |TABLE SCAN|order_table|1 |46 | | | ================================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | filter([order_table.order_date >= '2024-01-01'], [order_table.order_date < '2024-02-01']), | access([order_table.order_id], [order_table.customer_id], [order_table.order_date]), | | partitions(p1) -- Only partition p1 accessed! | +------------------------------------------------------------------------------------------------+ ``` ```sql -- ❌ BAD: Function prevents partition pruning obclient [SALES_DB]> SELECT * FROM order_table WHERE MONTH(order_date) = 1; -- Check execution plan: partitions(p0, p1, p2) - all partitions scanned -- ✅ GOOD: Rewrite to use partition pruning obclient [SALES_DB]> SELECT * FROM order_table WHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'; ``` **Partition Pruning Rules:** - For Hash/List partitions: Partition key must be in WHERE clause with equality or IN conditions - For Range partitions: Use range conditions that match partition boundaries - Avoid functions on partition keys in WHERE clauses - For expressions as partition keys, the expression must appear as a whole in WHERE clause --- ## Join Optimization OceanBase supports three join algorithms: **Nested Loop Join (NLJ)**, **Hash Join (HJ)**, and **Merge Join (MJ)**. The optimizer automatically selects the best algorithm based on statistics and cost estimation. ### Join Algorithms **1. Nested Loop Join (NLJ)** - Best for: Small outer table, indexed inner table - Use when: Join condition has index on inner table - Hint: `/*+ USE_NL(table1, table2) */` ```sql -- ✅ GOOD: NLJ with indexed join column obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id); obclient [SALES_DB]> EXPLAIN SELECT /*+USE_NL(c, o)*/ o.order_id, c.customer_name FROM customer_table c INNER JOIN order_table o ON c.customer_id = o.customer_id WHERE c.customer_type = 'VIP'; ``` 查询结果如下: ``` +------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------+ | =========================================== | | |ID|OPERATOR |NAME |EST. ROWS|EST.TIME(us)| | | ------------------------------------------- | | |0 |NESTED-LOOP JOIN| |990 |37346 | | | |1 | TABLE SCAN |c |999 |669 | | | |2 | TABLE GET |o |1 |36 | -- Index lookup! | | =========================================== | | Outputs & filters: | | ------------------------------------- | | 0 - output([o.order_id], [c.customer_name]), filter(nil), | | conds(nil), nl_params_([c.customer_id]) | | 1 - output([c.customer_id], [c.customer_name]), filter([c.customer_type = 'VIP']), | | access([c.customer_id], [c.customer_name]), partitions(p0) | | 2 - output([o.order_id]), filter(nil), | | access([o.order_id]), partitions(p0), | | range_key([o.customer_id]), range([? = o.customer_id]) -- Index used! | +------------------------------------------------------------------------------------------------+ ``` **2. Hash Join (HJ)** - Best for: Large datasets, equal join conditions - Use when: Both tables are large and join condition is equality - Hint: `/*+ USE_HASH(table1, table2) */` ```sql -- ✅ GOOD: Hash Join for large table joins obclient [SALES_DB]> EXPLAIN SELECT /*+USE_HASH(o, c)*/ o.order_id, c.customer_name FROM order_table o INNER JOIN customer_table c ON o.customer_id = c.customer_id; ``` 查询结果如下: ``` +------------------------------------------------------------------------------------------------+ | Query Plan | +------------------------------------------------------------------------------------------------+ | ======================================= | | |ID|OPERATOR |NAME|EST. ROWS|EST.TIME(us)| | | --------------------------------------- | | |0 |HASH JOIN | |98010000 |66774608 | | | |1 | TABLE SCAN|o |100000 |68478 | | | |2 | TABLE SCAN|c |100000 |68478 | | | ======================================= | | Outputs & filters: | | ------------------------------------- | | 0 - output([o.order_id], [c.customer_name]), filter(nil), | | equal_conds([o.customer_id = c.customer_id]), other_conds(nil) | +------------------------------------------------------------------------------------------------+ ``` **3. Merge Join (MJ)** - Best for: Pre-sorted data, ordered results needed - Use when: Both inputs are already sorted or can be sorted efficiently - Hint: `/*+ USE_MERGE(table1, table2) */` ```sql -- ✅ GOOD: Merge Join when data is sorted obclient [SALES_DB]> EXPLAIN SELECT /*+USE_MERGE(o, c)*/ o.order_id, c.customer_name FROM order_table o INNER JOIN customer_table c ON o.customer_id = c.customer_id ORDER BY o.customer_id; ``` ### Join Order **Use LEADING hint to control join order:** ```sql -- ✅ GOOD: Control join order explicitly obclient [SALES_DB]> SELECT /*+LEADING(c, o)*/ o.order_id, c.customer_name FROM customer_table c INNER JOIN order_table o ON c.customer_id = o.customer_id WHERE c.customer_type = 'VIP' AND o.order_date >= '2024-01-01'; -- ❌ BAD: Let optimizer choose suboptimal order obclient [SALES_DB]> SELECT o.order_id, c.customer_name FROM order_table o INNER JOIN customer_table c ON o.customer_id = c.customer_id WHERE c.customer_type = 'VIP'; ``` ### Index on Join Columns **Always index foreign key columns:** ```sql -- ✅ GOOD: Index on join column enables efficient joins obclient [SALES_DB]> CREATE INDEX idx_customer_id ON order_table(customer_id); -- Then join queries can use index (NLJ with TABLE GET) obclient [SALES_DB]> SELECT o.*, c.customer_name FROM order_table o INNER JOIN customer_table c ON o.customer_id = c.customer_id; ``` --- ## Aggregation Optimization ### Use Appropriate Aggregation Functions ```sql -- ✅ GOOD: COUNT(*) is optimized obclient [SALES_DB]> SELECT COUNT(*) FROM order_table WHERE order_date >= '2024-01-01'; -- ✅ GOOD: COUNT(column) when you need non-NULL count obclient [SALES_DB]> SELECT COUNT(customer_id) FROM order_table; -- ❌ BAD: COUNT(DISTINCT) on large datasets can be slow obclient [SALES_DB]> SELECT COUNT(DISTINCT customer_id) FROM order_table; -- ✅ GOOD: Use GROUP BY with LIMIT (MySQL) or FETCH FIRST (Oracle) for top N -- MySQL Mode: obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total FROM order_table GROUP BY customer_id ORDER BY total DESC LIMIT 10; -- Oracle Mode: obclient [SALES_DB]> SELECT customer_id, SUM(total_amount) AS total FROM order_table GROUP BY customer_id ORDER BY total DESC FETCH FIRST 10 ROWS ONLY; ``` --- ## Subquery Optimization ### Convert Correlated Subqueries to JOINs ```sql -- ❌ BAD: Correlated subquery (executes for each row) obclient [SALES_DB]> SELECT o.order_id, o.total_amount FROM order_table o WHERE o.total_amount > ( SELECT AVG(total_amount) FROM order_table WHERE customer_id = o.customer_id ); -- ✅ GOOD: Convert to JOIN obclient [SALES_DB]> SELECT o.order_id, o.total_amount FROM order_table o INNER JOIN ( SELECT customer_id, AVG(total_amount) as avg_amount FROM order_table GROUP BY customer_id ) avg_orders ON o.customer_id = avg_orders.customer_id WHERE o.total_amount > avg_orders.avg_amount; ``` ### Use EXISTS instead of IN for large datasets **Syntax is the same for both modes:** ```sql -- ✅ GOOD: EXISTS stops at first match obclient [SALES_DB]> SELECT * FROM order_table o WHERE EXISTS ( SELECT 1 FROM customer_table c WHERE c.customer_id = o.customer_id AND c.customer_type = 'VIP' ); -- ⚠️ OK: IN works but may be slower for large lists obclient [SALES_DB]> SELECT * FROM order_table WHERE customer_id IN (SELECT customer_id FROM customer_table WHERE customer_type = 'VIP'); ``` --- ## Slow Query Tuning ### Enable SQL Audit **Configure SQL Audit settings:** ```sql -- Enable SQL Audit obclient> ALTER SYSTEM SET enable_sql_audit = true; -- Set SQL Audit memory percentage (default: 3%, range: [0,80]) obclient> SET GLOBAL ob_sql_audit_percentage = 3; ``` ### Identify Slow Queries **Query TOP SQL by CPU usage:** **MySQL Mode:** Use `oceanbase.GV$OB_SQL_AUDIT` with `TIME_TO_USEC(NOW())` **Oracle Mode:** Use `GV$OB_SQL_AUDIT` with `SYSDATE - INTERVAL '30' MINUTE` ```sql -- Find TOP SQL by CPU usage in last 30 minutes obclient> SELECT sql_id, COUNT(*) AS executions, SUM(execute_time) AS tot_cpu_time, AVG(execute_time) AS avg_cpu_time, SUM(execute_time)/(30*60*1000*1000) AS cpu_cnt, query_sql FROM oceanbase.GV$OB_SQL_AUDIT -- MySQL: add 'oceanbase.' prefix WHERE tenant_id = 'mysql001' -- MySQL: string, Oracle: number AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) -- MySQL -- Oracle: BETWEEN (SYSDATE - INTERVAL '30' MINUTE) AND SYSDATE AND is_executor_rpc = 0 GROUP BY sql_id HAVING COUNT(*) > 1 ORDER BY cpu_cnt DESC LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY ``` **Query slow queries by execution time:** **MySQL Mode:** Use `usec_to_time()` function **Oracle Mode:** Use `TO_CHAR()` function ```sql -- Find queries with execution time > 100ms obclient> SELECT request_id, usec_to_time(request_time) AS request_time, -- MySQL -- TO_CHAR(request_time, 'YYYY-MM-DD HH24:MI:SS.FF') AS request_time, -- Oracle elapsed_time, queue_time, execute_time, flt_trace_id, query_sql FROM oceanbase.v$ob_sql_audit -- MySQL: add 'oceanbase.' prefix WHERE elapsed_time > 100000 -- > 100ms in microseconds ORDER BY elapsed_time DESC LIMIT 10; -- MySQL: LIMIT, Oracle: FETCH FIRST 10 ROWS ONLY ``` ### Analyze Query Performance **Get detailed execution statistics:** **MySQL Mode:** Use `oceanbase.gv$ob_sql_audit` **Oracle Mode:** Use `GV$OB_SQL_AUDIT` (no schema prefix) ```sql -- Get detailed statistics for a specific SQL obclient> SELECT sql_id, query_sql, executions, elapsed_time, execute_time, queue_time, return_rows, affected_rows, partition_cnt, table_scan, is_hit_plan, plan_id FROM oceanbase.gv$ob_sql_audit -- MySQL: add 'oceanbase.' prefix WHERE sql_id = 'your_sql_id' ORDER BY elapsed_time DESC; ``` **Get execution plan for slow query:** **MySQL Mode:** Use `oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT` **Oracle Mode:** Use `GV$OB_PLAN_CACHE_PLAN_STAT` (no schema prefix) ```sql -- Get execution plan from plan cache obclient> SELECT tenant_id, svr_ip, svr_port, sql_id, plan_id, last_active_time, first_load_time, outline_data FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_STAT -- MySQL: add 'oceanbase.' prefix WHERE tenant_id = 1002 AND sql_id = 'your_sql_id' AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046; -- Get plan details obclient> SELECT operator, name, rows, cost FROM oceanbase.GV$OB_PLAN_CACHE_PLAN_EXPLAIN -- MySQL: add 'oceanbase.' prefix WHERE tenant_id = 1002 AND plan_id = 741 AND svr_ip = 'xxx.xxx.xxx.xxx' AND svr_port = 35046; ``` ### Find Currently Running Slow Queries **MySQL Mode:** ```sql obclient> SELECT user, tenant, sql_id, concat(time, 's') AS time, info, svr_ip, svr_port, trace_id FROM oceanbase.GV$OB_PROCESSLIST WHERE state = 'ACTIVE' ORDER BY time DESC LIMIT 10; ``` **Oracle Mode:** ```sql obclient> SELECT user_name, tenant_name, sql_id, time || 's' AS time, info, svr_ip, svr_port, trace_id FROM GV$OB_PROCESSLIST WHERE state = 'ACTIVE' ORDER BY time DESC FETCH FIRST 10 ROWS ONLY; ``` --- ## Real-World Optimization Case ### Case: Type Conversion Preventing Index Usage **Problem:** SQL execution time was 2 seconds, CPU usage exceeded 70% of server resources. **Root Cause Analysis:** 1. **Identify TOP SQL:** ```sql obclient> SELECT sql_id, COUNT(*) AS executions, AVG(execute_time) AS avg_cpu_time FROM oceanbase.GV$OB_SQL_AUDIT WHERE tenant_id = 'mysql001' AND request_time BETWEEN (TIME_TO_USEC(NOW())-30*60*1000*1000) AND TIME_TO_USEC(NOW()) GROUP BY sql_id ORDER BY SUM(execute_time) DESC LIMIT 1; ``` 2. **Analyze Execution Plan:** ```sql obclient> EXPLAIN SELECT ... FROM v_tt01 WHERE COL001 IN (20017476); ``` Execution plan showed: - High cost on `TBL3(IDX_TBL3_COL170)` index scan - `range(MIN,MIN ; MAX,MAX)always true` - index not used for matching - `filter([20017476 = cast(cast(TBL3.COL170, VARCHAR2(20 BYTE)), NUMBER)])` - type conversion happening 3. **Root Cause:** - View `V_TT01` had `TO_NUMBER` conversion on `COL001` - Table `TBL3.COL170` was `VARCHAR2(20)` but compared with `NUMBER` - Type conversion prevented index usage **Optimization Steps:** 1. **Remove unnecessary TO_NUMBER conversion:** ```sql -- Before: View had TO_NUMBER conversion CREATE VIEW V_TT01_OLD AS SELECT To_number("tt01"."col001") AS "COL001", ... -- After: Remove TO_NUMBER (COL001 is already NUMBER) CREATE VIEW V_TT01 AS SELECT "tt01"."col001" AS "COL001", ... ``` Result: SQL RT reduced from 2s to 150ms 2. **Unify data types:** ```sql -- Before: TBL3.COL170 was VARCHAR2(20) CREATE TABLE TBL3 ( COL170 VARCHAR2(20) NOT NULL, ... ); -- After: Change to NUMBER(20) to match join condition ALTER TABLE TBL3 MODIFY COL170 NUMBER(20) NOT NULL; ``` Result: SQL RT further reduced to 20ms 3. **Partition and table group optimization:** ```sql -- Add hash partitioning on join keys ALTER TABLE TBL1 PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL2 PARTITION BY HASH(COL004) PARTITIONS 8; ALTER TABLE TBL3 PARTITION BY HASH(COL170) PARTITIONS 8; ALTER TABLE TT01 PARTITION BY HASH(COL001) PARTITIONS 8; -- Create table group to avoid cross-partition joins CREATE TABLEGROUP order_tg PARTITION BY HASH(COL001) PARTITIONS 8; ALTER TABLE TBL1 SET TABLEGROUP order_tg; ALTER TABLE TBL2 SET TABLEGROUP order_tg; ALTER TABLE TBL3 SET TABLEGROUP order_tg; ALTER TABLE TT01 SET TABLEGROUP order_tg; ``` Result: SQL RT further reduced to 4ms **Final Execution Plan:** Shows `PX PARTITION ITERATOR` (partition pruning working) and `TABLE SCAN` with index `INDEX_TBL3_COL170` (index used efficiently) **Key Learnings:** - Always check execution plan for type conversions in filters - Ensure join column data types match - Use partition pruning and table groups for distributed queries - Remove unnecessary type conversions in views --- ## Common Anti-patterns ### ❌ Avoid These Patterns **1. N+1 Query Problem:** ```sql -- ❌ BAD: Multiple queries in loop -- Instead, use JOIN or batch queries ``` **2. SELECT * in Production:** ```sql -- ❌ BAD: SELECT * returns unnecessary data SELECT * FROM order_table; -- ✅ GOOD: Select only needed columns SELECT order_id, customer_id, total_amount FROM order_table; ``` **3. Functions on Indexed Columns:** ```sql -- ❌ BAD: Prevents index usage WHERE UPPER(customer_name) = 'JOHN' -- ✅ GOOD: Store normalized data or use function-based index WHERE customer_name = 'JOHN' ``` **4. Implicit Type Conversions:** ```sql -- ❌ BAD: String to number conversion WHERE order_id = '12345' -- ✅ GOOD: Match data types WHERE order_id = 12345 ``` --- ## Performance Monitoring ### Key Metrics to Monitor ```sql -- Check table statistics obclient [SALES_DB]> ANALYZE TABLE order_table; -- View table size and row count obclient [SALES_DB]> SELECT table_name, table_rows, data_length, index_length FROM information_schema.tables WHERE table_schema = 'SALES_DB'; -- Check index usage obclient [SALES_DB]> SHOW INDEX FROM order_table; ``` --- ## Optimization Checklist Before deploying SQL queries to production: - [ ] Execution plan reviewed (EXPLAIN) - [ ] Appropriate indexes created - [ ] Partition pruning enabled (if partitioned) - [ ] SELECT columns limited to needed fields - [ ] WHERE conditions use indexed columns - [ ] JOIN conditions indexed - [ ] Subqueries optimized (JOINs preferred) - [ ] Aggregations use efficient functions - [ ] No functions on indexed columns in WHERE - [ ] Data types match in comparisons - [ ] Query tested with production-like data volume --- ## Quick Reference ### Execution Plan Operators | Operator | Meaning | Optimization | |----------|---------|--------------| | TABLE SCAN / TABLE FULL SCAN | Full table scan | Add index or use partition key | | TABLE RANGE SCAN | Range scan on table | ✅ Good, but consider index | | TABLE GET | Direct row access via primary key | ✅ Best | | HASH JOIN | Hash join algorithm | ✅ Good for large equal joins | | NESTED-LOOP JOIN | Nested loop join | ✅ Good for small outer table with indexed inner | | MERGE JOIN | Merge join algorithm | ✅ Good for pre-sorted data | | SORT | Sorting operation | Add ORDER BY index or use sorted index | | AGGREGATE | Aggregation | Consider materialized views | | PX PARTITION ITERATOR | Parallel partition iterator | ✅ Good, partition pruning working | | EXCHANGE OUT DISTR | Distributed exchange | Indicates distributed execution | ### Index Types - **Primary Key**: Automatically indexed, unique - **Unique Index**: Ensures uniqueness, fast lookups - **Composite Index**: Multiple columns, order matters - **Covering Index**: Contains all query columns