--- name: optimizing-sql description: Optimize SQL query performance through EXPLAIN analysis, indexing strategies, and query rewriting for PostgreSQL, MySQL, and SQL Server. Use when debugging slow queries, analyzing execution plans, or improving database performance. --- # SQL Optimization Provide tactical guidance for optimizing SQL query performance across PostgreSQL, MySQL, and SQL Server through execution plan analysis, strategic indexing, and query rewriting. ## When to Use This Skill Trigger this skill when encountering: - Slow query performance or database timeouts - Analyzing EXPLAIN plans or execution plans - Determining index requirements - Rewriting inefficient queries - Identifying query anti-patterns (N+1, SELECT *, correlated subqueries) - Database-specific optimization needs (PostgreSQL, MySQL, SQL Server) ## Core Optimization Workflow ### Step 1: Analyze Query Performance Run execution plan analysis to identify bottlenecks: **PostgreSQL:** ```sql EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com'; ``` **MySQL:** ```sql EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5; ``` **SQL Server:** Use SQL Server Management Studio: Display Estimated Execution Plan (Ctrl+L) **Key Metrics to Monitor:** - **Cost**: Estimated resource consumption - **Rows**: Number of rows processed (estimated vs actual) - **Scan Type**: Sequential scan vs index scan - **Execution Time**: Actual time spent on operation For detailed execution plan interpretation, see `references/explain-guide.md`. ### Step 2: Identify Optimization Opportunities **Common Red Flags:** | Indicator | Problem | Solution | |-----------|---------|----------| | Seq Scan / Table Scan | Full table scan on large table | Add index on filter columns | | High row count | Processing excessive rows | Add WHERE filter or index | | Nested Loop with large outer table | Inefficient join algorithm | Index join columns | | Correlated subquery | Subquery executes per row | Rewrite as JOIN or EXISTS | | Sort operation on large result set | Expensive sorting | Add index matching ORDER BY | For scan type interpretation, see `references/scan-types.md`. ### Step 3: Apply Indexing Strategies **Index Decision Framework:** ``` Is column used in WHERE, JOIN, ORDER BY, or GROUP BY? ├─ YES → Is column selective (many unique values)? │ ├─ YES → Is table frequently queried? │ │ ├─ YES → ADD INDEX │ │ └─ NO → Consider based on query frequency │ └─ NO (low selectivity) → Skip index └─ NO → Skip index ``` **Index Types by Use Case:** **PostgreSQL:** - **B-tree** (default): General-purpose, supports <, ≤, =, ≥, >, BETWEEN, IN - **Hash**: Equality comparisons only (=) - **GIN**: Full-text search, JSONB, arrays - **GiST**: Spatial data, geometric types - **BRIN**: Very large tables with naturally ordered data **MySQL:** - **B-tree** (default): General-purpose index - **Full-text**: Text search on VARCHAR/TEXT columns - **Spatial**: Spatial data types **SQL Server:** - **Clustered**: Table data sorted by index (one per table) - **Non-clustered**: Separate index structure (multiple allowed) For comprehensive indexing guidance, see `references/indexing-decisions.md` and `references/index-types.md`. ### Step 4: Design Composite Indexes For queries filtering on multiple columns, use composite indexes: **Column Order Matters:** 1. **Equality filters first** (most selective) 2. **Additional equality filters** (by selectivity) 3. **Range filters or ORDER BY** (last) **Example:** ```sql -- Query pattern SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped' ORDER BY created_at DESC LIMIT 10; -- Optimal composite index CREATE INDEX idx_orders_customer_status_created ON orders (customer_id, status, created_at DESC); ``` For composite index design patterns, see `references/composite-indexes.md`. ### Step 5: Rewrite Inefficient Queries **Common Anti-Patterns to Avoid:** **1. SELECT * (Over-fetching)** ```sql -- ❌ Bad: Fetches all columns SELECT * FROM users WHERE id = 1; -- ✅ Good: Fetch only needed columns SELECT id, name, email FROM users WHERE id = 1; ``` **2. N+1 Queries** ```sql -- ❌ Bad: 1 + N queries SELECT * FROM users LIMIT 100; -- Then in loop: SELECT * FROM posts WHERE user_id = ?; -- ✅ Good: Single JOIN SELECT users.*, posts.id AS post_id, posts.title FROM users LEFT JOIN posts ON users.id = posts.user_id; ``` **3. Non-Sargable Queries** (functions on indexed columns) ```sql -- ❌ Bad: Function prevents index usage SELECT * FROM orders WHERE YEAR(created_at) = 2025; -- ✅ Good: Sargable range condition SELECT * FROM orders WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'; ``` **4. Correlated Subqueries** ```sql -- ❌ Bad: Subquery executes per row SELECT name, (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id) FROM users; -- ✅ Good: JOIN with GROUP BY SELECT users.name, COUNT(orders.id) AS order_count FROM users LEFT JOIN orders ON users.id = orders.user_id GROUP BY users.id, users.name; ``` For complete anti-pattern reference, see `references/anti-patterns.md`. For efficient query patterns, see `references/efficient-patterns.md`. ## Quick Reference Tables ### Index Selection Guide | Query Pattern | Index Type | Example | |--------------|------------|---------| | `WHERE column = value` | Single-column B-tree | `CREATE INDEX ON table (column)` | | `WHERE col1 = ? AND col2 = ?` | Composite B-tree | `CREATE INDEX ON table (col1, col2)` | | `WHERE text_col LIKE '%word%'` | Full-text (GIN/Full-text) | `CREATE INDEX ON table USING GIN (to_tsvector('english', text_col))` | | `WHERE geom && box` | Spatial (GiST) | `CREATE INDEX ON table USING GIST (geom)` | | `WHERE json_col @> '{"key":"value"}'` | JSONB (GIN) | `CREATE INDEX ON table USING GIN (json_col)` | ### Join Optimization Checklist - [ ] Index foreign key columns on both sides of JOIN - [ ] Order joins starting with table returning fewest rows - [ ] Use INNER JOIN when possible (more efficient than OUTER JOIN) - [ ] Avoid joining more than 5 tables (break into CTEs or subqueries) - [ ] Consider denormalization for frequently joined tables in read-heavy systems ### Execution Plan Performance Targets | Scan Type | Performance | When Acceptable | |-----------|-------------|-----------------| | Index-Only Scan | Best | Always preferred | | Index Scan | Excellent | Small-medium result sets | | Bitmap Heap Scan | Good | Medium result sets (PostgreSQL) | | Sequential Scan | Poor | Only for small tables (<1000 rows) or full table queries | | Table Scan | Poor | Only for small tables or unavoidable full scans | ## Database-Specific Optimizations ### PostgreSQL-Specific Features **Partial Indexes** (index subset of rows): ```sql CREATE INDEX idx_active_users_login ON users (last_login) WHERE status = 'active'; ``` **Expression Indexes** (index computed values): ```sql CREATE INDEX idx_users_email_lower ON users (LOWER(email)); ``` **Covering Indexes** (avoid heap access): ```sql CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (id, name); ``` For comprehensive PostgreSQL optimization, see `references/postgresql.md`. ### MySQL-Specific Features **Index Hints** (override optimizer): ```sql SELECT * FROM orders USE INDEX (idx_orders_customer) WHERE customer_id = 123; ``` **Storage Engine Selection:** - **InnoDB** (default): Transactional, row-level locks, clustered primary key - **MyISAM**: Faster reads, no transactions, table-level locks For comprehensive MySQL optimization, see `references/mysql.md`. ### SQL Server-Specific Features **Query Store** (track query performance over time): ```sql ALTER DATABASE YourDatabase SET QUERY_STORE = ON; ``` **Execution Plan Warnings:** - Look for yellow exclamation marks in graphical execution plans - Thick arrows indicate high row counts For comprehensive SQL Server optimization, see `references/sqlserver.md`. ## Advanced Optimization Techniques ### Common Table Expressions (CTEs) Break complex queries into readable, maintainable parts: ```sql WITH active_customers AS ( SELECT id, name FROM customers WHERE status = 'active' ), recent_orders AS ( SELECT customer_id, COUNT(*) as order_count FROM orders WHERE created_at > NOW() - INTERVAL '30 days' GROUP BY customer_id ) SELECT ac.name, COALESCE(ro.order_count, 0) as orders FROM active_customers ac LEFT JOIN recent_orders ro ON ac.id = ro.customer_id; ``` ### EXISTS vs IN for Subqueries Use EXISTS for better performance with large datasets: ```sql -- ✅ Good: EXISTS stops at first match SELECT * FROM users WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id); -- ❌ Less efficient: IN builds full list SELECT * FROM users WHERE id IN (SELECT user_id FROM orders); ``` ### Denormalization Decision Framework Consider denormalization when: - Query joins 3+ tables frequently - Data is relatively static (infrequent updates) - Read performance is critical - Write overhead is acceptable **Denormalization Strategies:** 1. **Duplicate columns**: Copy foreign key data into main table 2. **Summary tables**: Pre-aggregate data 3. **Materialized views**: Database-maintained denormalized views 4. **Application caching**: Redis/Memcached for frequently accessed data ## Optimization Workflow Example **Scenario:** API endpoint taking 2 seconds to load **Step 1: Identify Slow Query** ``` Use APM/observability tools to identify database query causing delay ``` **Step 2: Run EXPLAIN ANALYZE** ```sql EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123 ORDER BY created_at DESC LIMIT 10; ``` **Step 3: Analyze Output** ``` Seq Scan on orders (cost=0.00..2500.00 rows=10) Filter: (customer_id = 123) Rows Removed by Filter: 99990 ``` **Problem**: Sequential scan filtering 99,990 rows **Step 4: Add Composite Index** ```sql CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC); ``` **Step 5: Verify Improvement** ```sql EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123 ORDER BY created_at DESC LIMIT 10; ``` ``` Index Scan using idx_orders_customer_created (cost=0.42..12.44 rows=10) Index Cond: (customer_id = 123) ``` **Result**: 200x faster (2000ms → 10ms) ## Monitoring and Maintenance **Regular Optimization Tasks:** - Review slow query logs weekly - Update database statistics regularly (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server) - Monitor index usage (drop unused indexes) - Archive old data to keep tables manageable - Review execution plans for critical queries quarterly **PostgreSQL Statistics Update:** ```sql ANALYZE table_name; ``` **MySQL Statistics Update:** ```sql ANALYZE TABLE table_name; ``` **SQL Server Statistics Update:** ```sql UPDATE STATISTICS table_name; ``` ## Related Skills - **databases-relational**: Schema design and database fundamentals - **observability**: Performance monitoring and slow query detection - **api-patterns**: API-level optimization (pagination, caching) - **performance-engineering**: Application performance profiling ## Additional Resources For comprehensive documentation, reference these files: - `references/explain-guide.md` - Detailed EXPLAIN plan interpretation - `references/scan-types.md` - Scan type meanings and performance implications - `references/indexing-decisions.md` - When and how to add indexes - `references/index-types.md` - Database-specific index types - `references/composite-indexes.md` - Multi-column index design - `references/anti-patterns.md` - Common anti-patterns with solutions - `references/efficient-patterns.md` - Efficient query patterns - `references/postgresql.md` - PostgreSQL-specific optimizations - `references/mysql.md` - MySQL-specific optimizations - `references/sqlserver.md` - SQL Server-specific optimizations For working SQL examples, see `examples/` directory.