---
name: query-optimizer
description: Analyze and optimize SQL queries for better performance and efficiency.
---

# Query Optimizer Skill

Analyze and optimize SQL queries for better performance and efficiency.

## Instructions

You are a database performance optimization expert. When invoked:

1. **Analyze Query Performance**:
   - Use EXPLAIN/EXPLAIN ANALYZE to understand execution plan
   - Identify slow queries from logs
   - Measure query execution time
   - Detect full table scans and missing indexes

2. **Identify Bottlenecks**:
   - Find N+1 query problems
   - Detect inefficient JOINs
   - Identify missing or unused indexes
   - Spot suboptimal WHERE clauses

3. **Optimize Queries**:
   - Add appropriate indexes
   - Rewrite queries for better performance
   - Suggest caching strategies
   - Recommend query restructuring

4. **Provide Recommendations**:
   - Index creation suggestions
   - Query rewriting alternatives
   - Database configuration tuning
   - Monitoring and alerting setup

## Supported Databases

- **SQL**: PostgreSQL, MySQL, MariaDB, SQL Server, SQLite
- **Analysis Tools**: EXPLAIN, EXPLAIN ANALYZE, Query Profiler
- **Monitoring**: pg_stat_statements, slow query log, performance schema

## Usage Examples

```
@query-optimizer
@query-optimizer --analyze-slow-queries
@query-optimizer --suggest-indexes
@query-optimizer --explain SELECT * FROM users WHERE email = 'test@example.com'
@query-optimizer --fix-n-plus-one
```

## Query Analysis Tools

### PostgreSQL - EXPLAIN ANALYZE
```sql
-- Basic EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with all options (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT * FROM orders
WHERE user_id = 123
  AND created_at >= '2024-01-01';
```

**Reading EXPLAIN Output:**
```
Seq Scan on users  (cost=0.00..1234.56 rows=10000 width=32)
  Filter: (active = true)

-- Seq Scan = Sequential Scan (full table scan) - BAD for large tables
-- cost=0.00..1234.56 = startup cost..total cost
-- rows=10000 = estimated rows
-- width=32 = average row size in bytes
```

```
Index Scan using idx_users_email on users  (cost=0.29..8.30 rows=1 width=32)
  Index Cond: (email = 'test@example.com'::text)

-- Index Scan = Using index - GOOD
-- Much lower cost than Seq Scan
-- rows=1 = accurate estimate
```

### MySQL - EXPLAIN
```sql
-- MySQL EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- EXPLAIN with execution stats (MySQL 8.0+)
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123;

-- Show warnings for optimization info
EXPLAIN
SELECT * FROM users WHERE email = 'test@example.com';
SHOW WARNINGS;
```

**MySQL EXPLAIN Output:**
```
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL    | NULL    | NULL  | 1000 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+

-- type=ALL means full table scan - BAD
-- key=NULL means no index used

+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key            | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
|  1 | SIMPLE      | users | ref  | idx_users_email| idx_users_email| 767     | const |    1 | NULL  |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+

-- type=ref means index lookup - GOOD
-- key shows index being used
```

## Common Performance Issues

### 1. Missing Indexes

**Problem:**
```sql
-- Slow query - full table scan
SELECT * FROM users WHERE email = 'john@example.com';

-- EXPLAIN shows:
-- Seq Scan on users (cost=0.00..1500.00 rows=1 width=100)
--   Filter: (email = 'john@example.com')
```

**Solution:**
```sql
-- Add index on email column
CREATE INDEX idx_users_email ON users(email);

-- Now EXPLAIN shows:
-- Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=100)
--   Index Cond: (email = 'john@example.com')

-- Query becomes 100x faster
```

### 2. N+1 Query Problem

**Problem:**
```javascript
// ORM code causing N+1 queries
const users = await User.findAll(); // 1 query

for (const user of users) {
  const orders = await Order.findAll({
    where: { userId: user.id }  // N queries (one per user)
  });
  console.log(`${user.name}: ${orders.length} orders`);
}

// Total: 1 + N queries for N users
// For 100 users = 101 queries!
```

**Solution:**
```javascript
// Use eager loading - single query with JOIN
const users = await User.findAll({
  include: [{
    model: Order,
    attributes: ['id', 'total_amount']
  }]
});

for (const user of users) {
  console.log(`${user.name}: ${user.orders.length} orders`);
}

// Total: 1 query regardless of user count
```

**SQL Equivalent:**
```sql
-- Instead of multiple queries:
SELECT * FROM users;
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... (N more queries)

-- Use single JOIN query:
SELECT
  u.id,
  u.name,
  COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
```

### 3. SELECT * Inefficiency

**Problem:**
```sql
-- Fetching all columns when only need few
SELECT * FROM products
WHERE category_id = 5;

-- Fetches: id, name, description (large text), image_url, specs (json),
--         price, stock, created_at, updated_at, etc.
```

**Solution:**
```sql
-- Only select needed columns
SELECT id, name, price, stock
FROM products
WHERE category_id = 5;

-- Benefits:
-- - Less data transferred
-- - Faster query execution
-- - Lower memory usage
-- - Can use covering indexes
```

### 4. Inefficient Pagination

**Problem:**
```sql
-- OFFSET becomes slow with large offsets
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;

-- Database must:
-- 1. Sort all rows
-- 2. Skip 10,000 rows
-- 3. Return next 20
-- Gets slower as offset increases
```

**Solution:**
```sql
-- Use cursor-based (keyset) pagination
SELECT * FROM users
WHERE created_at < '2024-01-01 12:00:00'
  AND (created_at < '2024-01-01 12:00:00' OR id < 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Or with indexed column:
SELECT * FROM users
WHERE id < 10000
ORDER BY id DESC
LIMIT 20;

-- Benefits:
-- - Consistent performance regardless of page
-- - Uses index efficiently
-- - No need to skip rows
```

### 5. Function on Indexed Column

**Problem:**
```sql
-- Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';

-- EXPLAIN shows Seq Scan (index not used)
```

**Solution 1 - Store lowercase:**
```sql
-- Add computed column
ALTER TABLE users ADD COLUMN email_lower VARCHAR(255)
  GENERATED ALWAYS AS (LOWER(email)) STORED;

CREATE INDEX idx_users_email_lower ON users(email_lower);

-- Query:
SELECT * FROM users
WHERE email_lower = 'john@example.com';
```

**Solution 2 - Functional index (PostgreSQL):**
```sql
-- Create index on function result
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now original query uses index
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';
```

**Solution 3 - Case-insensitive collation:**
```sql
-- PostgreSQL - use citext type
ALTER TABLE users ALTER COLUMN email TYPE citext;

-- Query without LOWER:
SELECT * FROM users WHERE email = 'john@example.com';
-- Automatically case-insensitive
```

### 6. Inefficient JOINs

**Problem:**
```sql
-- Multiple JOINs without proper indexes
SELECT
  u.username,
  o.id as order_id,
  p.name as product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE u.email = 'john@example.com';

-- Slow if missing indexes on:
-- - users.email
-- - orders.user_id
-- - order_items.order_id
-- - order_items.product_id
```

**Solution:**
```sql
-- Add necessary indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Now query uses indexes for all JOINs
-- EXPLAIN will show "Index Scan" for each table
```

### 7. OR Conditions

**Problem:**
```sql
-- OR prevents efficient index usage
SELECT * FROM users
WHERE username = 'john' OR email = 'john@example.com';

-- May not use indexes optimally
```

**Solution:**
```sql
-- Use UNION for better index usage
SELECT * FROM users WHERE username = 'john'
UNION
SELECT * FROM users WHERE email = 'john@example.com';

-- Each subquery uses its own index
-- Deduplicates results automatically
```

### 8. NOT IN with Subquery

**Problem:**
```sql
-- Slow subquery execution
SELECT * FROM users
WHERE id NOT IN (
  SELECT user_id FROM banned_users
);

-- Can be very slow with large subquery results
```

**Solution:**
```sql
-- Use LEFT JOIN with NULL check
SELECT u.*
FROM users u
LEFT JOIN banned_users bu ON u.id = bu.user_id
WHERE bu.user_id IS NULL;

-- Or use NOT EXISTS (often faster):
SELECT u.*
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM banned_users bu
  WHERE bu.user_id = u.id
);
```

## Index Optimization

### When to Add Indexes

**Add indexes for:**
- Primary keys (automatic in most databases)
- Foreign keys (critical for JOINs)
- Columns in WHERE clauses
- Columns in ORDER BY clauses
- Columns in GROUP BY clauses
- Columns in JOIN conditions
- Columns with high cardinality (many unique values)

### Index Types

**B-Tree Index (Default):**
```sql
-- Best for: equality (=) and range (<, >, BETWEEN) queries
CREATE INDEX idx_users_created_at ON users(created_at);

-- Good for:
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
```

**Composite Index:**
```sql
-- Index on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Used for queries filtering both columns
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Also used for queries on first column only
SELECT * FROM orders WHERE user_id = 123;

-- NOT used for queries on second column only
SELECT * FROM orders WHERE status = 'pending'; -- Won't use this index

-- Column order matters! Most selective first
```

**Partial Index (PostgreSQL):**
```sql
-- Index only subset of rows
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;

-- Smaller index, faster queries for active users
SELECT * FROM users WHERE email = 'john@example.com' AND active = true;
```

**GIN Index (PostgreSQL - for arrays, JSONB, full-text):**
```sql
-- For JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN(metadata);

-- Query JSONB data
SELECT * FROM products
WHERE metadata @> '{"brand": "Apple"}';

-- For array columns
CREATE INDEX idx_tags ON posts USING GIN(tags);

-- Query arrays
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
```

**Full-Text Search Index:**
```sql
-- PostgreSQL
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name || ' ' || description));

-- Full-text search query
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
  @@ to_tsquery('english', 'laptop & gaming');
```

### Covering Index

**Concept:**
```sql
-- Covering index includes all columns needed by query
CREATE INDEX idx_users_email_username ON users(email, username);

-- This query can be answered entirely from index (no table access)
SELECT username FROM users WHERE email = 'john@example.com';

-- PostgreSQL: Index-Only Scan
-- MySQL: Using index
```

**With INCLUDE (PostgreSQL 11+):**
```sql
-- Include non-indexed columns in index leaf nodes
CREATE INDEX idx_users_email ON users(email)
INCLUDE (username, created_at);

-- Query can use index without table access
SELECT username, created_at
FROM users
WHERE email = 'john@example.com';
```

### Index Maintenance

**Find Unused Indexes (PostgreSQL):**
```sql
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes to save space and improve write performance
```

**Find Duplicate Indexes:**
```sql
-- PostgreSQL query to find duplicate indexes
SELECT
  indrelid::regclass AS table_name,
  array_agg(indexrelid::regclass) AS indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;
```

**Rebuild Fragmented Indexes:**
```sql
-- PostgreSQL
REINDEX INDEX idx_users_email;
REINDEX TABLE users;

-- MySQL
OPTIMIZE TABLE users;
```

## Query Rewriting Examples

### Example 1: Aggregation Optimization

**Before:**
```sql
SELECT
  u.id,
  u.username,
  (SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count,
  (SELECT SUM(total_amount) FROM orders WHERE user_id = u.id) as total_spent
FROM users u
WHERE u.active = true;

-- N+1 problem: 1 query + 2 subqueries per user
```

**After:**
```sql
SELECT
  u.id,
  u.username,
  COUNT(o.id) as order_count,
  COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;

-- Single query with JOIN
-- Much faster!
```

### Example 2: EXISTS vs IN

**Before:**
```sql
SELECT * FROM products
WHERE id IN (
  SELECT product_id FROM order_items
  WHERE created_at >= '2024-01-01'
);

-- Subquery returns all product_ids (potentially large result set)
```

**After:**
```sql
SELECT p.* FROM products p
WHERE EXISTS (
  SELECT 1 FROM order_items oi
  WHERE oi.product_id = p.id
    AND oi.created_at >= '2024-01-01'
);

-- EXISTS stops at first match (more efficient)
```

### Example 3: Avoid Cartesian Products

**Before:**
```sql
-- Accidental cartesian product
SELECT *
FROM users u, orders o
WHERE u.active = true
  AND o.status = 'completed';

-- Returns every user combined with every completed order!
-- Missing JOIN condition
```

**After:**
```sql
SELECT u.*, o.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.active = true
  AND o.status = 'completed';

-- Proper JOIN condition
```

### Example 4: Optimize DISTINCT

**Before:**
```sql
SELECT DISTINCT user_id
FROM orders
WHERE status = 'completed';

-- DISTINCT requires sorting/deduplication
```

**After:**
```sql
SELECT user_id
FROM orders
WHERE status = 'completed'
GROUP BY user_id;

-- GROUP BY often faster than DISTINCT
-- Or if unique constraint exists:
SELECT DISTINCT ON (user_id) user_id, created_at
FROM orders
WHERE status = 'completed'
ORDER BY user_id, created_at DESC;
```

## Monitoring Slow Queries

### PostgreSQL - pg_stat_statements

```sql
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slowest queries
SELECT
  substring(query, 1, 50) AS short_query,
  round(total_exec_time::numeric, 2) AS total_time,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Find queries with most calls
SELECT
  substring(query, 1, 50) AS short_query,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();
```

### MySQL - Slow Query Log

```sql
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';

-- Log queries not using indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Analyze slow query log
-- Use mysqldumpslow tool:
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log
```

### Performance Schema (MySQL)

```sql
-- Enable performance schema
SET GLOBAL performance_schema = ON;

-- Find slowest statements
SELECT
  DIGEST_TEXT,
  COUNT_STAR AS executions,
  ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
  ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
```

## Best Practices

### DO ✓
- **Use EXPLAIN** before and after optimization
- **Add indexes on foreign keys** - Critical for JOINs
- **Use covering indexes** when possible
- **Paginate large result sets** - Avoid loading all data
- **Monitor query performance** - Use pg_stat_statements or slow query log
- **Test on production-like data** - Performance differs with data volume
- **Use connection pooling** - Reduce connection overhead
- **Cache frequently accessed data** - Redis, Memcached
- **Archive old data** - Keep active tables smaller
- **Regular VACUUM/ANALYZE** (PostgreSQL) - Update statistics

### DON'T ✗
- **Don't use SELECT *** - Fetch only needed columns
- **Don't over-index** - Each index slows down writes
- **Don't ignore EXPLAIN warnings** - They indicate problems
- **Don't use functions on indexed columns** - Prevents index usage
- **Don't fetch more data than needed** - Use LIMIT
- **Don't use OFFSET for deep pagination** - Use cursor-based instead
- **Don't ignore database logs** - Monitor for errors
- **Don't optimize prematurely** - Profile first, optimize bottlenecks
- **Don't forget about write performance** - Indexes slow down INSERTs
- **Don't skip testing** - Verify optimizations actually help

## Query Optimization Checklist

```markdown
## Query Optimization Checklist

- [ ] Run EXPLAIN/EXPLAIN ANALYZE on query
- [ ] Check if query uses indexes (no Seq Scan on large tables)
- [ ] Verify indexes exist on:
  - [ ] Foreign key columns
  - [ ] WHERE clause columns
  - [ ] JOIN condition columns
  - [ ] ORDER BY columns
- [ ] SELECT only needed columns (avoid SELECT *)
- [ ] Use appropriate JOIN type (INNER vs LEFT)
- [ ] Avoid N+1 queries (use JOINs or eager loading)
- [ ] Use pagination for large result sets
- [ ] Check for unused indexes (slow down writes)
- [ ] Consider query caching for frequent queries
- [ ] Test with production-like data volumes
- [ ] Monitor query performance over time
```

## Notes

- Always measure before and after optimization
- Index creation can take time on large tables
- Too many indexes slow down INSERT/UPDATE/DELETE
- Keep database statistics up to date (ANALYZE)
- Consider read replicas for read-heavy workloads
- Use database-specific features when beneficial
- Document optimization decisions for team
- Regular performance audits prevent degradation