---
name: postgresql-optimization
description: Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.
version: 1.0.0
category: database
tags: [postgresql, sql, performance, indexing, query-optimization, database]
---

# PostgreSQL Optimization

## Overview

Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.

## When to Use

- Diagnosing slow queries with EXPLAIN ANALYZE
- Creating optimal indexes for query patterns
- Designing database schemas for performance
- Configuring PostgreSQL for production workloads
- Implementing connection pooling (PgBouncer, Supavisor)
- Setting up partitioning for large tables
- Analyzing and reducing lock contention
- Migrating or upgrading PostgreSQL versions

## Capabilities

### Query Optimization
- EXPLAIN / EXPLAIN ANALYZE interpretation
- Query plan analysis and optimization
- Identifying sequential scans vs index scans
- Join optimization and query rewriting
- CTE vs subquery performance trade-offs
- Window function optimization

### Indexing Strategies
- B-tree, GIN, GiST, BRIN index selection
- Partial indexes for filtered queries
- Expression indexes for computed values
- Covering indexes (INCLUDE clause)
- Index-only scans optimization
- Concurrent index creation

### Schema Design
- Normalization vs denormalization trade-offs
- JSONB column design and indexing
- Array columns and operations
- Enum types vs lookup tables
- Foreign key cascade strategies
- Table inheritance and partitioning

### Configuration Tuning
- Memory settings (shared_buffers, work_mem, effective_cache_size)
- Connection limits and pooling
- WAL and checkpoint tuning
- Autovacuum configuration
- Statistics collection settings

### Advanced Features
- Partitioning (range, list, hash)
- Materialized views with refresh strategies
- Full-text search with tsvector/tsquery
- PostGIS geospatial queries
- Logical replication setup
- pg_stat_statements analysis

## Dependencies

Works well with:
- `database-modeler` - Schema design and ERD creation
- `data-pipeline-engineer` - ETL and data processing
- `site-reliability-engineer` - Database monitoring and alerting
- `nextjs-app-router-expert` - Full-stack data fetching

## Examples

### Reading EXPLAIN ANALYZE Output
```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Key metrics to look for:
-- - "Seq Scan" on large tables → needs index
-- - "Rows Removed by Filter" high → filter before join
-- - "Sort Method: external merge" → increase work_mem
-- - "Buffers: shared hit" vs "shared read" → cache efficiency
```

### Creating Effective Indexes
```sql
-- Basic B-tree for equality and range queries
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC);

-- Partial index for common filter
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';

-- GIN index for JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_metadata
ON products USING GIN (metadata jsonb_path_ops);

-- Covering index to enable index-only scans
CREATE INDEX CONCURRENTLY idx_users_email_covering
ON users (email) INCLUDE (name, created_at);

-- Expression index for case-insensitive search
CREATE INDEX CONCURRENTLY idx_users_email_lower
ON users (LOWER(email));
```

### Optimizing N+1 Queries
```sql
-- BAD: N+1 pattern (1 + N queries)
SELECT * FROM posts WHERE user_id = $1;
-- Then for each post: SELECT * FROM comments WHERE post_id = $1;

-- GOOD: Single query with lateral join
SELECT p.*, c.comments
FROM posts p
LEFT JOIN LATERAL (
  SELECT json_agg(c.*) as comments
  FROM comments c
  WHERE c.post_id = p.id
) c ON true
WHERE p.user_id = $1;

-- GOOD: Window function for aggregates
SELECT
  p.*,
  COUNT(*) OVER (PARTITION BY p.user_id) as user_post_count
FROM posts p
WHERE p.user_id = $1;
```

### Table Partitioning
```sql
-- Create partitioned table by date range
CREATE TABLE events (
  id BIGSERIAL,
  event_type TEXT NOT NULL,
  payload JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automate partition creation with pg_partman
CREATE EXTENSION pg_partman;
SELECT partman.create_parent('public.events', 'created_at', 'native', 'monthly');
```

### Connection Pooling Config (PgBouncer)
```ini
; pgbouncer.ini

[databases]
myapp = host=localhost dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool settings
pool_mode = transaction        ; Recommended for most apps
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5

; Timeouts
server_idle_timeout = 600
client_idle_timeout = 0
```

### Performance Configuration
```sql
-- Check current settings
SHOW shared_buffers;        -- ~25% of RAM
SHOW effective_cache_size;  -- ~75% of RAM
SHOW work_mem;              -- Per-operation, start small (64MB)
SHOW maintenance_work_mem;  -- For VACUUM, CREATE INDEX (512MB-1GB)

-- Recommended production settings (for 32GB RAM server)
ALTER SYSTEM SET shared_buffers = '8GB';
ALTER SYSTEM SET effective_cache_size = '24GB';
ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET random_page_cost = 1.1;  -- For SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200;  -- For SSD

-- Reload configuration
SELECT pg_reload_conf();
```

### Finding Slow Queries
```sql
-- Enable pg_stat_statements
CREATE EXTENSION pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT
  round(total_exec_time::numeric, 2) as total_ms,
  calls,
  round(mean_exec_time::numeric, 2) as avg_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER())::numeric, 2) as pct,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with most I/O
SELECT
  round(shared_blks_read::numeric, 2) as disk_reads,
  round(shared_blks_hit::numeric, 2) as cache_hits,
  round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_ratio,
  query
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 10;
```

### Analyzing Table Bloat
```sql
-- Check table bloat
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size,
  pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size,
  n_dead_tup,
  n_live_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- Manual VACUUM for critical tables
VACUUM (VERBOSE, ANALYZE) orders;

-- Reclaim space (requires exclusive lock)
VACUUM FULL orders;  -- Use during maintenance window
```

## Best Practices

1. **Always use EXPLAIN ANALYZE** - Don't guess, measure actual query performance
2. **Create indexes CONCURRENTLY** - Avoid blocking writes during index creation
3. **Partial indexes for hot paths** - Index only the rows you query frequently
4. **Use connection pooling** - PgBouncer or Supavisor for production
5. **Monitor pg_stat_statements** - Track query performance over time
6. **Regular ANALYZE** - Keep statistics current for query planner
7. **Avoid SELECT *** - Only fetch columns you need
8. **Batch large updates** - Process in chunks to avoid lock contention
9. **Use prepared statements** - Reduce parsing overhead for repeated queries

## Common Pitfalls

- **Missing indexes** - Check for sequential scans on large tables
- **Over-indexing** - Too many indexes slow down writes
- **work_mem too low** - Causes disk-based sorts and hash joins
- **Connection exhaustion** - Not using connection pooling
- **Stale statistics** - Autovacuum not running frequently enough
- **Bloated tables** - Not vacuuming after large deletes/updates
- **N+1 queries** - Fetching related data in loops instead of joins
- **SELECT * everywhere** - Fetching unnecessary columns