---
name: postgresql-best-practices
description: PostgreSQL development best practices for schema design, query optimization, and database administration
---

# PostgreSQL Best Practices

## Core Principles

- Leverage PostgreSQL's advanced features for robust data modeling
- Optimize queries using EXPLAIN ANALYZE and proper indexing strategies
- Use native PostgreSQL data types appropriately
- Implement proper connection pooling and resource management
- Follow PostgreSQL-specific security best practices

## Schema Design

### Data Types

- Use appropriate native types: `UUID`, `JSONB`, `ARRAY`, `INET`, `CIDR`
- Prefer `TIMESTAMPTZ` over `TIMESTAMP` for timezone-aware applications
- Use `TEXT` instead of `VARCHAR` when no length limit is needed
- Consider `NUMERIC` for precise decimal calculations (financial data)
- Use `SERIAL` or `BIGSERIAL` for auto-incrementing IDs, or `UUID` for distributed systems

```sql
CREATE TABLE orders (
    order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id UUID NOT NULL REFERENCES customers(customer_id),
    order_data JSONB NOT NULL DEFAULT '{}',
    tags TEXT[] DEFAULT '{}',
    total_amount NUMERIC(12, 2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

### Table Design

- Always define primary keys
- Use foreign keys with appropriate ON DELETE/UPDATE actions
- Add NOT NULL constraints where appropriate
- Use CHECK constraints for data validation
- Consider partitioning for large tables

```sql
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL UNIQUE,
    name TEXT NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
    status VARCHAR(20) NOT NULL DEFAULT 'active'
        CHECK (status IN ('active', 'inactive', 'discontinued')),
    metadata JSONB DEFAULT '{}'
);
```

### Partitioning

- Use declarative partitioning for large tables (millions of rows)
- Choose appropriate partition strategy: RANGE, LIST, or HASH
- Create indexes on partitioned tables after partitioning

```sql
CREATE TABLE events (
    event_id BIGSERIAL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
```

## Indexing Strategies

### Index Types

- Use B-tree indexes (default) for equality and range queries
- Use GIN indexes for JSONB, arrays, and full-text search
- Use GiST indexes for geometric data and range types
- Use BRIN indexes for large, naturally ordered data
- Consider partial indexes for filtered queries

```sql
-- B-tree index for common lookups
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- GIN index for JSONB queries
CREATE INDEX idx_orders_data ON orders USING GIN (order_data);

-- Partial index for active records only
CREATE INDEX idx_active_products ON products(name) WHERE status = 'active';

-- Covering index to avoid table lookup
CREATE INDEX idx_orders_covering ON orders(customer_id)
    INCLUDE (order_date, total_amount);
```

### Index Maintenance

- Regularly run ANALYZE to update statistics
- Use REINDEX for bloated indexes
- Monitor index usage with `pg_stat_user_indexes`
- Remove unused indexes to reduce write overhead

```sql
-- Check index usage
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
```

## Query Optimization

### EXPLAIN ANALYZE

- Always analyze query plans for slow queries
- Look for sequential scans on large tables
- Identify missing indexes from query plans
- Watch for high row estimates vs actual rows

```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.order_id)
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at > '2024-01-01'
GROUP BY c.customer_id, c.name;
```

### Common Table Expressions (CTEs)

- Use CTEs for complex query organization
- Note: CTEs are optimization fences in older PostgreSQL versions
- Use `MATERIALIZED`/`NOT MATERIALIZED` hints in PostgreSQL 12+

```sql
WITH recent_orders AS MATERIALIZED (
    SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
    FROM orders
    WHERE order_date > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT c.name, ro.order_count, ro.total_spent
FROM customers c
JOIN recent_orders ro ON c.customer_id = ro.customer_id
WHERE ro.total_spent > 1000;
```

### Window Functions

- Use window functions for analytics queries
- Leverage PARTITION BY and ORDER BY for complex calculations

```sql
SELECT
    order_id,
    customer_id,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS order_rank
FROM orders;
```

## JSONB Best Practices

- Use JSONB over JSON for better performance and indexing
- Create GIN indexes for JSONB columns you query
- Use containment operators (@>, <@) for efficient queries
- Extract frequently queried fields to regular columns

```sql
-- Efficient JSONB query with GIN index
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';

-- Extract specific fields
SELECT
    product_id,
    metadata->>'brand' AS brand,
    (metadata->>'rating')::numeric AS rating
FROM products
WHERE metadata ? 'rating';
```

## Connection Management

### Connection Pooling

- Use PgBouncer or pgpool-II for connection pooling
- Set appropriate pool sizes based on workload
- Use transaction pooling mode for short-lived connections

### Connection Settings

```sql
-- Recommended session settings
SET statement_timeout = '30s';
SET lock_timeout = '10s';
SET idle_in_transaction_session_timeout = '60s';
```

## Transactions and Locking

- Use appropriate transaction isolation levels
- Keep transactions short to reduce lock contention
- Use advisory locks for application-level locking
- Monitor and resolve lock conflicts

```sql
-- Use advisory locks for application coordination
SELECT pg_advisory_lock(hashtext('resource_name'));
-- Do work
SELECT pg_advisory_unlock(hashtext('resource_name'));

-- Check for blocking queries
SELECT blocked_locks.pid AS blocked_pid,
       blocking_locks.pid AS blocking_pid,
       blocked_activity.query AS blocked_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation = blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked_locks.pid;
```

## Maintenance

### Vacuum and Analyze

- Enable autovacuum and tune for your workload
- Run manual VACUUM ANALYZE after bulk operations
- Monitor table bloat

```sql
-- Check table bloat
SELECT schemaname, relname,
       n_live_tup, n_dead_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
```

### Backup Strategies

- Use pg_dump for logical backups
- Use pg_basebackup for physical backups
- Implement point-in-time recovery (PITR) with WAL archiving
- Test backup restoration regularly

## Security

- Use SSL/TLS for connections
- Implement row-level security (RLS) for multi-tenant applications
- Use roles and GRANT/REVOKE for access control
- Audit sensitive operations with pgAudit extension

```sql
-- Enable row-level security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY documents_tenant_policy ON documents
    FOR ALL
    USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- Grant minimal privileges
GRANT SELECT, INSERT, UPDATE ON orders TO app_user;
GRANT USAGE ON SEQUENCE orders_order_id_seq TO app_user;
```

## Monitoring

- Monitor with pg_stat_statements extension
- Track slow queries and optimize regularly
- Set up alerts for replication lag, connection count, and disk usage
- Use pg_stat_activity to monitor active queries

```sql
-- Enable pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
```