--- name: database-advanced-optimization category: database difficulty: advanced estimated_time: 60 minutes prerequisites: - Basic SQL knowledge - Database design fundamentals - Understanding of indexes and query execution tags: - postgresql - mongodb - redis - performance - optimization - scaling - caching - sharding learning_objectives: - Master database-specific optimization techniques - Implement effective caching strategies - Design scalable database architectures - Optimize query performance across multiple database systems - Configure connection pooling and monitoring version: 1.0.0 last_updated: 2025-01-17 description: 'Use SQL (PostgreSQL) when:' --- # Database Advanced Optimization ## Level 1: Quick Reference (5 minutes) ### Database Selection Guide **Use SQL (PostgreSQL) when:** - ACID compliance is critical - Complex joins and transactions required - Data has clear relational structure - Strong consistency needed - Rich query capabilities required **Use NoSQL (MongoDB) when:** - Flexible schema needed - Horizontal scaling is priority - Document-oriented data model fits - High write throughput required - Eventual consistency acceptable **Use In-Memory (Redis) when:** - Sub-millisecond latency required - Caching layer needed - Real-time features (pub/sub, streams) - Session management - Rate limiting or counters ### Common Optimization Patterns ```sql -- PostgreSQL: Create covering index CREATE INDEX idx_orders_user_date ON orders(user_id, created_at) INCLUDE (status, total_amount); -- PostgreSQL: Analyze query plan EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 123; ``` ```javascript // MongoDB: Create compound index db.orders.createIndex({ userId: 1, createdAt: -1 }, { background: true }); // MongoDB: Use aggregation pipeline efficiently db.orders.aggregate([ { $match: { status: "pending" } }, { $sort: { createdAt: -1 } }, { $limit: 100 } ]); ``` ```python # Redis: Implement cache-aside pattern def get_user(user_id): cache_key = f"user:{user_id}" user = redis.get(cache_key) if user is None: user = db.query("SELECT * FROM users WHERE id = %s", user_id) redis.setex(cache_key, 3600, json.dumps(user)) return json.loads(user) ``` ### Essential Optimization Checklist #### PostgreSQL - [ ] Create appropriate indexes (B-tree, GIN, GiST) - [ ] Analyze query plans with EXPLAIN ANALYZE - [ ] Configure autovacuum appropriately - [ ] Set up connection pooling (PgBouncer) - [ ] Monitor with pg_stat_statements - [ ] Optimize shared_buffers and work_mem - [ ] Configure appropriate WAL settings #### MongoDB - [ ] Design effective shard keys - [ ] Create compound indexes for common queries - [ ] Enable profiler for slow queries - [ ] Configure replica sets for read scaling - [ ] Optimize aggregation pipelines - [ ] Set appropriate write concerns - [ ] Monitor with MongoDB Compass/Atlas #### Redis - [ ] Implement appropriate caching strategy - [ ] Configure maxmemory and eviction policies - [ ] Use pipelining for bulk operations - [ ] Set appropriate TTLs - [ ] Monitor memory usage and hit rates - [ ] Configure persistence (RDB/AOF) appropriately - [ ] Use Redis Cluster for scaling ### Quick Wins **Immediate Impact:** 1. Add indexes for frequently queried columns 2. Enable query result caching 3. Implement connection pooling 4. Add Redis cache layer for hot data 5. Optimize N+1 queries with batch loading **Performance Monitoring:** ```bash # PostgreSQL query stats SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10; # MongoDB profiler db.setProfilingLevel(1, { slowms: 100 }); db.system.profile.find().sort({ ts: -1 }).limit(10); # Redis stats redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses' ``` --- ## Level 2: > > **📚 Full Examples**: See [REFERENCE.md](./REFERENCE.md) for complete code samples, detailed configurations, and production-ready implementations. Implementation Guide (45 minutes) ### PostgreSQL Advanced Optimization #### Index Strategies PostgreSQL supports multiple index types, each optimized for specific use cases: **B-tree Indexes (Default)** *See [REFERENCE.md](./REFERENCE.md#example-0) for complete implementation.* **GIN Indexes (Full-text and Array Search)** *See [REFERENCE.md](./REFERENCE.md#example-1) for complete implementation.* **GiST Indexes (Geometric and Range Data)** *See [REFERENCE.md](./REFERENCE.md#example-2) for complete implementation.* #### Query Optimization with EXPLAIN ANALYZE *See [REFERENCE.md](./REFERENCE.md#example-3) for complete implementation.* **Understanding EXPLAIN Output:** *See [REFERENCE.md](./REFERENCE.md#example-4) for complete implementation.* **Optimization Decisions:** - Index Scan = Good (using indexes) - Buffers shared hit = Good (data in cache) - Nested Loop = Appropriate for small result sets - Hash Join = Better for large joins #### VACUUM and Maintenance *See [REFERENCE.md](./REFERENCE.md#example-5) for complete implementation.* #### Connection Pooling with PgBouncer *See [REFERENCE.md](./REFERENCE.md#example-6) for complete implementation.* **Connection Pooling Best Practices:** - Use `transaction` mode for most applications - Set `default_pool_size` = (CPU cores × 2) + effective_spindle_count - Monitor pool usage: `SHOW POOLS;` in PgBouncer console - Application connection string: `postgresql://user:pass@pgbouncer:6432/myapp` #### PostgreSQL Performance Tuning *See [REFERENCE.md](./REFERENCE.md#example-7) for complete implementation.* ### MongoDB Advanced Optimization #### Sharding Strategies **Shard Key Selection:** *See [REFERENCE.md](./REFERENCE.md#example-8) for complete implementation.* **Shard Key Patterns:** 1. **Range-based Sharding** (ordered data): ```javascript // Good for time-series data sh.shardCollection("analytics.events", { "date": 1, "userId": 1 }) // Queries benefit from targeted routing db.events.find({ date: ISODate("2025-01-17"), userId: 123 }) ``` 2. **Hash-based Sharding** (random distribution): ```javascript // Good for even distribution, poor for range queries sh.shardCollection("users.profiles", { "_id": "hashed" }) // All shards queried for range db.profiles.find({ _id: { $gt: 1000, $lt: 2000 } }) ``` 3. **Zone Sharding** (geographic distribution): *See [REFERENCE.md](./REFERENCE.md#example-11) for complete implementation.* #### Index Optimization *See [REFERENCE.md](./REFERENCE.md#example-12) for complete implementation.* **Index Analysis:** *See [REFERENCE.md](./REFERENCE.md#example-13) for complete implementation.* #### Aggregation Pipeline Optimization *See [REFERENCE.md](./REFERENCE.md#example-14) for complete implementation.* #### Replica Set Configuration *See [REFERENCE.md](./REFERENCE.md#example-15) for complete implementation.* ### Redis Advanced Patterns #### Caching Strategies **1. Cache-Aside (Lazy Loading)** *See [REFERENCE.md](./REFERENCE.md#example-16) for complete implementation.* **2. Write-Through Caching** *See [REFERENCE.md](./REFERENCE.md#example-17) for complete implementation.* **3. Write-Behind (Write-Back) Caching** *See [REFERENCE.md](./REFERENCE.md#example-18) for complete implementation.* **4. Read-Through Caching** *See [REFERENCE.md](./REFERENCE.md#example-19) for complete implementation.* #### Redis Data Structures and Use Cases *See [REFERENCE.md](./REFERENCE.md#example-20) for complete implementation.* #### Pub/Sub and Streams **Pub/Sub Pattern:** *See [REFERENCE.md](./REFERENCE.md#example-21) for complete implementation.* **Redis Streams (Preferred for Reliable Messaging):** *See [REFERENCE.md](./REFERENCE.md#example-22) for complete implementation.* #### Redis Cluster Configuration *See [REFERENCE.md](./REFERENCE.md#example-23) for complete implementation.* **Redis Configuration (redis.conf):** *See [REFERENCE.md](./REFERENCE.md#example-24) for complete implementation.* ### Query Optimization Techniques #### Solving the N+1 Problem **Bad: N+1 Queries** *See [REFERENCE.md](./REFERENCE.md#example-25) for complete implementation.* **Good: Join or Batch Loading** *See [REFERENCE.md](./REFERENCE.md#example-26) for complete implementation.* #### Database Scaling Strategies **1. Vertical Scaling (Scale Up)** - Increase CPU, RAM, storage - Simple but has limits - Good for initial growth **2. Horizontal Scaling (Scale Out)** **Read Replicas:** *See [REFERENCE.md](./REFERENCE.md#example-27) for complete implementation.* **Sharding:** *See [REFERENCE.md](./REFERENCE.md#example-28) for complete implementation.* #### Connection Pooling Best Practices **HikariCP (Java):** *See [REFERENCE.md](./REFERENCE.md#example-29) for complete implementation.* **Python (SQLAlchemy):** *See [REFERENCE.md](./REFERENCE.md#example-30) for complete implementation.* **Node.js (pg-pool):** *See [REFERENCE.md](./REFERENCE.md#example-31) for complete implementation.* ### Database Monitoring and Profiling #### PostgreSQL Monitoring **pg_stat_statements Extension:** *See [REFERENCE.md](./REFERENCE.md#example-32) for complete implementation.* **Key Metrics to Monitor:** *See [REFERENCE.md](./REFERENCE.md#example-33) for complete implementation.* #### MongoDB Profiling **Enable Profiler:** *See [REFERENCE.md](./REFERENCE.md#example-34) for complete implementation.* **Analyze Slow Queries:** *See [REFERENCE.md](./REFERENCE.md#example-35) for complete implementation.* **Server Status Metrics:** *See [REFERENCE.md](./REFERENCE.md#example-36) for complete implementation.* #### Redis Monitoring **INFO Command:** *See [REFERENCE.md](./REFERENCE.md#example-37) for complete implementation.* **Key Metrics:** *See [REFERENCE.md](./REFERENCE.md#example-38) for complete implementation.* **Slow Log:** *See [REFERENCE.md](./REFERENCE.md#example-39) for complete implementation.* ### Backup and Recovery Strategies #### PostgreSQL Backup **Physical Backup (pg_basebackup):** ```bash # Full backup pg_basebackup -h localhost -U postgres -D /backup/pg_data -Fp -Xs -P # Compressed backup pg_basebackup -h localhost -U postgres -D /backup/pg_data -Ft -z -P ``` **Logical Backup (pg_dump):** *See [REFERENCE.md](./REFERENCE.md#example-41) for complete implementation.* **Restore:** *See [REFERENCE.md](./REFERENCE.md#example-42) for complete implementation.* **Point-in-Time Recovery (PITR):** *See [REFERENCE.md](./REFERENCE.md#example-43) for complete implementation.* #### MongoDB Backup **mongodump:** *See [REFERENCE.md](./REFERENCE.md#example-44) for complete implementation.* **mongorestore:** *See [REFERENCE.md](./REFERENCE.md#example-45) for complete implementation.* **Filesystem Snapshots (Replica Set):** *See [REFERENCE.md](./REFERENCE.md#example-46) for complete implementation.* #### Redis Backup **RDB (Snapshot):** *See [REFERENCE.md](./REFERENCE.md#example-47) for complete implementation.* **AOF (Append-Only File):** ```bash # Enable AOF (redis.conf) appendonly yes appendfsync everysec # Rewrite AOF redis-cli BGREWRITEAOF # Backup AOF file cp /var/lib/redis/appendonly.aof /backup/appendonly-$(date +%Y%m%d).aof ``` **Restore:** ```bash # Stop Redis systemctl stop redis # Restore RDB or AOF cp /backup/dump.rdb /var/lib/redis/ # or cp /backup/appendonly.aof /var/lib/redis/ # Start Redis systemctl start redis ``` --- ## Level 3: Deep Dive Resources (10 minutes) ### Official Documentation **PostgreSQL:** - [PostgreSQL Performance Tips](https://www.postgresql.org/docs/current/performance-tips.html) - [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html) - [EXPLAIN Documentation](https://www.postgresql.org/docs/current/sql-explain.html) - [PgBouncer Documentation](https://www.pgbouncer.org/usage.html) **MongoDB:** - [MongoDB Performance Best Practices](https://www.mongodb.com/docs/manual/administration/analyzing-mongodb-performance/) - [Sharding Guide](https://www.mongodb.com/docs/manual/sharding/) - [Index Strategies](https://www.mongodb.com/docs/manual/applications/indexes/) - [Aggregation Pipeline Optimization](https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/) **Redis:** - [Redis Documentation](https://redis.io/documentation) - [Redis Cluster Tutorial](https://redis.io/docs/manual/scaling/) - [Redis Persistence](https://redis.io/docs/manual/persistence/) - [Redis Best Practices](https://redis.io/docs/manual/patterns/) ### Books and Courses - "High Performance PostgreSQL" by Gregory Smith - "PostgreSQL Query Optimization" by Henrietta Dombrovskaya - "MongoDB: The Definitive Guide" by Shannon Bradshaw - "Redis in Action" by Josiah Carlson - "Designing Data-Intensive Applications" by Martin Kleppmann ### Tools and Utilities **PostgreSQL:** - pgAdmin 4 - Database management - pg_top - Real-time monitoring - pgBadger - Log analyzer - pgtune - Configuration generator **MongoDB:** - MongoDB Compass - GUI client - MongoDB Atlas - Managed service - mongostat/mongotop - Command-line monitoring - Studio 3T - Advanced GUI **Redis:** - RedisInsight - GUI client - redis-cli - Command-line interface - redis-stat - Real-time monitoring - redis-rdb-tools - RDB file analysis ### Practice Exercises 1. **Index Optimization Challenge**: Given a slow query, design optimal indexes 2. **Sharding Strategy**: Design shard key for multi-tenant application 3. **Cache Implementation**: Build cache-aside pattern with TTL and invalidation 4. **Query Analysis**: Use EXPLAIN to optimize complex join queries 5. **Backup Strategy**: Design comprehensive backup/recovery plan ### Related Skills - [Database Design Fundamentals](../design-fundamentals/SKILL.md) - [API Performance Optimization](../../backend/api-performance/SKILL.md) - [System Architecture Patterns](../../architecture/system-patterns/SKILL.md) - [Monitoring and Observability](../../devops/monitoring/SKILL.md) ## Examples ### Basic Usage ```python // TODO: Add basic example for advanced-optimization // This example demonstrates core functionality ``` ### Advanced Usage ```python // TODO: Add advanced example for advanced-optimization // This example shows production-ready patterns ``` ### Integration Example ```python // TODO: Add integration example showing how advanced-optimization // works with other systems and services ``` See `examples/advanced-optimization/` for complete working examples. ## Integration Points This skill integrates with: ### Upstream Dependencies - **Tools**: Common development tools and frameworks - **Prerequisites**: Basic understanding of general concepts ### Downstream Consumers - **Applications**: Production systems requiring advanced-optimization functionality - **CI/CD Pipelines**: Automated testing and deployment workflows - **Monitoring Systems**: Observability and logging platforms ### Related Skills - See other skills in this category ### Common Integration Patterns 1. **Development Workflow**: How this skill fits into daily development 2. **Production Deployment**: Integration with production systems 3. **Monitoring & Alerting**: Observability integration points ## Common Pitfalls ### Pitfall 1: Insufficient Testing **Problem:** Not testing edge cases and error conditions leads to production bugs **Solution:** Implement comprehensive test coverage including: - Happy path scenarios - Error handling and edge cases - Integration points with external systems **Prevention:** Enforce minimum code coverage (80%+) in CI/CD pipeline ### Pitfall 2: Hardcoded Configuration **Problem:** Hardcoding values makes applications inflexible and environment-dependent **Solution:** Use environment variables and configuration management: - Separate config from code - Use environment-specific configuration files - Never commit secrets to version control **Prevention:** Use tools like dotenv, config validators, and secret scanners ### Pitfall 3: Ignoring Security Best Practices **Problem:** Security vulnerabilities from not following established security patterns **Solution:** Follow security guidelines: - Input validation and sanitization - Proper authentication and authorization - Encrypted data transmission (TLS/SSL) - Regular security audits and updates **Prevention:** Use security linters, SAST tools, and regular dependency updates **Best Practices:** - Follow established patterns and conventions for advanced-optimization - Keep dependencies up to date and scan for vulnerabilities - Write comprehensive documentation and inline comments - Use linting and formatting tools consistently - Implement proper error handling and logging - Regular code reviews and pair programming - Monitor production metrics and set up alerts --- ## Bundled Resources This skill includes 6 production-ready templates and scripts: 1. **templates/postgres-optimization.sql** - PostgreSQL optimization queries and index strategies 2. **templates/mongodb-sharding.yaml** - MongoDB sharding configuration 3. **templates/redis-caching.py** - Redis caching pattern implementations 4. **scripts/query-profiling.sh** - Automated database query profiling 5. **config/pgbouncer.ini** - PgBouncer connection pooling configuration 6. **templates/monitoring-dashboard.json** - Grafana dashboard for database metrics All resources are located in the `skills/database/advanced-optimization/` directory.