--- name: pg-index-optimization description: Performs comprehensive PostgreSQL index optimization including unused index cleanup, duplicate detection, missing index recommendations, and hypothetical index testing with HypoPG. Provides actionable CREATE/DROP INDEX statements with estimated impact. --- # PostgreSQL Index Optimization This skill guides comprehensive index analysis and optimization for PostgreSQL databases using pgtuner-mcp tools, including HypoPG hypothetical index testing. ## When to Use This Skill Use this skill when the user: - Asks to optimize or review database indexes - Wants to find and remove unused or duplicate indexes - Reports slow queries that might benefit from better indexing - Asks "what indexes should I create?" or "which indexes can I drop?" - Mentions high storage usage from indexes or slow write performance - Wants to test index ideas without creating them in production ## MCP Resources Available - `pgtuner://docs/tools` -- Full tool reference documentation - `pgtuner://table/{schema}/{table_name}/indexes` -- Quick view of existing indexes on a table (lightweight) - `pgtuner://table/{schema}/{table_name}/stats` -- Table statistics for context - `pgtuner://query/{query_hash}/stats` -- Per-query stats to understand which queries benefit from indexes ## Related MCP Prompt This skill corresponds to the **`index_optimization`** MCP Prompt. If the user triggers that prompt, follow this skill's workflow. ## Prerequisites - The pgtuner-mcp server must be connected with a valid `DATABASE_URI` - Required: `pg_stat_statements` extension for workload analysis - Recommended: `hypopg` extension for hypothetical index testing - Recommended: `pgstattuple` extension for index bloat analysis ## Pre-Flight Checks Before starting, check extension availability: 1. Use `manage_hypothetical_indexes` with `action: "check"` to verify HypoPG. If unavailable, skip Phase 3 (hypothetical testing) and rely on heuristic-based recommendations. 2. If `analyze_index_bloat` returns an error about `pgstattuple`, note that bloat measurements will be estimated rather than exact. ## Agent Decision Tree ``` User asks about indexes | +--> "Which indexes can I drop?" | --> Start with Phase 1 (cleanup), focus on Step 1.1 | +--> "What indexes should I create?" | --> Start with Phase 2 (missing indexes) | +--> "Comprehensive index review" | --> Follow all 4 phases in order | +--> "Test a specific index idea" | --> Skip to Phase 3 (hypothetical testing) | +--> After finding unused indexes, also check: --> Read pgtuner://table/{schema}/{table}/indexes for quick context --> Cross-reference: are the "unused" indexes UNIQUE? (warn user) --> Cross-reference: how long has the DB been up? ``` ## Optimization Workflow ### Phase 1: Index Inventory and Cleanup #### Step 1.1: Find Unused Indexes ``` Tool: find_unused_indexes Parameters: schema_name: "public" min_size_mb: 1 include_duplicates: true ``` **What to analyze:** - **Unused indexes** (idx_scan = 0): Safe to drop if the database has been running long enough for representative workload - **Rarely used indexes** (idx_scan < 10 over weeks): Candidates for removal - **Duplicate indexes**: Indexes that are a prefix of another index (e.g., `idx_a` on `(col1)` is redundant if `idx_b` on `(col1, col2)` exists) - **Overlapping indexes**: Multiple indexes covering similar column combinations **Before recommending DROP:** 1. Check how long the database has been up: `SELECT pg_postmaster_start_time()` 2. Check if `pg_stat_statements` was recently reset 3. Consider if the index supports a rarely-run but critical report query 4. Always warn about unique indexes (they enforce constraints, not just speed) **Generate cleanup SQL:** ```sql -- DROP INDEX CONCURRENTLY avoids blocking writes DROP INDEX CONCURRENTLY IF EXISTS schema.index_name; ``` #### Step 1.2: Analyze Index Bloat ``` Tool: analyze_index_bloat Parameters: schema_name: "public" min_index_size_gb: 0.1 # Lower threshold to catch more issues min_bloat_percent: 20 ``` **What to analyze:** - **avg_leaf_density < 70%**: Fragmented, consider REINDEX - **Estimated bloat > 40%**: Significant wasted space - B-tree indexes degrade over time with UPDATE/DELETE patterns **Remediation:** ```sql -- REINDEX CONCURRENTLY (PostgreSQL 12+) avoids blocking REINDEX INDEX CONCURRENTLY schema.index_name; ``` ### Phase 2: Missing Index Analysis #### Step 2.1: Workload-Based Recommendations ``` Tool: get_index_recommendations Parameters: max_recommendations: 10 min_improvement_percent: 10 include_hypothetical_testing: true ``` If the user has specific queries to optimize: ``` Tool: get_index_recommendations Parameters: workload_queries: ["SELECT ... FROM orders WHERE ...", "SELECT ... FROM users JOIN ..."] max_recommendations: 10 min_improvement_percent: 10 include_hypothetical_testing: true target_tables: ["orders", "users"] # Optional: focus on specific tables ``` **Evaluate each recommendation:** - Improvement > 50%: Strong candidate - Improvement 20-50%: Good candidate, verify with hypothetical testing - Improvement 10-20%: Marginal, weigh against write overhead #### Step 2.2: Check Table Access Patterns For tables flagged in recommendations: ``` Tool: get_table_stats Parameters: table_name: "" include_indexes: true order_by: "seq_scans" ``` **What to analyze:** - **High seq_scan count with low idx_scan**: Table needs better indexing - **seq_tup_read >> idx_tup_fetch**: Most reads are full scans - **Large table with only PK index**: Likely needs additional indexes ### Phase 3: Hypothetical Index Testing #### Step 3.1: Test Recommended Indexes For each promising recommendation, test with HypoPG: ``` Tool: explain_with_indexes Parameters: query: "" hypothetical_indexes: - table: "orders" columns: ["customer_id", "created_at"] index_type: "btree" unique: false - table: "orders" columns: ["status"] index_type: "btree" analyze: false ``` **Evaluate the result:** - Compare before/after costs - Verify the plan actually uses the new index - Check if Index Scan or Index Only Scan is chosen (Index Only Scan is better) #### Step 3.2: Advanced HypoPG Testing For more complex scenarios, start by checking HypoPG availability: ``` Tool: manage_hypothetical_indexes Parameters: action: "check" ``` If available, test advanced index types: ``` Tool: manage_hypothetical_indexes Parameters: action: "create" table: "orders" columns: ["customer_id", "created_at"] index_type: "btree" schema: "public" include: ["order_total"] # Covering index for Index Only Scan ``` Estimate the storage cost of a hypothetical index: ``` Tool: manage_hypothetical_indexes Parameters: action: "estimate_size" table: "orders" columns: ["customer_id", "created_at"] index_type: "btree" ``` Test partial indexes (index only matching rows): ``` Tool: manage_hypothetical_indexes Parameters: action: "create" table: "orders" columns: ["created_at"] index_type: "btree" where: "status = 'pending'" # Only index pending orders ``` Test what happens if you hide an existing index: ``` Tool: manage_hypothetical_indexes Parameters: action: "hide" index_id: ``` Verify which indexes are currently hidden: ``` Tool: manage_hypothetical_indexes Parameters: action: "list_hidden" ``` List all currently created hypothetical indexes: ``` Tool: manage_hypothetical_indexes Parameters: action: "list" ``` Then explain the query to see if performance changes: ``` Tool: manage_hypothetical_indexes Parameters: action: "explain_with_index" query: "SELECT ... FROM orders WHERE ..." table: "orders" columns: ["customer_id", "created_at"] ``` Restore hidden indexes and clean up when done: ``` Tool: manage_hypothetical_indexes Parameters: action: "reset" ``` ### Phase 4: Generate Action Plan Compile all findings into a prioritized action plan. ## Output Format ### Index Cleanup (Safe to Drop) | Index | Table | Size | Reason | DROP Statement | |-------|-------|------|--------|----------------| | idx_old_status | orders | 245 MB | 0 scans in 30 days | `DROP INDEX CONCURRENTLY idx_old_status;` | ### Indexes to REINDEX (Bloated) | Index | Table | Size | Bloat % | REINDEX Statement | |-------|-------|------|---------|-------------------| | idx_orders_date | orders | 1.2 GB | 45% | `REINDEX INDEX CONCURRENTLY idx_orders_date;` | ### New Indexes to Create | Table | Columns | Type | Improvement | CREATE Statement | |-------|---------|------|-------------|------------------| | orders | (customer_id, created_at) | btree | 72% | `CREATE INDEX CONCURRENTLY idx_orders_cust_date ON orders (customer_id, created_at);` | ### Net Impact Summary - Estimated storage savings from drops: X MB - Estimated storage cost of new indexes: Y MB - Net storage change: +/- Z MB - Estimated query improvement: summarize ## Index Design Guidelines Present these guidelines when making recommendations: ### Column Order in Composite Indexes 1. Equality columns first (WHERE col = value) 2. Range columns next (WHERE col BETWEEN / > / <) 3. Sort columns last (ORDER BY col) ### Index Type Selection | Use Case | Index Type | |----------|-----------| | Equality and range queries (default) | btree | | Equality-only lookups | hash | | Full-text search | gin | | Geometric / range data types | gist | | Very large tables, time-series | brin | ### When NOT to Add an Index - Table has < 1,000 rows (seq scan is faster) - Column has very low cardinality (e.g., boolean) unless combined in composite - Table is write-heavy with rare reads - Query already runs under 1ms ## Important Notes - Always use `CONCURRENTLY` for CREATE/DROP/REINDEX in production to avoid blocking - HypoPG indexes exist only in the current session and are not persisted - After creating real indexes, run `ANALYZE table_name` to update statistics - Monitor write performance after adding indexes: check `pg_stat_user_tables.n_tup_ins/upd/del` - Wait for representative workload before dropping "unused" indexes (minimum 1-2 weeks of normal traffic) - Before dropping any index, capture the CREATE INDEX statement for rollback ## Iterative Verification After applying index changes: 1. **After CREATE INDEX**: Re-run `analyze_query` on affected queries to confirm the plan uses the new index. 2. **After DROP INDEX**: Re-run `analyze_query` to ensure no critical query regressed. Monitor application error rates. 3. **After REINDEX**: Re-run `analyze_index_bloat` to confirm bloat was resolved. 4. **After all changes**: Re-run `find_unused_indexes` to confirm the new state is clean. ## When to Stop and Ask the User - **Before dropping a UNIQUE index**: "This index enforces a uniqueness constraint. Dropping it will remove the constraint. Are you sure?" - **Before dropping indexes on a table with unclear workload**: "The database has only been running for X days. Some indexes may serve monthly reporting queries. Should we wait for a full workload cycle?" - **If HypoPG is not available**: "The HypoPG extension is not installed. I can provide heuristic-based recommendations, but cannot verify improvement with hypothetical testing. Would you like to proceed or install HypoPG first?" - **If estimated storage for new indexes is significant**: "The recommended indexes will use approximately X GB of storage. Is this acceptable?" ## Cross-References to Other Skills - **Slow query driving the index need**: Use `pg-slow-query-diagnosis` skill - **Query rewrite might be better than an index**: Use `pg-query-rewrite` skill - **Index bloat requiring deeper investigation**: Use `pg-bloat-analysis` skill - **I/O impact of indexes**: Use `pg-io-deep-dive` skill ## PostgreSQL Version Notes | Feature | Version | |---------|---------| | `INCLUDE` columns in indexes (covering indexes) | PG11+ | | `REINDEX CONCURRENTLY` | PG12+ | | Deduplication for B-tree indexes | PG13+ | | `pg_stat_progress_create_index` | PG12+ | ## Production Safety - All diagnostic tools are read-only. No indexes are created or dropped by the tools. - HypoPG indexes are session-scoped and automatically cleaned up. - `manage_hypothetical_indexes` with `action: "hide"` only hides the index from the planner in the current session. It does not affect other sessions or actual data. - All CREATE/DROP/REINDEX statements are recommendations for the user to execute.