---
name: postgres-query-expert
description: A comprehensive guide for interacting with PostgreSQL 16 databases. Use this skill for constructing standard and advanced SQL queries, optimizing performance, debugging errors, managing schema objects, and introspecting database structure.
allowed-tools: Read, Grep, Glob
---
# PostgreSQL Query Expert

This skill is a definitive reference for PostgreSQL 16, covering query construction, optimization, schema management, and system introspection.

## Instructions

### 1. General Query Standards

- **Syntax**: Adhere to ANSI SQL standards, but prefer PostgreSQL extensions (e.g., `DISTINCT ON`, `RETURNING`, `LATERAL`, `FILTER` clauses) when they provide cleaner logic or better performance.
- **Identifiers**: Use `snake_case` for all identifiers. Only quote identifiers (`"MyTable"`) if absolutely necessary; prefer lowercase unquoted names.
- **Safety**:
  - **Parameterization**: Always use parameters (`$1`, `$2`, …) for literal values. Never inject user input directly.
  - **Timeouts**: For exploratory queries on large databases, prepend `SET LOCAL statement_timeout = '30s';`.
  - **Transactions**: Use explicit `BEGIN` and `COMMIT` blocks for multi-step operations.

### 2. Performance & Optimization

- **Explain plans**: Use `EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)` to diagnose bottlenecks.
- **Red flags**: `Seq Scan` on large tables, high `Buffers: shared hit` (RAM usage), or `Disk: read` (I/O).
- **Indexing**: Recommend specific index types based on usage:
  - **B-tree**: Standard equality/range (`=`, `<`, `>`) queries.
  - **GIN**: For composite types like `JSONB` (`@>`) or arrays (`&&`), and full-text search.
  - **GiST**: For geometric data and ranges.
- **CTEs**: Use Common Table Expressions (`WITH`) for readability. In PG16+, these are optimized (inlined) by default unless `MATERIALIZED` is specified.

## Introspection (Agent Capabilities)

When exploring a new database, use these queries to understand the schema.

### List All Tables

```sql
SELECT n.nspname AS schema,
       c.relname AS table,
       obj_description(c.oid) AS description
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY 1, 2;
```

### Get Table Columns & Types

```sql
SELECT a.attname AS column,
       format_type(a.atttypid, a.atttypmod) AS type,
       a.attnotnull AS not_null,
       col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
WHERE a.attrelid = 'public.target_table_name'::regclass
  AND a.attnum > 0
  AND NOT a.attisdropped
ORDER BY a.attnum;
```

## Reference: Data Querying (DQL)

### Advanced Aggregations

- **Filter clause**: `count(*) FILTER (WHERE status = 'active')`
- **Grouping sets**: `GROUP BY GROUPING SETS ((brand), (brand, category), ())`
- **Any value**: `any_value(col)` (PG16+) returns an arbitrary value from the group.

### Window Functions

Perform calculations across a set of table rows related to the current row.

```sql
SELECT dept,
       emp_no,
       salary,
       -- Rank employees by salary within department
       dense_rank() OVER (PARTITION BY dept ORDER BY salary DESC) AS rank,
       -- Running total of salaries
       sum(salary) OVER (
         PARTITION BY dept
         ORDER BY salary
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_total
FROM employees;
```

### Pattern Matching

- **LIKE**: `col LIKE 'foo%'` (simple wildcard).
- **ILIKE**: `col ILIKE 'foo%'` (case-insensitive).
- **SIMILAR TO**: `col SIMILAR TO '[a-c]%'` (SQL-regex style).
- **POSIX regex**:
  - Case-sensitive: `col ~ '^[a-z]+$'`
  - Case-insensitive: `col ~* 'foo'`

## Reference: Data Modification (DML)

### MERGE (Upsert / Conditional Ops)

Standard SQL method for inserting, updating, or deleting based on join conditions (PG15+).

```sql
MERGE INTO wine_stock ws
USING wine_shipments s
  ON s.winery_id = ws.winery_id
 AND s.year = ws.year
WHEN MATCHED THEN
  UPDATE SET stock = ws.stock + s.count
WHEN NOT MATCHED THEN
  INSERT (winery_id, year, stock)
  VALUES (s.winery_id, s.year, s.count);
```

### INSERT ... ON CONFLICT (Legacy Upsert)

Postgres-specific, often more concise for simple unique-key conflicts.

```sql
INSERT INTO kv_store (key, value)
VALUES ('config', '{"a":1}')
ON CONFLICT (key)
DO UPDATE SET value = EXCLUDED.value;
```

### RETURNING Clause

Return data from modified rows immediately.

```sql
DELETE FROM archived_logs
WHERE created_at < NOW() - INTERVAL '1 year'
RETURNING id, created_at;
```

## Reference: Special Data Types

### JSONB (Binary JSON)

Prefer `jsonb` over `json` for storage and indexing.

| Operator | Description | Example |
|---|---|---|
| `->` / `->>` | Get element (JSON / text) | `data->'key'` |
| `@>` | Contains (indexable) | `data @> '{"tag": "urgent"}'` |
| `?` | Key exists | `data ? 'error'` |
| `#-` | Delete path | `data #- '{info, sensitive}'` |

SQL/JSON path (PG12+):

```sql
-- Find all items with price > 10
SELECT jsonb_path_query(data, '$.items[*] ? (@.price > 10)')
FROM orders;
```

### Arrays

```sql
SELECT ARRAY[1,2,3];           -- Creation
SELECT (ARRAY[1,2,3])[1];      -- Access (1-based index)
SELECT 1 = ANY(arr_col);       -- Check if value exists in array
SELECT unnest(arr_col) FROM t; -- Expand array to rows
```

### Range Types

Useful for scheduling and validity periods.

- `tstzrange`: timestamp with time zone range.
- `int4range`, `daterange`: integer and date ranges.
- Overlap operator (`&&`): checks if two ranges overlap.

```sql
SELECT *
FROM reservations
WHERE duration && tstzrange('2023-01-01 10:00', '2023-01-01 12:00');
```

## Reference: System Administration & Stats

### Kill Long-Running Query

```sql
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND pid <> pg_backend_pid()
  AND query_start < NOW() - INTERVAL '5 minutes';
```

### Check Table Size (Disk Usage)

```sql
SELECT relname,
       pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
       pg_size_pretty(pg_relation_size(relid)) AS data_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
```

## Examples

### Scenario 1: Recursive CTE for Graph/Tree Data

Navigating an organizational hierarchy.

```sql
WITH RECURSIVE subordinates AS (
    -- Base case: the manager
    SELECT employee_id, manager_id, full_name, 0 AS level
    FROM employees
    WHERE employee_id = $1

    UNION ALL

    -- Recursive step: direct reports
    SELECT e.employee_id, e.manager_id, e.full_name, s.level + 1
    FROM employees e
    INNER JOIN subordinates s ON s.employee_id = e.manager_id
)
SELECT *
FROM subordinates;
```

### Scenario 2: Lateral Join for "Top N per Category"

Efficiently getting the latest 3 posts for each user.

```sql
SELECT u.username, p.title, p.created_at
FROM users u
CROSS JOIN LATERAL (
    SELECT title, created_at
    FROM posts
    WHERE user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) p
WHERE u.status = 'active';
```

### Scenario 3: Full Text Search with Ranking

Searching a blog table.

```sql
SELECT id,
       title,
       ts_rank(to_tsvector('english', title || ' ' || content), query) AS rank
FROM articles,
     to_tsquery('english', 'postgres | optimization') query
WHERE to_tsvector('english', title || ' ' || content) @@ query
ORDER BY rank DESC;
```