---
name: lang-sql-dev
description: Foundational SQL patterns for query writing, schema design, and dialect differences. Use when writing SQL queries, designing database schemas, understanding SQL syntax across PostgreSQL/MySQL/SQLite, or preparing SQL for conversion to other query languages. This is a meta-skill for SQL derivatives.
---

# SQL Fundamentals

Foundational SQL patterns covering query writing, schema design, and dialect differences across PostgreSQL, MySQL, and SQLite. This skill serves as a base for specialized SQL skills and SQL-to-X conversions.

## Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                        SQL Skill Hierarchy                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                      ┌──────────────┐                           │
│                      │   lang-sql   │ ◄── You are here          │
│                      │ (foundation) │                           │
│                      └──────┬───────┘                           │
│                             │                                   │
│         ┌───────────────────┼───────────────────┐               │
│         │                   │                   │               │
│         ▼                   ▼                   ▼               │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │ sql-to-     │    │    sql-     │    │   data-     │         │
│  │  polars     │    │optimization │    │  postgres   │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

**This skill covers:**
- Query writing (SELECT, JOIN, CTEs, window functions)
- Schema design (tables, constraints, normalization)
- Dialect differences (PostgreSQL, MySQL, SQLite)
- Performance basics (EXPLAIN, indexing fundamentals)
- SQL syntax patterns useful for conversion

**This skill does NOT cover:**
- Deep optimization strategies - see `sql-optimization-patterns`
- ORM usage (SQLAlchemy, Prisma, etc.)
- Database administration (backups, replication, users)
- Platform-specific features (stored procedures, triggers)

---

## Quick Reference

| Task | PostgreSQL | MySQL | SQLite |
|------|------------|-------|--------|
| Show tables | `\dt` | `SHOW TABLES` | `.tables` |
| Describe table | `\d table` | `DESCRIBE table` | `.schema table` |
| Current database | `SELECT current_database()` | `SELECT DATABASE()` | N/A (file-based) |
| List indexes | `\di` | `SHOW INDEX FROM table` | `.indexes table` |
| Explain query | `EXPLAIN ANALYZE` | `EXPLAIN` | `EXPLAIN QUERY PLAN` |

---

## Query Patterns

### SELECT Fundamentals

```sql
-- Basic SELECT with filtering
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1 DESC
LIMIT 10;

-- Column aliases
SELECT
    first_name AS "First Name",
    last_name AS "Last Name",
    salary * 12 AS annual_salary
FROM employees;
```

### JOIN Types

```sql
-- INNER JOIN (matching rows only)
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;

-- LEFT JOIN (all from left, matching from right)
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;

-- Multiple JOINs
SELECT
    o.id,
    c.name as customer,
    p.name as product
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;
```

### Common Table Expressions (CTEs)

```sql
-- Basic CTE
WITH active_users AS (
    SELECT id, name, email
    FROM users
    WHERE status = 'active'
)
SELECT * FROM active_users WHERE email LIKE '%@company.com';

-- Multiple CTEs
WITH
    recent_orders AS (
        SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
    ),
    order_totals AS (
        SELECT customer_id, SUM(amount) as total
        FROM recent_orders
        GROUP BY customer_id
    )
SELECT c.name, ot.total
FROM order_totals ot
JOIN customers c ON ot.customer_id = c.id;

-- Recursive CTE (hierarchical data)
WITH RECURSIVE org_chart AS (
    -- Base case: top-level managers
    SELECT id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees under managers
    SELECT e.id, e.name, e.manager_id, oc.level + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, name;
```

### Window Functions

```sql
-- ROW_NUMBER: Sequential numbering within partition
SELECT
    department,
    name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

-- Running totals
SELECT
    date,
    amount,
    SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;

-- Compare to previous row
SELECT
    date,
    revenue,
    LAG(revenue) OVER (ORDER BY date) as prev_revenue,
    revenue - LAG(revenue) OVER (ORDER BY date) as change
FROM daily_sales;

-- Percentile ranking
SELECT
    name,
    score,
    PERCENT_RANK() OVER (ORDER BY score) as percentile
FROM test_results;
```

### Subqueries

```sql
-- Scalar subquery (returns single value)
SELECT name, salary,
    (SELECT AVG(salary) FROM employees) as avg_salary
FROM employees;

-- IN subquery
SELECT * FROM products
WHERE category_id IN (
    SELECT id FROM categories WHERE name LIKE '%Electronics%'
);

-- EXISTS subquery (often more efficient)
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.id
    AND o.created_at > NOW() - INTERVAL '30 days'
);

-- Correlated subquery (references outer query)
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = e.department_id
);
```

---

## Aggregation Patterns

### GROUP BY

```sql
-- Basic aggregation
SELECT
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    MIN(salary) as min_salary,
    MAX(salary) as max_salary
FROM employees
GROUP BY department;

-- HAVING (filter groups)
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

-- Multiple grouping columns
SELECT
    department,
    job_title,
    COUNT(*) as count
FROM employees
GROUP BY department, job_title
ORDER BY department, count DESC;
```

### CASE Expressions

```sql
-- Conditional aggregation
SELECT
    COUNT(CASE WHEN status = 'active' THEN 1 END) as active_count,
    COUNT(CASE WHEN status = 'inactive' THEN 1 END) as inactive_count,
    COUNT(*) as total
FROM users;

-- Bucketing data
SELECT
    CASE
        WHEN age < 18 THEN 'minor'
        WHEN age < 65 THEN 'adult'
        ELSE 'senior'
    END as age_group,
    COUNT(*) as count
FROM users
GROUP BY 1;  -- Group by first select column
```

---

## Schema Design

### Table Creation

```sql
-- Basic table with constraints
CREATE TABLE users (
    id SERIAL PRIMARY KEY,           -- PostgreSQL
    -- id INT AUTO_INCREMENT PRIMARY KEY,  -- MySQL
    -- id INTEGER PRIMARY KEY AUTOINCREMENT,  -- SQLite

    email VARCHAR(255) NOT NULL UNIQUE,
    name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Table with foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    status VARCHAR(20) DEFAULT 'pending',
    total DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Many-to-many junction table
CREATE TABLE user_roles (
    user_id INT REFERENCES users(id) ON DELETE CASCADE,
    role_id INT REFERENCES roles(id) ON DELETE CASCADE,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, role_id)
);
```

### Normalization Quick Guide

| Form | Rule | Example |
|------|------|---------|
| 1NF | No repeating groups | Split `phone1, phone2` into separate rows |
| 2NF | No partial dependencies | Move `dept_name` to `departments` table if key is `(emp_id, dept_id)` |
| 3NF | No transitive dependencies | Move `city, state` to `addresses` if they depend on `zip_code` |

### Common Constraints

```sql
-- CHECK constraint
ALTER TABLE products
ADD CONSTRAINT positive_price CHECK (price > 0);

-- UNIQUE constraint on multiple columns
ALTER TABLE subscriptions
ADD CONSTRAINT unique_user_plan UNIQUE (user_id, plan_id);

-- NOT NULL with default
ALTER TABLE users
ALTER COLUMN status SET DEFAULT 'active',
ALTER COLUMN status SET NOT NULL;
```

---

## Index Basics

### When to Index

| Index | Use Case |
|-------|----------|
| Primary key | Automatic, unique identifier |
| Foreign key | Speed up JOINs |
| Frequently filtered columns | WHERE clauses |
| Frequently sorted columns | ORDER BY clauses |
| Composite | Multi-column WHERE/ORDER |

### Index Creation

```sql
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
-- Good for: WHERE status = 'active' AND created_at > '2024-01-01'
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
```

### Reading EXPLAIN

```sql
-- PostgreSQL: Full analysis with timing
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Key things to look for:
-- ✓ Index Scan / Index Only Scan - good
-- ✗ Seq Scan on large tables - investigate
-- ✗ High "actual rows" vs "estimated rows" - stale statistics
```

---

## Dialect Differences

### String Concatenation

```sql
-- PostgreSQL
SELECT first_name || ' ' || last_name AS full_name FROM users;

-- MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;

-- SQLite (both work)
SELECT first_name || ' ' || last_name AS full_name FROM users;
```

### Date/Time Operations

```sql
-- Current timestamp
-- PostgreSQL: NOW(), CURRENT_TIMESTAMP
-- MySQL: NOW(), CURRENT_TIMESTAMP
-- SQLite: datetime('now')

-- Date arithmetic
-- PostgreSQL
SELECT created_at + INTERVAL '7 days' FROM orders;

-- MySQL
SELECT DATE_ADD(created_at, INTERVAL 7 DAY) FROM orders;

-- SQLite
SELECT datetime(created_at, '+7 days') FROM orders;

-- Extract parts
-- PostgreSQL
SELECT EXTRACT(YEAR FROM created_at) FROM orders;

-- MySQL
SELECT YEAR(created_at) FROM orders;

-- SQLite
SELECT strftime('%Y', created_at) FROM orders;
```

### UPSERT (Insert or Update)

```sql
-- PostgreSQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

-- MySQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON DUPLICATE KEY UPDATE name = VALUES(name);

-- SQLite (3.24+)
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = excluded.name;
```

### Pagination

```sql
-- Standard (PostgreSQL, MySQL, SQLite)
SELECT * FROM products ORDER BY id LIMIT 10 OFFSET 20;

-- Keyset pagination (more efficient for large offsets)
SELECT * FROM products
WHERE id > 1000  -- Last seen ID
ORDER BY id
LIMIT 10;
```

### Boolean Handling

```sql
-- PostgreSQL: Native BOOLEAN
SELECT * FROM users WHERE is_active = true;

-- MySQL: TINYINT(1) or BOOLEAN (alias)
SELECT * FROM users WHERE is_active = 1;

-- SQLite: INTEGER (0/1)
SELECT * FROM users WHERE is_active = 1;
```

---

## SQL for Conversion

When converting SQL to DataFrame operations (Pandas, Polars), map these patterns:

| SQL | DataFrame Equivalent |
|-----|---------------------|
| `SELECT col1, col2` | `.select(["col1", "col2"])` |
| `WHERE condition` | `.filter(condition)` |
| `ORDER BY col DESC` | `.sort("col", descending=True)` |
| `LIMIT n` | `.head(n)` or `.limit(n)` |
| `GROUP BY` | `.group_by()` |
| `JOIN` | `.join()` |
| `DISTINCT` | `.unique()` or `.distinct()` |

See derivative skills for specific conversion patterns:
- `sql-to-polars` - SQL to Polars DataFrame
- `sql-to-pandas` - SQL to Pandas DataFrame

---

## Anti-Patterns to Avoid

### 1. SELECT *

```sql
-- Bad: Fetches unnecessary data
SELECT * FROM users;

-- Good: Only fetch needed columns
SELECT id, name, email FROM users;
```

### 2. N+1 Queries

```sql
-- Bad: Query per user (in application loop)
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... repeated N times

-- Good: Single query with JOIN or IN
SELECT * FROM orders WHERE user_id IN (1, 2, 3, ...);
```

### 3. Functions on Indexed Columns

```sql
-- Bad: Prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Good: Store normalized, or use expression index
SELECT * FROM users WHERE email = 'test@example.com';
```

### 4. Implicit Type Conversion

```sql
-- Bad: String compared to integer
SELECT * FROM users WHERE id = '123';

-- Good: Matching types
SELECT * FROM users WHERE id = 123;
```

### 5. Missing WHERE on UPDATE/DELETE

```sql
-- DANGEROUS: Affects all rows!
UPDATE users SET status = 'inactive';

-- Safe: Always include WHERE
UPDATE users SET status = 'inactive' WHERE last_login < '2024-01-01';
```

---

## Troubleshooting

### Query Returns No Results

1. **Check WHERE conditions**: Test each condition separately
2. **Verify JOIN keys**: Ensure matching data types and values
3. **NULL handling**: Use `IS NULL` not `= NULL`
4. **Case sensitivity**: Check collation settings

### Query Too Slow

1. **Run EXPLAIN**: Look for Seq Scans on large tables
2. **Check indexes**: Are filtered/joined columns indexed?
3. **Reduce data early**: Filter before JOINing
4. **Avoid SELECT ***: Fetch only needed columns

### Unexpected Duplicates

1. **Missing DISTINCT**: Add if needed
2. **Many-to-many JOINs**: Each match creates a row
3. **GROUP BY missing columns**: All non-aggregated columns must be grouped

### NULL Surprises

```sql
-- NULL comparisons always return NULL (unknown)
SELECT * FROM users WHERE department = NULL;  -- Returns nothing!
SELECT * FROM users WHERE department IS NULL; -- Correct

-- NULL in aggregations
SELECT AVG(salary) FROM employees;  -- NULLs ignored
SELECT COUNT(*) vs COUNT(column);   -- COUNT(*) includes NULL rows
```

---

## References

- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
- [MySQL Reference Manual](https://dev.mysql.com/doc/refman/8.0/en/)
- [SQLite Documentation](https://www.sqlite.org/docs.html)
- `sql-optimization-patterns` - Deep performance optimization
- `sql-expert` - Advanced query patterns