---
name: building-with-sqlalchemy-orm
description: |
  Build production database layers with SQLAlchemy ORM and PostgreSQL.
  This skill should be used when teaching students to define data models, manage sessions, perform CRUD operations, and connect to PostgreSQL/Neon databases.
allowed-tools: Read, Grep, Glob, Bash
---

# Building with SQLAlchemy ORM

Build production-grade database applications with SQLAlchemy ORM 2.0+, generic PostgreSQL patterns, and Neon-specific serverless considerations.

## Before Implementation

Gather context to ensure successful implementation:

| Source | Gather |
|--------|--------|
| **Codebase** | Existing models, database setup, connection patterns |
| **Conversation** | Student's specific use case (what they're building), constraints |
| **Skill References** | Domain patterns from `references/` (API docs, best practices, architecture) |
| **User Guidelines** | Project conventions, proficiency level |

Only ask student for THEIR requirements (domain expertise is embedded in this skill).

---

## Persona

You are a Python database architect with production experience building applications with SQLAlchemy ORM. You understand both the generic PostgreSQL patterns (applicable everywhere) and Neon-specific serverless considerations (autoscaling, scale-to-zero, branching). You've built multi-table applications with proper transaction handling, relationships, and connection pooling.

---

## When to Use

- **Building database models** from requirements (defining tables as Python classes)
- **Implementing CRUD operations** safely with transactions
- **Managing relationships** between tables (foreign keys, joins)
- **Querying data** with filters, ordering, and complex joins
- **Connecting to PostgreSQL** or Neon with proper configuration
- **Teaching database fundamentals** to beginners learning persistence

---

## Core Concepts

### 1. Models as Classes (ORM Abstraction)

SQLAlchemy maps Python classes to database tables:

```python
from sqlalchemy import Column, Integer, String, Float, ForeignKey
from sqlalchemy.orm import declarative_base

Base = declarative_base()

class Expense(Base):
    __tablename__ = 'expenses'

    id = Column(Integer, primary_key=True)
    description = Column(String(200))
    amount = Column(Float)
    category_id = Column(Integer, ForeignKey('categories.id'))
```

**Why this matters**: You write Python. SQLAlchemy generates SQL. You don't write SQL by hand.

### 2. Sessions as Transactions (Unit of Work)

A session groups database operations into an atomic transaction:

```python
with Session(engine) as session:
    new_expense = Expense(description="Groceries", amount=45.50, category_id=1)
    session.add(new_expense)
    session.commit()  # All or nothing
```

**Why this matters**: If anything fails, nothing is committed. Guarantees database consistency.

### 3. Relationships (Foreign Keys as Navigation)

Define relationships in Python instead of manual joins:

```python
class Category(Base):
    __tablename__ = 'categories'
    id = Column(Integer, primary_key=True)
    name = Column(String(50))
    expenses = relationship("Expense", back_populates="category")

class Expense(Base):
    __tablename__ = 'expenses'
    id = Column(Integer, primary_key=True)
    category = relationship("Category", back_populates="expenses")
```

Usage:
```python
category = session.query(Category).first()
print(category.expenses)  # All expenses in this category
```

### 4. Queries (Filtering, Ordering, Joining)

Construct queries safely without writing raw SQL:

```python
# Filter: expenses > $50
expensive = session.query(Expense).filter(Expense.amount > 50).all()

# Order: sorted by amount descending
sorted_expenses = session.query(Expense).order_by(Expense.amount.desc()).all()

# Join: expenses with their categories
results = session.query(Expense, Category).join(Category).all()
```

### 5. Neon Connection Specifics

Neon is serverless PostgreSQL with auto-scaling and branching. Key differences:

- **Connection string**: `postgresql+psycopg2://user:pass@host/dbname?sslmode=require`
- **Always use SSL**: `?sslmode=require` (Neon enforces this)
- **Environment variables**: Store credentials in `.env` (never hardcode)
- **Auto-pause**: Neon pauses compute when idle—connection pools help with this

---

## Decision Logic

| Scenario | Pattern | Why |
|----------|---------|-----|
| **First database model** | Single table, one Column type | Simplest mental model before relationships |
| **Need to link data** | Use relationship() + ForeignKey | ORM handles complex joins for you |
| **Many concurrent requests** | Connection pooling with pool_size | Neon scales compute; pooling maximizes it |
| **Data consistency critical** | Transactions with try/except | Rollback on error; guarantees atomicity |
| **Want to scale to zero** | Neon serverless + pool with echo_pool | Auto-pause when idle; wake on first request |
| **Debugging queries** | Enable echo=True in engine | See generated SQL |

---

## Workflow: Building Budget Tracker

### Step 1: Define Models

```python
from sqlalchemy import create_engine, Column, Integer, String, Float, DateTime, ForeignKey, func
from sqlalchemy.orm import declarative_base, relationship, Session
from datetime import datetime

Base = declarative_base()

class Category(Base):
    __tablename__ = 'categories'
    id = Column(Integer, primary_key=True)
    name = Column(String(50), unique=True)
    expenses = relationship("Expense", back_populates="category")

class Expense(Base):
    __tablename__ = 'expenses'
    id = Column(Integer, primary_key=True)
    description = Column(String(200))
    amount = Column(Float)
    date = Column(DateTime, default=datetime.utcnow)
    category_id = Column(Integer, ForeignKey('categories.id'))
    category = relationship("Category", back_populates="expenses")
```

### Step 2: Create Engine and Tables

```python
import os
from dotenv import load_dotenv

load_dotenv()

# Connection string from environment
DATABASE_URL = os.getenv("DATABASE_URL")
# Format: postgresql+psycopg2://user:password@host/dbname?sslmode=require

engine = create_engine(DATABASE_URL)
Base.metadata.create_all(engine)
```

### Step 3: Implement CRUD

```python
def create_expense(session, description, amount, category_id):
    """Create a new expense."""
    try:
        expense = Expense(
            description=description,
            amount=amount,
            category_id=category_id
        )
        session.add(expense)
        session.commit()
        return expense
    except Exception as e:
        session.rollback()
        print(f"Error creating expense: {e}")
        return None

def read_expenses(session, category_id=None):
    """Read expenses, optionally filtered by category."""
    query = session.query(Expense)
    if category_id:
        query = query.filter(Expense.category_id == category_id)
    return query.all()

def update_expense(session, expense_id, amount=None, description=None):
    """Update an expense."""
    expense = session.query(Expense).filter(Expense.id == expense_id).first()
    if expense:
        if amount is not None:
            expense.amount = amount
        if description is not None:
            expense.description = description
        session.commit()
        return expense
    return None

def delete_expense(session, expense_id):
    """Delete an expense."""
    expense = session.query(Expense).filter(Expense.id == expense_id).first()
    if expense:
        session.delete(expense)
        session.commit()
        return True
    return False
```

### Step 4: Query with Relationships

```python
# Get all expenses for a category
category = session.query(Category).filter_by(name="Food").first()
print(category.expenses)  # Uses relationship

# Total spent by category
totals = session.query(
    Category.name,
    func.sum(Expense.amount).label('total')
).join(Expense).group_by(Category.name).all()

for name, total in totals:
    print(f"{name}: ${total:.2f}")
```

### Step 5: Handle Transactions Safely

```python
def transfer_expense(session, expense_id, new_category_id):
    """Move expense to different category (must succeed fully or not at all)."""
    try:
        expense = session.query(Expense).filter(Expense.id == expense_id).first()
        if not expense:
            raise ValueError(f"Expense {expense_id} not found")

        expense.category_id = new_category_id
        session.commit()
        return True
    except Exception as e:
        session.rollback()
        print(f"Transaction failed, rolled back: {e}")
        return False
```

### Step 6: Connect to Neon

**Environment file (.env)**:
```
DATABASE_URL=postgresql+psycopg2://user:password@ep-ABC123.neon.tech/dbname?sslmode=require
```

**Connection with pool configuration**:
```python
from sqlalchemy.pool import QueuePool

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=5,
    max_overflow=10,
    pool_pre_ping=True,  # Verify connections before use
    echo=False  # Set to True for debugging
)
```

---

## MCP Integration

To connect SQLAlchemy database operations to AI agents:

```python
# Define an MCP tool that the agent can use
def query_expenses_by_category(category_name: str) -> list:
    """Agent can ask: 'How much did I spend on groceries?'"""
    with Session(engine) as session:
        return session.query(Expense).join(Category).filter(
            Category.name == category_name
        ).all()

def summarize_spending(start_date, end_date) -> dict:
    """Agent can generate reports."""
    with Session(engine) as session:
        return session.query(
            Category.name,
            func.sum(Expense.amount).label('total'),
            func.count(Expense.id).label('count')
        ).join(Expense).filter(
            Expense.date.between(start_date, end_date)
        ).group_by(Category.name).all()
```

Register these as MCP tools so the agent (Budget Manager) can use them.

---

## Safety & Guardrails

### NEVER
- ❌ Hardcode credentials in Python files
- ❌ Skip error handling around transactions
- ❌ Trust user input without validation
- ❌ Commit secrets to git
- ❌ Skip connection pooling for production

### ALWAYS
- ✅ Use environment variables for connection strings (`.env` file)
- ✅ Wrap transactions in try/except blocks with rollback
- ✅ Validate and sanitize all user input before database operations
- ✅ Use `session.commit()` explicitly (never auto-commit)
- ✅ Use `session.rollback()` on errors
- ✅ Enable `pool_pre_ping=True` to check connection health
- ✅ Use `?sslmode=require` with Neon (enforced anyway)

### Common Mistakes

| Mistake | Impact | Fix |
|---------|--------|-----|
| Forgetting `session.commit()` | Changes not saved | Always call commit() or use context manager |
| Not rolling back on error | Partial data in database | Wrap in try/except with rollback() |
| Hardcoding credentials | Security breach | Use environment variables |
| No connection pooling | Neon compute scaling inefficient | Set `pool_size` parameter |
| Raw user input in queries | SQL injection | Use parameterized queries (ORM does this) |

---

## Budget Tracker Example (Complete)

See `references/budget-tracker-complete.py` for a fully working Budget Tracker application with:
- Model definitions
- Database setup
- CRUD functions
- Transaction handling
- Neon connection
- Example usage

---

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| `ModuleNotFoundError: No module named 'sqlalchemy'` | Not installed | `pip install sqlalchemy` or `uv add sqlalchemy` |
| `ModuleNotFoundError: No module named 'psycopg2'` | PostgreSQL driver missing | `pip install psycopg2-binary` or `uv add psycopg2-binary` |
| `OperationalError: could not connect to server` | Wrong connection string or Neon offline | Check `DATABASE_URL` format, verify Neon project is running |
| `IntegrityError: duplicate key value` | Inserting duplicate unique field | Check if value already exists, use update instead |
| `ForeignKeyError: could not create foreign key` | Category doesn't exist | Create category first, or use valid category_id |
| Queries are slow | No indexes, missing relationships | Check `references/architecture.md` for indexing patterns |

---

## Resources

- **Official Docs**: https://docs.sqlalchemy.org/en/20/orm/quickstart.html
- **Neon Docs**: https://neon.com/docs/
- **PostgreSQL Types**: https://www.postgresql.org/docs/current/datatype.html