---
name: langgraph-checkpoints
description: LangGraph checkpointing and persistence. Use when implementing fault-tolerant workflows, resuming interrupted executions, debugging with state history, or avoiding re-running expensive operations.
tags: [langgraph, checkpoints, state, persistence]
context: fork
agent: workflow-architect
version: 1.0.0
author: OrchestKit
user-invocable: false
---

# LangGraph Checkpointing

Persist workflow state for recovery and debugging.

## Checkpointer Options

```python
from langgraph.checkpoint import MemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.checkpoint.postgres import PostgresSaver

# Development: In-memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Production: SQLite
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")
app = workflow.compile(checkpointer=checkpointer)

# Production: PostgreSQL
checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = workflow.compile(checkpointer=checkpointer)
```

## Using Thread IDs

```python
# Start new workflow
config = {"configurable": {"thread_id": "analysis-123"}}
result = app.invoke(initial_state, config=config)

# Resume interrupted workflow
config = {"configurable": {"thread_id": "analysis-123"}}
result = app.invoke(None, config=config)  # Resumes from checkpoint
```

## PostgreSQL Setup

```python
def create_checkpointer():
    """Create PostgreSQL checkpointer for production."""
    return PostgresSaver.from_conn_string(
        settings.DATABASE_URL,
        save_every=1  # Save after each node
    )

# Compile with checkpointing
app = workflow.compile(
    checkpointer=create_checkpointer(),
    interrupt_before=["quality_gate"]  # Manual review point
)
```

## Inspecting Checkpoints

```python
# Get all checkpoints for a workflow
checkpoints = app.get_state_history(config)

for checkpoint in checkpoints:
    print(f"Step: {checkpoint.metadata['step']}")
    print(f"Node: {checkpoint.metadata['source']}")
    print(f"State: {checkpoint.values}")

# Get current state
current = app.get_state(config)
print(current.values)
```

## Resuming After Crash

```python
import logging

async def run_with_recovery(workflow_id: str, initial_state: dict):
    """Run workflow with automatic recovery."""
    config = {"configurable": {"thread_id": workflow_id}}

    try:
        # Try to resume existing workflow
        state = app.get_state(config)
        if state.values:
            logging.info(f"Resuming workflow {workflow_id}")
            return app.invoke(None, config=config)
    except Exception:
        pass  # No existing checkpoint

    # Start fresh
    logging.info(f"Starting new workflow {workflow_id}")
    return app.invoke(initial_state, config=config)
```

## Step-by-Step Debugging

```python
# Execute one node at a time
for step in app.stream(initial_state, config):
    print(f"After {step['node']}: {step['state']}")
    input("Press Enter to continue...")

# Rollback to previous checkpoint
history = list(app.get_state_history(config))
previous_state = history[1]  # One step back
app.update_state(config, previous_state.values)
```

## Store vs Checkpointer (2026 Best Practice)

```python
from langgraph.checkpoint.postgres import PostgresSaver
from langgraph.store.postgres import PostgresStore

# Checkpointer = SHORT-TERM memory (thread-scoped)
# - Conversation history within a session
# - Workflow state for resume/recovery
# - Scoped to thread_id

checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)

# Store = LONG-TERM memory (cross-thread)
# - User preferences across sessions
# - Learned facts about users
# - Shared across ALL threads for a user

store = PostgresStore.from_conn_string(DATABASE_URL)

# Compile with BOTH for full memory support
app = workflow.compile(
    checkpointer=checkpointer,  # Thread-scoped state
    store=store                  # Cross-thread memory
)
```

## Using Store for Cross-Thread Memory

```python
from langgraph.store.base import BaseStore

async def agent_with_memory(state: AgentState, *, store: BaseStore):
    """Agent that remembers across conversations."""
    user_id = state["user_id"]

    # Read cross-thread memory (user preferences)
    memories = await store.aget(namespace=("users", user_id), key="preferences")

    # Use memories in agent logic
    if memories and memories.value.get("prefers_concise"):
        state["system_prompt"] += "\nBe concise in responses."

    # Update cross-thread memory (learned facts)
    await store.aput(
        namespace=("users", user_id),
        key="last_topic",
        value={"topic": state["current_topic"], "timestamp": datetime.now().isoformat()}
    )

    return state

# Register node with store access
workflow.add_node("agent", agent_with_memory)
```

## Memory Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    User: alice                               │
├─────────────────────────────────────────────────────────────┤
│  Thread 1 (chat-001)    │  Thread 2 (chat-002)              │
│  ┌─────────────────┐    │  ┌─────────────────┐              │
│  │ Checkpointer    │    │  │ Checkpointer    │              │
│  │ - msg history   │    │  │ - msg history   │              │
│  │ - workflow pos  │    │  │ - workflow pos  │              │
│  └─────────────────┘    │  └─────────────────┘              │
├─────────────────────────────────────────────────────────────┤
│                     Store (cross-thread)                     │
│  namespace=("users", "alice")                                │
│  - preferences: {prefers_concise: true}                     │
│  - last_topic: {topic: "langgraph", timestamp: "..."}       │
└─────────────────────────────────────────────────────────────┘
```

## Key Decisions

| Decision | Recommendation |
|----------|----------------|
| Development | MemorySaver (fast, no setup) |
| Production | PostgresSaver (shared, durable) |
| save_every | 1 for expensive nodes, 5 for cheap |
| Thread ID | Use deterministic ID (workflow_id) |
| **Short-term memory** | **Checkpointer (thread-scoped)** |
| **Long-term memory** | **Store (cross-thread, namespaced)** |

## Common Mistakes

- No checkpointer in production (lose progress)
- Random thread IDs (can't resume)
- Not handling missing checkpoints
- Saving too frequently (overhead)
- **Using only checkpointer for user preferences (lost across threads)**
- **Not using namespaces in Store (data collisions)**

## Related Skills

- `langgraph-state` - State design for checkpointing
- `langgraph-human-in-loop` - Interrupt patterns
- `database-schema-designer` - PostgreSQL setup

## Capability Details

### checkpoint-saving
**Keywords:** save checkpoint, checkpoint, persist state, save state
**Solves:**
- Save workflow state at key points
- Implement checkpoint strategies
- Handle checkpoint serialization

### checkpoint-loading
**Keywords:** load checkpoint, restore, resume, recovery
**Solves:**
- Resume workflows from checkpoints
- Implement state recovery
- Handle checkpoint versioning

### memory-backends
**Keywords:** memory backend, MemorySaver, SqliteSaver, PostgresSaver
**Solves:**
- Configure checkpoint storage backends
- Choose between memory/SQLite/Postgres
- Implement custom checkpoint storage

### async-checkpoints
**Keywords:** async checkpoint, AsyncSqliteSaver, async persistence
**Solves:**
- Implement async checkpoint operations
- Handle concurrent checkpoint access
- Optimize checkpoint performance

### conversation-history
**Keywords:** conversation, history, message history, thread
**Solves:**
- Persist conversation history
- Implement thread-based checkpoints
- Manage conversation state