---
name: layered-recall
description: "Progressive memory recall with 4 scope layers AND 3 depth layers. Scope: identity > project > room > deep. Depth: IDs only > summary > full. 10-50x token savings through fetch-on-confirmation pattern."
---

# Layered Recall

Progressive memory system with **two orthogonal dimensions** of lazy loading:
1. **Scope layers** - What is relevant (identity, project, domain, deep)
2. **Depth layers** - How much detail to fetch (IDs, summary, full)

Combined savings: 10-50x tokens vs eager loading.

## Depth Pattern (Fetch-on-Confirmation)

Instead of loading full memory entries upfront, agents fetch in 3 depths:

```
Depth 1: IDs only        (~10 tokens per match)
  Agent decides which are worth investigating

Depth 2: Summary         (~50 tokens per match)
  Room, type, preview (first 80 chars)
  Agent confirms relevance

Depth 3: Full content    (~500+ tokens per match)
  Only fetched for confirmed matches
```

**Example flow:**
```
1. Agent searches "auth refresh token"
2. Depth 1 returns 8 IDs: d-abc123, d-def456, ...
3. Agent requests Depth 2 for IDs 1-3
4. Sees room=authentication, type=decision, preview="Chose JWT..."
5. Agent confirms IDs 1,3 are relevant
6. Requests Depth 3 only for those 2 entries
7. Gets full content for ~1000 tokens instead of 4000+
```

## The 4 Layers

```
Layer 1: Identity (always loaded, ~200 tokens)
   Who is the user? What are their preferences?

Layer 2: Critical Facts (per-project, ~500 tokens)
   Hard constraints, active decisions, blockers

Layer 3: Room Recall (on-demand, ~1-2K tokens)
   Relevant memories for current task domain

Layer 4: Deep Search (when needed, ~2-5K tokens)
   Full semantic search across all memories
```

## Layer Details

### Layer 1: Identity (~200 tokens, ALWAYS loaded)

Loaded at every session start. Contains:
- User preferences (language, style, autonomy level)
- Global constraints (no emojis, Turkish responses, etc.)
- Tool preferences (which editors, which terminal)

**Source:** `~/.claude/projects/*/memory/user_*.md`

### Layer 2: Critical Facts (~500 tokens, per-project)

Loaded when entering a project directory. Contains:
- Active architectural decisions
- Known blockers and constraints
- Current sprint/milestone goals
- Tech stack and versions

**Source:** `~/.claude/projects/*/memory/project_*.md` + `thoughts/CONTEXT.md`

### Layer 3: Room Recall (~1-2K tokens, on-demand)

Loaded when task domain is detected (auth, database, deploy, etc.). Contains:
- Previous decisions in this domain
- Past errors and fixes
- Patterns that worked
- Patterns that failed

**Source:** Memory palace rooms + `mature-instincts.json` filtered by domain

**Trigger:** Intent classifier detects domain (e.g., "fix the login bug" -> room: authentication)

### Layer 4: Deep Search (~2-5K tokens, explicit)

Only loaded when explicitly needed or when Layers 1-3 don't have enough context. Contains:
- Full semantic search results
- Cross-project pattern matches
- Historical error resolutions
- Archived decisions

**Source:** PostgreSQL vector search + palace cross-wing search

**Trigger:** Agent explicitly queries, or user asks "have we done this before?"

## Recall Flow

```
Session Start
  -> Load Layer 1 (identity)
  -> Detect project -> Load Layer 2 (facts)
  -> User sends prompt
  -> Classify intent/domain -> Load Layer 3 (room)
  -> If insufficient context -> Load Layer 4 (deep)
```

## Token Budget

| Layer | Tokens | When |
|-------|--------|------|
| L1 | ~200 | Always |
| L2 | ~500 | Per project |
| L3 | ~1-2K | Per task domain |
| L4 | ~2-5K | On demand |
| **Total max** | **~8K** | Worst case |

vs. loading everything: ~30-50K tokens

**Savings: 4-6x token reduction**

## Integration

### With Existing Hooks
- `instinct-loader` -> feeds Layer 2 and Layer 3
- `smart-memory-recall` -> implements Layer 3 scoring
- `intent-classifier` -> triggers Layer 3 room selection
- `graph-indexer` -> powers Layer 4 deep search

### With Memory Palace
- Layer 2 pulls from palace wing index
- Layer 3 pulls from palace room drawers
- Layer 4 searches across all wings

### With Agents
- Agents inherit parent's Layer 1-2 context
- Each agent can request Layer 3-4 for their domain
- Agent memories feed back into palace for future recall