---
name: recursive-knowledge
description: "Process large document corpora (1000+ docs, millions of tokens) through knowledge graph construction and stateful multi-hop reasoning. Use when (1) User provides a large corpus exceeding context limits, (2) Questions require connections across multiple documents, (3) Multi-hop reasoning needed for complex queries, (4) User wants persistent queryable knowledge from documents. Replaces brute-force document stuffing with intelligent graph traversal."
---

# Recursive Knowledge Processing

Process arbitrarily large document sets through knowledge graph construction and stateful multi-hop queries. Based on RLM research but with proper state management and termination logic.

## Core Concept

Instead of stuffing documents into context (which causes degradation), this skill:
1. Indexes documents into a knowledge graph (entities, relationships)
2. Answers queries by traversing the graph
3. Tracks state to avoid redundant exploration
4. Uses confidence thresholds to know when to stop

## Workflow

### Phase 1: Indexing

For a new corpus, run the indexer:

```python
python3 scripts/index_corpus.py --input /path/to/documents --output /path/to/graph.json
```

This extracts:
- **Entities**: People, organizations, concepts, dates, locations
- **Relationships**: References, mentions, contradicts, supports, relates_to
- **Metadata**: Source document, position, extraction confidence

For details on entity/relationship schema, see [references/graph-schema.md](references/graph-schema.md).

### Phase 2: Querying

For user queries against an indexed corpus:

```python
python3 scripts/query.py --graph /path/to/graph.json --query "user question here"
```

The query engine:
1. Parses query into target entities/relationships
2. Finds entry points in graph
3. Traverses with state tracking
4. Stops when confidence threshold met
5. Returns answer with provenance

### Phase 3: Incremental Updates

Add new documents to existing graph:

```python
python3 scripts/index_corpus.py --input /path/to/new_docs --output /path/to/graph.json --append
```

## State Management (Critical)

The key improvement over naive recursive approaches is **stateful traversal**. See [references/state-management.md](references/state-management.md) for full details.

**During query execution, track:**

| State | Purpose |
|-------|---------|
| `visited_nodes` | Prevent re-exploring same entities |
| `visited_edges` | Prevent re-traversing same relationships |
| `findings` | Accumulated evidence with sources |
| `confidence` | Current certainty level (0-1) |
| `depth` | Current traversal depth |

**Termination conditions:**

```python
STOP if:
  - confidence >= 0.85 (high certainty)
  - len(corroborating_sources) >= 3 (multiple agreement)
  - depth > max_depth (prevent infinite exploration)
  - all relevant paths exhausted
```

## Multi-Hop Reasoning

For questions requiring connection across documents:

1. Identify query components (what entities/facts needed)
2. Find entry points for each component
3. Traverse from each entry point
4. Look for path intersections
5. Synthesize findings at intersection points

Example: "Who worked with X on project Y?"
- Entry point 1: Entity "X" → relationships → projects
- Entry point 2: Entity "Project Y" → relationships → people
- Intersection: People connected to both X and Project Y

See [references/traversal-patterns.md](references/traversal-patterns.md) for patterns.

## When NOT to Use This Skill

- Small document sets that fit in context (<50k tokens) - just use direct context
- Simple keyword search - use grep/search tools instead
- No multi-hop reasoning needed - simpler approaches work
- Real-time streaming data - this is for static corpora

## File Reference

- `scripts/index_corpus.py` - Build graph from documents
- `scripts/query.py` - Execute queries with state management
- `scripts/graph_ops.py` - Graph CRUD utilities
- `references/graph-schema.md` - Entity and relationship types
- `references/state-management.md` - Termination and confidence logic
- `references/traversal-patterns.md` - Multi-hop query patterns