---
name: knowledge-intake
description: 'Process external resources into stored knowledge with quality evaluation, curation routing, and application decisions.'
version: 1.9.3
alwaysApply: false
category: governance
tags:
- knowledge-management
- intake
- evaluation
- curation
- external-resources
dependencies:
- memory-palace-architect
- digital-garden-cultivator
- leyline:evaluation-framework
- leyline:storage-templates
- leyline:document-conversion
- scribe:slop-detector
scripts: []
usage_patterns:
- resource-intake
- knowledge-evaluation
- application-routing
complexity: intermediate
model_hint: standard
estimated_tokens: 950
---
## Table of Contents

- [What It Is](#what-it-is)
- [The Intake Signal](#the-intake-signal)
- [Quick Start](#quick-start)
- [Evaluation Framework](#evaluation-framework)
- [Importance Criteria](#importance-criteria)
- [Scoring Guide](#scoring-guide)
- [Application Routing](#application-routing)
- [Local Codebase Application](#local-codebase-application)
- [Meta-Infrastructure Application](#meta-infrastructure-application)
- [Routing Decision Tree](#routing-decision-tree)
- [Storage Locations](#storage-locations)
- [The Tidying Imperative (KonMari-Inspired)](#the-tidying-imperative-konmari-inspired)
- [The Master Curator](#the-master-curator)
- [The Two Questions](#the-two-questions)
- [Tidying Actions](#tidying-actions)
- [Marginal Value Filtering (Anti-Pollution)](#marginal-value-filtering-anti-pollution)
- [The Three-Step Filter](#the-three-step-filter)
- [Using the Filter](#using-the-filter)
- [Filter Output Example](#filter-output-example)
- [Progressive Autonomy Integration](#progressive-autonomy-integration)
- [RL-Based Quality Scoring](#rl-based-quality-scoring)
- [Usage Signals](#usage-signals)
- [Quality Decay Model](#quality-decay-model)
- [Source Lineage Tracking](#source-lineage-tracking)
- [Knowledge Orchestrator](#knowledge-orchestrator)
- [RL Integration with Marginal Value Filter](#rl-integration-with-marginal-value-filter)
- [Workflow Example](#workflow-example)
- [Queue Processing](#queue-processing)
- [Processing Queue Entries](#processing-queue-entries)
- [Queue Integration](#queue-integration)
- [Queue Status Workflow](#queue-status-workflow)
- [Automation](#automation)
- [Detailed Resources](#detailed-resources)
- [Hook Integration](#hook-integration)
- [Automatic Triggers](#automatic-triggers)
- [Hook Signals](#hook-signals)
- [Deduplication](#deduplication)
- [Safety Checks](#safety-checks)
- [Index Schema Alignment](#index-schema-alignment)
- [Integration](#integration)


# Knowledge Intake

Systematically process external resources into actionable knowledge. When a user links an article, blog post, or paper, this skill guides evaluation, storage decisions, and application routing.


## When To Use

- Capturing and organizing knowledge from sessions
- Ingesting information into structured memory palaces

## When NOT To Use

- Temporary notes that do not need long-term storage
- Code-only changes without knowledge capture needs

## What It Is

A knowledge governance framework that answers three questions for every external resource:
1. **Is it worth storing?** - Evaluate signal-to-noise and relevance
2. **Where does it apply?** - Route to local codebase or meta-infrastructure
3. **What does it displace?** - Identify outdated knowledge to prune

## The Intake Signal

> When a user links an external resource, it is a signal of importance.

The act of sharing indicates the resource passed the user's own filter. Our job is to:
- Extract the essential patterns and insights
- Determine appropriate storage location and format
- Connect to existing knowledge structures
- Identify application opportunities

## Quick Start

When a user shares a link:

```
1. FETCH    → Detect format, retrieve and convert content
2. EVALUATE → Apply importance criteria
3. DECIDE   → Storage location and application type
4. STORE    → Create structured knowledge entry
5. VALIDATE → Scribe verification (slop scan + doc verify)
6. CONNECT  → Link to existing palace structures
7. PROMOTE  → Offer Discussion promotion (score 80+)
8. APPLY    → Route to codebase or infrastructure updates
9. PRUNE    → Identify displaced/outdated knowledge
```

### Step 1: FETCH with Format Detection

Before retrieving content, detect the source format from
the URL or file path to choose the right retrieval method.

**Web articles and blog posts** (default path):
Use WebFetch to retrieve HTML content directly.
No conversion needed.

**Document URLs** (PDF, DOCX, PPTX, XLSX):
Apply the `leyline:document-conversion` protocol.
This tries the markitdown MCP tool first for high-quality
markdown, then falls back to native Claude Code tools
(Read for PDFs, etc.), then informs the user if the
format is unsupported without markitdown.

**Local files** (user shares a file path):
Construct a `file://` URI from the absolute path and
apply the `leyline:document-conversion` protocol.

**Format detection heuristics:**

| URL Pattern | Format | Retrieval |
|-------------|--------|-----------|
| `*.pdf`, `arxiv.org/pdf/*` | PDF | document-conversion |
| `*.docx`, `*.doc` | Word | document-conversion |
| `*.pptx`, `*.ppt` | PowerPoint | document-conversion |
| `*.xlsx`, `*.xls` | Excel | document-conversion |
| `*.epub` | E-book | document-conversion |
| `drive.google.com/*` | Various | document-conversion |
| Everything else | HTML/web | WebFetch (existing) |

After retrieval (regardless of method), wrap the content
in external content boundary markers per
`leyline:content-sanitization` before proceeding to
Step 2 (EVALUATE).

### Step 5: Scribe Validation (Required)

**All knowledge corpus entries MUST pass scribe validation before finalizing.**

Run `Skill(scribe:slop-detector)` on the new entry:
- Score must be < 2.5 (Clean to Light)
- No Tier 1 markers (delve, tapestry, comprehensive, leveraging, etc.)
- Hedge word density < 15 per 1000 words

Use `Agent(scribe:doc-verifier)` to validate:
- All file paths and URLs exist
- All cross-references valid
- Source attributions accurate

```bash
# Quick validation for knowledge corpus entry
/slop-scan docs/knowledge-corpus/[entry-name].md
# Doc verification is now agent-only:
Agent(scribe:doc-verifier) "Verify docs/knowledge-corpus/[entry-name].md"
```

**DO NOT finalize entries with slop score > 2.5** - rewrite with concrete specifics.
**Verification:** Run the command with `--help` flag to verify availability.

### Step 7: Discussion Promotion (Score 80+ Only)

When the evaluation score is 80-100 (evergreen), you
MUST execute the Discussion promotion workflow. If the
score is below 80, skip this step entirely.

**Execute these steps in order:**

1. Read `modules/discussion-promotion.md` for the
   full GraphQL workflow
2. Tell the user: "This entry has reached evergreen
   maturity. Publishing to GitHub Discussions. [Y/n]"
3. If the user says "n", skip to Step 8 (APPLY)
4. Run the `gh api graphql` commands from the module
   to create or update a Discussion in the "Knowledge"
   category
5. Update the local corpus entry with `discussion_url`

- If the entry already has a `discussion_url` field,
  update the existing Discussion instead of creating
  a new one
- If `gh` is unavailable or promotion fails, warn
  the user and continue to Step 8 (APPLY)

Publishing is the default for qualifying entries. It
never blocks the intake workflow.

## Evaluation Framework

### Importance Criteria

| Criterion | Weight | Questions |
|-----------|--------|-----------|
| **Novelty** | 25% | Does this introduce new patterns or concepts? |
| **Applicability** | 30% | Can we apply this to current work? |
| **Durability** | 20% | Will this remain relevant in 6+ months? |
| **Connectivity** | 15% | Does it connect to multiple existing concepts? |
| **Authority** | 10% | Is the source credible and well-reasoned? |

### Scoring Guide

- **80-100**: Evergreen knowledge, store prominently, apply immediately
- **60-79**: Valuable insight, store in corpus, schedule application
- **40-59**: Useful reference, store as seedling, revisit later
- **Below 40**: Low priority, capture key quote only or skip

## Application Routing

### Local Codebase Application
Apply when knowledge directly improves current project:
- Bug fix patterns
- Performance optimizations
- Architecture decisions for this codebase
- Tool/library recommendations

**Action**: Update code, add comments, create ADR

### Meta-Infrastructure Application
Apply when knowledge improves our plugin ecosystem:
- Skill design patterns
- Agent behavior improvements
- Workflow optimizations
- Learning/evaluation methods (like Franklin Protocol)

**Action**: Update skills, create modules, enhance agents

### Routing Decision Tree

```
**Verification:** Run the command with `--help` flag to verify availability.
Is the knowledge...
├── About HOW we build things? → Meta-infrastructure
│   ├── Skill patterns → Update abstract/memory-palace skills
│   ├── Learning methods → Add to knowledge-corpus
│   └── Tool techniques → Create new skill module
│
└── About WHAT we're building? → Local codebase
    ├── Domain knowledge → Store in project docs
    ├── Implementation patterns → Update code/architecture
    └── Bug/issue solutions → Apply fix, document
```
**Verification:** Run the command with `--help` flag to verify availability.

## Storage Locations

| Knowledge Type | Location | Format |
|----------------|----------|--------|
| Meta-learning patterns | `docs/knowledge-corpus/` | Full memory palace entry |
| Skill design insights | `skills/*/modules/` | Technique module |
| Tool/library knowledge | `docs/references/` | Quick reference |
| Temporary insights | Digital garden seedling | Lightweight note |

## The Tidying Imperative (KonMari-Inspired)

> "A cluttered palace is a cluttered mind."

New knowledge often displaces old—but **time is not the criterion**. Relevance and aspirational alignment are.

### The Master Curator
The human in the loop defines what stays. Before major tidying:
1. **Who are you becoming?** - Your aspirations as a developer
2. **What excites you now?** - Genuine enthusiasm, not "should"
3. **What have you outgrown?** - Past interests consciously left behind

### The Two Questions
For each piece of knowledge, both must be yes:
- **Does it spark joy?** - Genuine enthusiasm, not obligation
- **Does it serve your aspirations?** - Aligned with who you're becoming

### Tidying Actions

| Finding | Action |
|---------|--------|
| Supersedes | Archive old with gratitude, link as context |
| Contradicts | Evaluate both, keep what sparks joy |
| No longer aligned | Release with gratitude |
| Complements | Create bidirectional links |

**"I might need this someday"** is fear, not joy. Release it.

## Marginal Value Filtering (Anti-Pollution)

> "If it can't teach something the existing corpus can't already teach → skip it."

Before storing ANY knowledge, run the **marginal value filter** to prevent corpus pollution.

### The Three-Step Filter

**1. Redundancy Check**
- Exact match → REJECT immediately
- 80%+ overlap → REJECT as redundant
- 40-80% overlap → Evaluate delta (Step 2)
- <40% overlap → Likely novel, proceed to store

**2. Delta Analysis** (for partial overlap only)
- **Novel insight/pattern** → High value (0.7-0.9)
- **Different framing only** → Low value (0.2-0.4)
- **More examples** → Marginal value (0.4-0.6)
- **Contradicts existing** → Investigate (0.6-0.8)

**3. Integration Decision**
- **Standalone**: Novel content, no significant overlap
- **Merge**: Enhances existing entry with examples/details
- **Replace**: Supersedes outdated knowledge
- **Skip**: Insufficient marginal value

### Using the Filter

```python
from memory_palace.corpus import MarginalValueFilter

# Initialize filter with corpus and index directories
filter = MarginalValueFilter(
    corpus_dir="docs/knowledge-corpus",
    index_dir="docs/knowledge-corpus/indexes"
)

# Evaluate new content
redundancy, delta, integration = filter.evaluate_content(
    content=article_text,
    title="Structured Concurrency in Python",
    tags=["async", "concurrency", "python"]
)

# Get human-readable explanation
explanation = filter.explain_decision(redundancy, delta, integration)
print(explanation)

# Act on decision
if integration.decision == IntegrationDecision.SKIP:
    print(f"Skipping: {integration.rationale}")
elif integration.decision == IntegrationDecision.STANDALONE:
    # Store as new entry
    store_knowledge(content, title)
elif integration.decision == IntegrationDecision.MERGE:
    # Enhance existing entry
    enhance_entry(integration.target_entries[0], content)
elif integration.decision == IntegrationDecision.REPLACE:
    # Replace outdated entry
    replace_entry(integration.target_entries[0], content)
```
**Verification:** Run the command with `--help` flag to verify availability.

### Filter Output Example

```
**Verification:** Run the command with `--help` flag to verify availability.
=== Marginal Value Assessment ===

Redundancy: partial
Overlap: 65%
Matches: async-patterns, python-concurrency
  - Partial overlap (65%) with 2 entries

Delta Type: novel_insight
Value Score: 75%
Teaching Delta: Introduces 8 new concepts
Novel aspects:
  + New concepts: structured, taskgroup, context-manager
  + New topics: Error Propagation, Resource Cleanup

Decision: STANDALONE
Confidence: 80%
Rationale: Novel insights justify standalone: Introduces 8 new concepts
```
**Verification:** Run the command with `--help` flag to verify availability.

### Progressive Autonomy Integration

The marginal value filter respects autonomy levels (see plan Phase 4):
- **Level 0**: ALL decisions require human approval
- **Level 1**: Auto-approve 85+ scores in known domains
- **Level 2**: Auto-approve 70+ scores in known domains
- **Level 3**: Auto-approve 60+, auto-reject obvious noise

Current implementation: Level 0 (all human-in-the-loop).

## RL-Based Quality Scoring

The knowledge corpus uses reinforcement learning signals to dynamically score entry quality based on actual usage patterns.

### Usage Signals

| Signal | Weight | Description |
|--------|--------|-------------|
| `ACCESS` | +0.1 | Entry was accessed/read |
| `CITATION` | +0.3 | Entry was cited in another context |
| `POSITIVE_FEEDBACK` | +0.5 | User marked as helpful |
| `NEGATIVE_FEEDBACK` | -0.3 | User marked as unhelpful |
| `CORRECTION` | +0.2 | Entry was corrected/updated |
| `STALE_FLAG` | -0.4 | Entry marked as potentially outdated |

### Quality Decay Model

Knowledge entries decay over time unless validated:

| Maturity | Half-Life | Decay Curve |
|----------|-----------|-------------|
| Seedling | 14 days | Exponential |
| Growing | 30 days | Exponential |
| Evergreen | 90 days | Logarithmic |

Entries are classified by decay status:
- **Fresh**: >70% quality retained
- **Stale**: 40-70% quality retained
- **Critical**: 20-40% quality retained
- **Archived**: <20% quality retained

### Source Lineage Tracking

Hybrid lineage tracking based on source importance:

**Full Lineage** (for important sources):
- Primary source with complete metadata
- Derivation chain (what entries it was derived from)
- Transformation history (summarization, extraction, etc.)
- Validation chain (who validated and when)

**Simple Lineage** (for standard sources):
- Source type and URL
- Retrieval timestamp

Full lineage is used for:
- Research papers
- Documentation
- Entries with importance score >= 0.7

### Knowledge Orchestrator

The `KnowledgeOrchestrator` coordinates all quality systems:

```python
from memory_palace.corpus import KnowledgeOrchestrator, UsageSignal

# Initialize orchestrator
orchestrator = KnowledgeOrchestrator(
    corpus_dir="docs/knowledge-corpus",
    index_dir="docs/knowledge-corpus/indexes"
)

# Record usage events
orchestrator.record_usage("entry-1", UsageSignal.ACCESS)
orchestrator.record_usage("entry-1", UsageSignal.POSITIVE_FEEDBACK)

# Assess entry quality
entry = {"id": "entry-1", "maturity": "growing"}
assessment = orchestrator.assess_entry(entry)
print(f"Quality: {assessment.overall_score:.0%}")
print(f"Status: {assessment.status}")
print(f"Recommendations: {assessment.recommendations}")

# Get maintenance queue
entries = [...]  # Your entry list
queue = orchestrator.get_maintenance_queue(entries)
for item in queue:
    print(f"{item.entry_id}: {item.status} - {item.recommendations}")

# Ingest new content with lineage
from memory_palace.corpus import SourceReference, SourceType

source = SourceReference(
    source_id="src-1",
    source_type=SourceType.DOCUMENTATION,
    url="https://docs.example.com/api",
    title="API Documentation"
)
entry_id, decision = orchestrator.ingest_with_lineage(
    content="# API Reference\n...",
    title="API Documentation",
    source=source
)
```
**Verification:** Run the command with `--help` flag to verify availability.

### RL Integration with Marginal Value Filter

The marginal value filter emits RL signals on integration decisions:

```python
from memory_palace.corpus import MarginalValueFilter

filter = MarginalValueFilter(corpus_dir, index_dir)

# Evaluate with RL signal emission
redundancy, delta, integration, rl_signal = filter.evaluate_with_rl(
    content=article_text,
    title="New Article",
    tags=["python", "async"]
)

# RL signal contains:
# - signal_type: UsageSignal to emit
# - weight: Signal weight for scoring
# - action: What happened (new_entry_created, entry_enhanced, etc.)
# - decision: Integration decision made
# - confidence: Decision confidence
print(f"RL Signal: {rl_signal['action']} (weight: {rl_signal['weight']})")
```
**Verification:** Run the command with `--help` flag to verify availability.

## Workflow Example

**User shares**: "Check out this article on structured concurrency"

```yaml
intake:
  source: "https://example.com/structured-concurrency"

# PHASE 3: Marginal Value Filter
marginal_value:
  redundancy:
    level: partial_overlap
    overlap_score: 0.65
    matching_entries: [async-patterns, python-concurrency]
  delta:
    type: novel_insight
    value_score: 0.75
    novel_aspects: [structured, taskgroup, context-manager]
    teaching_delta: "Introduces structured concurrency pattern"
  integration:
    decision: standalone
    confidence: 0.80
    rationale: "Novel insights justify standalone entry"

# Continue with evaluation if filter passes
evaluation:
  novelty: 75        # New pattern for error handling
  applicability: 90  # Directly relevant to async code
  durability: 85     # Core concept, won't age quickly
  connectivity: 70   # Links to error handling, async patterns
  authority: 80      # Well-known author, cited sources
  total: 82          # Evergreen, store and apply

routing:
  type: both
  local_application:
    - Refactor async error handling in current project
    - Add structured concurrency pattern to codebase
  meta_application:
    - Create module in relevant skill
    - Add to knowledge-corpus as reference

storage:
  location: docs/knowledge-corpus/structured-concurrency.md
  format: memory_palace_entry
  maturity: growing

pruning:
  displaces:
    - Old async error patterns (mark deprecated)
  complements:
    - Existing error handling module
    - Async patterns documentation
```
**Verification:** Run the command with `--help` flag to verify availability.

## Queue Processing

Research sessions and external content are automatically queued for review in `docs/knowledge-corpus/queue/`.

### Processing Queue Entries

```bash
# List pending queue entries
ls -1t docs/knowledge-corpus/queue/*.yaml

# Review specific entry
cat docs/knowledge-corpus/queue/2025-12-31_topic.yaml

# Process approved entry
# 1. Create memory palace entry in docs/knowledge-corpus/
# 2. Update queue entry status to 'processed'
# 3. Archive or delete queue entry
```
**Verification:** Run the command with `--help` flag to verify availability.

### Queue Integration

The `research-queue-integration` hook automatically queues:
- Brainstorming sessions with 3+ WebSearch calls
- Research-focused sessions with substantial findings
- Manual additions via queue entry creation

**Queue entry format**: See `docs/knowledge-corpus/queue/README.md`

### Queue Status Workflow

```
**Verification:** Run the command with `--help` flag to verify availability.
pending_review → [Review] → approved/rejected
approved → [Create Entry] → processed
processed → [Archive] → queue/archive/
```
**Verification:** Run the command with `--help` flag to verify availability.

## Automation

- Run `uv run python scripts/intake_cli.py --candidate path/to/intake_candidate.json --auto-accept`
- The CLI runs marginal value filter, creates palace entries (`docs/knowledge-corpus/*.md`),
  developer drafts (`docs/developer-drafts/`), and appends audit rows to `docs/curation-log.md`.
- Use `--output-root` in tests or sandboxes to avoid mutating the main corpus.
- **Queue Processing**: Use `--process-queue` flag to review and process queued entries interactively.

## Detailed Resources

- **Evaluation Rubric**: See `modules/evaluation-rubric.md`
- **Storage Patterns**: See `modules/storage-patterns.md`
- **KonMari Tidying Philosophy**: See `modules/konmari-tidying.md`
- **Tidying Workflows**: See `modules/pruning-workflows.md`
- **Discussion Promotion**: Invoked in Step 7 (PROMOTE)
  for evergreen entries (score 80+). Publishing is the
  default action.
  See `modules/discussion-promotion.md` for full workflow.

## Hook Integration

Memory-palace hooks automatically detect content that may need knowledge intake processing:

### Automatic Triggers

| Hook | Event | When Triggered |
|------|-------|----------------|
| `url_detector` | UserPromptSubmit | User message contains URLs |
| `web_content_processor` | PostToolUse (WebFetch/WebSearch) | After fetching web content |
| `local_doc_processor` | PostToolUse (Read) | Reading files in knowledge paths |
| `research_queue_integration` | SessionEnd | Research sessions with 3+ WebSearch calls |

### Hook Signals

When hooks detect potential knowledge content, they add context messages:

```
**Verification:** Run `pytest -v` to verify tests pass.
Memory Palace: New web content fetched from {url}.
Consider running knowledge-intake to evaluate and store if valuable.
```
**Verification:** Run the command with `--help` flag to verify availability.

```
**Verification:** Run the command with `--help` flag to verify availability.
Memory Palace: Reading local knowledge doc '{path}'.
This path is configured for knowledge tracking.
Consider running knowledge-intake if this contains valuable reference material.
```
**Verification:** Run the command with `--help` flag to verify availability.

### Deduplication

Hooks check the `memory-palace-index.yaml` to avoid redundant processing:
- **Known URLs**: "Content already indexed" - skip re-evaluation
- **Changed content**: "Content has changed" - suggest update
- **New content**: Full evaluation recommended

### Safety Checks

Before signaling intake, hooks validate content:
- Size limits (default 500KB)
- Secret detection (API keys, credentials)
- Data bomb prevention (repetition, unicode bombs)
- Prompt injection sanitization

### Index Schema Alignment

The deduplication index stores fields aligned with this skill's evaluation:

```yaml
entries:
  "https://example.com/article":
    content_hash: "xxh:abc123..."
    stored_at: "docs/knowledge-corpus/article.md"
    importance_score: 82           # From evaluation framework
    maturity: "growing"            # seedling, growing, evergreen
    routing_type: "both"           # local, meta, both
    last_updated: "2025-12-06T..."
```
**Verification:** Run the command with `--help` flag to verify availability.

## Integration

- `memory-palace-architect` - Structures stored knowledge spatially
- `digital-garden-cultivator` - Manages knowledge lifecycle
- `knowledge-locator` - Finds and retrieves stored knowledge
- `skills-eval` (abstract) - Evaluates meta-infrastructure updates