--- name: memory-ingest description: Ingest a source into any consumer's semantic memory by reading the topology contract namespace: aiwg category: kernel platforms: [claude, copilot, cursor, factory, windsurf, warp, codex, opencode, openclaw, hermes] triggers: - "help my AI remember what we decided" - "help me preserve project decisions between sessions" - "choose memory KB or project artifact flow" - "memory ingest" --- # memory-ingest Ingest an external source into a consumer framework's semantic memory. Reads the consumer's `memory.topology` contract to know where pages live, then extracts, summarizes, integrates, and cross-references — all topology-agnostic. ## When to Use When new knowledge (a document, paper, URL, config file, or directory of files) needs to enter a consumer's semantic memory. This is the primary write path for external information. ## Parameters ### source (required) Path to the source material. Supports: markdown (`.md`), PDF (`.pdf`), HTML (`.html`), YAML (`.yaml`/`.yml`), JSON (`.json`), a directory of files, or a URL. ### --consumer (optional) Consumer ID to ingest into. Resolved via ADR-021 D4 precedence: 1. **Explicit** — `--consumer research-complete` 2. **Wrapper** — set by a calling skill or orchestrator 3. **Auto-detect** — cwd detection or active framework in `.aiwg/frameworks/registry.json` ### --dry-run (optional) Preview what would be created/modified without writing any files. Outputs the planned page list, cross-references, and contradiction flags. ### --non-interactive (optional) Skip the discussion step and proceed directly to extraction and page writing. Use for batch ingestion or CI pipelines. ## Operation ### 1. Resolve consumer Determine which consumer's memory to target using ADR-021 D4 precedence. Fail with a clear error if no consumer can be resolved. ### 2. Load schema Read `memory.topology` from the consumer's `manifest.json`. Extract: - `rootDir` — base path for all memory pages - `derivedPages.summary` — where summary pages are written - `pageTemplate` — structure the summary must conform to - `crossRefStyle` — how cross-references are formatted (e.g., wiki-links, markdown links) - `indexPath` — location of the consumer's memory index - `log` — path to `.log.jsonl` - `ingestRequires` — optional list of required post-ingest actions (e.g., `"provenance"`) ### 3. Read source Parse the source material based on type: - **Markdown/HTML** — extract text, headings, and structure - **PDF** — extract text content (use page ranges for large documents) - **YAML/JSON** — parse structured data, identify key entities - **Directory** — recursively read all supported files, treating each as a sub-source - **URL** — fetch content, then parse based on content type ### 4. Discuss (interactive default) **Default behavior** (no `--non-interactive` flag): 1. Present a concise summary of the source to the user 2. Highlight key takeaways, entities, and concepts found 3. Ask the user what to emphasize, de-prioritize, or reframe 4. Incorporate user guidance into the extraction strategy This discussion-first pattern ensures the memory reflects human judgment, not just mechanical extraction. ### 5. Extract and summarize Use LLM to produce a structured summary conforming to the consumer's `pageTemplate`. The summary captures: - Key claims and findings - Named entities (people, systems, concepts) - Relationships between entities - Source metadata (title, author, date, URI) ### 6. Integrate - **Write summary page** to `derivedPages.summary` path - **Update entity/concept pages** — for each entity or concept mentioned, update or create the relevant page under the consumer's entity directory, adding the new information with source attribution - **Insert cross-references** — link the summary page to entity pages and vice versa, using the consumer's `crossRefStyle` ### 7. Contradiction detection Compare new claims against existing pages. When a contradiction is found: - **Flag inline** on the affected existing page using a callout: ```markdown > [!contradiction] > Source "paper.pdf" (2026-04-14) claims X, but this page states Y. > Ingested via memory-ingest — awaiting human resolution. ``` - **Log the contradiction** in `.log.jsonl` with `"contradictions"` count and details - **Do not auto-resolve** — surface contradictions for human judgment ### 8. Update index Regenerate the entry for the new summary page in the consumer's index at `indexPath`. Include title, source reference, date, and cross-ref targets. ### 9. Append log Call `memory-log-append` with: ``` --consumer --op ingest --data '{"source":"","pages_touched":[...],"contradictions":,"cross_refs_added":}' ``` ### 10. Optional provenance If `ingestRequires` includes `"provenance"`, create a W3C PROV record documenting: - `prov:Entity` — the new summary page - `prov:Activity` — the ingest operation - `prov:wasDerivedFrom` — the source material - `prov:wasGeneratedBy` — this skill invocation - `prov:wasAttributedTo` — the actor (model + user) ### 11. Report Output a summary: - Pages created or updated (with paths) - Contradictions flagged (count and locations) - Cross-references added (count) - Provenance record path (if created) ## Error Handling - **Consumer not found** — fail with actionable message listing available consumers - **Source unreadable** — fail with format-specific guidance (e.g., "PDF extraction requires the Read tool with page ranges") - **Schema missing fields** — warn and use sensible defaults; log the gap for `memory-lint` to catch - **Log write failure** — non-blocking; report primary operation result regardless ## Examples ``` # Interactive ingest of a research paper memory-ingest docs/papers/distributed-consensus.pdf --consumer research-complete # Batch ingest a directory of meeting notes memory-ingest .aiwg/working/meeting-notes/ --consumer sdlc-complete --non-interactive # Dry run to preview what would change memory-ingest https://example.com/api-spec.html --consumer sdlc-complete --dry-run # Explicit consumer override memory-ingest design-doc.md --consumer media-marketing-kit --non-interactive ``` ## Related Skills - `memory-log-append` — log write primitive (called in step 9) - `memory-lint` — validates memory page structure and cross-ref integrity - `memory-query-capture` — captures query patterns for memory optimization - `provenance-create` — W3C PROV record creation (called in step 10 when required) ## Storage Routing (#934, #966) This skill's persistence flows through `resolveStorage('memory')`. On the default `fs` backend the memory subsystem lives at `.aiwg/memory/` and behavior is byte-identical to direct file writes. To redirect memory artifacts into Obsidian, Logseq, Fortemi, or another backend without changing this skill, configure `.aiwg/storage.config` (#934). When this skill needs to read or write memory artifacts from a Bash step: ```bash aiwg memory path # resolved root (fs only) aiwg memory list --prefix research-complete/ aiwg memory get research-complete/index.md echo "# index" | aiwg memory put research-complete/index.md echo '{"op":"ingest","summary":"foo"}' | aiwg memory append-log research-complete/.log.jsonl ``` The `aiwg memory append-log` subcommand uses atomic `O_APPEND` (#976) on the fs backend — concurrent appenders don't race.