--- name: llm-wiki description: "Karpathy's LLM Wiki — build and maintain a persistent, interlinked markdown knowledge base. Ingest sources, query compiled knowledge, and lint for consistency." version: 2.0.0 author: Hermes Agent license: MIT metadata: hermes: tags: [wiki, knowledge-base, research, notes, markdown, rag-alternative] category: research related_skills: [obsidian, arxiv, agentic-research-ideas] config: - key: wiki.path description: Path to the LLM Wiki knowledge base directory default: "~/wiki" prompt: Wiki directory path --- # Karpathy's LLM Wiki Build and maintain a persistent, compounding knowledge base as interlinked markdown files. Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki compiles knowledge once and keeps it current. Cross-references are already there. Contradictions have already been flagged. Synthesis reflects everything ingested. **Division of labor:** The human curates sources and directs analysis. The agent summarizes, cross-references, files, and maintains consistency. ## When This Skill Activates Use this skill when the user: - Asks to create, build, or start a wiki or knowledge base - Asks to ingest, add, or process a source into their wiki - Asks a question and an existing wiki is present at the configured path - Asks to lint, audit, or health-check their wiki - References their wiki, knowledge base, or "notes" in a research context ## Wiki Location Configured via `skills.config.wiki.path` in `~/.hermes/config.yaml` (prompted during `hermes config migrate` or `hermes setup`): ```yaml skills: config: wiki: path: ~/wiki ``` Falls back to `~/wiki` default. The resolved path is injected when this skill loads — check the `[Skill config: ...]` block above for the active value. The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or any editor. No database, no special tooling required. ## Architecture: Three Layers ``` wiki/ ├── SCHEMA.md # Conventions, structure rules, domain config ├── index.md # Sectioned content catalog with one-line summaries ├── log.md # Chronological action log (append-only, rotated yearly) ├── raw/ # Layer 1: Immutable source material │ ├── articles/ # Web articles, clippings │ ├── papers/ # PDFs, arxiv papers │ ├── transcripts/ # Meeting notes, interviews │ └── assets/ # Images, diagrams referenced by sources ├── entities/ # Layer 2: Entity pages (people, orgs, products, models) ├── concepts/ # Layer 2: Concept/topic pages ├── comparisons/ # Layer 2: Side-by-side analyses └── queries/ # Layer 2: Filed query results worth keeping ``` **Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these. **Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and cross-referenced by the agent. **Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy. ## Resuming an Existing Wiki (CRITICAL — do this every session) When the user has an existing wiki, **always orient yourself before doing anything**: ① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy. ② **Read `index.md`** — learn what pages exist and their summaries. ③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity. ```bash WIKI="${wiki_path:-$HOME/wiki}" # Orientation reads at session start read_file "$WIKI/SCHEMA.md" read_file "$WIKI/index.md" read_file "$WIKI/log.md" offset= ``` Only after orientation should you ingest, query, or lint. This prevents: - Creating duplicate pages for entities that already exist - Missing cross-references to existing content - Contradicting the schema's conventions - Repeating work already logged For large wikis (100+ pages), also run a quick `search_files` for the topic at hand before creating anything new. ## Initializing a New Wiki When the user asks to create or start a wiki: 1. Determine the wiki path (from config, env var, or ask the user; default `~/wiki`) 2. Create the directory structure above 3. Ask the user what domain the wiki covers — be specific 4. Write `SCHEMA.md` customized to the domain (see template below) 5. Write initial `index.md` with sectioned header 6. Write initial `log.md` with creation entry 7. Confirm the wiki is ready and suggest first sources to ingest ### SCHEMA.md Template Adapt to the user's domain. The schema constrains agent behavior and ensures consistency: ```markdown # Wiki Schema ## Domain [What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"] ## Conventions - File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`) - Every wiki page starts with YAML frontmatter (see below) - Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page) - When updating a page, always bump the `updated` date - Every new page must be added to `index.md` under the correct section - Every action must be appended to `log.md` ## Frontmatter ```yaml --- title: Page Title created: YYYY-MM-DD updated: YYYY-MM-DD type: entity | concept | comparison | query | summary tags: [from taxonomy below] sources: [raw/articles/source-name.md] --- ``` ## Tag Taxonomy [Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.] Example for AI/ML: - Models: model, architecture, benchmark, training - People/Orgs: person, company, lab, open-source - Techniques: optimization, fine-tuning, inference, alignment, data - Meta: comparison, timeline, controversy, prediction Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl. ## Page Thresholds - **Create a page** when an entity/concept appears in 2+ sources OR is central to one source - **Add to existing page** when a source mentions something already covered - **DON'T create a page** for passing mentions, minor details, or things outside the domain - **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links - **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index ## Entity Pages One page per notable entity. Include: - Overview / what it is - Key facts and dates - Relationships to other entities ([[wikilinks]]) - Source references ## Concept Pages One page per concept or topic. Include: - Definition / explanation - Current state of knowledge - Open questions or debates - Related concepts ([[wikilinks]]) ## Comparison Pages Side-by-side analyses. Include: - What is being compared and why - Dimensions of comparison (table format preferred) - Verdict or synthesis - Sources ## Update Policy When new information conflicts with existing content: 1. Check the dates — newer sources generally supersede older ones 2. If genuinely contradictory, note both positions with dates and sources 3. Mark the contradiction in frontmatter: `contradictions: [page-name]` 4. Flag for user review in the lint report ``` ### index.md Template The index is sectioned by type. Each entry is one line: wikilink + summary. ```markdown # Wiki Index > Content catalog. Every wiki page listed under its type with a one-line summary. > Read this first to find relevant pages for any query. > Last updated: YYYY-MM-DD | Total pages: N ## Entities ## Concepts ## Comparisons ## Queries ``` **Scaling rule:** When any section exceeds 50 entries, split it into sub-sections by first letter or sub-domain. When the index exceeds 200 entries total, create a `_meta/topic-map.md` that groups pages by theme for faster navigation. ### log.md Template ```markdown # Wiki Log > Chronological record of all wiki actions. Append-only. > Format: `## [YYYY-MM-DD] action | subject` > Actions: ingest, update, query, lint, create, archive, delete > When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh. ## [YYYY-MM-DD] create | Wiki initialized - Domain: [domain] - Structure created with SCHEMA.md, index.md, log.md ``` ## Core Operations ### 1. Ingest When the user provides a source (URL, file, paste), integrate it into the wiki: ① **Capture the raw source:** - URL → use `web_extract` to get markdown, save to `raw/articles/` - PDF → use `web_extract` (handles PDFs), save to `raw/papers/` - Pasted text → save to appropriate `raw/` subdirectory - Name the file descriptively: `raw/articles/karpathy-llm-wiki-2026.md` ② **Discuss takeaways** with the user — what's interesting, what matters for the domain. (Skip this in automated/cron contexts — proceed directly.) ③ **Check what already exists** — search index.md and use `search_files` to find existing pages for mentioned entities/concepts. This is the difference between a growing wiki and a pile of duplicates. ④ **Write or update wiki pages:** - **New entities/concepts:** Create pages only if they meet the Page Thresholds in SCHEMA.md (2+ source mentions, or central to one source) - **Existing pages:** Add new information, update facts, bump `updated` date. When new info contradicts existing content, follow the Update Policy. - **Cross-reference:** Every new or updated page must link to at least 2 other pages via `[[wikilinks]]`. Check that existing pages link back. - **Tags:** Only use tags from the taxonomy in SCHEMA.md ⑤ **Update navigation:** - Add new pages to `index.md` under the correct section, alphabetically - Update the "Total pages" count and "Last updated" date in index header - Append to `log.md`: `## [YYYY-MM-DD] ingest | Source Title` - List every file created or updated in the log entry ⑥ **Report what changed** — list every file created or updated to the user. A single source can trigger updates across 5-15 wiki pages. This is normal and desired — it's the compounding effect. ### 2. Query When the user asks a question about the wiki's domain: ① **Read `index.md`** to identify relevant pages. ② **For wikis with 100+ pages**, also `search_files` across all `.md` files for key terms — the index alone may miss relevant content. ③ **Read the relevant pages** using `read_file`. ④ **Synthesize an answer** from the compiled knowledge. Cite the wiki pages you drew from: "Based on [[page-a]] and [[page-b]]..." ⑤ **File valuable answers back** — if the answer is a substantial comparison, deep dive, or novel synthesis, create a page in `queries/` or `comparisons/`. Don't file trivial lookups — only answers that would be painful to re-derive. ⑥ **Update log.md** with the query and whether it was filed. ### 3. Lint When the user asks to lint, health-check, or audit the wiki: ① **Orphan pages:** Find pages with no inbound `[[wikilinks]]` from other pages. ```python # Use execute_code for this — programmatic scan across all wiki pages import os, re from collections import defaultdict wiki = "" # Scan all .md files in entities/, concepts/, comparisons/, queries/ # Extract all [[wikilinks]] — build inbound link map # Pages with zero inbound links are orphans ``` ② **Broken wikilinks:** Find `[[links]]` that point to pages that don't exist. ③ **Index completeness:** Every wiki page should appear in `index.md`. Compare the filesystem against index entries. ④ **Frontmatter validation:** Every wiki page must have all required fields (title, created, updated, type, tags, sources). Tags must be in the taxonomy. ⑤ **Stale content:** Pages whose `updated` date is >90 days older than the most recent source that mentions the same entities. ⑥ **Contradictions:** Pages on the same topic with conflicting claims. Look for pages that share tags/entities but state different facts. ⑦ **Page size:** Flag pages over 200 lines — candidates for splitting. ⑧ **Tag audit:** List all tags in use, flag any not in the SCHEMA.md taxonomy. ⑨ **Log rotation:** If log.md exceeds 500 entries, rotate it. ⑩ **Report findings** with specific file paths and suggested actions, grouped by severity (broken links > orphans > stale content > style issues). ⑪ **Append to log.md:** `## [YYYY-MM-DD] lint | N issues found` ## Working with the Wiki ### Searching ```bash # Find pages by content search_files "transformer" path="$WIKI" file_glob="*.md" # Find pages by filename search_files "*.md" target="files" path="$WIKI" # Find pages by tag search_files "tags:.*alignment" path="$WIKI" file_glob="*.md" # Recent activity read_file "$WIKI/log.md" offset= ``` ### Bulk Ingest When ingesting multiple sources at once, batch the updates: 1. Read all sources first 2. Identify all entities and concepts across all sources 3. Check existing pages for all of them (one search pass, not N) 4. Create/update pages in one pass (avoids redundant updates) 5. Update index.md once at the end 6. Write a single log entry covering the batch ### Archiving When content is fully superseded or the domain scope changes: 1. Create `_archive/` directory if it doesn't exist 2. Move the page to `_archive/` with its original path (e.g., `_archive/entities/old-page.md`) 3. Remove from `index.md` 4. Update any pages that linked to it — replace wikilink with plain text + "(archived)" 5. Log the archive action ### Obsidian Integration The wiki directory works as an Obsidian vault out of the box: - `[[wikilinks]]` render as clickable links - Graph View visualizes the knowledge network - YAML frontmatter powers Dataview queries - The `raw/assets/` folder holds images referenced via `![[image.png]]` For best results: - Set Obsidian's attachment folder to `raw/assets/` - Enable "Wikilinks" in Obsidian settings (usually on by default) - Install Dataview plugin for queries like `TABLE tags FROM "entities" WHERE contains(tags, "company")` If using the Obsidian skill alongside this one, set `OBSIDIAN_VAULT_PATH` to the same directory as the wiki path. ### Obsidian Headless (servers and headless machines) On machines without a display, use `obsidian-headless` instead of the desktop app. It syncs vaults via Obsidian Sync without a GUI — perfect for agents running on servers that write to the wiki while Obsidian desktop reads it on another device. **Setup:** ```bash # Requires Node.js 22+ npm install -g obsidian-headless # Login (requires Obsidian account with Sync subscription) ob login --email --password '' # Create a remote vault for the wiki ob sync-create-remote --name "LLM Wiki" # Connect the wiki directory to the vault cd ~/wiki ob sync-setup --vault "" # Initial sync ob sync # Continuous sync (foreground — use systemd for background) ob sync --continuous ``` **Continuous background sync via systemd:** ```ini # ~/.config/systemd/user/obsidian-wiki-sync.service [Unit] Description=Obsidian LLM Wiki Sync After=network-online.target Wants=network-online.target [Service] ExecStart=/path/to/ob sync --continuous WorkingDirectory=/home/user/wiki Restart=on-failure RestartSec=10 [Install] WantedBy=default.target ``` ```bash systemctl --user daemon-reload systemctl --user enable --now obsidian-wiki-sync # Enable linger so sync survives logout: sudo loginctl enable-linger $USER ``` This lets the agent write to `~/wiki` on a server while you browse the same vault in Obsidian on your laptop/phone — changes appear within seconds. ## Pitfalls - **Never modify files in `raw/`** — sources are immutable. Corrections go in wiki pages. - **Always orient first** — read SCHEMA + index + recent log before any operation in a new session. Skipping this causes duplicates and missed cross-references. - **Always update index.md and log.md** — skipping this makes the wiki degrade. These are the navigational backbone. - **Don't create pages for passing mentions** — follow the Page Thresholds in SCHEMA.md. A name appearing once in a footnote doesn't warrant an entity page. - **Don't create pages without cross-references** — isolated pages are invisible. Every page must link to at least 2 other pages. - **Frontmatter is required** — it enables search, filtering, and staleness detection. - **Tags must come from the taxonomy** — freeform tags decay into noise. Add new tags to SCHEMA.md first, then use them. - **Keep pages scannable** — a wiki page should be readable in 30 seconds. Split pages over 200 lines. Move detailed analysis to dedicated deep-dive pages. - **Ask before mass-updating** — if an ingest would touch 10+ existing pages, confirm the scope with the user first. - **Rotate the log** — when log.md exceeds 500 entries, rename it `log-YYYY.md` and start fresh. The agent should check log size during lint. - **Handle contradictions explicitly** — don't silently overwrite. Note both claims with dates, mark in frontmatter, flag for user review.