--- name: knowledge-base description: Ingest URLs, documents, and transcripts into a searchable knowledge base. Query past research and curated documentation using full-text search. Trigger words: ingest, knowledge base, look up, search knowledge, what do we know about, research, index this, add to knowledge base. license: MIT metadata: author: sagemindai version: "1.0" requires: instar homepage: https://instar.sh user_invocable: "true" --- # knowledge-base -- Searchable Knowledge Base for Instar Agents Build a searchable knowledge base from external sources -- URLs, documents, transcripts, PDFs. Uses the existing MemoryIndex (FTS5) for search, so no new dependencies. --- ## How It Works The knowledge base is a set of markdown files in `.instar/knowledge/` that MemoryIndex indexes alongside your other memory files. Each file has YAML frontmatter for metadata and is tracked in a catalog for browsing. ``` .instar/knowledge/ catalog.json # Registry of all ingested sources articles/ # Ingested web articles transcripts/ # Video/audio transcripts docs/ # Curated reference documentation ``` --- ## Ingesting Content ### Via CLI ```bash # Ingest text content directly instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents" # Ingest from a URL (fetch first, then ingest) # Step 1: Fetch the content python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md # Step 2: Ingest it instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2" ``` ### Via API ```bash curl -X POST http://localhost:4040/knowledge/ingest \ -H "Content-Type: application/json" \ -d '{ "content": "The article content...", "title": "Article Title", "url": "https://example.com/article", "type": "article", "tags": ["AI", "infrastructure"], "summary": "Brief description" }' ``` ### Via Agent Workflow When the agent wants to ingest content during a session: 1. Fetch the content (WebFetch, smart-fetch, transcript tools, or Read for local files) 2. Clean it (strip navigation, ads, boilerplate) 3. Call the ingest API or write the file manually: ```bash # Write the markdown file with frontmatter cat > .instar/knowledge/articles/2026-02-25-my-article.md << 'EOF' --- title: "My Article" source: "https://example.com/article" ingested: "2026-02-25" tags: ["AI", "infrastructure"] --- # My Article [Cleaned article content here] EOF # Sync the index to pick up the new file instar memory sync ``` --- ## Searching Knowledge ### CLI ```bash # Search within knowledge base only instar knowledge search "notification batching" # Search all memory (including knowledge) instar memory search "notification batching" ``` ### API ```bash # Knowledge-scoped search curl "http://localhost:4040/memory/search?q=notification+batching&source=knowledge/&limit=5" # Browse the catalog curl "http://localhost:4040/knowledge/catalog" curl "http://localhost:4040/knowledge/catalog?tag=AI" ``` --- ## Managing Sources ### List all sources ```bash instar knowledge list instar knowledge list --tag AI ``` ### Remove a source ```bash # Find the source ID from the list instar knowledge list # Remove it instar knowledge remove kb_20260225123456_abc123 # Re-sync the index instar memory sync ``` ### Via API ```bash # Remove curl -X DELETE "http://localhost:4040/knowledge/kb_20260225123456_abc123" ``` --- ## MemoryIndex Configuration To enable knowledge base indexing, add these sources to your `.instar/config.json` memory section: ```json { "memory": { "enabled": true, "sources": [ { "path": "AGENT.md", "type": "markdown", "evergreen": true }, { "path": "USER.md", "type": "markdown", "evergreen": true }, { "path": "knowledge/articles/", "type": "markdown", "evergreen": false }, { "path": "knowledge/transcripts/", "type": "markdown", "evergreen": false }, { "path": "knowledge/docs/", "type": "markdown", "evergreen": true } ] } } ``` **Source behavior:** - `articles/` and `transcripts/` use `evergreen: false` -- recent content ranks higher (30-day temporal decay) - `docs/` uses `evergreen: true` -- reference documentation doesn't decay --- ## Content Types | Type | Directory | Temporal Decay | Best For | |------|-----------|----------------|----------| | `article` | `articles/` | Yes (30-day) | Web articles, blog posts, news | | `transcript` | `transcripts/` | Yes (30-day) | YouTube videos, podcasts, meetings | | `doc` | `docs/` | No (evergreen) | API docs, manuals, reference material | --- ## Tips - **Always sync after ingesting**: `instar memory sync` updates the FTS5 index - **Use tags consistently**: Tags enable filtered browsing via `instar knowledge list --tag X` - **Include source URLs**: Helps trace back to original content - **Clean before ingesting**: Strip navigation, ads, cookie banners for better search results - **Use smart-fetch for URLs**: `python3 .claude/scripts/smart-fetch.py URL --auto` gets clean markdown