# RAGStack MCP Server MCP (Model Context Protocol) server for RAGStack knowledge bases. Enables AI assistants to search, chat, upload documents/media, and scrape your knowledge base. ## Installation ```bash # Using uvx (recommended - no install needed) uvx ragstack-mcp # Or install globally pip install ragstack-mcp ``` ## Configuration Get your GraphQL endpoint and API key from the RAGStack dashboard: **Settings → API Key** ### Claude Desktop Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (Mac) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows): ```json { "mcpServers": { "ragstack-kb": { "command": "uvx", "args": ["ragstack-mcp"], "env": { "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql", "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx" } } } } ``` ### Amazon Q CLI Edit `~/.aws/amazonq/mcp.json`: ```json { "mcpServers": { "ragstack-kb": { "command": "uvx", "args": ["ragstack-mcp"], "env": { "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql", "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx" } } } } ``` ### Cursor Open **Settings → MCP Servers → Add Server**, or edit `.cursor/mcp.json`: ```json { "ragstack-kb": { "command": "uvx", "args": ["ragstack-mcp"], "env": { "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql", "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx" } } } ``` ### VS Code + Cline Edit `.vscode/cline_mcp_settings.json`: ```json { "mcpServers": { "ragstack-kb": { "command": "uvx", "args": ["ragstack-mcp"], "env": { "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql", "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx" } } } } ``` ### VS Code + Continue Edit `~/.continue/config.json`, add to `mcpServers` array: ```json { "mcpServers": [ { "name": "ragstack-kb", "command": "uvx", "args": ["ragstack-mcp"], "env": { "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql", "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx" } } ] } ``` ## Available Tools ### search_knowledge_base Search for relevant documents in the knowledge base. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `query` | string | Yes | - | The search query | | `max_results` | int | No | 5 | Maximum results to return | ### chat_with_knowledge_base Ask questions and get AI-generated answers with source citations. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `query` | string | Yes | - | Your question | | `conversation_id` | string | No | null | ID to maintain conversation context | ### start_scrape_job Scrape a website into the knowledge base. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `url` | string | Yes | - | Starting URL to scrape | | `max_pages` | int | No | 50 | Maximum pages to scrape | | `max_depth` | int | No | 3 | How deep to follow links (0 = start page only) | | `scope` | string | No | "HOSTNAME" | `SUBPAGES`, `HOSTNAME`, or `DOMAIN` | | `include_patterns` | list[str] | No | null | Only scrape URLs matching these glob patterns | | `exclude_patterns` | list[str] | No | null | Skip URLs matching these glob patterns | | `scrape_mode` | string | No | "AUTO" | `AUTO`, `FAST` (HTTP only), or `FULL` (browser) | | `cookies` | string | No | null | Cookie string for authenticated sites | | `force_rescrape` | bool | No | false | Re-scrape even if content unchanged | **Scope values:** - `SUBPAGES` - Only URLs under the starting path - `HOSTNAME` - All pages on the same subdomain - `DOMAIN` - All subdomains of the domain **Scrape mode values:** - `AUTO` - Try fast mode, fall back to full for SPAs - `FAST` - HTTP only, faster but may miss JavaScript content - `FULL` - Uses headless browser, handles all JavaScript ### get_scrape_job_status Check the status of a scrape job. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `job_id` | string | Yes | The scrape job ID | ### list_scrape_jobs List recent scrape jobs. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `limit` | int | No | 10 | Maximum jobs to return | ### upload_document_url Get a presigned URL to upload a document or media file. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `filename` | string | Yes | Name of the file (e.g., 'report.pdf', 'meeting.mp4') | **Supported formats:** - Documents: PDF, DOCX, XLSX, HTML, TXT, CSV, JSON, XML, EML, EPUB, Markdown - Images: JPG, PNG, GIF, WebP, AVIF, BMP, TIFF - Video: MP4, WebM - Audio: MP3, WAV, M4A, OGG, FLAC Video/audio files are transcribed using AWS Transcribe and segmented for search. ### upload_image_url Get a presigned URL to upload an image (step 1 of image upload workflow). | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `filename` | string | Yes | Name of the image file (e.g., 'photo.jpg') | Supported formats: JPEG, PNG, GIF, WebP, AVIF, BMP, TIFF ### generate_image_caption Generate an AI caption for an uploaded image using a vision model (step 2, optional). | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `s3_uri` | string | Yes | S3 URI returned by upload_image_url | ### submit_image Finalize an image upload and trigger indexing (step 3). | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `image_id` | string | Yes | - | Image ID from upload_image_url | | `caption` | string | No | null | Primary caption | | `user_caption` | string | No | null | User-provided caption | | `ai_caption` | string | No | null | AI-generated caption | --- ## Configuration Tools (Read-Only) ### get_configuration Get all current RAGStack configuration settings organized by category. Returns settings for: - **Chat:** Models, quotas, system prompt, document access - **Metadata Extraction:** Enabled, model, mode (auto/manual), max keys - **Query-Time Filtering:** Filter generation, multi-slice retrieval settings - **Public Access:** Which endpoints allow unauthenticated access - **Document Processing:** OCR backend, image caption prompt - **Media Processing:** Transcribe language, speaker diarization, segment duration - **Budget:** Alert thresholds **Note:** Read-only. To modify settings, use the admin dashboard (Cognito auth required). --- ## Metadata Analysis Tools These tools help understand and optimize metadata extraction and filtering. ### get_metadata_stats Get statistics about metadata keys extracted from documents. Returns key names, data types, occurrence counts, sample values, and status. ### get_filter_examples Get AI-generated filter examples for metadata-based search queries. Returns filter patterns with name, description, use case, and JSON filter syntax. **Filter syntax reference:** - Basic operators: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`, `$exists` - Logical operators: `$and`, `$or` - Example: `{"topic": {"$eq": "genealogy"}}` ### get_key_library Get the complete metadata key library with all discovered keys. Returns all keys available for filtering with data types and sample values. ### check_key_similarity Check if a proposed metadata key is similar to existing keys. | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `key_name` | string | Yes | - | Proposed key name to check | | `threshold` | float | No | 0.8 | Similarity threshold (0.0-1.0) | Use this before adding documents with new keys to avoid duplicates. ### analyze_metadata Trigger metadata analysis to discover keys and generate filter examples. **Note:** This is a long-running operation (1-2 minutes). It samples up to 1000 vectors and uses LLM analysis. Run this after ingesting new documents or when filter generation isn't working as expected. --- ## Usage Examples Once configured, just ask your AI assistant naturally: **Search & Chat:** - "Search my knowledge base for authentication best practices" - "What does our documentation say about API rate limits?" - "What was discussed in the team meeting about deadlines?" (searches video/audio transcripts) **Web Scraping:** - "Scrape the React docs at react.dev/reference" - "Check the status of my scrape job" **Document, Image & Media Upload:** - "Upload a new document called quarterly-report.pdf" - "Upload this image and generate a caption for it" - "Upload the meeting recording meeting-2024-01.mp4" **Metadata Analysis:** - "What metadata keys are available for filtering?" - "Analyze the metadata in my knowledge base" - "Show me the filter examples" - "Check if 'author' is similar to any existing keys" **Configuration:** - "What are my current RAGStack settings?" - "What model is being used for chat?" - "Is multi-slice retrieval enabled?" - "What are my quota limits?" - "What language is configured for transcription?" ## Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `RAGSTACK_GRAPHQL_ENDPOINT` | Yes | Your RAGStack GraphQL API URL | | `RAGSTACK_API_KEY` | Yes | Your RAGStack API key | ## Development ```bash # Clone and install cd src/ragstack-mcp uv sync # Run locally uv run ragstack-mcp # Build package uv build # Publish to PyPI uv publish ``` ## License MIT