--- name: lancer description: Use lancer CLI for LanceDB semantic and multi-modal search with document ingestion, vector embeddings, and MCP server integration for knowledge retrieval. --- # Lancer - LanceDB CLI and MCP Server Skill You are a specialist in using `lancer`, a CLI and MCP server for LanceDB that provides semantic and full-text search with multi-modal support (text and images). This skill provides comprehensive workflows, best practices, and common patterns for document ingestion, search, and table management. ## What is Lancer? `lancer` is a powerful tool for: - **Semantic search**: Find documents by meaning, not just keywords - **Multi-modal support**: Index and search both text and images - **LanceDB integration**: Efficient vector database storage and retrieval - **Flexible ingestion**: Support for multiple file formats (txt, md, pdf, sql, images) - **MCP server mode**: Integration with Claude Desktop and other MCP clients ## Core Capabilities 1. **Ingest**: Add documents to LanceDB with automatic chunking and embedding 2. **Search**: Semantic similarity search across documents 3. **Tables**: Manage LanceDB tables (list, info, delete) 4. **Remove**: Remove documents from tables 5. **MCP**: Run as Model Context Protocol server ## Quick Start ### Basic Search ```bash # Search all tables lancer search "how to deploy kubernetes" # Search specific table with more results lancer search -t docs -l 20 "authentication methods" # Search with similarity threshold lancer search --threshold 0.7 "error handling patterns" ``` ### Basic Ingestion ```bash # Ingest a single file lancer ingest document.md # Ingest a directory lancer ingest ./docs/ # Ingest multiple paths lancer ingest file1.md file2.pdf ./images/ ``` ## Document Ingestion ### Ingest Command Options ```bash # Ingest to specific table lancer ingest -t my_docs document.md # Ingest with file extension filter lancer ingest -e md,txt,pdf ./docs/ # Ingest from stdin (pipe file paths) find ./docs -name "*.md" | lancer ingest --stdin # Ingest from file list lancer ingest --files-from paths.txt # Custom chunk size and overlap lancer ingest --chunk-size 2000 --chunk-overlap 400 document.md ``` ### Supported File Types **Text formats:** - `txt` - Plain text files - `md` - Markdown documents - `pdf` - PDF documents - `sql` - SQL scripts **Image formats:** - `jpg`, `jpeg` - JPEG images - `png` - PNG images - `gif` - GIF images - `bmp` - Bitmap images - `webp` - WebP images - `tiff`, `tif` - TIFF images - `svg` - SVG vector graphics - `ico` - Icon files ### Embedding Models **Text models:** ```bash # Default: all-MiniLM-L6-v2 (fast, good quality) lancer ingest document.md # Larger model for better quality lancer ingest --text-model all-MiniLM-L12-v2 document.md # BGE models (better semantic understanding) lancer ingest --text-model bge-small-en-v1.5 document.md lancer ingest --text-model bge-base-en-v1.5 document.md ``` **Image models:** ```bash # Default: clip-vit-b-32 (cross-modal text/image) lancer ingest image.jpg # ResNet50 for image-only search lancer ingest --image-model resnet50 image.jpg ``` **Advanced: Force specific model:** ```bash # Force CLIP for text (enables future image additions) lancer ingest --embedding-model clip-vit-b-32 document.md # Force BGE for performance (text-only) lancer ingest --embedding-model BAAI/bge-small-en-v1.5 document.md ``` ### Ingestion Optimization ```bash # Filter by file size lancer ingest --min-file-size 1000 --max-file-size 10000000 ./docs/ # Skip embedding generation (metadata only) lancer ingest --no-embeddings document.md # Custom batch size for database writes lancer ingest --batch-size 200 ./large-dataset/ # JSON output for scripting lancer ingest --format json document.md ``` ## Search Operations ### Search Command Options ```bash # Basic search lancer search "kubernetes deployment" # Search specific table lancer search -t docs "authentication" # Limit results lancer search -l 5 "error handling" # Set similarity threshold (0.0-1.0) lancer search --threshold 0.6 "database migration" # Include embeddings in results lancer search --include-embeddings "API design" # JSON output lancer search --format json "machine learning" ``` ### Metadata Filters ```bash # Single filter (field:operator:value) lancer search --filter "author:eq:John" "AI research" # Multiple filters lancer search \ --filter "author:eq:John" \ --filter "year:gt:2020" \ "deep learning" # Available operators: # eq (equals), ne (not equals) # gt (greater than), lt (less than) # gte (greater/equal), lte (less/equal) # in (in list), contains (string contains) ``` ### Search Examples ```bash # Find recent documentation lancer search \ -t docs \ --filter "date:gte:2024-01-01" \ -l 10 \ "API endpoints" # Search by category lancer search \ --filter "category:eq:tutorial" \ "getting started" # Multi-criteria search lancer search \ -t technical_docs \ --filter "language:eq:python" \ --filter "level:eq:advanced" \ --threshold 0.7 \ -l 15 \ "async programming patterns" ``` ## Table Management ### List Tables ```bash # List all tables lancer tables list # JSON output lancer tables list --format json ``` ### Table Information ```bash # Get table details lancer tables info my_table # JSON output for scripting lancer tables info my_table --format json ``` ### Delete Table ```bash # Delete a table (be careful!) lancer tables delete old_table ``` ## Remove Documents ```bash # Remove specific documents from a table lancer remove -t docs document_id # Remove multiple documents lancer remove -t docs id1 id2 id3 ``` ## Configuration ### Using Config File ```bash # Specify config file lancer -c ~/.lancer/config.toml search "query" # Set default table in config lancer -c config.toml ingest document.md ``` ### Environment Variables ```bash # Set default table export LANCER_TABLE=my_docs lancer search "query" # Searches my_docs # Set log level export LANCER_LOG_LEVEL=debug lancer ingest document.md ``` ### Log Levels ```bash # Error only lancer --log-level error search "query" # Warning lancer --log-level warn ingest document.md # Info (default) lancer --log-level info search "query" # Debug lancer --log-level debug ingest document.md # Trace (verbose) lancer --log-level trace search "query" ``` ## Common Workflows ### Workflow 1: Index Documentation ```bash # 1. Ingest markdown docs lancer ingest -t docs -e md ./documentation/ # 2. Verify ingestion lancer tables info docs # 3. Test search lancer search -t docs "installation guide" # 4. Refine search with threshold lancer search -t docs --threshold 0.7 -l 5 "configuration" ``` ### Workflow 2: Multi-modal Image Search ```bash # 1. Ingest images with CLIP model lancer ingest -t images -e jpg,png,webp \ --image-model clip-vit-b-32 \ ./photos/ # 2. Search images with text query lancer search -t images "sunset over mountains" # 3. Search with higher threshold for precision lancer search -t images --threshold 0.8 "red car" ``` ### Workflow 3: Mixed Content Corpus ```bash # 1. Ingest with CLIP for cross-modal search lancer ingest -t knowledge_base \ --embedding-model clip-vit-b-32 \ -e md,pdf,jpg,png \ ./content/ # 2. Search text and images together lancer search -t knowledge_base "architecture diagrams" # 3. Filter by file type lancer search -t knowledge_base \ --filter "file_type:eq:png" \ "system design" ``` ### Workflow 4: Batch Ingestion ```bash # 1. Generate file list find ./corpus -type f -name "*.md" > files.txt # 2. Ingest from list with custom settings lancer ingest -t corpus \ --files-from files.txt \ --chunk-size 1500 \ --chunk-overlap 300 \ --batch-size 150 # 3. Verify ingestion lancer tables info corpus # 4. Test search quality lancer search -t corpus -l 10 "sample query" ``` ### Workflow 5: Update Existing Corpus ```bash # 1. Ingest new documents lancer ingest -t docs ./new_docs/ # 2. Search to verify new content lancer search -t docs "recent feature" # 3. Remove outdated documents lancer remove -t docs old_doc_id # 4. Verify final state lancer tables info docs ``` ## Best Practices ### 1. Choose the Right Embedding Model **For text-only corpora:** ```bash # Fast and efficient lancer ingest --text-model all-MiniLM-L6-v2 document.md # Better quality lancer ingest --text-model bge-base-en-v1.5 document.md ``` **For images or mixed content:** ```bash # Cross-modal search (text queries → image results) lancer ingest --embedding-model clip-vit-b-32 content/ ``` ### 2. Optimize Chunk Settings **Short documents (< 500 words):** ```bash lancer ingest --chunk-size 500 --chunk-overlap 100 article.md ``` **Long documents (> 2000 words):** ```bash lancer ingest --chunk-size 2000 --chunk-overlap 400 book.pdf ``` **Code documentation:** ```bash lancer ingest --chunk-size 1000 --chunk-overlap 200 docs/ ``` ### 3. Use Tables to Organize Content ```bash # Separate tables by content type lancer ingest -t api_docs ./api/*.md lancer ingest -t tutorials ./tutorials/*.md lancer ingest -t images ./screenshots/*.png # Search specific context lancer search -t api_docs "authentication endpoints" ``` ### 4. Set Appropriate Thresholds **Broad exploration:** ```bash lancer search --threshold 0.4 "general topic" ``` **Precise matching:** ```bash lancer search --threshold 0.75 "specific concept" ``` **Very high precision:** ```bash lancer search --threshold 0.85 -l 3 "exact information" ``` ### 5. Use Filters for Structured Data ```bash # Combine semantic search with metadata lancer search \ --filter "status:eq:published" \ --filter "category:eq:tutorial" \ --threshold 0.6 \ "getting started guide" ``` ### 6. Format Output for Scripting ```bash # JSON output for automation lancer search --format json "query" | jq '.results[] | .path' # List tables programmatically lancer tables list --format json | jq '.[] | .name' ``` ## MCP Server Mode ### Running as MCP Server ```bash # Start MCP server for Claude Desktop integration lancer mcp # With custom config lancer mcp -c ~/.lancer/config.toml # With specific log level lancer mcp --log-level info ``` ### Integration with Claude Desktop Add to Claude Desktop config (`~/Library/Application Support/Claude/claude_desktop_config.json`): ```json { "mcpServers": { "lancer": { "command": "lancer", "args": ["mcp"] } } } ``` ## Performance Tips ### 1. Batch Operations ```bash # Ingest multiple files at once lancer ingest file1.md file2.md file3.md # Use --stdin for large batches find ./docs -name "*.md" | lancer ingest --stdin ``` ### 2. Optimize Batch Size ```bash # Larger batches for bulk ingestion lancer ingest --batch-size 500 ./large-corpus/ # Smaller batches for limited memory lancer ingest --batch-size 50 ./documents/ ``` ### 3. Skip Embeddings for Metadata-Only ```bash # Index metadata without generating embeddings lancer ingest --no-embeddings ./archive/ ``` ### 4. Use Appropriate Models ```bash # Faster ingestion with smaller model lancer ingest --text-model all-MiniLM-L6-v2 ./docs/ # Better quality with larger model (slower) lancer ingest --text-model bge-base-en-v1.5 ./docs/ ``` ## Troubleshooting ### Issue: Search returns no results **Solutions:** ```bash # Lower the similarity threshold lancer search --threshold 0.3 "query" # Check table exists and has documents lancer tables list lancer tables info my_table # Try different search terms lancer search "alternative phrasing" ``` ### Issue: Ingestion fails for some files **Solutions:** ```bash # Check supported extensions lancer ingest -e md,txt,pdf ./docs/ # Set file size limits lancer ingest --max-file-size 100000000 ./docs/ # Use debug logging lancer --log-level debug ingest document.pdf ``` ### Issue: Low search quality **Solutions:** ```bash # Use better embedding model lancer ingest --text-model bge-base-en-v1.5 document.md # Adjust chunk size lancer ingest --chunk-size 1500 --chunk-overlap 300 document.md # Adjust search threshold lancer search --threshold 0.6 "query" ``` ### Issue: Slow ingestion **Solutions:** ```bash # Increase batch size lancer ingest --batch-size 300 ./docs/ # Use faster embedding model lancer ingest --text-model all-MiniLM-L6-v2 ./docs/ # Skip embeddings if not needed lancer ingest --no-embeddings ./docs/ ``` ## Quick Reference ```bash # Ingestion lancer ingest document.md # Ingest single file lancer ingest -t docs ./directory/ # Ingest to specific table lancer ingest -e md,pdf ./docs/ # Filter by extensions lancer ingest --chunk-size 2000 document.md # Custom chunk size # Search lancer search "query" # Search all tables lancer search -t docs "query" # Search specific table lancer search -l 20 "query" # Limit results lancer search --threshold 0.7 "query" # Set similarity threshold lancer search --filter "author:eq:John" "query" # Metadata filter # Table management lancer tables list # List all tables lancer tables info my_table # Table information lancer tables delete old_table # Delete table # Configuration lancer -c config.toml search "query" # Use config file lancer --log-level debug ingest doc.md # Set log level export LANCER_TABLE=docs # Set default table # MCP server lancer mcp # Start MCP server ``` ## Common Patterns ### Pattern 1: Quick Documentation Search ```bash lancer search -t docs --threshold 0.7 -l 5 "how to configure authentication" ``` ### Pattern 2: Ingest and Test ```bash lancer ingest -t test_docs document.md && \ lancer search -t test_docs "key concept from document" ``` ### Pattern 3: Find Similar Images ```bash lancer search -t images --threshold 0.8 "sunset landscape photography" ``` ### Pattern 4: Batch Ingest with Verification ```bash find ./docs -name "*.md" | lancer ingest -t docs --stdin && \ lancer tables info docs ``` ### Pattern 5: Precise Technical Search ```bash lancer search -t technical_docs \ --filter "language:eq:rust" \ --threshold 0.75 \ -l 10 \ "async trait implementation patterns" ``` ## Summary **Primary use cases:** - Semantic search across documentation - Multi-modal search (text and images) - Knowledge base indexing and retrieval - Integration with Claude via MCP **Key advantages:** - Semantic similarity (not just keyword matching) - Multi-modal support (text and images) - Flexible metadata filtering - Multiple embedding model options - Fast vector search with LanceDB **Most common commands:** - `lancer ingest document.md` - Index documents - `lancer search "query"` - Search semantically - `lancer tables list` - Manage tables - `lancer search -t docs --threshold 0.7 "query"` - Precise search