--- name: leann description: Local RAG indexing with 97% storage reduction via anchor-based lazy recomputation. Graph-based selective embedding storage for memory-efficient semantic code search. version: 1.0.0 triggers: - keywords: [leann, semantic search, code index, RAG, vector search, embedding index, anchor graph] - context: [indexing large codebases, memory-efficient search, local code search, production RAG] - file_patterns: [leann.config.json, leann.index, .leann/] tags: [rag, vector-search, graph, compression, local-first, semantic-search] --- # LEANN Skill **LEANN (Learned Embedding ANchor Navigation)** - A graph-based selective recomputation system achieving 97% storage reduction for local RAG indexes while maintaining fast retrieval performance. --- ## Core Principle **Use leann for persistent local code indexes with minimal storage overhead.** Instead of storing all embeddings at full precision (3 GB for 1M files), leann stores only strategic anchors and reconstructs others on-demand (128 MB for same dataset - **95.8% reduction**). --- ## When to Use LEANN ### Primary Use Cases **1. Large Codebase Indexing (10K+ files)** - Monorepos with multiple services - Enterprise codebases that exceed vector database free tiers - Projects requiring fast local semantic search without cloud dependencies **2. Memory-Constrained Environments** - Development machines with limited RAM - CI/CD pipelines needing index validation - Edge deployments or air-gapped systems **3. Cost-Sensitive RAG Applications** - Avoiding Pinecone/Weaviate monthly costs - Self-hosted vector search with minimal infrastructure - Batch processing of large document collections **4. Real-Time Code Navigation** - IDE integrations for semantic code search - Developer tools needing instant relevance feedback - Documentation search within editors ### When NOT to Use LEANN **Use traditional vector DBs instead when:** - Dataset is small (<1K items) - overhead not worth it - Cloud infrastructure is required (multi-tenant, global CDN) - Need advanced features (hybrid search, filtering, multi-tenancy) - Absolute lowest latency required (<5ms) - leann trades latency for storage --- ## Architecture Overview ### Two-Stage Query Process ``` Query → [Stage 1: Anchor Graph Search] → [Stage 2: Lazy Reconstruction] → Results Stage 1: Fast HNSW/DiskANN traversal (1-5ms) Stage 2: Reconstruct top candidates (5-25ms) Total: 10-30ms (HNSW) or 50-200ms (DiskANN) ``` ### Storage Model ``` Naive: N × D × 4 bytes (full embeddings) LEANN: M × D × 4 + N × (D / 8) bytes (anchors + compressed deltas) Example (100K files, 768 dims): Naive: 100,000 × 768 × 4 = 307 MB LEANN: 1,000 × 768 × 4 = 3 MB (anchors) 100,000 × 96 = 9.6 MB (deltas) Total ≈ 12.6 MB (95.9% reduction) ``` ### Key Components 1. **Anchor Graph:** HNSW (in-memory) or DiskANN (disk-based) graph of anchor embeddings 2. **Product Quantization:** Compress delta vectors (delta = embedding - anchor) 3. **ZMQ Embedding Server:** GPU-accelerated embedding generation 4. **Delta Index:** Incremental updates without full rebuild --- ## Quick Start ### 1. Installation ```bash # Via pip pip install leann # Via conda conda install -c conda-forge leann # From source git clone https://github.com/leann-index/leann.git cd leann && pip install -e . ``` ### 2. Start Embedding Server ```bash # Docker (recommended - GPU support) docker run -d \ --name leann-embeddings \ --gpus all \ -p 5555:5555 \ -e MODEL=sentence-transformers/all-MiniLM-L6-v2 \ leann/embedding-server:latest # Python script (alternative) python -m leann.server \ --model sentence-transformers/all-MiniLM-L6-v2 \ --port 5555 \ --device cuda ``` ### 3. Create Index ```bash # Basic indexing leann index create \ --input /path/to/codebase \ --output ./leann.index \ --config leann.config.json # With progress tracking leann index create \ --input /path/to/codebase \ --output ./leann.index \ --progress \ --verbose ``` ### 4. Query Index ```bash # CLI query leann query \ --index ./leann.index \ --query "JWT authentication middleware" \ --top-k 10 # Watch mode (live updates) leann serve \ --index ./leann.index \ --watch \ --port 8080 ``` --- ## Configuration ### Minimal Configuration ```json { "backend": "hnsw", "complexity": "medium", "anchorSelection": { "type": "kmeans", "clusters": 300 }, "quantization": { "subVectors": 96, "codebookSize": 256 }, "embeddingServer": { "endpoint": "tcp://localhost:5555", "model": "sentence-transformers/all-MiniLM-L6-v2", "batchSize": 128, "timeout": 30000 } } ``` ### Production Configuration ```json { "backend": "hnsw", "complexity": "high", "anchorSelection": { "type": "kmeans", "clusters": 1000, "samples": 20000 }, "quantization": { "subVectors": 96, "codebookSize": 256, "trainingSamples": 50000 }, "embeddingServer": { "endpoint": "tcp://localhost:5555", "model": "sentence-transformers/all-MiniLM-L6-v2", "batchSize": 256, "timeout": 60000 }, "incremental": { "enabled": true, "deltaIndexThreshold": 1000, "rebuildSchedule": "0 2 * * *" }, "hnsw": { "M": 32, "efConstruction": 400, "efSearch": 200 } } ``` ### Configuration Parameters #### Backend Selection **`backend: "hnsw"` (In-Memory)** - Use when index fits in RAM - Query latency: 10-30ms - Suitable for: <1M files, <50K anchors **`backend: "diskann"` (Disk-Based)** - Use when index exceeds RAM - Query latency: 50-200ms - Suitable for: 1M+ files, massive monorepos #### Complexity Levels | Level | M | efConstruction | efSearch | Build Time | Accuracy | |-------|---|----------------|----------|------------|----------| | low | 12 | 100 | 50 | Fast | Good | | medium | 16 | 200 | 100 | Moderate | Better | | high | 32 | 400 | 200 | Slow | Best | #### Anchor Selection Strategies **1. Random (Fastest)** ```json { "type": "random", "count": 300 } ``` - Speed: O(N) - Quality: Acceptable for homogeneous codebases - Use when: Prototyping or single-language projects **2. K-Means (Recommended)** ```json { "type": "kmeans", "clusters": 300, "samples": 10000 } ``` - Speed: O(N × k × iterations) - Quality: Excellent coverage - Use when: Production deployments (default) **3. Max-Coverage (Best Quality)** ```json { "type": "max-coverage", "count": 300, "diversityThreshold": 0.2 } ``` - Speed: O(N × M²) - Quality: Best for sparse distributions - Use when: High-value outliers must be findable --- ## Common Workflows ### Workflow 1: Initial Indexing ```bash # 1. Discover codebase find /path/to/codebase -type f \( -name "*.py" -o -name "*.ts" \) | wc -l # Output: 15,000 files # 2. Calculate recommended anchors # Rule: M = √N → √15000 ≈ 122 → recommend 300 for better coverage # 3. Create config cat > leann.config.json <50ms **Diagnosis:** ```bash leann index stats ./leann.index # Check: # - Avg candidates evaluated (should be <1000) # - efSearch parameter (default: 100) ``` **Solutions:** 1. **Reduce efSearch** (faster, lower recall) ```json { "hnsw": { "efSearch": 50 } } ``` 2. **Increase anchor count** (more memory, better coverage) ```bash leann index rebuild \ --index ./leann.index \ --new-anchor-count 1000 \ --output ./leann-improved.index ``` 3. **Use early termination** ```bash leann query \ --index ./leann.index \ --query "authentication" \ --top-k 10 \ --early-terminate \ --confidence-threshold 0.9 ``` ### Storage Optimization **Problem:** Index larger than expected **Diagnosis:** ```bash leann index stats ./leann.index # Check: # - Anchor count (should be ~√N) # - PQ configuration (96 subvectors × 256 codebook is standard) ``` **Solutions:** 1. **Reduce anchor count** (lower quality, less storage) ```json { "anchorSelection": { "clusters": 100 // Reduced from 300 } } ``` 2. **Increase PQ compression** (lower quality, less storage) ```json { "quantization": { "subVectors": 128, // More aggressive compression "codebookSize": 128 } } ``` ### Accuracy Improvement **Problem:** Poor retrieval relevance **Diagnosis:** ```bash leann index validate ./leann.index --verbose # Check: # - Coverage score (target: >0.80) # - Quantization error (target: <0.05) ``` **Solutions:** 1. **Increase anchor count** ```bash leann index rebuild \ --index ./leann.index \ --new-anchor-count 1000 \ --output ./leann-improved.index ``` 2. **Use k-means selection** (better than random) ```json { "anchorSelection": { "type": "kmeans", // Changed from "random" "clusters": 500 } } ``` 3. **Enable re-ranking with cross-encoder** ```bash leann query \ --index ./leann.index \ --query "authentication middleware" \ --mode rerank \ --top-k 10 ``` --- ## Monitoring & Health ### Key Metrics **1. Coverage Score** (target: >0.80) ```bash leann index validate ./leann.index --metric coverage ``` Measures how uniformly anchors cover embedding space. **2. Quantization Error** (target: <0.05) ```bash leann index validate ./leann.index --metric quantization ``` Measures average reconstruction error from PQ compression. **3. Utilization Balance** (target: >0.70) ```bash leann index validate ./leann.index --metric balance ``` Checks if anchors are evenly utilized (no overloaded anchors). ### Rebuild Triggers Automatic rebuild when: - Coverage score drops below 0.70 - Delta index exceeds threshold (default: 10% of main index) - Query latency degrades >30% from baseline - Manual trigger: `leann index rebuild` ### Health Check Script ```bash #!/bin/bash # scripts/index-validator.sh INDEX="./leann.index" echo "LEANN Index Health Check" echo "========================" # Check coverage COVERAGE=$(leann index validate $INDEX --metric coverage --json | jq -r '.score') echo "Coverage: $COVERAGE (target: >0.80)" # Check quantization error QUANT_ERROR=$(leann index validate $INDEX --metric quantization --json | jq -r '.error') echo "Quantization Error: $QUANT_ERROR (target: <0.05)" # Check balance BALANCE=$(leann index validate $INDEX --metric balance --json | jq -r '.score') echo "Balance: $BALANCE (target: >0.70)" # Check delta index size DELTA_SIZE=$(leann index stats $INDEX --json | jq -r '.deltaIndexSize') MAIN_SIZE=$(leann index stats $INDEX --json | jq -r '.mainIndexSize') DELTA_RATIO=$(echo "scale=2; $DELTA_SIZE / $MAIN_SIZE * 100" | bc) echo "Delta Index: $DELTA_RATIO% of main (rebuild at 10%)" # Recommend actions if (( $(echo "$COVERAGE < 0.70" | bc -l) )); then echo "⚠️ WARNING: Coverage too low - consider rebuilding with more anchors" fi if (( $(echo "$QUANT_ERROR > 0.05" | bc -l) )); then echo "⚠️ WARNING: High quantization error - consider reducing PQ compression" fi if (( $(echo "$DELTA_RATIO > 10" | bc -l) )); then echo "⚠️ WARNING: Delta index large - recommend full rebuild" fi ``` --- ## Integration Patterns ### Pattern 1: Python API ```python from leann import LEANNIndex, QueryRequest # Load index index = LEANNIndex.load('./leann.index') # Basic query results = index.query( query="database connection pooling", top_k=10, mode='retrieve' ) for item in results.items: print(f"{item.item.path} (score: {item.score:.2f})") print(f" {item.preview}\n") # Query with filters results = index.query( query="authentication middleware", top_k=5, filters={ 'languages': ['typescript'], 'paths': ['src/middleware/*.ts'] } ) # Batch queries queries = [ "JWT authentication", "database migration", "error handling" ] results_batch = index.batch_query(queries, top_k=5) ``` ### Pattern 2: REST API ```python from fastapi import FastAPI from leann import LEANNIndex app = FastAPI() index = LEANNIndex.load('./leann.index') @app.post('/search') async def search(query: str, top_k: int = 10): results = index.query(query=query, top_k=top_k) return { 'results': [ { 'path': item.item.path, 'score': item.score, 'preview': item.preview } for item in results.items ], 'latency': results.latency } @app.get('/health') async def health(): stats = index.get_stats() return { 'status': 'healthy' if stats.healthScore > 70 else 'degraded', 'healthScore': stats.healthScore, 'indexSize': stats.storageSize, 'avgLatency': stats.avgLatency } ``` ### Pattern 3: CLI Tool ```bash #!/bin/bash # Wrapper script for common queries function search() { leann query \ --index ~/.cache/leann/project.index \ --query "$1" \ --top-k "${2:-10}" \ --format json | jq -r '.items[] | "\(.item.path):\(.item.startLine)"' } function update-index() { leann index update \ --index ~/.cache/leann/project.index \ --watch \ --auto-rebuild } # Usage: # search "database connection" 5 # update-index ``` --- ## Troubleshooting ### Issue: Out of Memory During Indexing **Error:** ``` Cannot allocate memory for k-means clustering ``` **Solution:** Reduce sampling in anchor selection: ```json { "anchorSelection": { "type": "kmeans", "clusters": 300, "samples": 5000 // Reduced from 10000 } } ``` ### Issue: Embedding Server Connection Failed **Error:** ``` Connection refused: tcp://localhost:5555 ``` **Solution:** ```bash # Check if server is running docker ps | grep leann-embeddings # If not, start it docker run -d -p 5555:5555 leann/embedding-server:latest # Test connection curl http://localhost:5555/health ``` ### Issue: Low Query Accuracy **Error:** ``` Results not relevant to query ``` **Diagnosis:** ```bash leann index validate ./leann.index --verbose # Check coverage score and quantization error ``` **Solution:** ```bash # Rebuild with more anchors and k-means selection leann index rebuild \ --index ./leann.index \ --anchor-strategy kmeans \ --anchor-count 1000 \ --output ./leann-improved.index ``` --- ## Resources **Codebase Documentation:** - Type definitions: `@architect/leann-codebase/types/core.ts` - Anchor selection principles: `@architect/leann-codebase/principles/anchor-selection.md` - Graph navigation: `@architect/leann-codebase/principles/graph-navigation.md` - Indexing workflows: `@architect/leann-codebase/templates/indexing-workflow.md` **References:** - Quick commands: `@architect/leann/assets/cheatsheet.md` - Indexing patterns: `@architect/leann/references/indexing-patterns.md` - Configuration guide: `@architect/leann/references/configuration.md` **External:** - GitHub: https://github.com/leann-index/leann - Documentation: https://leann-index.github.io - Paper: "LEANN: Learned Embedding Anchor Navigation" --- ## Summary **Core Value Proposition:** - 97% storage reduction vs traditional vector databases - 10-30ms query latency (local, no network overhead) - Zero cloud costs (self-hosted) - Incremental updates (no nightly rebuilds) **Best For:** - Large codebases (10K+ files) - Memory-constrained environments - Cost-sensitive RAG applications - Local-first semantic search **Decision Tree:** ``` Need semantic code search? ├─ Dataset < 1K items → Use simple vector DB (overhead not worth it) └─ Dataset > 1K items ├─ Cloud infrastructure required → Use Pinecone/Weaviate └─ Local-first / cost-sensitive → Use LEANN ├─ Index < RAM → HNSW backend (10-30ms) └─ Index > RAM → DiskANN backend (50-200ms) ``` **Golden Rules:** 1. Start with medium complexity, increase only if needed 2. Use k-means anchor selection for production 3. Monitor coverage score - rebuild if <0.70 4. Enable incremental updates for CI/CD 5. Use re-ranking for user-facing search