--- name: grepai-embeddings-ollama description: Configure Ollama as embedding provider for GrepAI. Use this skill for local, private embedding generation. --- # GrepAI Embeddings with Ollama This skill covers using Ollama as the embedding provider for GrepAI, enabling 100% private, local code search. ## When to Use This Skill - Setting up private, local embeddings - Choosing the right Ollama model - Optimizing Ollama performance - Troubleshooting Ollama connection issues ## Why Ollama? | Advantage | Description | |-----------|-------------| | 🔒 **Privacy** | Code never leaves your machine | | 💰 **Free** | No API costs or usage limits | | ⚡ **Speed** | No network latency | | 🔌 **Offline** | Works without internet | | 🔧 **Control** | Choose your model | ## Prerequisites 1. Ollama installed and running 2. An embedding model downloaded ```bash # Install Ollama brew install ollama # macOS # or curl -fsSL https://ollama.com/install.sh | sh # Linux # Start Ollama ollama serve # Download model ollama pull nomic-embed-text ``` ## Configuration ### Basic Configuration ```yaml # .grepai/config.yaml embedder: provider: ollama model: nomic-embed-text endpoint: http://localhost:11434 ``` ### With Custom Endpoint ```yaml embedder: provider: ollama model: nomic-embed-text endpoint: http://192.168.1.100:11434 # Remote Ollama server ``` ### With Explicit Dimensions ```yaml embedder: provider: ollama model: nomic-embed-text endpoint: http://localhost:11434 dimensions: 768 # Usually auto-detected ``` ## Available Models ### Recommended: nomic-embed-text ```bash ollama pull nomic-embed-text ``` | Property | Value | |----------|-------| | Dimensions | 768 | | Size | ~274 MB | | Speed | Fast | | Quality | Excellent for code | | Language | English-optimized | **Configuration:** ```yaml embedder: provider: ollama model: nomic-embed-text ``` ### Multilingual: nomic-embed-text-v2-moe ```bash ollama pull nomic-embed-text-v2-moe ``` | Property | Value | |----------|-------| | Dimensions | 768 | | Size | ~500 MB | | Speed | Medium | | Quality | Excellent | | Language | Multilingual | Best for codebases with non-English comments/documentation. **Configuration:** ```yaml embedder: provider: ollama model: nomic-embed-text-v2-moe ``` ### High Quality: bge-m3 ```bash ollama pull bge-m3 ``` | Property | Value | |----------|-------| | Dimensions | 1024 | | Size | ~1.2 GB | | Speed | Slower | | Quality | Very high | | Language | Multilingual | Best for large, complex codebases where accuracy is critical. **Configuration:** ```yaml embedder: provider: ollama model: bge-m3 dimensions: 1024 ``` ### Maximum Quality: mxbai-embed-large ```bash ollama pull mxbai-embed-large ``` | Property | Value | |----------|-------| | Dimensions | 1024 | | Size | ~670 MB | | Speed | Medium | | Quality | Highest | | Language | English | **Configuration:** ```yaml embedder: provider: ollama model: mxbai-embed-large dimensions: 1024 ``` ## Model Comparison | Model | Dims | Size | Speed | Quality | Use Case | |-------|------|------|-------|---------|----------| | `nomic-embed-text` | 768 | 274MB | ⚡⚡⚡ | ⭐⭐⭐ | General use | | `nomic-embed-text-v2-moe` | 768 | 500MB | ⚡⚡ | ⭐⭐⭐⭐ | Multilingual | | `bge-m3` | 1024 | 1.2GB | ⚡ | ⭐⭐⭐⭐⭐ | Large codebases | | `mxbai-embed-large` | 1024 | 670MB | ⚡⚡ | ⭐⭐⭐⭐⭐ | Maximum accuracy | ## Performance Optimization ### Memory Management Models load into RAM. Ensure sufficient memory: | Model | RAM Required | |-------|--------------| | `nomic-embed-text` | ~500 MB | | `nomic-embed-text-v2-moe` | ~800 MB | | `bge-m3` | ~1.5 GB | | `mxbai-embed-large` | ~1 GB | ### GPU Acceleration Ollama automatically uses: - **macOS:** Metal (Apple Silicon) - **Linux/Windows:** CUDA (NVIDIA GPUs) Check GPU usage: ```bash ollama ps ``` ### Keeping Model Loaded By default, Ollama unloads models after 5 minutes of inactivity. Keep loaded: ```bash # Keep model loaded indefinitely curl http://localhost:11434/api/generate -d '{ "model": "nomic-embed-text", "keep_alive": -1 }' ``` ## Verifying Connection ### Check Ollama is Running ```bash curl http://localhost:11434/api/tags ``` ### List Available Models ```bash ollama list ``` ### Test Embedding ```bash curl http://localhost:11434/api/embeddings -d '{ "model": "nomic-embed-text", "prompt": "function authenticate(user, password)" }' ``` ## Running Ollama as a Service ### macOS (launchd) Ollama app runs automatically on login. ### Linux (systemd) ```bash # Enable service sudo systemctl enable ollama # Start service sudo systemctl start ollama # Check status sudo systemctl status ollama ``` ### Manual Background ```bash nohup ollama serve > /dev/null 2>&1 & ``` ## Remote Ollama Server Run Ollama on a powerful server and connect remotely: ### On the Server ```bash # Allow remote connections OLLAMA_HOST=0.0.0.0 ollama serve ``` ### On the Client ```yaml # .grepai/config.yaml embedder: provider: ollama model: nomic-embed-text endpoint: http://server-ip:11434 ``` ## Common Issues ❌ **Problem:** Connection refused ✅ **Solution:** ```bash # Start Ollama ollama serve ``` ❌ **Problem:** Model not found ✅ **Solution:** ```bash # Pull the model ollama pull nomic-embed-text ``` ❌ **Problem:** Slow embedding generation ✅ **Solutions:** - Use a smaller model (`nomic-embed-text`) - Ensure GPU is being used (`ollama ps`) - Close memory-intensive applications - Consider a remote server with better hardware ❌ **Problem:** Out of memory ✅ **Solutions:** - Use a smaller model - Close other applications - Upgrade RAM - Use remote Ollama server ❌ **Problem:** Embeddings differ after model update ✅ **Solution:** Re-index after model updates: ```bash rm .grepai/index.gob grepai watch ``` ## Best Practices 1. **Start with `nomic-embed-text`:** Best balance of speed/quality 2. **Keep Ollama running:** Background service recommended 3. **Match dimensions:** Don't mix models with different dimensions 4. **Re-index on model change:** Delete index and re-run watch 5. **Monitor memory:** Embedding models use significant RAM ## Output Format Successful Ollama configuration: ``` ✅ Ollama Embedding Provider Configured Provider: Ollama Model: nomic-embed-text Endpoint: http://localhost:11434 Dimensions: 768 (auto-detected) Status: Connected Model Info: - Size: 274 MB - Loaded: Yes - GPU: Apple Metal ```