--- name: embedding-models description: Embedding model configurations and cost calculators allowed-tools: Bash, Read, Write, Edit, WebFetch --- # Embedding Models Skill Embedding model selection, configuration, and cost optimization for RAG pipelines. ## Use When - Selecting embedding models for vector search - Configuring OpenAI, Cohere, or HuggingFace embeddings - Calculating embedding generation costs - Optimizing embedding performance vs cost tradeoffs - Setting up local vs cloud embedding models - Implementing embedding caching strategies - User mentions: "embeddings", "vector models", "embedding costs", "semantic search models" ## Model Selection Guide ### Commercial Models **OpenAI Embeddings:** - `text-embedding-3-small` - 1536 dims, $0.02/1M tokens, balanced performance - `text-embedding-3-large` - 3072 dims, $0.13/1M tokens, highest quality - `text-embedding-ada-002` - 1536 dims, $0.10/1M tokens, legacy model **Cohere Embeddings:** - `embed-english-v3.0` - 1024 dims, multilingual support - `embed-english-light-v3.0` - 384 dims, faster/cheaper - `embed-multilingual-v3.0` - 1024 dims, 100+ languages ### Open Source Models (HuggingFace) **Sentence Transformers:** - `all-MiniLM-L6-v2` - 384 dims, 80MB, fast and efficient - `all-mpnet-base-v2` - 768 dims, 420MB, high quality - `multi-qa-mpnet-base-dot-v1` - 768 dims, optimized for Q&A - `paraphrase-multilingual-mpnet-base-v2` - 768 dims, 50+ languages **Specialized Models:** - `BAAI/bge-small-en-v1.5` - 384 dims, SOTA small model - `BAAI/bge-base-en-v1.5` - 768 dims, excellent retrieval - `BAAI/bge-large-en-v1.5` - 1024 dims, top performance - `intfloat/e5-base-v2` - 768 dims, strong general purpose ## Cost Calculator Use the cost calculator script to estimate embedding costs: ```bash # Calculate costs for different models and volumes python scripts/calculate-embedding-costs.py \ --documents 100000 \ --avg-tokens 500 \ --model text-embedding-3-small # Compare multiple models python scripts/calculate-embedding-costs.py \ --documents 100000 \ --avg-tokens 500 \ --compare ``` ## Setup Scripts ### OpenAI Embeddings ```bash bash scripts/setup-openai-embeddings.sh ``` Configures OpenAI embedding client with API key management and retry logic. ### HuggingFace Embeddings ```bash bash scripts/setup-huggingface-embeddings.sh ``` Downloads and configures sentence-transformers models locally. ### Cohere Embeddings ```bash bash scripts/setup-cohere-embeddings.sh ``` Sets up Cohere embedding client with API credentials. ## Configuration Templates ### OpenAI Configuration ```python # templates/openai-embedding-config.py from openai import OpenAI client = OpenAI(api_key="your-key") embeddings = client.embeddings.create( model="text-embedding-3-small", input=["Your text here"] ) ``` ### HuggingFace Configuration ```python # templates/huggingface-embedding-config.py from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2') embeddings = model.encode(["Your text here"]) ``` ### Custom Model Template ```python # templates/custom-embedding-model.py # Wrapper for any embedding model with consistent interface ``` ## Optimization Strategies **Cost Optimization:** 1. Use smaller models for high-volume applications 2. Implement embedding caching (see examples/embedding-cache.py) 3. Batch embedding generation (see examples/batch-embedding-generation.py) 4. Consider local models for sensitive data **Performance Optimization:** 1. Use GPU acceleration for local models 2. Batch processing for throughput 3. Dimension reduction for storage/speed 4. Model distillation for faster inference ## Model Comparison Matrix | Model | Dimensions | Size | Speed | Quality | Cost | |-------|-----------|------|-------|---------|------| | text-embedding-3-small | 1536 | API | Fast | Good | $0.02/1M | | text-embedding-3-large | 3072 | API | Medium | Excellent | $0.13/1M | | all-MiniLM-L6-v2 | 384 | 80MB | Very Fast | Good | Free | | all-mpnet-base-v2 | 768 | 420MB | Fast | Excellent | Free | | bge-base-en-v1.5 | 768 | 420MB | Fast | Excellent | Free | | embed-english-v3.0 | 1024 | API | Fast | Excellent | $0.10/1M | ## Examples **Batch Embedding Generation:** ```python # examples/batch-embedding-generation.py # Process large document collections efficiently ``` **Embedding Cache:** ```python # examples/embedding-cache.py # Cache embeddings to avoid redundant API calls ``` ## Decision Framework **Use OpenAI when:** - Need highest quality embeddings - Low to medium volume (<10M tokens/month) - Prefer managed service over self-hosting - Working with latest models **Use Cohere when:** - Need multilingual support - Require production SLA - Want embedding customization - Need both embedding and reranking **Use HuggingFace/Local when:** - High volume (>10M tokens/month) - Data privacy requirements - Have GPU infrastructure - Cost optimization priority - Offline/air-gapped environments ## References - Sentence Transformers: https://www.sbert.net/ - OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings - Cohere Embeddings: https://docs.cohere.com/docs/embeddings - MTEB Leaderboard: https://huggingface.co/spaces/mteb/leaderboard