---
name: embeddings
description: Text embeddings for semantic search and similarity. Use when converting text to vectors, choosing embedding models, implementing chunking strategies, or building document similarity features.
tags: [ai, embeddings, vectors, semantic-search, similarity]
context: fork
agent: data-pipeline-engineer
version: 1.0.0
author: OrchestKit
user-invocable: false
---

# Embeddings

Convert text to dense vector representations for semantic search and similarity.

## Quick Reference

```python
from openai import OpenAI

client = OpenAI()

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here"
)
vector = response.data[0].embedding  # 1536 dimensions
```

```python
# Batch embedding (efficient)
texts = ["text1", "text2", "text3"]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)
vectors = [item.embedding for item in response.data]
```

## Model Selection

| Model | Dims | Cost | Use Case |
|-------|------|------|----------|
| `text-embedding-3-small` | 1536 | $0.02/1M | General purpose |
| `text-embedding-3-large` | 3072 | $0.13/1M | High accuracy |
| `nomic-embed-text` (Ollama) | 768 | Free | Local/CI |

## Chunking Strategy

```python
def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks for embedding."""
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk:
            chunks.append(chunk)

    return chunks
```

**Guidelines:**
- Chunk size: 256-1024 tokens (512 typical)
- Overlap: 10-20% for context continuity
- Include metadata (title, source) with chunks

## Similarity Calculation

```python
import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage
similarity = cosine_similarity(vector1, vector2)
# 1.0 = identical, 0.0 = orthogonal, -1.0 = opposite
```

## Key Decisions

- **Dimension reduction**: Can truncate `text-embedding-3-large` to 1536 dims
- **Normalization**: Most models return normalized vectors
- **Batch size**: 100-500 texts per API call for efficiency

## Common Mistakes

- Embedding queries differently than documents
- Not chunking long documents (context gets lost)
- Using wrong similarity metric (cosine vs euclidean)
- Re-embedding unchanged content (cache embeddings)

## Advanced Patterns

See `references/advanced-patterns.md` for:
- **Late Chunking**: Embed full document, extract chunk vectors from contextualized tokens
- **Batch API**: Production batching with rate limiting and retry
- **Embedding Cache**: Redis-based caching to avoid re-embedding
- **Matryoshka Embeddings**: Dimension reduction with text-embedding-3

## Related Skills

- `rag-retrieval` - Using embeddings for RAG pipelines
- `hyde-retrieval` - Hypothetical document embeddings for vocabulary mismatch
- `contextual-retrieval` - Anthropic's context-prepending technique
- `reranking-patterns` - Cross-encoder reranking for precision
- `ollama-local` - Local embeddings with nomic-embed-text

## Capability Details

### text-to-vector
**Keywords:** embedding, text to vector, vectorize, embed text
**Solves:**
- Convert text to vector embeddings
- Choose appropriate embedding models
- Handle embedding API integration

### semantic-search
**Keywords:** semantic search, vector search, similarity search, find similar
**Solves:**
- Implement semantic search over documents
- Configure similarity thresholds
- Rank results by relevance

### chunking-strategies
**Keywords:** chunk, chunking, split, text splitting, overlap
**Solves:**
- Split documents into optimal chunks
- Configure chunk size and overlap
- Preserve semantic boundaries

### batch-embedding
**Keywords:** batch, bulk embed, parallel embedding, batch processing
**Solves:**
- Embed large document collections efficiently
- Handle rate limits and retries
- Optimize embedding costs

### local-embeddings
**Keywords:** local, ollama, self-hosted, on-premise, offline
**Solves:**
- Run embeddings locally with Ollama
- Deploy self-hosted embedding models
- Reduce API costs with local models