--- name: outlines description: Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library version: 1.0.0 author: Orchestra Research license: MIT tags: [Prompt Engineering, Outlines, Structured Generation, JSON Schema, Pydantic, Local Models, Grammar-Based Generation, vLLM, Transformers, Type Safety] dependencies: [outlines, transformers, vllm, pydantic] --- # Outlines: Structured Text Generation ## When to Use This Skill Use Outlines when you need to: - **Guarantee valid JSON/XML/code** structure during generation - **Use Pydantic models** for type-safe outputs - **Support local models** (Transformers, llama.cpp, vLLM) - **Maximize inference speed** with zero-overhead structured generation - **Generate against JSON schemas** automatically - **Control token sampling** at the grammar level **GitHub Stars**: 8,000+ | **From**: dottxt.ai (formerly .txt) ## Installation ```bash # Base installation pip install outlines # With specific backends pip install outlines transformers # Hugging Face models pip install outlines llama-cpp-python # llama.cpp pip install outlines vllm # vLLM for high-throughput ``` ## Quick Start ### Basic Example: Classification ```python import outlines from typing import Literal # Load model model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Generate with type constraint prompt = "Sentiment of 'This product is amazing!': " generator = outlines.generate.choice(model, ["positive", "negative", "neutral"]) sentiment = generator(prompt) print(sentiment) # "positive" (guaranteed one of these) ``` ### With Pydantic Models ```python from pydantic import BaseModel import outlines class User(BaseModel): name: str age: int email: str model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Generate structured output prompt = "Extract user: John Doe, 30 years old, john@example.com" generator = outlines.generate.json(model, User) user = generator(prompt) print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "john@example.com" ``` ## Core Concepts ### 1. Constrained Token Sampling Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level. **How it works:** 1. Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG) 2. Transform CFG into Finite State Machine (FSM) 3. Filter invalid tokens at each step during generation 4. Fast-forward when only one valid token exists **Benefits:** - **Zero overhead**: Filtering happens at token level - **Speed improvement**: Fast-forward through deterministic paths - **Guaranteed validity**: Invalid outputs impossible ```python import outlines # Pydantic model -> JSON schema -> CFG -> FSM class Person(BaseModel): name: str age: int model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Behind the scenes: # 1. Person -> JSON schema # 2. JSON schema -> CFG # 3. CFG -> FSM # 4. FSM filters tokens during generation generator = outlines.generate.json(model, Person) result = generator("Generate person: Alice, 25") ``` ### 2. Structured Generators Outlines provides specialized generators for different output types. #### Choice Generator ```python # Multiple choice selection generator = outlines.generate.choice( model, ["positive", "negative", "neutral"] ) sentiment = generator("Review: This is great!") # Result: One of the three choices ``` #### JSON Generator ```python from pydantic import BaseModel class Product(BaseModel): name: str price: float in_stock: bool # Generate valid JSON matching schema generator = outlines.generate.json(model, Product) product = generator("Extract: iPhone 15, $999, available") # Guaranteed valid Product instance print(type(product)) # ``` #### Regex Generator ```python # Generate text matching regex generator = outlines.generate.regex( model, r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern ) phone = generator("Generate phone number:") # Result: "555-123-4567" (guaranteed to match pattern) ``` #### Integer/Float Generators ```python # Generate specific numeric types int_generator = outlines.generate.integer(model) age = int_generator("Person's age:") # Guaranteed integer float_generator = outlines.generate.float(model) price = float_generator("Product price:") # Guaranteed float ``` ### 3. Model Backends Outlines supports multiple local and API-based backends. #### Transformers (Hugging Face) ```python import outlines # Load from Hugging Face model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda" # Or "cpu" ) # Use with any generator generator = outlines.generate.json(model, YourModel) ``` #### llama.cpp ```python # Load GGUF model model = outlines.models.llamacpp( "./models/llama-3.1-8b-instruct.Q4_K_M.gguf", n_gpu_layers=35 ) generator = outlines.generate.json(model, YourModel) ``` #### vLLM (High Throughput) ```python # For production deployments model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", tensor_parallel_size=2 # Multi-GPU ) generator = outlines.generate.json(model, YourModel) ``` #### OpenAI (Limited Support) ```python # Basic OpenAI support model = outlines.models.openai( "gpt-4o-mini", api_key="your-api-key" ) # Note: Some features limited with API models generator = outlines.generate.json(model, YourModel) ``` ### 4. Pydantic Integration Outlines has first-class Pydantic support with automatic schema translation. #### Basic Models ```python from pydantic import BaseModel, Field class Article(BaseModel): title: str = Field(description="Article title") author: str = Field(description="Author name") word_count: int = Field(description="Number of words", gt=0) tags: list[str] = Field(description="List of tags") model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, Article) article = generator("Generate article about AI") print(article.title) print(article.word_count) # Guaranteed > 0 ``` #### Nested Models ```python class Address(BaseModel): street: str city: str country: str class Person(BaseModel): name: str age: int address: Address # Nested model generator = outlines.generate.json(model, Person) person = generator("Generate person in New York") print(person.address.city) # "New York" ``` #### Enums and Literals ```python from enum import Enum from typing import Literal class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected" class Application(BaseModel): applicant: str status: Status # Must be one of enum values priority: Literal["low", "medium", "high"] # Must be one of literals generator = outlines.generate.json(model, Application) app = generator("Generate application") print(app.status) # Status.PENDING (or APPROVED/REJECTED) ``` ## Common Patterns ### Pattern 1: Data Extraction ```python from pydantic import BaseModel import outlines class CompanyInfo(BaseModel): name: str founded_year: int industry: str employees: int model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, CompanyInfo) text = """ Apple Inc. was founded in 1976 in the technology industry. The company employs approximately 164,000 people worldwide. """ prompt = f"Extract company information:\n{text}\n\nCompany:" company = generator(prompt) print(f"Name: {company.name}") print(f"Founded: {company.founded_year}") print(f"Industry: {company.industry}") print(f"Employees: {company.employees}") ``` ### Pattern 2: Classification ```python from typing import Literal import outlines model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Binary classification generator = outlines.generate.choice(model, ["spam", "not_spam"]) result = generator("Email: Buy now! 50% off!") # Multi-class classification categories = ["technology", "business", "sports", "entertainment"] category_gen = outlines.generate.choice(model, categories) category = category_gen("Article: Apple announces new iPhone...") # With confidence class Classification(BaseModel): label: Literal["positive", "negative", "neutral"] confidence: float classifier = outlines.generate.json(model, Classification) result = classifier("Review: This product is okay, nothing special") ``` ### Pattern 3: Structured Forms ```python class UserProfile(BaseModel): full_name: str age: int email: str phone: str country: str interests: list[str] model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, UserProfile) prompt = """ Extract user profile from: Name: Alice Johnson Age: 28 Email: alice@example.com Phone: 555-0123 Country: USA Interests: hiking, photography, cooking """ profile = generator(prompt) print(profile.full_name) print(profile.interests) # ["hiking", "photography", "cooking"] ``` ### Pattern 4: Multi-Entity Extraction ```python class Entity(BaseModel): name: str type: Literal["PERSON", "ORGANIZATION", "LOCATION"] class DocumentEntities(BaseModel): entities: list[Entity] model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, DocumentEntities) text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond." prompt = f"Extract entities from: {text}" result = generator(prompt) for entity in result.entities: print(f"{entity.name} ({entity.type})") ``` ### Pattern 5: Code Generation ```python class PythonFunction(BaseModel): function_name: str parameters: list[str] docstring: str body: str model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, PythonFunction) prompt = "Generate a Python function to calculate factorial" func = generator(prompt) print(f"def {func.function_name}({', '.join(func.parameters)}):") print(f' """{func.docstring}"""') print(f" {func.body}") ``` ### Pattern 6: Batch Processing ```python def batch_extract(texts: list[str], schema: type[BaseModel]): """Extract structured data from multiple texts.""" model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, schema) results = [] for text in texts: result = generator(f"Extract from: {text}") results.append(result) return results class Person(BaseModel): name: str age: int texts = [ "John is 30 years old", "Alice is 25 years old", "Bob is 40 years old" ] people = batch_extract(texts, Person) for person in people: print(f"{person.name}: {person.age}") ``` ## Backend Configuration ### Transformers ```python import outlines # Basic usage model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # GPU configuration model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda", model_kwargs={"torch_dtype": "float16"} ) # Popular models model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct") model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3") model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct") ``` ### llama.cpp ```python # Load GGUF model model = outlines.models.llamacpp( "./models/llama-3.1-8b.Q4_K_M.gguf", n_ctx=4096, # Context window n_gpu_layers=35, # GPU layers n_threads=8 # CPU threads ) # Full GPU offload model = outlines.models.llamacpp( "./models/model.gguf", n_gpu_layers=-1 # All layers on GPU ) ``` ### vLLM (Production) ```python # Single GPU model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct") # Multi-GPU model = outlines.models.vllm( "meta-llama/Llama-3.1-70B-Instruct", tensor_parallel_size=4 # 4 GPUs ) # With quantization model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", quantization="awq" # Or "gptq" ) ``` ## Best Practices ### 1. Use Specific Types ```python # ✅ Good: Specific types class Product(BaseModel): name: str price: float # Not str quantity: int # Not str in_stock: bool # Not str # ❌ Bad: Everything as string class Product(BaseModel): name: str price: str # Should be float quantity: str # Should be int ``` ### 2. Add Constraints ```python from pydantic import Field # ✅ Good: With constraints class User(BaseModel): name: str = Field(min_length=1, max_length=100) age: int = Field(ge=0, le=120) email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$") # ❌ Bad: No constraints class User(BaseModel): name: str age: int email: str ``` ### 3. Use Enums for Categories ```python # ✅ Good: Enum for fixed set class Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high" class Task(BaseModel): title: str priority: Priority # ❌ Bad: Free-form string class Task(BaseModel): title: str priority: str # Can be anything ``` ### 4. Provide Context in Prompts ```python # ✅ Good: Clear context prompt = """ Extract product information from the following text. Text: iPhone 15 Pro costs $999 and is currently in stock. Product: """ # ❌ Bad: Minimal context prompt = "iPhone 15 Pro costs $999 and is currently in stock." ``` ### 5. Handle Optional Fields ```python from typing import Optional # ✅ Good: Optional fields for incomplete data class Article(BaseModel): title: str # Required author: Optional[str] = None # Optional date: Optional[str] = None # Optional tags: list[str] = [] # Default empty list # Can succeed even if author/date missing ``` ## Comparison to Alternatives | Feature | Outlines | Instructor | Guidance | LMQL | |---------|----------|------------|----------|------| | Pydantic Support | ✅ Native | ✅ Native | ❌ No | ❌ No | | JSON Schema | ✅ Yes | ✅ Yes | ⚠️ Limited | ✅ Yes | | Regex Constraints | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes | | Local Models | ✅ Full | ⚠️ Limited | ✅ Full | ✅ Full | | API Models | ⚠️ Limited | ✅ Full | ✅ Full | ✅ Full | | Zero Overhead | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes | | Automatic Retrying | ❌ No | ✅ Yes | ❌ No | ❌ No | | Learning Curve | Low | Low | Low | High | **When to choose Outlines:** - Using local models (Transformers, llama.cpp, vLLM) - Need maximum inference speed - Want Pydantic model support - Require zero-overhead structured generation - Control token sampling process **When to choose alternatives:** - Instructor: Need API models with automatic retrying - Guidance: Need token healing and complex workflows - LMQL: Prefer declarative query syntax ## Performance Characteristics **Speed:** - **Zero overhead**: Structured generation as fast as unconstrained - **Fast-forward optimization**: Skips deterministic tokens - **1.2-2x faster** than post-generation validation approaches **Memory:** - FSM compiled once per schema (cached) - Minimal runtime overhead - Efficient with vLLM for high throughput **Accuracy:** - **100% valid outputs** (guaranteed by FSM) - No retry loops needed - Deterministic token filtering ## Resources - **Documentation**: https://outlines-dev.github.io/outlines - **GitHub**: https://github.com/outlines-dev/outlines (8k+ stars) - **Discord**: https://discord.gg/R9DSu34mGd - **Blog**: https://blog.dottxt.co ## See Also - `references/json_generation.md` - Comprehensive JSON and Pydantic patterns - `references/backends.md` - Backend-specific configuration - `references/examples.md` - Production-ready examples