--- name: llamaindex-development description: Expert guidance for LlamaIndex development including RAG applications, vector stores, document processing, query engines, and building production AI applications. --- # LlamaIndex Development You are an expert in LlamaIndex for building RAG (Retrieval-Augmented Generation) applications, data indexing, and LLM-powered applications with Python. ## Key Principles - Write concise, technical responses with accurate Python examples - Use functional, declarative programming; avoid classes where possible - Prioritize code quality, maintainability, and performance - Use descriptive variable names that reflect their purpose - Follow PEP 8 style guidelines ## Code Organization ### Directory Structure ``` project/ ├── data/ # Source documents and data ├── indexes/ # Persisted index storage ├── loaders/ # Custom document loaders ├── retrievers/ # Custom retriever implementations ├── query_engines/ # Query engine configurations ├── prompts/ # Custom prompt templates ├── transformations/ # Document transformations ├── callbacks/ # Custom callback handlers ├── utils/ # Utility functions ├── tests/ # Test files └── config/ # Configuration files ``` ### Naming Conventions - Use snake_case for files, functions, and variables - Use PascalCase for classes - Prefix private functions with underscore - Use descriptive names (e.g., `create_vector_index`, `build_query_engine`) ## Document Loading ### Using Document Loaders ```python from llama_index.core import SimpleDirectoryReader from llama_index.readers.file import PDFReader, DocxReader # Load from directory documents = SimpleDirectoryReader( input_dir="./data", recursive=True, required_exts=[".pdf", ".txt", ".md"] ).load_data() # Load specific file types pdf_reader = PDFReader() documents = pdf_reader.load_data(file="document.pdf") ``` ### Custom Loaders ```python from llama_index.core.readers.base import BaseReader from llama_index.core import Document class CustomLoader(BaseReader): def load_data(self, file_path: str) -> list[Document]: # Custom loading logic with open(file_path, 'r') as f: content = f.read() return [Document( text=content, metadata={"source": file_path} )] ``` ## Text Splitting and Processing ### Node Parsing ```python from llama_index.core.node_parser import ( SentenceSplitter, SemanticSplitterNodeParser, MarkdownNodeParser ) # Simple sentence splitting splitter = SentenceSplitter( chunk_size=1024, chunk_overlap=200 ) nodes = splitter.get_nodes_from_documents(documents) # Semantic splitting (preserves meaning) from llama_index.embeddings.openai import OpenAIEmbedding semantic_splitter = SemanticSplitterNodeParser( embed_model=OpenAIEmbedding(), breakpoint_percentile_threshold=95 ) # Markdown-aware splitting markdown_splitter = MarkdownNodeParser() ``` ### Best Practices for Chunking - Choose chunk size based on your embedding model's context window - Use overlap to maintain context between chunks - Preserve document structure when possible - Include metadata for filtering and retrieval - Use semantic splitting for better coherence ## Vector Stores and Indexing ### Creating Indexes ```python from llama_index.core import VectorStoreIndex, StorageContext from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb # In-memory index index = VectorStoreIndex.from_documents(documents) # With persistent vector store chroma_client = chromadb.PersistentClient(path="./chroma_db") chroma_collection = chroma_client.get_or_create_collection("my_collection") vector_store = ChromaVectorStore(chroma_collection=chroma_collection) storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents( documents, storage_context=storage_context ) ``` ### Supported Vector Stores - Chroma (local development) - Pinecone (production, managed) - Weaviate (production, self-hosted or managed) - Qdrant (production, self-hosted or managed) - PostgreSQL with pgvector - MongoDB Atlas Vector Search ### Index Persistence ```python from llama_index.core import StorageContext, load_index_from_storage # Persist index index.storage_context.persist(persist_dir="./storage") # Load index storage_context = StorageContext.from_defaults(persist_dir="./storage") index = load_index_from_storage(storage_context) ``` ## Query Engines ### Basic Query Engine ```python from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine( similarity_top_k=5, response_mode="compact" ) response = query_engine.query("What is the main topic?") print(response.response) ``` ### Response Modes - `refine`: Iteratively refine answer through each node - `compact`: Combine chunks before sending to LLM - `tree_summarize`: Build tree and summarize - `simple_summarize`: Truncate and summarize - `accumulate`: Accumulate responses from each node ### Advanced Query Engine ```python from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.postprocessor import SimilarityPostprocessor query_engine = RetrieverQueryEngine.from_args( retriever=index.as_retriever(similarity_top_k=10), node_postprocessors=[ SimilarityPostprocessor(similarity_cutoff=0.7) ], response_mode="compact" ) ``` ## Retrievers ### Custom Retrievers ```python from llama_index.core.retrievers import VectorIndexRetriever # Basic retriever retriever = VectorIndexRetriever( index=index, similarity_top_k=10 ) # Retrieve nodes nodes = retriever.retrieve("search query") ``` ### Hybrid Search ```python from llama_index.core.retrievers import QueryFusionRetriever # Combine multiple retrieval strategies retriever = QueryFusionRetriever( [ index.as_retriever(similarity_top_k=5), bm25_retriever, # Keyword-based ], num_queries=4, use_async=True ) ``` ## Embeddings ### Embedding Models ```python from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.core import Settings # OpenAI embeddings Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small", dimensions=512 # Optional dimension reduction ) # Local embeddings Settings.embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-small-en-v1.5" ) ``` ## LLM Configuration ### Setting Up LLMs ```python from llama_index.llms.openai import OpenAI from llama_index.llms.anthropic import Anthropic from llama_index.core import Settings # OpenAI Settings.llm = OpenAI( model="gpt-4o", temperature=0.1 ) # Anthropic Settings.llm = Anthropic( model="claude-sonnet-4-20250514", temperature=0.1 ) ``` ## Agents ### Building Agents ```python from llama_index.core.agent import ReActAgent from llama_index.core.tools import QueryEngineTool, ToolMetadata # Create tools from query engines tools = [ QueryEngineTool( query_engine=documents_query_engine, metadata=ToolMetadata( name="documents", description="Search through documents" ) ), QueryEngineTool( query_engine=code_query_engine, metadata=ToolMetadata( name="codebase", description="Search through code" ) ) ] # Create agent agent = ReActAgent.from_tools( tools, llm=llm, verbose=True ) response = agent.chat("Find information about X") ``` ## Performance Optimization ### Caching ```python from llama_index.core import Settings from llama_index.core.llms import LLMCache # Enable LLM response caching Settings.llm = OpenAI(model="gpt-4o") Settings.llm_cache = LLMCache() ``` ### Async Operations ```python # Use async for better performance response = await query_engine.aquery("question") # Batch processing responses = await asyncio.gather(*[ query_engine.aquery(q) for q in questions ]) ``` ### Embedding Optimization - Batch embeddings when possible - Use smaller embedding dimensions when accuracy allows - Cache embeddings for repeated documents - Use local models for cost-sensitive applications ## Error Handling ```python from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler # Debug handler for troubleshooting debug_handler = LlamaDebugHandler() callback_manager = CallbackManager([debug_handler]) Settings.callback_manager = callback_manager ``` ## Testing - Unit test document loaders and transformations - Test retrieval quality with known queries - Validate index persistence and loading - Test query engine responses - Monitor retrieval metrics (precision, recall) ## Dependencies - llama-index - llama-index-embeddings-openai - llama-index-llms-openai - llama-index-vector-stores-chroma - chromadb - python-dotenv - pydantic