--- title: RAG Document Processing Guide description: Comprehensive guide for Retrieval-Augmented Generation (RAG) with document chunking, hybrid search, and reranking capabilities keywords: [ rag, chunking, embeddings, vector-search, hybrid-search, reranking, document-processing, ] --- # RAG Document Processing Guide > **Since**: v8.44.0 | **Status**: Stable | **Availability**: SDK + CLI > **Provider Defaults:** When `--provider` (CLI) or `provider` (SDK) is not specified, NeuroLink defaults to **Vertex AI** with **gemini-2.5-flash** (see `src/lib/rag/ragIntegration.ts`). Set the `NEUROLINK_PROVIDER` or `AI_PROVIDER` environment variable to change the default provider, or pass an explicit `embeddingModel` / `generationModel` config. ## Overview NeuroLink provides enterprise-grade RAG (Retrieval-Augmented Generation) capabilities for building production AI applications: - **10 Chunking Strategies**: Character, recursive, sentence, token, markdown, HTML, JSON, LaTeX, semantic, and semantic-markdown chunking for any content type - **Hybrid Search**: Combine BM25 keyword search with vector embeddings using RRF or linear fusion - **Multi-Factor Reranking**: LLM, cross-encoder, Cohere API, and simple position-based reranking options - **Factory + Registry Patterns**: Extensible architecture with lazy loading, aliases, and full TypeScript support - **Resilience Built-In**: Circuit breakers, retry handlers, and comprehensive error handling ## Quick Start ### Basic Document Processing ```typescript import { loadDocument, createChunker, createReranker } from "@juspay/neurolink"; // Load and chunk a document const doc = await loadDocument("/path/to/document.md"); const chunker = await createChunker("markdown", { maxSize: 1000, overlap: 100, }); const chunks = await chunker.chunk(doc.content); // Each chunk includes metadata console.log(chunks[0]); // { // id: "chunk-abc123", // text: "## Introduction\n\nThis document covers...", // metadata: { // documentId: "doc-xyz", // chunkIndex: 0, // startOffset: 0, // endOffset: 847 // } // } ``` ### Full RAG Pipeline ```typescript import { RAGPipeline, createRAGPipeline } from "@juspay/neurolink"; const pipeline = new RAGPipeline({ embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, generationModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, }); // Ingest documents await pipeline.ingest(["./docs/*.md", "./knowledge/**/*.txt"]); // Query with automatic retrieval and generation const response = await pipeline.query("What are the key features?"); console.log(response.answer); console.log(response.sources); // Retrieved chunks with citations ``` ## Integration with generate() and stream() The RAG system integrates seamlessly with NeuroLink's `generate()` and `stream()` APIs through the `createVectorQueryTool`. This allows AI models to automatically query your knowledge base during generation. ### Using RAG with generate() ```typescript import { NeuroLink, createVectorQueryTool, InMemoryVectorStore, } from "@juspay/neurolink"; // 1. Set up vector store with your data const vectorStore = new InMemoryVectorStore(); await vectorStore.upsert("knowledge-base", [ { id: "doc1", vector: embedding1, metadata: { text: "Your document content..." }, }, // ... more documents ]); // 2. Create the RAG tool const ragTool = createVectorQueryTool( { id: "knowledge-search", description: "Search the knowledge base for relevant information", indexName: "knowledge-base", embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, topK: 5, reranker: { model: { provider: "vertex", modelName: "gemini-3-flash-preview" }, topK: 3, }, }, vectorStore, ); // 3. Use with generate() const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "What are the key features of our product?" }, tools: { [ragTool.name]: ragTool }, provider: "vertex", model: "gemini-3-flash-preview", }); console.log(result.content); console.log(result.toolExecutions); // See RAG tool results ``` ### Using RAG with stream() ```typescript // Same setup as above, then: const result = await neurolink.stream({ input: { text: "Explain our pricing model in detail" }, tools: { [ragTool.name]: ragTool }, provider: "vertex", model: "gemini-3-flash-preview", }); for await (const chunk of result.stream) { if ("content" in chunk) { process.stdout.write(chunk.content); } } ``` ### Complete RAG Pipeline Example This example demonstrates a full RAG pipeline from document loading to AI-powered retrieval: ```typescript import { NeuroLink, createVectorQueryTool, InMemoryVectorStore, } from "@juspay/neurolink"; import { loadDocument, createChunker, createMetadataExtractor, } from "@juspay/neurolink"; // Step 1: Load and chunk documents const doc = await loadDocument("./docs/product-guide.md"); const chunker = await createChunker("markdown", { maxSize: 1000, overlap: 100, preserveHeaders: true, }); const chunks = await chunker.chunk(doc.content); // Step 2: Extract metadata for better retrieval (optional) const extractor = await createMetadataExtractor("llm", { provider: "vertex", modelName: "gemini-3-flash-preview", }); const enrichedChunks = await extractor.extract(chunks, { summary: true, keywords: true, }); // Step 3: Generate embeddings using the NeuroLink provider const neurolink = new NeuroLink(); // Helper function to generate embeddings async function generateEmbeddings(texts: string[]): Promise { const embeddings: number[][] = []; for (const text of texts) { const result = await neurolink.generate({ input: { text }, provider: "vertex", model: "gemini-3-flash-preview", }); // Extract embedding from result (provider-specific) embeddings.push(result.embedding || []); } return embeddings; } const embeddings = await generateEmbeddings(enrichedChunks.map((c) => c.text)); // Step 4: Store in vector store const vectorStore = new InMemoryVectorStore(); await vectorStore.upsert( "product-docs", enrichedChunks.map((chunk, i) => ({ id: chunk.id, vector: embeddings[i], metadata: { text: chunk.text, summary: chunk.metadata.summary, keywords: chunk.metadata.keywords, source: "product-guide.md", }, })), ); // Step 5: Create RAG tool const ragTool = createVectorQueryTool( { id: "product-search", description: "Search product documentation for answers to user questions", indexName: "product-docs", embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, topK: 5, includeSources: true, reranker: { model: { provider: "vertex", modelName: "gemini-3-flash-preview" }, topK: 3, weights: { semantic: 0.6, vector: 0.3, position: 0.1 }, }, }, vectorStore, ); // Step 6: Use with generate() const response = await neurolink.generate({ input: { text: "How do I configure the billing settings?" }, tools: { [ragTool.name]: ragTool }, provider: "vertex", model: "gemini-3-flash-preview", systemPrompt: `You are a helpful product assistant. Use the product-search tool to find relevant information before answering questions. Always cite your sources.`, }); console.log("Answer:", response.content); console.log( "Sources used:", response.toolExecutions?.map((t) => t.result?.sources), ); ``` ### Configuration Options for createVectorQueryTool | Option | Type | Default | Description | | ----------------- | ----------------------------------------- | --------------------- | ------------------------------------------------------ | | `id` | `string` | `vector-query-{uuid}` | Unique identifier for the tool | | `description` | `string` | Default description | Description shown to AI for tool selection | | `indexName` | `string` | **Required** | Name of the index in the vector store | | `embeddingModel` | `{ provider: string, modelName: string }` | **Required** | Embedding model configuration | | `enableFilter` | `boolean` | `false` | Enable metadata filtering in queries | | `includeVectors` | `boolean` | `false` | Include raw vectors in results | | `includeSources` | `boolean` | `true` | Include source documents in response | | `topK` | `number` | `10` | Number of results to retrieve | | `reranker` | `RerankerConfig` | `undefined` | Optional reranker configuration | | `providerOptions` | `VectorProviderOptions` | `undefined` | Provider-specific options (Pinecone, pgVector, Chroma) | #### Reranker Configuration | Option | Type | Default | Description | | --------- | ----------------------------------------------------------- | ----------------------------------------------- | --------------------------------- | | `model` | `{ provider: string, modelName: string }` | **Required** | Model for semantic reranking | | `weights` | `{ semantic?: number, vector?: number, position?: number }` | `{ semantic: 0.5, vector: 0.3, position: 0.2 }` | Score weights (must sum to 1.0) | | `topK` | `number` | Same as tool `topK` | Results to return after reranking | ### Event Handling Listen for tool events during RAG operations to monitor and debug: ```typescript const neurolink = new NeuroLink(); // Listen for tool execution events neurolink.on("tool:start", (event) => { console.log(`Tool started: ${event.toolName}`); console.log(`Parameters:`, event.parameters); }); neurolink.on("tool:end", (event) => { console.log(`Tool completed: ${event.toolName}`); console.log(`Success: ${event.success}`); console.log(`Response time: ${event.responseTime}ms`); if (event.result) { console.log(`Results found: ${event.result.totalResults}`); } if (event.error) { console.error(`Error:`, event.error.message); } }); // Listen for generation events neurolink.on("generation:start", (event) => { console.log(`Generation started with provider: ${event.provider}`); }); neurolink.on("generation:end", (event) => { console.log(`Generation completed in ${event.responseTime}ms`); console.log(`Tools used: ${event.toolsUsed?.join(", ") || "none"}`); }); // Execute RAG query with event monitoring const result = await neurolink.generate({ input: { text: "What are the system requirements?" }, tools: { [ragTool.name]: ragTool }, provider: "vertex", model: "gemini-3-flash-preview", }); ``` ### Dynamic Vector Store Resolution For multi-tenant applications, you can provide a resolver function instead of a static vector store: ```typescript const ragTool = createVectorQueryTool( { id: "tenant-search", description: "Search tenant-specific knowledge base", indexName: "documents", embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, topK: 5, }, (context) => { // Return different vector stores based on request context const tenantId = context.tenantId || "default"; return getVectorStoreForTenant(tenantId); }, ); // The context is passed from generate options const result = await neurolink.generate({ input: { text: "Search query" }, tools: { [ragTool.name]: ragTool }, context: { tenantId: "tenant-123", userId: "user-456" }, }); ``` ### Metadata Filtering Enable metadata filtering for more precise retrieval: ```typescript const ragTool = createVectorQueryTool( { id: "filtered-search", description: "Search with metadata filters", indexName: "knowledge-base", embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" }, enableFilter: true, // Enable filter parameter topK: 10, }, vectorStore, ); // The AI can now use filters in its queries // Example filter syntax supported: // { category: 'billing' } - Exact match // { date: { $gte: '2024-01-01' } } - Comparison operators // { tags: { $in: ['feature', 'guide'] } } - Array membership // { $and: [{ type: 'doc' }, { status: 'published' }] } - Logical operators ``` ## Chunking Strategies NeuroLink provides 10 chunking strategies optimized for different content types. ### Available Strategies | Strategy | Best For | Key Config | | ------------------- | --------------------------- | -------------------------------------- | | `character` | Simple text, logs | `maxSize`, `separator` | | `recursive` | General documents (default) | `maxSize`, `overlap`, `separators` | | `sentence` | Natural language, Q&A | `maxSize`, `minSentences` | | `token` | LLM context optimization | `maxSize` (tokens), `tokenizer` | | `markdown` | Documentation, READMEs | `preserveHeaders`, `codeBlockHandling` | | `html` | Web content | `preserveTags`, `removeTags` | | `json` | API responses, config | `preserveStructure`, `flattenDepth` | | `latex` | Academic papers | `sectionCommands`, `preserveMath` | | `semantic` | Context-aware splitting | `similarityThreshold`, `embedder` | | `semantic-markdown` | Knowledge bases | `semanticThreshold`, `embedder` | ### Strategy Configuration ```typescript import { createChunker, getAvailableStrategies } from "@juspay/neurolink"; // List all available strategies const strategies = getAvailableStrategies(); // ['character', 'recursive', 'sentence', 'token', 'markdown', 'html', 'json', 'latex', 'semantic', 'semantic-markdown'] // Recursive chunker (recommended for general use) const recursiveChunker = await createChunker("recursive", { maxSize: 1000, overlap: 200, separators: ["\n\n", "\n", ". ", " ", ""], keepSeparator: true, }); // Markdown chunker (for documentation) const markdownChunker = await createChunker("markdown", { maxSize: 1000, overlap: 100, preserveHeaders: true, codeBlockHandling: "preserve", // 'preserve' | 'split' | 'remove' }); // Token chunker (for LLM optimization) const tokenChunker = await createChunker("token", { maxSize: 512, // Max tokens per chunk overlap: 50, // Token overlap tokenizer: "cl100k_base", // OpenAI tokenizer }); ``` ### Content-Type Recommendations ```typescript import { getRecommendedStrategy } from "@juspay/neurolink"; // Get strategy based on content type getRecommendedStrategy("text/markdown"); // 'markdown' getRecommendedStrategy("text/html"); // 'html' getRecommendedStrategy("application/json"); // 'json' getRecommendedStrategy("text/x-latex"); // 'latex' getRecommendedStrategy("text/plain"); // 'recursive' ``` ## Hybrid Search Hybrid search combines BM25 keyword matching with vector similarity for improved retrieval quality. ### How It Works 1. **BM25 Search**: Traditional keyword matching using term frequency and document length normalization 2. **Vector Search**: Semantic similarity using embeddings 3. **Score Fusion**: Combine rankings using RRF or linear combination ### Fusion Methods #### Reciprocal Rank Fusion (RRF) RRF is robust to score scale differences and works well in most cases: ```typescript import { reciprocalRankFusion } from "@juspay/neurolink"; // Combine rankings from multiple sources const fusedScores = reciprocalRankFusion( [vectorRankings, bm25Rankings], 60, // k parameter (default: 60) ); // RRF formula: score(d) = sum(1 / (k + rank(d))) ``` #### Linear Combination Linear combination allows fine-tuning the balance between vector and keyword scores: ```typescript import { linearCombination } from "@juspay/neurolink"; const combinedScores = linearCombination( vectorScores, // Map bm25Scores, // Map 0.5, // alpha: weight for vector scores (0-1) ); // Linear formula: score(d) = alpha * vectorScore(d) + (1 - alpha) * bm25Score(d) ``` ### Hybrid Search Pipeline ```typescript import { createHybridSearch, InMemoryBM25Index, InMemoryVectorStore, } from "@juspay/neurolink"; // Create indices const bm25Index = new InMemoryBM25Index({ k1: 1.2, b: 0.75 }); const vectorStore = new InMemoryVectorStore(); // Add documents to both indices const documents = [ { id: "doc1", text: "Machine learning fundamentals...", metadata: { topic: "ml" }, }, { id: "doc2", text: "Deep learning architectures...", metadata: { topic: "dl" }, }, ]; await bm25Index.addDocuments(documents); await vectorStore.addDocuments(documents); // Create hybrid search const hybridSearch = createHybridSearch({ bm25Index, vectorStore, fusionMethod: "rrf", // 'rrf' | 'linear' alpha: 0.5, // Vector weight (for linear fusion) k: 60, // RRF parameter }); // Execute search const results = await hybridSearch.search("neural network training", { topK: 10, filter: { topic: "ml" }, }); ``` ### BM25 Configuration ```typescript type BM25Config = { k1: number; // Term frequency saturation (default: 1.2) b: number; // Document length normalization (default: 0.75) lowercase: boolean; // Normalize to lowercase (default: true) stemming: boolean; // Apply stemming (default: false) stopwords: string[]; // Words to ignore (default: English stopwords) }; ``` ## Reranking Reranking re-scores initial search results for improved relevance. ### Available Reranker Types | Type | Description | Requires Model | Best For | | --------------- | ----------------------------------- | -------------- | ------------------------ | | `simple` | Position + vector score combination | No | Fast, cost-free baseline | | `llm` | LLM semantic relevance scoring | Yes | High-quality semantic | | `cross-encoder` | Cross-encoder model scoring | Yes | Accuracy-focused tasks | | `cohere` | Cohere Rerank API | API Key | Production-grade results | | `batch` | Batch LLM processing | Yes | Large result sets | ### Reranker Configuration ```typescript import { createReranker, getAvailableRerankerTypes } from "@juspay/neurolink"; // List available types const types = getAvailableRerankerTypes(); // ['simple', 'llm', 'cross-encoder', 'cohere', 'batch'] // Simple reranker (no model required) const simpleReranker = await createReranker("simple", { topK: 10, positionWeight: 0.3, scoreWeight: 0.7, }); // LLM reranker (requires model) const llmReranker = await createReranker("llm", { topK: 5, model: "gemini-3-flash-preview", temperature: 0.0, batchSize: 5, }); // Cohere reranker (requires API key) const cohereReranker = await createReranker("cohere", { topK: 10, model: "rerank-v3.5", maxChunksPerDoc: 10, }); // Rerank results const reranked = await simpleReranker.rerank(searchResults, query, { topK: 5 }); ``` ### Batch Reranking for Large Sets ```typescript import { batchRerank } from "@juspay/neurolink"; // Process large result sets efficiently const reranked = await batchRerank(searchResults, query, { batchSize: 10, parallelBatches: 3, model: "gemini-3-flash-preview", topK: 20, }); ``` ## Metadata Extraction Extract structured metadata from chunks using LLMs. ### Extraction Types | Type | Description | Output | | ----------- | ------------------------- | ------------------------- | | `title` | Document/section title | `string` | | `summary` | Brief content summary | `string` | | `keywords` | Relevant keywords | `string[]` | | `questions` | Q&A pairs for the content | `{question, answer}[]` | | `custom` | Custom schema extraction | `Record` | ### Usage ```typescript import { createMetadataExtractor, extractMetadata, LLMMetadataExtractor, } from "@juspay/neurolink"; // Using factory const extractor = await createMetadataExtractor("llm", { provider: "vertex", modelName: "gemini-3-flash-preview", }); // Extract metadata from chunks const results = await extractor.extract(chunks, { title: true, summary: true, keywords: true, questions: { maxQuestions: 3 }, }); // Results include extracted metadata per chunk console.log(results[0]); // { // title: "Introduction to Machine Learning", // summary: "This section covers the fundamentals...", // keywords: ["machine learning", "supervised learning", "classification"], // questions: [ // { question: "What is supervised learning?", answer: "..." } // ] // } ``` ## Configuration Reference ### Chunker Configuration | Option | Type | Default | Description | | ------------ | ------------------------- | --------- | ---------------------------------- | | `maxSize` | `number` | `1000` | Maximum chunk size (chars/tokens) | | `overlap` | `number` | `200` | Overlap between chunks | | `minSize` | `number` | `50` | Minimum chunk size | | `documentId` | `string` | auto-UUID | Document identifier for metadata | | `metadata` | `Record` | `{}` | Additional metadata for all chunks | ### Reranker Configuration | Option | Type | Default | Description | | ----------------------- | --------- | ------- | ------------------------------- | | `topK` | `number` | `10` | Number of top results to return | | `minScore` | `number` | `0.0` | Minimum score threshold | | `includeOriginalScores` | `boolean` | `false` | Include original scores | ### Hybrid Search Configuration | Option | Type | Default | Description | | -------------- | ------------------- | ------- | --------------------------- | | `fusionMethod` | `'rrf' \| 'linear'` | `'rrf'` | Score fusion method | | `alpha` | `number` | `0.5` | Vector weight (linear only) | | `k` | `number` | `60` | RRF k parameter | | `topK` | `number` | `10` | Results to return | ### Environment Variables | Variable | Description | Required | | -------------------------------- | ----------------------------------------- | -------- | | `GOOGLE_APPLICATION_CREDENTIALS` | For Vertex AI (service account JSON path) | Yes | | `OPENAI_API_KEY` | For OpenAI provider | Optional | | `COHERE_API_KEY` | For Cohere reranker | Optional | | `ANTHROPIC_API_KEY` | For Claude-based reranking | Optional | ## Advanced Usage ### Integration with Observability Track RAG operations with Langfuse for debugging and optimization: ```typescript import { setLangfuseContext } from "@juspay/neurolink"; import { RAGPipeline } from "@juspay/neurolink"; const pipeline = new RAGPipeline(config); await setLangfuseContext( { userId: "user-123", sessionId: "session-456", operationName: "rag-query", metadata: { pipeline: "customer-support", chunkingStrategy: "markdown", }, }, async () => { const response = await pipeline.query("How do I reset my password?"); return response; }, ); ``` ### Integration with Guardrails Validate RAG inputs and outputs with guardrails: ```typescript import { createGuardrail, validateInput, validateOutput, } from "@juspay/neurolink"; import { RAGPipeline } from "@juspay/neurolink"; // Create guardrails for RAG const inputGuardrail = createGuardrail({ type: "input", rules: [ { type: "maxLength", value: 1000 }, { type: "noPersonalInfo", enabled: true }, ], }); const outputGuardrail = createGuardrail({ type: "output", rules: [ { type: "factualOnly", enabled: true }, { type: "noPII", enabled: true }, ], }); // Apply guardrails to RAG pipeline const validatedQuery = await validateInput(inputGuardrail, query); const response = await pipeline.query(validatedQuery); const validatedResponse = await validateOutput( outputGuardrail, response.answer, ); ``` ### Custom Chunker Registration Extend the chunker registry with custom implementations: ```typescript import { ChunkerRegistry } from "@juspay/neurolink"; import type { Chunker, ChunkerConfig } from "@juspay/neurolink"; // Define custom chunker class CustomChunker implements Chunker { constructor(private config?: ChunkerConfig) {} async chunk(text: string, options?: ChunkerConfig) { // Custom chunking logic const maxSize = options?.maxSize ?? this.config?.maxSize ?? 500; // ... implementation } } // Register with the registry ChunkerRegistry.register("custom", CustomChunker, { name: "Custom Chunker", description: "My custom chunking strategy", aliases: ["my-chunker"], defaultConfig: { maxSize: 500 }, }); // Now use it const chunker = await createChunker("custom", { maxSize: 800 }); ``` ### Graph RAG Use knowledge graphs for relationship-aware retrieval: ```typescript import { GraphRAG } from "@juspay/neurolink"; // Create graph with similarity threshold for edge creation const graphRag = new GraphRAG({ dimension: 1536, // Embedding dimension threshold: 0.7, // Similarity threshold for creating edges }); // Build graph from chunks and their embeddings const chunks = [ { text: "Machine learning basics", metadata: { topic: "ml" } }, { text: "Neural networks", metadata: { topic: "dl" } }, ]; const embeddings = [ { vector: [0.1, 0.2 /* ... */] }, { vector: [0.15, 0.25 /* ... */] }, ]; graphRag.createGraph(chunks, embeddings); // Or add nodes incrementally const nodeId = graphRag.addNode( { text: "Deep learning", metadata: { topic: "dl" } }, { vector: [0.12, 0.22 /* ... */] }, ); // Query with embedding vector using random walk with restart const results = graphRag.query({ query: queryEmbedding, // Query embedding vector topK: 10, randomWalkSteps: 100, restartProb: 0.15, }); // Get graph statistics const stats = graphRag.getStats(); // { nodeCount: 3, edgeCount: 4, avgDegree: 1.33, threshold: 0.7 } ``` ### Resilience Patterns Use circuit breakers and retry handlers for production reliability: ```typescript import { RAGCircuitBreaker, RAGRetryHandler } from "@juspay/neurolink"; // Circuit breaker for external API calls const breaker = new RAGCircuitBreaker("reranker-api", { failureThreshold: 5, resetTimeout: 60000, halfOpenMaxCalls: 3, operationTimeout: 30000, }); // Wrap reranker calls const result = await breaker.execute(async () => { return await cohereReranker.rerank(results, query); }, "rerank"); // Listen to circuit breaker events breaker.on("stateChange", ({ oldState, newState, reason }) => { console.log(`Circuit breaker: ${oldState} -> ${newState} (${reason})`); }); // Retry handler with exponential backoff const retryHandler = new RAGRetryHandler({ maxRetries: 3, initialDelay: 1000, maxDelay: 30000, backoffMultiplier: 2, jitter: true, }); const chunks = await retryHandler.executeWithRetry(async () => { return await chunker.chunk(largeDocument); }); ``` ## CLI Usage NeuroLink CLI provides commands for RAG operations. ### Document Processing ```bash # Chunk a document neurolink rag chunk ./document.md --strategy markdown --max-size 1000 --overlap 100 # Chunk with output to file neurolink rag chunk ./document.md -s recursive --format json --output chunks.json # Process multiple documents (use shell loop) for file in ./docs/*.md; do neurolink rag chunk "$file" --strategy markdown --format json; done ``` ### Index Management ```bash # Build an index from a document neurolink rag index ./docs/guide.md --indexName my-docs --provider vertex --model gemini-3-flash-preview # Query an existing index neurolink rag query "What are the main features?" --indexName my-docs --topK 5 --provider vertex --model gemini-3-flash-preview # Index with Graph RAG enabled neurolink rag index ./docs/guide.md --indexName my-docs --graph --provider vertex --model gemini-3-flash-preview ``` ## Simplified RAG API (`rag: { files }`) > **Since**: v9.2.0 | **Recommended** for most use cases Instead of manually creating chunkers, vector stores, and tools, pass `rag: { files }` directly to `generate()` or `stream()`. NeuroLink handles the entire pipeline automatically. ### SDK Usage ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); // Generate with RAG - just pass files const result = await neurolink.generate({ input: { text: "What are the key features described in the docs?" }, rag: { files: ["./docs/guide.md", "./docs/api.md"], strategy: "markdown", // Optional: auto-detected from file extension chunkSize: 512, // Optional: default 1000 chunkOverlap: 50, // Optional: default 200 topK: 5, // Optional: default 5 }, }); // Stream with RAG - identical API const streamResult = await neurolink.stream({ input: { text: "Summarize the architecture" }, rag: { files: ["./docs/architecture.md"] }, }); for await (const chunk of streamResult.stream) { if ("content" in chunk) { process.stdout.write(chunk.content); } } ``` ### CLI Usage ```bash # Basic RAG with generate neurolink generate "What is this about?" --rag-files ./docs/guide.md # RAG with custom chunking strategy neurolink generate "Explain the API" --rag-files ./docs/guide.md --rag-strategy markdown --rag-chunk-size 512 # RAG with streaming and multiple files neurolink stream "Summarize everything" --rag-files ./docs/a.md ./docs/b.md --rag-top-k 10 ``` ### CLI Flags Reference | Flag | Type | Default | Description | | --------------------- | ---------- | ------------- | ------------------------------------------------------------------------------------------------------------------- | | `--rag-files` | `string[]` | - | File paths to load for RAG context | | `--rag-strategy` | `string` | auto-detected | Chunking strategy (character, recursive, sentence, token, markdown, html, json, latex, semantic, semantic-markdown) | | `--rag-chunk-size` | `number` | 1000 | Maximum chunk size in characters | | `--rag-chunk-overlap` | `number` | 200 | Overlap between adjacent chunks | | `--rag-top-k` | `number` | 5 | Number of top results to retrieve | ### RAGConfig Type ```typescript type RAGConfig = { files: string[]; // Required: file paths to load strategy?: ChunkingStrategy; // Default: auto-detected from file extension chunkSize?: number; // Default: 1000 chunkOverlap?: number; // Default: 200 topK?: number; // Default: 5 toolName?: string; // Default: "search_knowledge_base" toolDescription?: string; // Custom tool description embeddingProvider?: string; // Defaults to generation provider embeddingModel?: string; // Defaults to provider's default }; ``` ### How It Works 1. Files are loaded from disk and auto-detected for chunking strategy (`.md` -> markdown, `.html` -> html, `.json` -> json, etc.) 2. Content is chunked using the selected strategy with configurable size and overlap 3. Chunks are embedded using a simple character-frequency hash (128 dimensions) and stored in an in-memory vector store 4. A `search_knowledge_base` tool is created and injected into the AI model's available tools 5. A system prompt instructs the AI to use the search tool before answering 6. The AI autonomously decides when to search the knowledge base during generation/streaming ### Auto-Detected Strategies by Extension | Extension | Strategy | | ---------------------------------------------------------------------------------------- | --------- | | `.md`, `.mdx` | markdown | | `.html`, `.htm` | html | | `.json` | json | | `.tex`, `.latex` | latex | | `.txt`, `.csv`, `.xml`, `.yaml`, `.yml` | recursive | | `.ts`, `.js`, `.py`, `.java`, `.go`, `.rs`, `.c`, `.cpp`, `.rb`, `.php`, `.swift`, `.kt` | recursive | ## Best Practices ### Chunking 1. **Match chunk size to model context** - Use token chunker when optimizing for specific LLM context windows 2. **Choose strategy by content type** - Markdown for docs, HTML for web content, JSON for structured data 3. **Use 10-20% overlap** - Prevents context loss at chunk boundaries 4. **Preserve structure when possible** - Format-aware chunkers maintain semantic coherence 5. **Test with your data** - Optimal settings vary by domain and use case ### Reranking 1. **Start with simple reranker** - Fast, free, and often sufficient for basic use cases 2. **Use LLM reranking for quality** - When accuracy matters more than latency 3. **Batch large result sets** - Use batch reranker for 50+ results 4. **Consider cost** - API-based rerankers (Cohere) have per-call costs 5. **Cache reranking results** - Results for the same query/docs can be reused ### Hybrid Search 1. **Start with RRF** - Robust to score scale differences, less tuning needed 2. **Tune alpha for linear fusion** - Start at 0.5, adjust based on evaluation 3. **Keep indices in sync** - Update both BM25 and vector indices together 4. **Filter early** - Apply metadata filters before fusion when possible 5. **Monitor retrieval quality** - Track precision/recall metrics in production ## Troubleshooting | Problem | Solution | | ----------------------------- | ------------------------------------------------------------------------ | | Empty chunks returned | Check if `maxSize` is too small for your content; try increasing to 500+ | | Duplicate content in chunks | Reduce `overlap` parameter or use a structure-aware chunker | | Missing context at boundaries | Increase `overlap` to 15-20% of `maxSize` | | Slow reranking performance | Switch to `simple` reranker or reduce `topK` before reranking | | Poor search quality | Tune BM25 parameters (`k1`, `b`) or adjust fusion `alpha` weight | | Out of memory with large docs | Process documents in batches; use streaming where available | | Reranker API timeouts | Use `CircuitBreaker` wrapper; reduce batch size | | Inconsistent chunk metadata | Ensure `documentId` is set consistently across processing runs | ### Debug Logging ```bash # Enable verbose logging for RAG operations DEBUG=neurolink:rag:* npx tsx your-script.ts # Log specific components DEBUG=neurolink:rag:chunker npx tsx your-script.ts DEBUG=neurolink:rag:reranker npx tsx your-script.ts DEBUG=neurolink:rag:hybrid npx tsx your-script.ts ``` ## API Reference ### Core Exports **Document Processing:** - `loadDocument(path)` - Load a single document - `loadDocuments(paths)` - Load multiple documents - `MDocument` - Fluent document processing class - `processDocument(text, options)` - Process text through chunking and metadata extraction **Chunking:** - `createChunker(strategy, config)` - Create a chunker instance - `ChunkerFactory` - Factory for chunker creation - `ChunkerRegistry` - Registry with all chunker implementations - `getAvailableStrategies()` - List available chunking strategies - `getRecommendedStrategy(contentType)` - Get recommended strategy for content type **Reranking:** - `createReranker(type, config)` - Create a reranker instance - `RerankerFactory` - Factory for reranker creation - `RerankerRegistry` - Registry with all reranker implementations - `getAvailableRerankerTypes()` - List available reranker types - `rerank(results, query, model)` - Direct reranking function - `batchRerank(results, query, options)` - Batch reranking **Retrieval:** - `createHybridSearch(config)` - Create hybrid search instance - `InMemoryBM25Index` - In-memory BM25 index - `InMemoryVectorStore` - In-memory vector store - `reciprocalRankFusion(rankings, k)` - RRF score fusion - `linearCombination(vectorScores, bm25Scores, alpha)` - Linear score fusion - `createVectorQueryTool(vectorStore, options)` - Create vector query tool **Metadata:** - `createMetadataExtractor(type, config)` - Create metadata extractor - `LLMMetadataExtractor` - LLM-powered extractor class - `extractMetadata(chunks, params)` - Extract metadata from chunks **Pipeline:** - `RAGPipeline` - Full RAG pipeline class - `createRAGPipeline(config)` - Create pipeline instance - `assembleContext(chunks, options)` - Assemble context from chunks - `formatContextWithCitations(chunks, format)` - Format with citations **Resilience:** - `RAGCircuitBreaker` - Circuit breaker pattern for RAG operations - `RAGRetryHandler` - Retry with exponential backoff and jitter **Types:** - `Chunk`, `ChunkMetadata`, `ChunkerConfig` - `Reranker`, `RerankerConfig`, `RerankerType` - `HybridSearchOptions`, `BM25Config` - `RAGPipelineConfig`, `RAGResponse` - `MetadataExtractor`, `MetadataExtractorConfig` ## See Also - [RAG Configuration Guide](../rag/CONFIGURATION.md) - Detailed configuration reference - [RAG Testing Guide](../rag/TESTING.md) - Testing RAG pipelines - [Observability Guide](./observability.md) - Tracing and monitoring - [Guardrails Guide](./guardrails.md) - Input/output validation - [Vector Store Integrations](../guides/vector-stores.md) - Production vector stores