# Full-Text Search Full-text search enables keyword-based text retrieval using BM25 scoring. Milvus supports this through **collection functions** that automatically tokenize text and convert it to sparse vectors at insert and query time. ## How It Works 1. You define a VarChar field for text and a SparseFloatVector field for the generated vectors 2. You add a BM25 function to the collection that maps the text field to the sparse vector field 3. When you insert text, Milvus automatically tokenizes it and generates sparse vectors 4. When you search, you pass text queries and Milvus automatically converts them to sparse vectors for matching ## Testing Analyzers Before creating a collection, you can test how an analyzer tokenizes text using `runAnalyzer()`: ```javascript const result = await client.runAnalyzer({ text: ['machine learning is great', 'deep learning fundamentals'], analyzer_params: { type: 'standard', }, }); console.log('Results:', result.results); // Each text input produces an AnalyzerResult with tokens ``` ### Analyzer Types | Type | Description | |------|-------------| | `standard` | Standard tokenizer with lowercase filter | | `english` | English-specific tokenizer with stemming | | `chinese` | Chinese text tokenizer | ## Creating a Full-Text Search Collection ```javascript import { MilvusClient, DataType, FunctionType } from '@zilliz/milvus2-sdk-node'; const client = new MilvusClient({ address: 'localhost:19530' }); // 1. Create collection with text and sparse vector fields await client.createCollection({ collection_name: 'articles', fields: [ { name: 'id', data_type: DataType.Int64, is_primary_key: true, autoID: true }, { name: 'title', data_type: DataType.VarChar, max_length: 256, enable_analyzer: true }, { name: 'body', data_type: DataType.VarChar, max_length: 10000, enable_analyzer: true }, { name: 'sparse_vector', data_type: DataType.SparseFloatVector }, ], functions: [ { name: 'bm25', type: FunctionType.BM25, input_field_names: ['body'], output_field_names: ['sparse_vector'], }, ], }); ``` ## Adding Functions to Existing Collections You can also add a BM25 function to an existing collection: ```javascript await client.addCollectionFunction({ collection_name: 'articles', function: { name: 'title_bm25', type: FunctionType.BM25, input_field_names: ['title'], output_field_names: ['title_sparse'], params: {}, }, }); ``` ## Searching with Full-Text Pass a text string as the search data. Milvus uses the BM25 function to convert it to a sparse vector automatically: ```javascript // Create index and load await client.createIndex({ collection_name: 'articles', field_name: 'sparse_vector', index_type: 'SPARSE_INVERTED_INDEX', metric_type: 'BM25', }); await client.loadCollectionSync({ collection_name: 'articles' }); // Insert documents await client.insert({ collection_name: 'articles', data: [ { title: 'Introduction to Machine Learning', body: 'Machine learning is a branch of AI...' }, { title: 'Deep Learning Basics', body: 'Deep learning uses neural networks...' }, ], }); // Search by text const results = await client.search({ collection_name: 'articles', data: ['machine learning algorithms'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'], }); console.log('Results:', results.results); ``` ## Highlighting Search Results Use the `highlighter` parameter to highlight matched text fragments in search results. ### Lexical Highlighting Highlights exact keyword matches: ```javascript const results = await client.search({ collection_name: 'articles', data: ['machine learning'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'], highlighter: { type: 0, // HighlightType.Lexical pre_tags: [''], post_tags: [''], fragment_size: 100, num_of_fragments: 3, }, }); ``` ### Semantic Highlighting Highlights semantically similar text: ```javascript const results = await client.search({ collection_name: 'articles', data: ['machine learning'], anns_field: 'sparse_vector', limit: 10, output_fields: ['title', 'body'], highlighter: { type: 1, // HighlightType.Semantic queries: ['machine learning'], input_fields: ['body'], pre_tags: [''], post_tags: [''], threshold: 0.5, }, }); ``` ## Managing Collection Functions ```javascript // List functions on a collection const desc = await client.describeCollection({ collection_name: 'articles' }); console.log('Functions:', desc.schema.functions); // Alter a function await client.alterCollectionFunction({ collection_name: 'articles', function_name: 'bm25', function: { name: 'bm25', type: FunctionType.BM25, input_field_names: ['body'], output_field_names: ['sparse_vector'], params: { key: 'new_value' }, }, }); // Drop a function await client.dropCollectionFunction({ collection_name: 'articles', function_name: 'bm25', }); ``` ## Next Steps - Learn about [Hybrid Search](/operations/hybrid-search) for combining full-text with vector search - Explore [Data Types & Schemas](/core-concepts/data-types-schemas) for SparseFloatVector details ## Commit ```bash git add docs/content/advanced/full-text-search.mdx git commit --signoff -m "docs: add full-text search documentation page" ```