--- name: elasticsearch-best-practices description: Elasticsearch development best practices for indexing, querying, and search optimization --- # Elasticsearch Best Practices ## Core Principles - Design indices and mappings based on query patterns - Optimize for search performance with proper analysis and indexing - Use appropriate shard sizing and cluster configuration - Implement proper security and access control - Monitor cluster health and optimize queries ## Index Design ### Mapping Best Practices - Define explicit mappings instead of relying on dynamic mapping - Use appropriate data types for each field - Disable indexing for fields you do not search on - Use keyword type for exact matches, text for full-text search ```json { "mappings": { "properties": { "product_id": { "type": "keyword" }, "name": { "type": "text", "analyzer": "standard", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "analyzer": "english" }, "price": { "type": "scaled_float", "scaling_factor": 100 }, "category": { "type": "keyword" }, "tags": { "type": "keyword" }, "created_at": { "type": "date" }, "metadata": { "type": "object", "enabled": false }, "location": { "type": "geo_point" } } } } ``` ### Field Types - `keyword`: Exact values, filtering, aggregations, sorting - `text`: Full-text search with analysis - `date`: Date/time values with format specification - `numeric types`: long, integer, short, byte, double, float, scaled_float - `boolean`: True/false values - `geo_point`: Latitude/longitude pairs - `nested`: Arrays of objects that need independent querying ### Index Settings ```json { "settings": { "number_of_shards": 3, "number_of_replicas": 1, "refresh_interval": "30s", "analysis": { "analyzer": { "custom_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "asciifolding", "synonym_filter"] } }, "filter": { "synonym_filter": { "type": "synonym", "synonyms": ["laptop, notebook", "phone, mobile, smartphone"] } } } } } ``` ## Shard Sizing ### Guidelines - Target 20-40GB per shard - Aim for ~20 shards per GB of heap - Avoid oversharding (too many small shards) - Consider time-based indices for time-series data ```json { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } } ``` ### Index Lifecycle Management (ILM) ```json { "policy": { "phases": { "hot": { "min_age": "0ms", "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" } } }, "warm": { "min_age": "30d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } }, "delete": { "min_age": "90d", "actions": { "delete": {} } } } } } ``` ## Query Optimization ### Query Types #### Match Query (Full-text search) ```json { "query": { "match": { "description": { "query": "wireless bluetooth headphones", "operator": "and", "fuzziness": "AUTO" } } } } ``` #### Term Query (Exact match) ```json { "query": { "term": { "status": "active" } } } ``` #### Bool Query (Combining queries) ```json { "query": { "bool": { "must": [ { "match": { "name": "laptop" } } ], "filter": [ { "term": { "category": "electronics" } }, { "range": { "price": { "gte": 500, "lte": 2000 } } } ], "should": [ { "term": { "brand": "apple" } } ], "must_not": [ { "term": { "status": "discontinued" } } ] } } } ``` ### Query Best Practices - Use `filter` context for non-scoring queries (cacheable) - Use `must` only when scoring is needed - Avoid wildcards at the beginning of terms - Use `keyword` fields for exact matches - Limit result size with `size` parameter ```json { "query": { "bool": { "must": { "multi_match": { "query": "search terms", "fields": ["name^3", "description", "tags^2"], "type": "best_fields" } }, "filter": [ { "term": { "active": true } }, { "range": { "created_at": { "gte": "now-30d" } } } ] } }, "size": 20, "from": 0, "_source": ["name", "price", "category"] } ``` ## Aggregations ### Common Aggregation Patterns ```json { "size": 0, "aggs": { "categories": { "terms": { "field": "category", "size": 10 }, "aggs": { "avg_price": { "avg": { "field": "price" } } } }, "price_ranges": { "range": { "field": "price", "ranges": [ { "to": 100 }, { "from": 100, "to": 500 }, { "from": 500 } ] } }, "date_histogram": { "date_histogram": { "field": "created_at", "calendar_interval": "month" } } } } ``` ### Aggregation Best Practices - Use `size: 0` when you only need aggregations - Set appropriate `shard_size` for terms aggregations - Use composite aggregations for pagination - Consider using `aggs` filters to narrow scope ## Indexing Best Practices ### Bulk Indexing ```json POST _bulk { "index": { "_index": "products", "_id": "1" } } { "name": "Product 1", "price": 99.99 } { "index": { "_index": "products", "_id": "2" } } { "name": "Product 2", "price": 149.99 } ``` ### Bulk API Guidelines - Use bulk API for batch operations - Optimal bulk size: 5-15MB per request - Monitor for rejected requests (thread pool queue full) - Disable refresh during bulk indexing for better performance ```json PUT /products/_settings { "refresh_interval": "-1" } // After bulk indexing: PUT /products/_settings { "refresh_interval": "1s" } POST /products/_refresh ``` ### Document Updates ```json POST /products/_update/1 { "doc": { "price": 89.99, "updated_at": "2024-01-15T10:30:00Z" } } // Update by query POST /products/_update_by_query { "query": { "term": { "category": "electronics" } }, "script": { "source": "ctx._source.on_sale = true" } } ``` ## Analysis and Tokenization ### Custom Analyzers ```json { "settings": { "analysis": { "analyzer": { "product_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "asciifolding", "english_stop", "english_stemmer" ] }, "autocomplete_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "edge_ngram_filter" ] } }, "filter": { "english_stop": { "type": "stop", "stopwords": "_english_" }, "english_stemmer": { "type": "stemmer", "language": "english" }, "edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 15 } } } } } ``` ### Test Analyzer ```json POST /products/_analyze { "analyzer": "product_analyzer", "text": "Wireless Bluetooth Headphones" } ``` ## Search Features ### Autocomplete/Suggestions ```json { "mappings": { "properties": { "name": { "type": "text", "fields": { "suggest": { "type": "completion" } } } } } } // Query suggestions { "suggest": { "product-suggest": { "prefix": "wire", "completion": { "field": "name.suggest", "size": 5 } } } } ``` ### Highlighting ```json { "query": { "match": { "description": "wireless" } }, "highlight": { "fields": { "description": { "pre_tags": [""], "post_tags": [""], "fragment_size": 150 } } } } ``` ## Performance Optimization ### Query Caching - Filter queries are cached automatically - Use `filter` context for frequently repeated conditions - Monitor cache hit rates ### Search Performance - Avoid deep pagination (use `search_after` instead) - Limit `_source` fields returned - Use `doc_values` for sorting and aggregations - Pre-sort index for common sort orders ```json { "query": { "match_all": {} }, "size": 20, "search_after": [1705329600000, "product_123"], "sort": [ { "created_at": "desc" }, { "_id": "asc" } ] } ``` ## Monitoring and Maintenance ### Cluster Health ``` GET _cluster/health GET _cat/indices?v GET _cat/shards?v GET _nodes/stats ``` ### Index Maintenance ``` POST /products/_forcemerge?max_num_segments=1 POST /products/_cache/clear POST /products/_refresh ``` ### Slow Query Log ```json PUT /products/_settings { "index.search.slowlog.threshold.query.warn": "10s", "index.search.slowlog.threshold.query.info": "5s", "index.search.slowlog.threshold.fetch.warn": "1s" } ``` ## Security ### Index-Level Security ```json PUT _security/role/products_reader { "indices": [ { "names": ["products*"], "privileges": ["read"] } ] } ``` ### Field-Level Security ```json PUT _security/role/limited_access { "indices": [ { "names": ["users"], "privileges": ["read"], "field_security": { "grant": ["name", "email", "created_at"] } } ] } ``` ## Aliases and Reindexing ### Index Aliases ```json POST _aliases { "actions": [ { "add": { "index": "products_v2", "alias": "products" } }, { "remove": { "index": "products_v1", "alias": "products" } } ] } ``` ### Reindex with Transformation ```json POST _reindex { "source": { "index": "products_v1" }, "dest": { "index": "products_v2" }, "script": { "source": "ctx._source.migrated_at = new Date().toString()" } } ```