5.02021-11-21T21:59:16ZTemplates/ApplicationsApp Elasticsearch Cluster by Zabbix agentApp Elasticsearch Cluster by Zabbix agent## Description
This is the "Zabbix agent" version of the Elasticsearch template which ships with Zabbix 5.0 - Evren Yurtesen The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests. You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME}, {$ELASTICSEARCH.HOST},{$ELASTICSEARCH.PORT}. You can discuss this template or leave feedback on our forum https://www.zabbix.com/forum/zabbix-suggestions-and-feedback/399473-discussion-thread-for-official-zabbix-template-for-elasticsearch Template tooling version used: 0.35
## Overview
This is the "Zabbix agent" version of the HTTP template shipped with Zabbix 5.0 (<https://www.zabbix.com/integrations/elasticsearch>)
This version can connect to elasticsearch on localohost or a remote network using the zabbix agent.
I have added checking of read-only indices. Elasticsearch makes indices read only if there is too little disk space. Also added collection of cluster\_name as an item.
Please report issues at GitHub (easier to track progress there!)
https://github.com/yurtesen/zabbix\_elasticsearch
Evren Yurtesen
## Author
Evren Yurtesen
Templates/ApplicationsES clusterZabbix raw items- ES: Delayed unassigned shardsDEPENDENTes.cluster.delayed_unassigned_shards07dThe number of shards whose allocation has been delayed by the timeout settings.ES clusterJSONPATH$.delayed_unassigned_shardsDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Inactive shards percentageDEPENDENTes.cluster.inactive_shards_percent_as_number07dFLOAT%The ratio of inactive shards in the cluster expressed as a percentage.ES clusterJSONPATH$.active_shards_percent_as_numberDISCARD_UNCHANGED_HEARTBEAT1hJAVASCRIPTreturn (100 - value)web.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Number of initializing shardsDEPENDENTes.cluster.initializing_shards07dThe number of shards that are under initialization.ES clusterJSONPATH$.initializing_shardsweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]{min(10m)}>0ES: Cluster has the initializing shardsAVERAGEThe cluster has the initializing shards longer than 10 minutes.
- ES: Number of data nodesDEPENDENTes.cluster.number_of_data_nodes07dThe number of nodes that are dedicated to data nodes.ES clusterJSONPATH$.number_of_data_nodesDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Number of nodesDEPENDENTes.cluster.number_of_nodes07dThe number of nodes within the cluster.ES clusterJSONPATH$.number_of_nodesDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]{change()}<0ES: The number of nodes within the cluster has decreasedINFOYES{change()}>0ES: The number of nodes within the cluster has increasedINFOYES
- ES: Number of pending tasksDEPENDENTes.cluster.number_of_pending_tasks07dThe number of cluster-level changes that have not yet been executed.ES clusterJSONPATH$.number_of_pending_tasksweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Number of relocating shardsDEPENDENTes.cluster.relocating_shards07dThe number of shards that are under relocation.ES clusterJSONPATH$.relocating_shardsweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Cluster health statusDEPENDENTes.cluster.status07dHealth status of the cluster, based on the state of its primary and replica shards. Statuses are:
green
All shards are assigned.
yellow
All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.
red
One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.ES clusterES cluster stateJSONPATH$.statusJAVASCRIPTvar state = ['green', 'yellow', 'red'];
return state.indexOf(value.trim()) === -1 ? 255 : state.indexOf(value.trim());
DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]{last()}=2ES: Health is REDHIGHOne or more primary shards are unassigned, so some data is unavailable.
This can occur briefly during cluster startup as primary shards are assigned.{last()}=255ES: Health is UNKNOWNHIGHThe health status of the cluster is unknown or cannot be obtained.{last()}=1ES: Health is YELLOWAVERAGEAll primary shards are assigned, but one or more replica shards are unassigned.
If a node in the cluster fails, some data could be unavailable until that node is repaired.
- ES: Task max waiting in queueDEPENDENTes.cluster.task_max_waiting_in_queue07dFLOATsThe time expressed in seconds since the earliest initiated task is waiting for being performed.ES clusterJSONPATH$.task_max_waiting_in_queue_millisDISCARD_UNCHANGED_HEARTBEAT1hMULTIPLIER0.001web.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]
- ES: Number of unassigned shardsDEPENDENTes.cluster.unassigned_shards07dThe number of shards that are not allocated.ES clusterJSONPATH$.unassigned_shardsweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]{min(10m)}>0ES: Cluster has the unassigned shardsAVERAGEThe cluster has the unassigned shards longer than 10 minutes.
- ES: Cluster nameDEPENDENTes.cluster_name[{#ES.NODE}]07d0CHARName of the cluster this node belongs to.ES clusterJSONPATH$.cluster_nameDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Indices with shards assigned to nodesDEPENDENTes.indices.count07dThe total number of indices with shards assigned to the selected nodes.ES clusterJSONPATH$.indices.countDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Number of non-deleted documentsDEPENDENTes.indices.docs.count07dThe total number of non-deleted documents across all primary shards assigned to the selected nodes.
This number is based on the documents in Lucene segments and may include the documents from nested fields.ES clusterJSONPATH$.indices.docs.countDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Nodes with the data roleDEPENDENTes.nodes.count.data07dThe number of selected nodes with the data role.ES clusterJSONPATH$.nodes.count.dataDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Nodes with the ingest roleDEPENDENTes.nodes.count.ingest07dThe number of selected nodes with the ingest role.ES clusterJSONPATH$.nodes.count.ingestDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Nodes with the master roleDEPENDENTes.nodes.count.master07dThe number of selected nodes with the master role.ES clusterJSONPATH$.nodes.count.masterDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]{last()}=2ES: Cluster has only two master nodesDISASTERThe cluster has only two nodes with a master role and will be unavailable if one of them breaks.
- ES: Total available size to JVM in all file storesDEPENDENTes.nodes.fs.available_in_bytes07dBThe total number of bytes available to JVM in the file stores across all selected nodes.
Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes.
This is the actual amount of free disk space the selected Elasticsearch nodes can use.ES clusterJSONPATH$.nodes.fs.available_in_bytesDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Total size of all file storesDEPENDENTes.nodes.fs.total_in_bytes07dBThe total size in bytes of all file stores across all selected nodes.ES clusterJSONPATH$.nodes.fs.total_in_bytesDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]
- ES: Cluster uptimeDEPENDENTes.nodes.jvm.max_uptime[{#ES.NODE}]07dFLOATsUptime duration in seconds since JVM has last started.ES clusterJSONPATH$.nodes.jvm.max_uptime_in_millisMULTIPLIER0.001web.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]{last()}<10mES: Cluster has been restarted (uptime < 10m)INFOUptime is less than 10 minutesYES
- ES: Service response timenet.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]7dFLOATsChecks performance of the TCP service.ES cluster{min(5m)}>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m)WARNINGThe performance of the TCP service is very low.YESES: Service is down{App Elasticsearch Cluster by Zabbix agent:net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"].last()}=0
- ES: Service statusnet.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]7dChecks if the service is running and accepting TCP connections.ES clusterService stateDISCARD_UNCHANGED_HEARTBEAT10m{last()}=0ES: Service is downAVERAGEThe service is unavailable or does not accept TCP connections.YES
- ES: Get cluster healthweb.page.get[{$ELASTICSEARCH.HOST},_cluster/health?timeout=5s,{$ELASTICSEARCH.PORT}]00TEXTReturns the health status of a cluster.Zabbix raw itemsREGEX\n\s?\n(.*)
\1
- ES: Get cluster statsweb.page.get[{$ELASTICSEARCH.HOST},_cluster/stats,{$ELASTICSEARCH.PORT}]00TEXTReturns cluster statistics.Zabbix raw itemsREGEX\n\s?\n(.*)
\1
- ES: Get nodes statsweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]00TEXTReturns cluster nodes statistics.Zabbix raw itemsREGEX\n\s?\n(.*)
\1
- ES: Get index settingsweb.page.get[{$ELASTICSEARCH.HOST},_settings,{$ELASTICSEARCH.PORT}]00TEXTReturns index settings.Zabbix raw itemsREGEX\n\s?\n(.*)
\1
Index settings discoveryDEPENDENTes.index.settings0Discovery ES index settingsES {#ES.INDEX_NAME}: read_only_allow_deleteDEPENDENTes.index.read_only_allow_delete[{#ES.INDEX_NAME}]0Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage.ES IndicesJSONPATH$..['{#ES.INDEX_NAME}'].settings.index.blocks.read_only_allow_delete.first()DISCARD_VALUEBOOL_TO_DECIMALDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_settings,{$ELASTICSEARCH.PORT}]{last()}=1Read-only index "{#ES.INDEX_NAME}"HIGHThe index setting index.read_only_allow_delete is set to true when the index and index metadata are read only. It is set to false when ES allows writes and metadata changes. ES allows deleting the index to free up resources even when this setting is set to true.YESweb.page.get[{$ELASTICSEARCH.HOST},_settings,{$ELASTICSEARCH.PORT}]{#ES.INDEX_NAME}$..provided_name.first()JSONPATH$.[*]Cluster nodes discoveryweb.page.get[{$ELASTICSEARCH.HOST},_nodes/_all/nodes,{$ELASTICSEARCH.PORT}]1hDiscovery ES cluster nodes.ES {#ES.NODE}: Total available sizeDEPENDENTes.node.fs.total.available_in_bytes[{#ES.NODE}]07dBThe total number of bytes available to this Java virtual machine on all file stores.
Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes.
This is the actual amount of free disk space the Elasticsearch node can utilize.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total sizeDEPENDENTes.node.fs.total.total_in_bytes[{#ES.NODE}]07dBTotal size (in bytes) of all file stores.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1dweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Number of open HTTP connectionsDEPENDENTes.node.http.current_open[{#ES.NODE}]07dThe number of currently open HTTP connections for the node.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].http.current_open.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Rate of HTTP connections openedDEPENDENTes.node.http.opened.rate[{#ES.NODE}]07dFLOATrpsThe number of HTTP connections opened for the node per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].http.total_opened.first()CHANGE_PER_SECONDweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Flush latencyCALCULATEDes.node.indices.flush.latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.flush.total[{#ES.NODE}]) + (last(es.node.indices.flush.total[{#ES.NODE}]) = 0) )The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m)WARNINGIf you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate
and eventually prevent you from being able to add new information to your index.ES {#ES.NODE}: Total number of index flushes to diskDEPENDENTes.node.indices.flush.total[{#ES.NODE}]07dThe total number of flush operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total time spent on flushing indices to diskDEPENDENTes.node.indices.flush.total_time_in_millis[{#ES.NODE}]07dmsTotal time in milliseconds spent performing flush operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Current indexing operationsDEPENDENTes.node.indices.indexing.index_current[{#ES.NODE}]07dThe number of indexing operations currently running.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Indexing latencyCALCULATEDes.node.indices.indexing.index_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.indexing.index_total[{#ES.NODE}]) + (last(es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )The average indexing latency calculated from the available index_total and index_time_in_millis metrics.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m)WARNINGIf the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch’s documentation
recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).ES {#ES.NODE}: Total time spent performing indexingDEPENDENTes.node.indices.indexing.index_time_in_millis[{#ES.NODE}]07dmsTotal time in milliseconds spent performing indexing operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total number of indexingDEPENDENTes.node.indices.indexing.index_total[{#ES.NODE}]07dThe total number of indexing operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Time spent throttling operationsDEPENDENTes.node.indices.indexing.throttle_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Time spent throttling merge operationsDEPENDENTes.node.indices.merges.total_throttled_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling merge operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Time spent throttling recovery operationsDEPENDENTes.node.indices.recovery.throttle_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling recovery operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Rate of index refreshesDEPENDENTes.node.indices.refresh.rate[{#ES.NODE}]07dFLOATrpsThe number of refresh operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()CHANGE_PER_SECONDDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Time spent performing refreshDEPENDENTes.node.indices.refresh.time[{#ES.NODE}]07dFLOATsTime in seconds spent performing refresh operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Rate of fetchDEPENDENTes.node.indices.search.fetch.rate[{#ES.NODE}]07dFLOATrpsThe number of fetch operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()CHANGE_PER_SECONDweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Current fetch operationsDEPENDENTes.node.indices.search.fetch_current[{#ES.NODE}]07dThe number of fetch operations currently running.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Fetch latencyCALCULATEDes.node.indices.search.fetch_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.search.fetch_total[{#ES.NODE}]) + (last(es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m)WARNINGThe fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing,
this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.),
or requesting too many results.ES {#ES.NODE}: Time spent performing fetchDEPENDENTes.node.indices.search.fetch_time[{#ES.NODE}]07dFLOATsTime in seconds spent performing fetch operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total time spent performing fetchDEPENDENTes.node.indices.search.fetch_time_in_millis[{#ES.NODE}]07dmsTime in milliseconds spent performing fetch operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total number of fetchDEPENDENTes.node.indices.search.fetch_total[{#ES.NODE}]07dThe total number of fetch operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Rate of queriesDEPENDENTes.node.indices.search.query.rate[{#ES.NODE}]07dFLOATrpsThe number of query operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()CHANGE_PER_SECONDDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Current query operationsDEPENDENTes.node.indices.search.query_current[{#ES.NODE}]07dThe number of query operations currently running.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Query latencyCALCULATEDes.node.indices.search.query_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.search.query_time_in_millis[{#ES.NODE}]) /
( last(es.node.indices.search.query_total[{#ES.NODE}]) + (last(es.node.indices.search.query_total[{#ES.NODE}]) = 0) )The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m)WARNINGIf latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.ES {#ES.NODE}: Time spent performing queryDEPENDENTes.node.indices.search.query_time[{#ES.NODE}]07dFLOATsTime in seconds spent performing query operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total time spent performing queryDEPENDENTes.node.indices.search.query_time_in_millis[{#ES.NODE}]07dmsTime in milliseconds spent performing query operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Total number of queryDEPENDENTes.node.indices.search.query_total[{#ES.NODE}]07dThe total number of query operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Amount of JVM heap committedDEPENDENTes.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}]07dBThe amount of memory, in bytes, available for use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Maximum JVM memory available for useDEPENDENTes.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}]07dBThe maximum amount of memory, in bytes, available for use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1dweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Amount of JVM heap currently in useDEPENDENTes.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}]07dBThe memory, in bytes, currently in use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Percent of JVM heap currently in useDEPENDENTes.node.jvm.mem.heap_used_percent[{#ES.NODE}]07dFLOAT%The percentage of memory currently in use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]{min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)HIGHThis indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.
To address this problem, you can either increase your heap size (as long as it remains below the recommended
guidelines stated above), or scale out the cluster by adding more nodes.{min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h)WARNINGThis indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.
To address this problem, you can either increase your heap size (as long as it remains below the recommended
guidelines stated above), or scale out the cluster by adding more nodes.ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h){App Elasticsearch Cluster by Zabbix agent:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}ES {#ES.NODE}: Node uptimeDEPENDENTes.node.jvm.uptime[{#ES.NODE}]07dFLOATsJVM uptime in seconds.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()MULTIPLIER0.001web.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]{last()}<10mES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m)INFOUptime is less than 10 minutesYESES {#ES.NODE}: Refresh thread pool active threadsDEPENDENTes.node.thread_pool.refresh.active[{#ES.NODE}]07dThe number of active threads in the refresh thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Refresh thread pool executor tasks completedDEPENDENTes.node.thread_pool.refresh.completed.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks completed by the refresh thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()CHANGE_PER_SECONDDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Refresh thread pool tasks in queueDEPENDENTes.node.thread_pool.refresh.queue[{#ES.NODE}]07dThe number of tasks in queue for the refresh thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Refresh thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.refresh.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the refresh thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()CHANGE_PER_SECONDweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]{min(5m)}>0ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the refresh thread pool executor is over 0 for 5m.ES {#ES.NODE}: Search thread pool active threadsDEPENDENTes.node.thread_pool.search.active[{#ES.NODE}]07dThe number of active threads in the search thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Search thread pool executor tasks completedDEPENDENTes.node.thread_pool.search.completed.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks completed by the search thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()CHANGE_PER_SECONDDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Search thread pool tasks in queueDEPENDENTes.node.thread_pool.search.queue[{#ES.NODE}]07dThe number of tasks in queue for the search thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Search thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.search.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the search thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()CHANGE_PER_SECONDweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]{min(5m)}>0ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the search thread pool executor is over 0 for 5m.ES {#ES.NODE}: Write thread pool active threadsDEPENDENTes.node.thread_pool.write.active[{#ES.NODE}]07dThe number of active threads in the write thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Write thread pool executor tasks completedDEPENDENTes.node.thread_pool.write.completed.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks completed by the write thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()CHANGE_PER_SECONDDISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Write thread pool tasks in queueDEPENDENTes.node.thread_pool.write.queue[{#ES.NODE}]07dThe number of tasks in queue for the write thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()DISCARD_UNCHANGED_HEARTBEAT1hweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]ES {#ES.NODE}: Write thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.write.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the write thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()CHANGE_PER_SECONDweb.page.get[{$ELASTICSEARCH.HOST},_nodes/stats,{$ELASTICSEARCH.PORT}]{min(5m)}>0ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the write thread pool executor is over 0 for 5m.ES {#ES.NODE}: Latency1A7C11- App Elasticsearch Cluster by Zabbix agentes.node.indices.search.query_latency[{#ES.NODE}]
12774A4- App Elasticsearch Cluster by Zabbix agentes.node.indices.indexing.index_latency[{#ES.NODE}]
2F63100- App Elasticsearch Cluster by Zabbix agentes.node.indices.search.fetch_latency[{#ES.NODE}]
3A54F10- App Elasticsearch Cluster by Zabbix agentes.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Query load1A7C11- App Elasticsearch Cluster by Zabbix agentes.node.indices.search.fetch_current[{#ES.NODE}]
12774A4- App Elasticsearch Cluster by Zabbix agentes.node.indices.search.query_current[{#ES.NODE}]
ES {#ES.NODE}: Refresh thread pool1A7C11- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.refresh.active[{#ES.NODE}]
12774A4- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.refresh.queue[{#ES.NODE}]
2F63100- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.refresh.completed.rate[{#ES.NODE}]
3A54F10- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.refresh.rejected.rate[{#ES.NODE}]
ES {#ES.NODE}: Search thread pool1A7C11- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.search.active[{#ES.NODE}]
12774A4- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.search.queue[{#ES.NODE}]
2F63100- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.search.completed.rate[{#ES.NODE}]
3A54F10- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.search.rejected.rate[{#ES.NODE}]
ES {#ES.NODE}: Write thread pool1A7C11- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.write.active[{#ES.NODE}]
12774A4- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.write.queue[{#ES.NODE}]
2F63100- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.write.completed.rate[{#ES.NODE}]
3A54F10- App Elasticsearch Cluster by Zabbix agentes.node.thread_pool.write.rejected.rate[{#ES.NODE}]
{#ES.NODE}$..name.first()REGEX\n\s?\n(.*)
\1JSONPATH$.nodes.[*]{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}100Maximum of fetch latency in milliseconds for trigger expression.{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}100Maximum of flush latency in milliseconds for trigger expression.{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}95The maximum percent in the use of JVM heap for critically trigger expression.{$ELASTICSEARCH.HEAP_USED.MAX.WARN}85The maximum percent in the use of JVM heap for warning trigger expression.{$ELASTICSEARCH.HOST}localhostThe hostname of the Elasticsearch host.{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}100Maximum of indexing latency in milliseconds for trigger expression.{$ELASTICSEARCH.PASSWORD}The password of the Elasticsearch.{$ELASTICSEARCH.PORT}9200The port of the Elasticsearch host.{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}100Maximum of query latency in milliseconds for trigger expression.{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}10sThe ES cluster maximum response time in seconds for trigger expression.{$ELASTICSEARCH.SCHEME}httpThe scheme of the Elasticsearch (http/https).{$ELASTICSEARCH.USERNAME}The username of the Elasticsearch.({App Elasticsearch Cluster by Zabbix agent:es.nodes.fs.total_in_bytes.last()}-{App Elasticsearch Cluster by Zabbix agent:es.nodes.fs.available_in_bytes.last()})/({App Elasticsearch Cluster by Zabbix agent:es.cluster.number_of_data_nodes.last()}-1)>{App Elasticsearch Cluster by Zabbix agent:es.nodes.fs.available_in_bytes.last()}ES: Cluster does not have enough space for reshardingHIGHThere is not enough disk space for index resharding.ES: Cluster health1A7C11- App Elasticsearch Cluster by Zabbix agentes.cluster.inactive_shards_percent_as_number
12774A4- App Elasticsearch Cluster by Zabbix agentes.cluster.relocating_shards
2F63100- App Elasticsearch Cluster by Zabbix agentes.cluster.initializing_shards
3A54F10- App Elasticsearch Cluster by Zabbix agentes.cluster.unassigned_shards
4FC6EA3- App Elasticsearch Cluster by Zabbix agentes.cluster.delayed_unassigned_shards
56C59DC- App Elasticsearch Cluster by Zabbix agentes.cluster.number_of_pending_tasks
6AC8C14- App Elasticsearch Cluster by Zabbix agentes.cluster.task_max_waiting_in_queue
ES cluster state0green1yellow2red255unknownService state0Down1Up