5.02021-11-21T22:05:11ZTemplates/ApplicationsElasticsearch Cluster by HTTP zbx 4.2Elasticsearch Cluster by HTTP zbx 4.2## Description
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests. You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}. Parsed manually from https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http from zabbix version 5.0 to 4.2 and add triggers
## Overview
This is fork of official [Zabbix template Elasticsearch Cluster by HTTP (Zabbix 5.0)](https://www.zabbix.com/integrations/elasticsearch)
Features:
* Compatible with Zabbix 4.2 +
* Added trigger of Maximum allocated shards on ES cluster
Be default in ES max shards per node is 1000 ({$ELASTICSEARCH.MAX\_SHARDS\_PER\_NODE}) if number of shrads is maximum then elasticshearch will be sillently reject write requests.
Tested on Zabbix 4.2 and Elasticsearch 7.4.2
## Author
R N
Templates/ApplicationsES clusterZabbix raw items- ES: Delayed unassigned shardsDEPENDENTes.cluster.delayed_unassigned_shards07dThe number of shards whose allocation has been delayed by the timeout settings.ES clusterJSONPATH$.delayed_unassigned_shardses.cluster.get_health
- ES: Get cluster healthHTTP_AGENTes.cluster.get_health00TEXTBASIC{$ELASTICSEARCH.USERNAME}{$ELASTICSEARCH.PASSWORD}Returns the health status of a cluster.Zabbix raw items15s{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_cluster/health?timeout=5s
- ES: Get cluster statsHTTP_AGENTes.cluster.get_stats00TEXTBASIC{$ELASTICSEARCH.USERNAME}{$ELASTICSEARCH.PASSWORD}Returns cluster statistics.Zabbix raw items15s{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_cluster/stats
- ES: Get cluster versionHTTP_AGENTes.cluster.get_version1h0DISABLEDFLOATBASIC{$ELASTICSEARCH.USERNAME}{$ELASTICSEARCH.PASSWORD}Returns the cluster versionES clusterJSONPATH$.version.numberREGEX\d..
\015s{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}
- ES: Inactive shards percentageDEPENDENTes.cluster.inactive_shards_percent_as_number07d%The ratio of inactive shards in the cluster expressed as a percentage.ES clusterJSONPATH$.active_shards_percent_as_numberJAVASCRIPTreturn (100 - value)es.cluster.get_health
- ES: Number of initializing shardsDEPENDENTes.cluster.initializing_shards07dThe number of shards that are under initialization.ES clusterJSONPATH$.initializing_shardses.cluster.get_health{min(10m)}>0ES: Cluster has the initializing shardsAVERAGEThe cluster has the initializing shards longer than 10 minutes.
- ES: Number of data nodesDEPENDENTes.cluster.number_of_data_nodes07dThe number of nodes that are dedicated to data nodes.ES clusterJSONPATH$.number_of_data_nodesDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_health
- ES: Number of nodesDEPENDENTes.cluster.number_of_nodes07dThe number of nodes within the cluster.ES clusterJSONPATH$.number_of_nodesDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_health{change()}<0ES: The number of nodes within the cluster has decreasedINFOYES{change()}>0ES: The number of nodes within the cluster has increasedINFOYES
- ES: Number of pending tasksDEPENDENTes.cluster.number_of_pending_tasks07dThe number of cluster-level changes that have not yet been executed.ES clusterJSONPATH$.number_of_pending_taskses.cluster.get_health
- ES: Number of relocating shardsDEPENDENTes.cluster.relocating_shards07dThe number of shards that are under relocation.ES clusterJSONPATH$.relocating_shardses.cluster.get_health
- ES: Cluster health statusDEPENDENTes.cluster.status07dHealth status of the cluster, based on the state of its primary and replica shards. Statuses are:
green
All shards are assigned.
yellow
All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.
red
One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.ES clusterES cluster stateJSONPATH$.statusJAVASCRIPTvar state = ['green', 'yellow', 'red'];
return state.indexOf(value.trim()) === -1 ? 255 : state.indexOf(value.trim());DISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_health{last()}=2ES: Health is REDHIGHOne or more primary shards are unassigned, so some data is unavailable.
This can occur briefly during cluster startup as primary shards are assigned.{last()}=255ES: Health is UNKNOWNHIGHThe health status of the cluster is unknown or cannot be obtained.{last()}=1ES: Health is YELLOWAVERAGEAll primary shards are assigned, but one or more replica shards are unassigned.
If a node in the cluster fails, some data could be unavailable until that node is repaired.
- ES: Task max waiting in queueDEPENDENTes.cluster.task_max_waiting_in_queue07dsThe time expressed in seconds since the earliest initiated task is waiting for being performed.ES clusterJSONPATH$.task_max_waiting_in_queue_millisMULTIPLIER0.001es.cluster.get_health
- ES: Number of unassigned shardsDEPENDENTes.cluster.unassigned_shards07dThe number of shards that are not allocated.ES clusterJSONPATH$.unassigned_shardses.cluster.get_health{min(10m)}>0ES: Cluster has the unassigned shardsAVERAGEThe cluster has the unassigned shards longer than 10 minutes.
- ES: Indices with shards assigned to nodesDEPENDENTes.indices.count07dThe total number of indices with shards assigned to the selected nodes.ES clusterJSONPATH$.indices.countDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Number of non-deleted documentsDEPENDENTes.indices.docs.count07dThe total number of non-deleted documents across all primary shards assigned to the selected nodes.
This number is based on the documents in Lucene segments and may include the documents from nested fields.ES clusterJSONPATH$.indices.docs.countDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Nodes with the data roleDEPENDENTes.nodes.count.data07dThe number of selected nodes with the data role.ES clusterJSONPATH$.nodes.count.dataDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Nodes with the ingest roleDEPENDENTes.nodes.count.ingest07dThe number of selected nodes with the ingest role.ES clusterJSONPATH$.nodes.count.ingestDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Nodes with the master roleDEPENDENTes.nodes.count.master07dThe number of selected nodes with the master role.ES clusterJSONPATH$.nodes.count.masterDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats{last()}=2ES: Cluster has only two master nodesDISASTERThe cluster has only two nodes with a master role and will be unavailable if one of them breaks.
- ES: Total available size to JVM in all file storesDEPENDENTes.nodes.fs.available_in_bytes07dBThe total number of bytes available to JVM in the file stores across all selected nodes.
Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes.
This is the actual amount of free disk space the selected Elasticsearch nodes can use.ES clusterJSONPATH$.nodes.fs.available_in_bytesDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Total size of all file storesDEPENDENTes.nodes.fs.total_in_bytes07dBThe total size in bytes of all file stores across all selected nodes.ES clusterJSONPATH$.nodes.fs.total_in_bytesDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Get nodes statsHTTP_AGENTes.nodes.get_stats00TEXTBASIC{$ELASTICSEARCH.USERNAME}{$ELASTICSEARCH.PASSWORD}Returns cluster nodes statistics.Zabbix raw items30s{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_nodes/stats
- ES: Cluster uptimeDEPENDENTes.nodes.jvm.max_uptime[{#ES.NODE}]07dFLOATsUptime duration in seconds since JVM has last started.ES clusterJSONPATH$.nodes.jvm.max_uptime_in_millisMULTIPLIER0.001es.cluster.get_stats{last()}<10mES: Cluster has been restarted (uptime < 10m)INFOUptime is less than 10 minutesYES
- ES: Total shardsDEPENDENTes.shards.total07dTotal number of shardsES clusterJSONPATH$.indices.shards.totalDISCARD_UNCHANGED_HEARTBEAT1hes.cluster.get_stats
- ES: Service response timeSIMPLEnet.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]7dsChecks performance of the TCP service.ES cluster{min(5m)}>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m)WARNINGThe performance of the TCP service is very low.YESES: Service is down{Elasticsearch Cluster by HTTP zbx 4.2:net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].last()}=0
- ES: Service statusSIMPLEnet.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]7dChecks if the service is running and accepting TCP connections.ES clusterDISCARD_UNCHANGED_HEARTBEAT10m{last()}=0ES: Service is downAVERAGEThe service is unavailable or does not accept TCP connections.YES
Cluster nodes discoveryHTTP_AGENTes.nodes.discovery1hBASIC{$ELASTICSEARCH.USERNAME}{$ELASTICSEARCH.PASSWORD}Discovery ES cluster nodes.ES {#ES.NODE}: Total available sizeDEPENDENTes.node.fs.total.available_in_bytes[{#ES.NODE}]07dBThe total number of bytes available to this Java virtual machine on all file stores.
Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes.
This is the actual amount of free disk space the Elasticsearch node can utilize.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Total sizeDEPENDENTes.node.fs.total.total_in_bytes[{#ES.NODE}]07dBTotal size (in bytes) of all file stores.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1des.nodes.get_statsES {#ES.NODE}: Number of open HTTP connectionsDEPENDENTes.node.http.current_open[{#ES.NODE}]07dThe number of currently open HTTP connections for the node.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].http.current_open.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Rate of HTTP connections openedDEPENDENTes.node.http.opened.rate[{#ES.NODE}]07dFLOATrpsThe number of HTTP connections opened for the node per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].http.total_opened.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Flush latencyCALCULATEDes.node.indices.flush.latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.flush.total[{#ES.NODE}]) + (last(es.node.indices.flush.total[{#ES.NODE}]) = 0) )The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m)WARNINGIf you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate
and eventually prevent you from being able to add new information to your index.ES {#ES.NODE}: Total number of index flushes to diskDEPENDENTes.node.indices.flush.total[{#ES.NODE}]07dThe total number of flush operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Total time spent on flushing indices to diskDEPENDENTes.node.indices.flush.total_time_in_millis[{#ES.NODE}]07dmsTotal time in milliseconds spent performing flush operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Current indexing operationsDEPENDENTes.node.indices.indexing.index_current[{#ES.NODE}]07dThe number of indexing operations currently running.ES {#ES.NODE}XMLPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Indexing latencyCALCULATEDes.node.indices.indexing.index_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.indexing.index_total[{#ES.NODE}]) + (last(es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )The average indexing latency calculated from the available index_total and index_time_in_millis metrics.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m)WARNINGIf the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch’s documentation
recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).ES {#ES.NODE}: Total time spent performing indexingDEPENDENTes.node.indices.indexing.index_time_in_millis[{#ES.NODE}]07dmsTotal time in milliseconds spent performing indexing operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Total number of indexingDEPENDENTes.node.indices.indexing.index_total[{#ES.NODE}]07dThe total number of indexing operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Time spent throttling operationsDEPENDENTes.node.indices.indexing.throttle_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Time spent throttling merge operationsDEPENDENTes.node.indices.merges.total_throttled_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling merge operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Time spent throttling recovery operationsDEPENDENTes.node.indices.recovery.throttle_time[{#ES.NODE}]07dFLOATsTime in seconds spent throttling recovery operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Rate of index refreshesDEPENDENTes.node.indices.refresh.rate[{#ES.NODE}]07dFLOATrpsThe number of refresh operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Time spent performing refreshDEPENDENTes.node.indices.refresh.time[{#ES.NODE}]07dFLOATsTime in seconds spent performing refresh operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Rate of fetchDEPENDENTes.node.indices.search.fetch.rate[{#ES.NODE}]07dFLOATrpsThe number of fetch operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Current fetch operationsDEPENDENTes.node.indices.search.fetch_current[{#ES.NODE}]07dThe number of fetch operations currently running.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()es.nodes.get_statsES {#ES.NODE}: Fetch latencyCALCULATEDes.node.indices.search.fetch_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( last(es.node.indices.search.fetch_total[{#ES.NODE}]) + (last(es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m)WARNINGThe fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing,
this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.),
or requesting too many results.ES {#ES.NODE}: Time spent performing fetchDEPENDENTes.node.indices.search.fetch_time[{#ES.NODE}]07dFLOATsTime in seconds spent performing fetch operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Total time spent performing fetchDEPENDENTes.node.indices.search.fetch_time_in_millis[{#ES.NODE}]07dmsTime in milliseconds spent performing fetch operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Total number of fetchDEPENDENTes.node.indices.search.fetch_total[{#ES.NODE}]07dThe total number of fetch operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Rate of queriesDEPENDENTes.node.indices.search.query.rate[{#ES.NODE}]07dFLOATrpsThe number of query operations per second.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Current query operationsDEPENDENTes.node.indices.search.query_current[{#ES.NODE}]07dThe number of query operations currently running.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()es.nodes.get_statsES {#ES.NODE}: Query latencyCALCULATEDes.node.indices.search.query_latency[{#ES.NODE}]7dFLOATmslast(es.node.indices.search.query_time_in_millis[{#ES.NODE}]) /
( last(es.node.indices.search.query_total[{#ES.NODE}]) + (last(es.node.indices.search.query_total[{#ES.NODE}]) = 0) )The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.ES {#ES.NODE}{min(5m)}>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m)WARNINGIf latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.ES {#ES.NODE}: Time spent performing queryDEPENDENTes.node.indices.search.query_time[{#ES.NODE}]07dFLOATsTime in seconds spent performing query operations for the last measuring span.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()MULTIPLIER0.001SIMPLE_CHANGEes.nodes.get_statsES {#ES.NODE}: Total time spent performing queryDEPENDENTes.node.indices.search.query_time_in_millis[{#ES.NODE}]07dmsTime in milliseconds spent performing query operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Total number of queryDEPENDENTes.node.indices.search.query_total[{#ES.NODE}]07dThe total number of query operations.Zabbix raw itemsJSONPATH$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Amount of JVM heap committedDEPENDENTes.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}]07dBThe amount of memory, in bytes, available for use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Maximum JVM memory available for useDEPENDENTes.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}]0BThe maximum amount of memory, in bytes, available for use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1des.nodes.get_statsES {#ES.NODE}: Amount of JVM heap currently in useDEPENDENTes.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}]07dBThe memory, in bytes, currently in use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_statsES {#ES.NODE}: Percent of JVM heap currently in useDEPENDENTes.node.jvm.mem.heap_used_percent[{#ES.NODE}]07dFLOAT%The percentage of memory currently in use by the heap.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()DISCARD_UNCHANGED_HEARTBEAT1hes.nodes.get_stats{min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)HIGHThis indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.
To address this problem, you can either increase your heap size (as long as it remains below the recommended
guidelines stated above), or scale out the cluster by adding more nodes.{min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h)WARNINGThis indicates that the rate of garbage collection isn’t keeping up with the rate of garbage creation.
To address this problem, you can either increase your heap size (as long as it remains below the recommended
guidelines stated above), or scale out the cluster by adding more nodes.ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h){Elasticsearch Cluster by HTTP zbx 4.2:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}ES {#ES.NODE}: Node uptimeDEPENDENTes.node.jvm.uptime[{#ES.NODE}]07dFLOATsJVM uptime in seconds.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()MULTIPLIER0.001es.nodes.get_stats{last()}<10mES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m)INFOUptime is less than 10 minutesES {#ES.NODE}: Refresh thread pool active threadsDEPENDENTes.node.thread_pool.refresh.active[{#ES.NODE}]07dThe number of active threads in the refresh thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()es.nodes.get_statsES {#ES.NODE}: Refresh thread pool executor tasks completedDEPENDENTes.node.thread_pool.refresh.completed.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks completed by the refresh thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Refresh thread pool tasks in queueDEPENDENTes.node.thread_pool.refresh.queue[{#ES.NODE}]07dThe number of tasks in queue for the refresh thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()es.nodes.get_statsES {#ES.NODE}: Refresh thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.refresh.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the refresh thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()CHANGE_PER_SECONDes.nodes.get_stats{min(5m)}>0ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the refresh thread pool executor is over 0 for 5m.ES {#ES.NODE}: Search thread pool active threadsDEPENDENTes.node.thread_pool.search.active[{#ES.NODE}]07dThe number of active threads in the search thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()es.nodes.get_statsES {#ES.NODE}: Search thread pool executor tasks completedDEPENDENTes.node.thread_pool.search.completed.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks completed by the search thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Search thread pool tasks in queueDEPENDENTes.node.thread_pool.search.queue[{#ES.NODE}]07dThe number of tasks in queue for the search thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()es.nodes.get_statsES {#ES.NODE}: Search thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.search.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the search thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()CHANGE_PER_SECONDes.nodes.get_stats{min(5m)}>0ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the search thread pool executor is over 0 for 5m.ES {#ES.NODE}: Write thread pool active threadsDEPENDENTes.node.thread_pool.write.active[{#ES.NODE}]07dThe number of active threads in the write thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()es.nodes.get_statsES {#ES.NODE}: Write thread pool executor tasks completedDEPENDENTes.node.thread_pool.write.completed.rate[{#ES.NODE}]07drpsThe number of tasks completed by the write thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()CHANGE_PER_SECONDes.nodes.get_statsES {#ES.NODE}: Write thread pool tasks in queueDEPENDENTes.node.thread_pool.write.queue[{#ES.NODE}]07dThe number of tasks in queue for the write thread pool.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()es.nodes.get_statsES {#ES.NODE}: Write thread pool executor tasks rejectedDEPENDENTes.node.thread_pool.write.rejected.rate[{#ES.NODE}]07dFLOATrpsThe number of tasks rejected by the write thread pool executor.ES {#ES.NODE}JSONPATH$..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()CHANGE_PER_SECONDes.nodes.get_stats{min(5m)}>0ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m)WARNINGThe number of tasks rejected by the write thread pool executor is over 0 for 5m.15s{$ELASTICSEARCH.SCHEME}://{HOST.CONN}:{$ELASTICSEARCH.PORT}/_nodes/_all/nodes{#ES.NODE}$..name.first()JSONPATH$.nodes.[*]DISCARD_UNCHANGED_HEARTBEAT1d{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}100{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}100{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}95{$ELASTICSEARCH.HEAP_USED.MAX.WARN}85{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}100{$ELASTICSEARCH.MAX_SHARDS_PER_NODE}1000{$ELASTICSEARCH.PASSWORD}{$ELASTICSEARCH.PORT}9200{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}100{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}10s{$ELASTICSEARCH.SCHEME}http{$ELASTICSEARCH.USERNAME}({Elasticsearch Cluster by HTTP zbx 4.2:es.nodes.fs.total_in_bytes.last()}-{Elasticsearch Cluster by HTTP zbx 4.2:es.nodes.fs.available_in_bytes.last()})/({Elasticsearch Cluster by HTTP zbx 4.2:es.cluster.number_of_data_nodes.last()}-1)>{Elasticsearch Cluster by HTTP zbx 4.2:es.nodes.fs.available_in_bytes.last()}ES: Cluster does not have enough space for reshardingHIGHThere is not enough disk space for index resharding.{Elasticsearch Cluster by HTTP zbx 4.2:es.shards.total.last()}>(({Elasticsearch Cluster by HTTP zbx 4.2:es.nodes.count.data.last()}*{$ELASTICSEARCH.MAX_SHARDS_PER_NODE})-({Elasticsearch Cluster by HTTP zbx 4.2:es.shards.total.last()}/{Elasticsearch Cluster by HTTP zbx 4.2:es.indices.count.last()}*5))ES Shards left for 5 indicesHIGHBe default in ES max shards per node is 1000 ({$ELASTICSEARCH.MAX_SHARDS_PER_NODE}) if number of shrads is maximum then elasticshearch will be sillently reject write requests.
Expression understanding:
TOTAL_SARDS > (MAX_SHARDS_ON_CLUSTER - SHARDS_NEEDED_FOR_5_INDICES)
Fix example:
curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": {"cluster.max_shards_per_node": "2000" } }'ES: Cluster health1A7C11- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.inactive_shards_percent_as_number
12774A4- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.relocating_shards
2F63100- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.initializing_shards
3A54F10- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.unassigned_shards
4FC6EA3- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.delayed_unassigned_shards
56C59DC- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.number_of_pending_tasks
6AC8C14- Elasticsearch Cluster by HTTP zbx 4.2es.cluster.task_max_waiting_in_queue
ES cluster state0green1yellow2red255unknown