5.02021-11-21T21:54:00ZTemplatesApp Elasticsearch Cluster newApp Elasticsearch Cluster new## Overview
**ElasticSearch Zabbix monitoring**
===================================
#### Script-free Zabbix ES monitoring
This template monitores all ES cluster using Zabbix 4.x HTTP Agent resource.
This allows check ES being OnPremise or PAAS (AWS Elasticsearch, for example) without additional scripts.
### Requisites:
* ES available for Zabbix server or a Zabbix proxy. That's all.
* ES Endpoints can be adjusted on template macro.
### **Discovers:**
* ES Indexes discovery
ES Node discovery
### **Monitored Items:**
* Shards
* Cluster Rate
* Cluster Latency
* Cluster Health
* JVM Stats
* Disk Status
* Snapshot status
* ES Port
* Memory
* Documents (searchable, deleted)
## Author
Rickk Barbosa (https://github.com/rickkbarbosa)
TemplatesES ClusterES DataES General statusES HealthES JVM StatsES Key performance indicatorsES Shards- Elasticsearch Memory (Average per Node)CALCULATEDelasticsearch.cluster.memory[total,pernode]5m1wFLOATblast("elasticsearch.memory[total,cluster]") / last("elasticsearch.cluster[number_of_nodes]")Total memory (sum of all nodes)ES General statusES Health
- Elasticsearch - Number of active primary shardsDEPENDENTelasticsearch.cluster[active_primary_shards]01wES ClusterES ShardsJSONPATH$.active_primary_shardselasticsearch.cluster[all,health]POST
- Elasticsearch - Number of active shardsDEPENDENTelasticsearch.cluster[active_shards]01wES ClusterES ShardsJSONPATH$.active_shardselasticsearch.cluster[all,health]POST
- Elasticsearch Cluster HealthHTTP_AGENTelasticsearch.cluster[all,health]1d0TEXTES Cluster10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cluster/health
- Elasticsearch Cluster Global StatusHTTP_AGENTelasticsearch.cluster[all,stats]1d0TEXTES Cluster5s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cluster/stats
- Elasticsearch - Number of data nodesDEPENDENTelasticsearch.cluster[cluster,number_of_data_nodes]01wES ClusterJSONPATH$.number_of_data_nodeselasticsearch.cluster[all,health]POST
- Master instance connection statusDEPENDENTelasticsearch.cluster[discovered_master]01wMaster instance connection status.
Indicates whether data nodes can reach the master node. Failures are usually the result of a network connectivity problem.ES HealthBooleanJSONPATH$.discovered_masterREGEXtrue
1CUSTOM_ERROR0elasticsearch.cluster[all,health]
- Elasticsearch - Number of initializing shardsDEPENDENTelasticsearch.cluster[initializing_shards]01wES ClusterES ShardsJSONPATH$.initializing_shardselasticsearch.cluster[all,health]POST
- Elasticsearch - Cluster NameDEPENDENTelasticsearch.cluster[name]01w0TEXTES ClusterJSONPATH$.cluster_nameelasticsearch.cluster[all,health]
- Number of nodesDEPENDENTelasticsearch.cluster[number_of_nodes]01wES ClusterJSONPATH$.number_of_nodeselasticsearch.cluster[all,health]POST
- Elasticsearch - Number of relocating shardsDEPENDENTelasticsearch.cluster[relocating_shards]01wES ClusterES ShardsJSONPATH$.relocating_shardselasticsearch.cluster[all,health]POST
- Elasticsearch - Cluster SizeDEPENDENTelasticsearch.cluster[size]01wbTotal cluster size in bytesES ClusterJSONPATH$.indices.store.size_in_byteselasticsearch.cluster[all,stats]
- Elasticsearch - Cluster StatusDEPENDENTelasticsearch.cluster[status]01w0TEXTES ClusterJSONPATH$.statuselasticsearch.cluster[all,health]POST{iregexp(green,3)}=0RECOVERY_EXPRESSION{iregexp(green,3)}=1[ {HOST.NAME} ] - Elasticsearch Cluster in {ITEM.LASTVALUE} stateWARNINGThe cluster health status is: green, yellow or red. On the shard level, a red status indicates that the specific shard is not allocated in the cluster, yellow means that the primary shard is allocated but replicas are not, and green means that all shards are allocated. The index level status is controlled by the worst shard status. The cluster status is controlled by the worst index status.[ {HOST.NAME} ] - Elasticsearch Cluster in {ITEM.LASTVALUE} state{App Elasticsearch Cluster new:elasticsearch.cluster[status].iregexp(green,5)}=0{App Elasticsearch Cluster new:elasticsearch.cluster[status].iregexp(green,3)}=1{iregexp(green,5)}=0RECOVERY_EXPRESSION{iregexp(green,3)}=1[ {HOST.NAME} ] - Elasticsearch Cluster in {ITEM.LASTVALUE} stateAVERAGEThe cluster health status is: green, yellow or red. On the shard level, a red status indicates that the specific shard is not allocated in the cluster, yellow means that the primary shard is allocated but replicas are not, and green means that all shards are allocated. The index level status is controlled by the worst shard status. The cluster status is controlled by the worst index status.{nodata(5m)}=1RECOVERY_EXPRESSION{nodata(3m)}=0[ {HOST.NAME} ] - Elasticsearch Monitoring is not collecting dataAVERAGE[ {HOST.NAME} ] - Elasticsearch Port is unavailable{App Elasticsearch Cluster new:net.tcp.service[tcp,{$ELASTICSEARCH_HOST},{$ELASTICSEARCH_PORT}].sum(#3)}=0{App Elasticsearch Cluster new:net.tcp.service[tcp,{$ELASTICSEARCH_HOST},{$ELASTICSEARCH_PORT}].avg(#3)}=1
- Elasticsearch - Number of unassigned shardsDEPENDENTelasticsearch.cluster[unassigned_shards]01wES ClusterES ShardsJSONPATH$.unassigned_shardselasticsearch.cluster[all,health]POST
- Elasticsearch Cluster UUIDDEPENDENTelasticsearch.cluster[uuid]01w0TEXTES ClusterES General statusJSONPATH$.cluster_uuidelasticsearch.cluster[all,stats]
- Elasticsearch CPU UsageDEPENDENTelasticsearch.cpu01wFLOAT%CPU Usage in percent on Cluster. It checks all node.ES ClusterES HealthJSONPATH$.nodes.process.cpu.percentelasticsearch.cluster[all,stats]
- Deleted documentsDEPENDENTelasticsearch.deleted01dTotal Number of Records marked for deletionES DataJSONPATH$.indices.docs.deletedelasticsearch.cluster[all,stats]
- Elasticsearch Disk VolumeHTTP_AGENTelasticsearch.disk[all]10m1d0TEXTES General status10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_nodes/stats/fs
- Elasticsearch Disk Free (%)CALCULATEDelasticsearch.disk[free,percent]5m1wFLOAT%( last("elasticsearch.disk[free]") / last("elasticsearch.disk[total]") ) * 100Free disk volume (in percent)ES General statusES Health
- Elasticsearch Disk Volume FreeDEPENDENTelasticsearch.disk[free]01dbES HealthJSONPATH$.nodes.fs.total.free_in_byteselasticsearch.disk[all]
- Elasticsearch Disk Volume TotalDEPENDENTelasticsearch.disk[total]01dbES HealthJSONPATH$.nodes.fs.total.total_in_byteselasticsearch.disk[all]
- Elasticsearch Indices Global StatusHTTP_AGENTelasticsearch.indices[all,stats]1d0TEXTIndices level stats provide statistics on different operations happening on an index. The API provides statistics on the index level scope (though most stats can also be retrieved using node level scope).
Base for key performance indicator
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/indices-stats.htmlES General statusES Key performance indicators10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_stats
- Elasticsearch JVM Heap (Max)DEPENDENTelasticsearch.jvm[heap,max]01wbhttps://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.htmlES JVM StatsJSONPATH$.nodes.jvm.mem.heap_max_in_byteselasticsearch.cluster[all,stats]
- Elasticsearch JVM Heap (Used, Percent)CALCULATEDelasticsearch.jvm[heap,usedp]5m1wFLOAT%( last("elasticsearch.jvm[heap,used]") / last("elasticsearch.jvm[heap,max]") ) *100https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.htmlES JVM Stats{avg(#3)}>{$ELASTICSEARCH_HEAPMEM_P2}[ {HOST.NAME} ] - Elasticsearch Heap Memory Used is over {$ELASTICSEARCH_HEAPMEM_P2}AVERAGE
- Elasticsearch JVM Heap (Used, bytes)DEPENDENTelasticsearch.jvm[heap,used]01wbhttps://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.htmlES JVM StatsJSONPATH$.nodes.jvm.mem.heap_used_in_byteselasticsearch.cluster[all,stats]
- Elasticsearch JVM VersionDEPENDENTelasticsearch.jvm[version]01w0TEXTES JVM StatsJSONPATH$.nodes.jvm.versionselasticsearch.cluster[all,stats]
- Elasticsearch Memory Free (%)CALCULATEDelasticsearch.memory[free,cluster,percentage]5m1wFLOAT%( last("elasticsearch.memory[free,cluster]") / last("elasticsearch.memory[total,cluster]") ) * 100Free memory in cluster (in percent)ES General statusES Health
- Elasticsearch Memory FreeDEPENDENTelasticsearch.memory[free,cluster]01wFLOATbFree memory on cluster (sum of all nodes)ES General statusES HealthJSONPATH$.nodes.os.mem.free_in_byteselasticsearch.cluster[all,stats]
- Elasticsearch Memory (Cluster)DEPENDENTelasticsearch.memory[total,cluster]01wFLOATbTotal memory (sum of all nodes)ES General statusES HealthJSONPATH$.nodes.os.mem.total_in_byteselasticsearch.cluster[all,stats]
- Elasticsearch - Indexing rateDEPENDENTelasticsearch.performance[index]01wFLOATops/minNumber of index operations per minute.ES HealthES Key performance indicatorsJSONPATH$._all.primaries.indexing.index_totalSIMPLE_CHANGEelasticsearch.indices[all,stats]
- Elasticsearch - Indexing latency (ms)DEPENDENTelasticsearch.performance[latency,index]01wFLOATmsAverage time that it takes a shard to complete and indexing operationES HealthES Key performance indicatorsJSONPATH$._all.primaries.indexing.index_time_in_millisCHANGE_PER_SECONDelasticsearch.indices[all,stats]
- Elasticsearch - Search latency (ms)DEPENDENTelasticsearch.performance[latency,search]01wFLOATmsAverage time that takes a shard to complete a search operationES HealthES Key performance indicatorsJSONPATH$._all.primaries.search.query_time_in_millisCHANGE_PER_SECONDelasticsearch.indices[all,stats]
- Elasticsearch - Search rateDEPENDENTelasticsearch.performance[search]01wFLOATops/minSearch operations per minute.ES HealthES Key performance indicatorsJSONPATH$._all.primaries.search.query_totalSIMPLE_CHANGEelasticsearch.indices[all,stats]
- Searchable documentsDEPENDENTelasticsearch.records01dTotal Number of RecordsES DataJSONPATH$.indices.docs.countelasticsearch.cluster[all,stats]
- Elasticsearch Well-done Snapshots in last {$ELASTICSEARCH_SNAPSHOTP_DAYS} daysDEPENDENTelasticsearch.snapshots[ok]01wTotal snapshots in the last {$ELASTICSEARCH_SNAPSHOTP_DAYS} days that has been succeedES HealthBooleanREGEX,([0-9]+)
\0TRIM,elasticsearch.snapshots[stats]
- ES Snapshot StatusHTTP_AGENTelasticsearch.snapshots[stats]1h1d0TEXTA snapshot is a backup taken from a running Elasticsearch cluster.
This presents how many snapshots exists in the last 3 days and how many has been succeed.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.htmlES General statusES HealthJAVASCRIPTvar lld = [];
var lines = value.split("\n");
var lines_num = lines.length;
//Date
var days = {$ELASTICSEARCH_SNAPSHOTP_DAYS}
days = 86400 * days
//daysAgo = 86400 //3 days
const now = new Date()
const date = (Math.floor(Date.now() / 1000) - daysAgo)
//Fetch last 3 days
var output = " ";
for (i = 0; i < lines_num; i++)
{
var line = lines[i].split(" ")[2];
if (line > date) {
output = output + "\n" + lines[i];
}
}//Regex
var re = /SUCCESS/g,
success = 0;
while (re.exec(output) !== null) {
++success;
}
var total = output.split("\n");
var total = total.length - 1;
result = total + "," + success;
return result;10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/snapshots/{$ELASTICSEARCH_SNAPSHOT}
- Elasticsearch Snapshots in last {$ELASTICSEARCH_SNAPSHOTP_DAYS} daysDEPENDENTelasticsearch.snapshots[total]01wTotal snapshots in the last 3 daysES HealthBooleanREGEX^([0-9]+),
\0RTRIM,elasticsearch.snapshots[stats]
- Elasticsearch versionDEPENDENTelasticsearch.version01w0TEXTES General statusJSONPATH$.nodes.versions[0]elasticsearch.cluster[all,stats]
- Elasticsearch port listenSIMPLEnet.tcp.service[tcp,{$ELASTICSEARCH_HOST},{$ELASTICSEARCH_PORT}]1wES HealthService statePOST{sum(#3)}=0RECOVERY_EXPRESSION{avg(#3)}=1[ {HOST.NAME} ] - Elasticsearch Port is unavailableAVERAGE
ES Indexes discoveryHTTP_AGENTelasticsearch.discovery.indexes{#ELASTICSEARCH_INDEX}^(?!\s*$).+A7dhttps://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.htmlElasticsearch index full info [ {#ELASTICSEARCH_INDEX} ]HTTP_AGENTelasticsearch.index[all,{#ELASTICSEARCH_INDEX}]1d0TEXTES General statusES IndexesJAVASCRIPTvar lld = [];
var data = value.split(" ");
var row = {};
row["ELASTICSEARCH_INDEX_HEALTH"] = data[0];
row["ELASTICSEARCH_INDEX_STATUS"] = data[1];
row["ELASTICSEARCH_INDEX_NAME"] = data[2];
row["ELASTICSEARCH_INDEX_UUID"] = data[3];
row["ELASTICSEARCH_INDEX_DOCSCOUNT"] = data[6];
row["ELASTICSEARCH_INDEX_DOCSDELETED"] = data[7];
row["ELASTICSEARCH_INDEX_SIZE"] = data[8];
row["ELASTICSEARCH_INDEX_PSIZE"] = data[9];
lld.push(row);
return JSON.stringify(lld);REGEX.*
{"data":\0REGEX.*
\0}10shttp://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/indices/{#ELASTICSEARCH_INDEX}bytesbElasticsearch index documents [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[documents,{#ELASTICSEARCH_INDEX}]07dES IndexesJSONPATH$.data[0].ELASTICSEARCH_INDEX_DOCSCOUNTelasticsearch.index[all,{#ELASTICSEARCH_INDEX}]Elasticsearch index documents deleted [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[documentsdeleted,{#ELASTICSEARCH_INDEX}]07dES IndexesJSONPATH$.data[0].ELASTICSEARCH_INDEX_DOCSDELETEDelasticsearch.index[all,{#ELASTICSEARCH_INDEX}]Elasticsearch index health [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[health,{#ELASTICSEARCH_INDEX}]07d0TEXTES IndexesJSONPATH$.data[0].ELASTICSEARCH_INDEX_HEALTHelasticsearch.index[all,{#ELASTICSEARCH_INDEX}]Elasticsearch index latency [ {#ELASTICSEARCH_INDEX} ] (ms)DEPENDENTelasticsearch.index[latency,{#ELASTICSEARCH_INDEX}]07dFLOATmsAverage time that takes a shard to complete a search operation.
Specific for a index.ES Key performance indicatorsES IndexesJSONPATH$.indices.{#ELASTICSEARCH_INDEX}.indexing.index_time_in_millisCHANGE_PER_SECONDelasticsearch.indices[all,stats]Elasticsearch queries [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[queries,{#ELASTICSEARCH_INDEX}]07dFLOATNumber of queries on this indexES Key performance indicatorsES IndexesJSONPATH$.indices.{#ELASTICSEARCH_INDEX}.search.query_totalSIMPLE_CHANGEelasticsearch.indices[all,stats]Elasticsearch index query latency [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[querylatency,{#ELASTICSEARCH_INDEX}]07dFLOATmsSearch time in this indexES Key performance indicatorsES IndexesJSONPATH$.indices.{#ELASTICSEARCH_INDEX}.search.query_time_in_millisCHANGE_PER_SECONDelasticsearch.indices[all,stats]Elasticsearch index size [ {#ELASTICSEARCH_INDEX} ]DEPENDENTelasticsearch.index[size,{#ELASTICSEARCH_INDEX}]07dbES IndexesJSONPATH$.data[0].ELASTICSEARCH_INDEX_SIZEelasticsearch.index[all,{#ELASTICSEARCH_INDEX}]10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/indiceshindexJAVASCRIPT
var lld = [];
var lines = value.split("\n");
var lines_num = lines.length;
for (i = 0; i < lines_num; i++)
{
var row = {};
row["{#ELASTICSEARCH_INDEX}"] = lines[i]
lld.push(row);
}
return JSON.stringify(lld);REGEX.*
{"data":\0REGEX.*
\0}ES Node discoveryHTTP_AGENTelasticsearch.discovery.nodes{#ELASTICSEARCH_NODE}^(?!\s*$).+A7dhttps://www.elastic.co/guide/en/elasticsearch/reference/current/cat-nodes.htmlElasticsearch full allocation info [ {#ELASTICSEARCH_NODE} ]HTTP_AGENTelasticsearch.node.disk[all,{#ELASTICSEARCH_NODE}]1d0TEXTProvides a snapshot of the number of shards allocated to each data node and their disk space.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-allocation.htmlES Nodes1mhttp://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/allocation/{#ELASTICSEARCH_NODE}bytesbElasticsearch node [ {#ELASTICSEARCH_NODE} ] is master?HTTP_AGENTelasticsearch.node.master[{#ELASTICSEARCH_NODE}]1d0TEXTGet information about master node.
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-master.htmlES NodesREGEX{#ELASTICSEARCH_NODE}
1CUSTOM_VALUE01mhttp://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/masterhnodeElasticsearch full stats for node [ {#ELASTICSEARCH_NODE} ]HTTP_AGENTelasticsearch.node.query_cache[all,{#ELASTICSEARCH_NODE}]1d0TEXTFull stats for specific node as seen on
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/cluster-nodes-stats.htmlES General statusES Nodes1mhttp://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_nodes/{#ELASTICSEARCH_NODE}/statsElasticsearch CPU Load (1min) [ {#ELASTICSEARCH_NODE} ]DEPENDENTelasticsearch.node[cpu1m,{#ELASTICSEARCH_NODE}]07d0TEXTES HealthES NodesJSONPATH$.nodeselasticsearch.node.query_cache[all,{#ELASTICSEARCH_NODE}]Elasticsearch Storage Total [ {#ELASTICSEARCH_NODE} ]DEPENDENTelasticsearch.node[disk,{#ELASTICSEARCH_NODE},total]07dbhttps://www.elastic.co/guide/en/elasticsearch/reference/current/cat-allocation.htmlES NodesREGEX(?:(\d+)( )(\d+)( )((\d+|x)\.))
\0REGEX^([0-9]+)
\0elasticsearch.node.disk[all,{#ELASTICSEARCH_NODE}]Elasticsearch Storage Used (in %) [ {#ELASTICSEARCH_NODE} ]DEPENDENTelasticsearch.node[disk,{#ELASTICSEARCH_NODE},usedp]07dFLOAT%https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-allocation.htmlES NodesREGEX(?:(\d+)( )(\d+)( )((\d+|x)\.))
\0REGEX(( )[0-9]+)
\0elasticsearch.node.disk[all,{#ELASTICSEARCH_NODE}]Elasticsearch Storage Used [ {#ELASTICSEARCH_NODE} ]DEPENDENTelasticsearch.node[disk,{#ELASTICSEARCH_NODE},used]07dFLOATbhttps://www.elastic.co/guide/en/elasticsearch/reference/current/cat-allocation.htmlES NodesREGEX(?:( )(\d+)( )(\d+))
\0REGEX([0-9]+)$
\0elasticsearch.node.disk[all,{#ELASTICSEARCH_NODE}]10s{$ELASTICSEARCH_PROTOCOL}://{$ELASTICSEARCH_HOST}:{$ELASTICSEARCH_PORT}/_cat/nodeshnameJAVASCRIPT
var lld = [];
var lines = value.split("\n");
var lines_num = lines.length;
for (i = 0; i < lines_num; i++)
{
var row = {};
row["{#ELASTICSEARCH_NODE}"] = lines[i]
lld.push(row);
}
return JSON.stringify(lld);REGEX.*
{"data":\0REGEX.*
\0}{$ELASTICSEARCH_HEAPMEM_P2}75{$ELASTICSEARCH_HOST}localhost{$ELASTICSEARCH_PORT}9200{$ELASTICSEARCH_PROTOCOL}http{$ELASTICSEARCH_SNAPSHOT}cs-automated-enc{$ELASTICSEARCH_SNAPSHOTP_DAYS}3BooleanFalse0false0True1true1Service state0Down1Up