5.02021-11-21T22:05:45ZKube/NodesKube/PodsTemplatesTemplates/KubernetesTemplates/Operating systemsKube by Prom APIKube by Prom API## Description
This template works out of the box as soon as Prometheus (Prometheus-operator) is available inside your cluster; it does not require any Zabbix agent installation or configuration. It allows external monitoring of the Kubernetes cluster through ingress, without any NodePort declaration. It uses the Prometheus API to create a Zabbix host for each pod available inside the Kubernetes cluster. {$PROM.API.URL} must contains the Prometheus entry point into your Kubernetes cluster. Zabbix pod hosts are created with the "Template Kube Pod by Prom API" template by default.
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
TemplatesTemplates/KubernetesKube nodeHTTP_AGENTprom.node.discoveryAND{#NODE.IP}{$PROM.NODE.IP.MATCHES}A{#NODE.IP}{$PROM.NODE.IP.NOT_MATCHES}NOT_MATCHES_REGEXB{#NODE.NAME}{$PROM.NODE.NAME.NOT_MATCHES}NOT_MATCHES_REGEXC{#NODE.IP}{#NODE.NAME}Kube/NodesKube Node by Prom API{$PROM.API.URL}/queryquerykubelet_node_name{#NODE.IP}$.instance{#NODE.NAME}$.nodeJSONPATH$.data.result[?(@.metric.node=~'{$PROM.NODE.NAME.MATCHES}')].metricJAVASCRIPTreturn JSON.stringify(JSON.parse(value).map(function(metric){
metric.instance=metric.instance.split(":")[0]; return metric}))Kube podHTTP_AGENTprom.pod.discoveryAND{#NAMESPACE}{$PROM.POD.NAMESPACE.NOT_MATCHES}NOT_MATCHES_REGEXA{#SERVICE}{$PROM.POD.SERVICE.NOT_MATCHES}NOT_MATCHES_REGEXC{#PODNAME}{$PROM.POD.NAME.NOT_MATCHES}NOT_MATCHES_REGEXB{#PODNAME}{#PODNAME}Kube/PodsKube Pod by Prom API{$PROM.API.URL}/queryquerykube_pod_created{#NAMESPACE}$.namespace{#PODNAME}$.pod{#SERVICE}$.serviceJSONPATH$.data.result[?(@.metric.namespace=~'{$PROM.POD.NAMESPACE.MATCHES}' && @.metric.service=~'{$PROM.POD.SERVICE.MATCHES}' && @.metric.pod=~'{$PROM.POD.NAME.MATCHES}')].metric{$CPU.UTIL.CRIT}90{$IF.ERRORS.WARN}2{$IF.UTIL.MAX}90{$IFCONTROL}1{$KERNEL.MAXFILES.MIN}256{$LOAD_AVG_PER_CPU.MAX.WARN}1.5Load per CPU considered sustainable. Tune if needed.{$MEMORY.AVAILABLE.MIN}20M{$MEMORY.UTIL.MAX}90{$NET.IF.IFALIAS.MATCHES}^.*${$NET.IF.IFALIAS.NOT_MATCHES}CHANGE_IF_NEEDED{$NET.IF.IFNAME.MATCHES}^.*${$NET.IF.IFNAME.NOT_MATCHES}(^Software Loopback Interface|^NULL[0-9.]*$|^[Ll]o[0-9.]*$|^[Ss]ystem$|^Nu[0-9.]*$|^veth[0-9a-z]+$|docker[0-9]+|br-[a-z0-9]{12})Filter out loopbacks, nulls, docker veth links and docker0 bridge by default{$NET.IF.IFOPERSTATUS.MATCHES}^.*${$NET.IF.IFOPERSTATUS.NOT_MATCHES}^7$Ignore notPresent(7){$NODE_EXPORTER_PORT}9100TCP Port node_exporter is listening on.{$PROM.API.URL}http://prometheus.k8scluster.nuci7.lan:8080/api/v1/Prometheus API URL. Can be overridden on the host or linked template level.{$PROM.NODE.IP.MATCHES}^.*$This macro is used in node discovery. Can be overridden on the host or linked template level.{$PROM.NODE.IP.NOT_MATCHES}CHANGE_IF_NEEDEDThis macro is used in node discovery. Can be overridden on the host or linked template level.{$PROM.NODE.NAME.MATCHES}^.*$This macro is used in node discovery. Can be overridden on the host or linked template level.{$PROM.NODE.NAME.NOT_MATCHES}CHANGE_IF_NEEDEDThis macro is used in node discovery. Can be overridden on the host or linked template level.{$PROM.POD.DEVICE.MATCHES}^.*$Device regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.DEVICE.NOT_MATCHES}CHANGE_IF_NEEDEDDevice interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.IFNAME.MATCHES}^.*$Network interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.IFNAME.NOT_MATCHES}CHANGE_IF_NEEDEDNetwork interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.NAME.MATCHES}^.*$This macro is used in pod discovery. Can be overridden on the host or linked template level.{$PROM.POD.NAME.NOT_MATCHES}CHANGE_IF_NEEDEDThis macro is used in pod discovery. Can be overridden on the host or linked template level.{$PROM.POD.NAMESPACE.MATCHES}^.*$This macro is used in pod discovery. Can be overridden on the host or linked template level.{$PROM.POD.NAMESPACE.NOT_MATCHES}CHANGE_IF_NEEDEDThis macro is used in pod discovery. Can be overridden on the host or linked template level.{$PROM.POD.SERVICE.MATCHES}^.*$This macro is used in pod discovery. Can be overridden on the host or linked template level.{$PROM.POD.SERVICE.NOT_MATCHES}CHANGE_IF_NEEDEDThis macro is used in pod discovery. Can be overridden on the host or linked template level.{$SWAP.PFREE.MIN.WARN}50{$SYSTEM.FUZZYTIME.MAX}60{$VFS.DEV.DEVNAME.MATCHES}.+This macro is used in block devices discovery. Can be overridden on the host or linked template level{$VFS.DEV.DEVNAME.NOT_MATCHES}^(loop[0-9]*|sd[a-z][0-9]+|nbd[0-9]+|sr[0-9]+|fd[0-9]+|dm-[0-9]+|ram[0-9]+|ploop[a-z0-9]+|md[0-9]*|hcp[0-9]*|zram[0-9]*)This macro is used in block devices discovery. Can be overridden on the host or linked template level{$VFS.DEV.READ.AWAIT.WARN}20Disk read average response time (in ms) before the trigger would fire{$VFS.DEV.WRITE.AWAIT.WARN}20Disk write average response time (in ms) before the trigger would fire{$VFS.FS.FSDEVICE.MATCHES}^.+$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSDEVICE.NOT_MATCHES}^\s$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSNAME.MATCHES}.+This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSNAME.NOT_MATCHES}^(/dev|/sys|/run|/proc|.+/shm$)This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSTYPE.MATCHES}^(btrfs|ext2|ext3|ext4|reiser|xfs|ffs|ufs|jfs|jfs2|vxfs|hfs|apfs|refs|ntfs|fat32|zfs)$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSTYPE.NOT_MATCHES}^\s$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.INODE.PFREE.MIN.CRIT}10{$VFS.FS.INODE.PFREE.MIN.WARN}20{$VFS.FS.PUSED.MAX.CRIT}90{$VFS.FS.PUSED.MAX.WARN}80Kube Node by Prom APIKube Node by Prom API## Description
This template works out of the box as soon as Prometheus (Prometheus-operator) is available inside your cluster; it does not require any Zabbix agent installation or configuration. It allows external monitoring of the Kubernetes cluster through ingress, without any NodePort declaration. It uses the Prometheus API to create a Zabbix host for each pod available inside the Kubernetes cluster. {$PROM.API.URL} must contains the Prometheus entry point into your Kubernetes cluster. Zabbix pod hosts are created with the "Template Kube Pod by Prom API" template by default.
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
## Description
Official Linux template using node exporter. Known Issues: Description: node_exporter v0.16.0 renamed many metrics. CPU utilization for 'guest' and 'guest_nice' metrics are not supported in this template with node_exporter < 0.16. Disk IO metrics are not supported. Other metrics provided as 'best effort'. See https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 for details. Version: below 0.16.0 Description: metric node_network_info with label 'device' cannot be found, so network discovery is not possible. Version: below 0.18 You can discuss this template or leave feedback on our forum https://www.zabbix.com/forum/zabbix-suggestions-and-feedback/387225-discussion-thread-for-official-zabbix-template-for-linux Template tooling version used: 0.34
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
Templates/Operating systemsCPUGeneralInventoryMemoryMonitoring agentNetwork interfacesStatusStorageZabbix raw items- Version of node_exporter runningDEPENDENTagent.version[node_exporter]07d0CHARMonitoring agentJSONPATH$[?(@.metric['__name__']=='node_uname_info')].metric.versionJAVASCRIPTreturn JSON.parse(value)[0];DISCARD_UNCHANGED_HEARTBEAT1dnode_exporter.get
- Number of open file descriptorsDEPENDENTfd.open[node_exporter]07dFLOATGeneralJSONPATH$[?(@.metric['__name__']=='node_filefd_allocated')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Maximum number of open file descriptorsDEPENDENTkernel.maxfiles[node_exporter]07dFLOATIt could be increased by using sysctrl utility or modifying file /etc/sysctl.conf.GeneralJSONPATH$[?(@.metric['__name__']=='node_filefd_maximum')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)DISCARD_UNCHANGED_HEARTBEAT1dnode_exporter.get{last()}<{$KERNEL.MAXFILES.MIN}Configured max number of open filedescriptors is too low (< {$KERNEL.MAXFILES.MIN})INFORunning out of file descriptors (less than < 20% free){Kube Node by Prom API:fd.open[node_exporter].last()}/{Kube Node by Prom API:kernel.maxfiles[node_exporter].last()}*100>80
- Get node_exporter metricsHTTP_AGENTnode_exporter.get1h0TEXTZabbix raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~'^node_.*$',instance=~'^{HOST.HOST}:{$NODE_EXPORTER_PORT}$',container='node-exporter'}) by (__name__,cpu,mode,device,ifalias,operstate,filesystem,mountpoint,fstype,nodename,machine,sysname,release,version){nodata(30m)}=1node_exporter is not available (or no data for 30m)WARNINGFailed to fetch system metrics from node_exporter in time.YES
- System boot timeDEPENDENTsystem.boottime[node_exporter]07dFLOATunixtimeGeneralJSONPATH$[?(@.metric['__name__']=='node_boot_time_seconds')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- CPU guest timeDEPENDENTsystem.cpu.guest[node_exporter]07dFLOAT%Guest time (time spent running a virtual CPU for a guest operating system)CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_guest_seconds_total' && @.metric.mode=='user')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU guest nice timeDEPENDENTsystem.cpu.guest_nice[node_exporter]07dFLOAT%Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_guest_seconds_total' && @.metric.mode=='nice')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;
CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU idle timeDEPENDENTsystem.cpu.idle[node_exporter]07dFLOAT%The time the CPU has spent doing nothing.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='idle')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU interrupt timeDEPENDENTsystem.cpu.interrupt[node_exporter]07dFLOAT%The amount of time the CPU has been servicing hardware interrupts.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='irq')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- Interrupts per secondDEPENDENTsystem.cpu.intr[node_exporter]07dFLOATCPUJSONPATH$[?(@.metric['__name__']=='node_intr_total')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get
- CPU iowait timeDEPENDENTsystem.cpu.iowait[node_exporter]07dFLOAT%Amount of time the CPU has been waiting for I/O to complete.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='iowait')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- Load average (1m avg)DEPENDENTsystem.cpu.load.avg1[node_exporter]07dFLOATCPUJSONPATH$[?(@.metric['__name__']=='node_load1')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Load average (5m avg)DEPENDENTsystem.cpu.load.avg5[node_exporter]07dFLOATCPUJSONPATH$[?(@.metric['__name__']=='node_load5')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Load average (15m avg)DEPENDENTsystem.cpu.load.avg15[node_exporter]07dFLOATCPUJSONPATH$[?(@.metric['__name__']=='node_load15')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- CPU nice timeDEPENDENTsystem.cpu.nice[node_exporter]07dFLOAT%The time the CPU has spent running users' processes that have been niced.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='nice')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- Number of CPUsDEPENDENTsystem.cpu.num[node_exporter]07dCPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric['mode']=='idle')].value[1]JAVASCRIPT//count the number of cores
return JSON.parse(value).length
node_exporter.get
- CPU softirq timeDEPENDENTsystem.cpu.softirq[node_exporter]07dFLOAT%The amount of time the CPU has been servicing software interrupts.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='softirq')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU steal timeDEPENDENTsystem.cpu.steal[node_exporter]07dFLOAT%The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='steal')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- Context switches per secondDEPENDENTsystem.cpu.switches[node_exporter]07dFLOATCPUJSONPATH$[?(@.metric['__name__']=='node_context_switches_total')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get
- CPU system timeDEPENDENTsystem.cpu.system[node_exporter]07dFLOAT%The time the CPU has spent running the kernel and its processes.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='system')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU user timeDEPENDENTsystem.cpu.user[node_exporter]07dFLOAT%The time the CPU has spent running users' processes that are not niced.CPUJSONPATH$[?(@.metric['__name__']=='node_cpu_seconds_total' && @.metric.mode=='user')].value[1]JAVASCRIPT//calculates average, all cpu utilization
var valueArr = JSON.parse(value);
return valueArr.reduce(function(acc,obj){
return acc + parseFloat(obj)
},0)/valueArr.length;CHANGE_PER_SECONDMULTIPLIER100node_exporter.get
- CPU utilizationDEPENDENTsystem.cpu.util[node_exporter]07dFLOAT%CPU utilization in %CPUJAVASCRIPT//Calculate utilization
return (100 - value)system.cpu.idle[node_exporter]{min(5m)}>{$CPU.UTIL.CRIT}High CPU utilization (over {$CPU.UTIL.CRIT}% for 5m)Current utilization: {ITEM.LASTVALUE1}WARNINGCPU utilization is too high. The system might be slow to respond.Load average is too high (per CPU load over {$LOAD_AVG_PER_CPU.MAX.WARN} for 5m){Kube Node by Prom API:system.cpu.load.avg1[node_exporter].min(5m)}/{Kube Node by Prom API:system.cpu.num[node_exporter].last()}>{$LOAD_AVG_PER_CPU.MAX.WARN}
and {Kube Node by Prom API:system.cpu.load.avg5[node_exporter].last()}>0
and {Kube Node by Prom API:system.cpu.load.avg15[node_exporter].last()}>0
- System descriptionDEPENDENTsystem.descr[node_exporter]02w0CHARLabeled system information as provided by the uname system call.GeneralJSONPATH$[?(@.metric['__name__']=='node_uname_info')].metricJAVASCRIPTvar info = JSON.parse(value)[0];
return info.sysname + ' version: ' + info.release + ' ' + info.version;DISCARD_UNCHANGED_HEARTBEAT1dnode_exporter.get
- System local timeDEPENDENTsystem.localtime[node_exporter]07dFLOATunixtimeSystem local time of the host.GeneralJSONPATH$[?(@.metric['__name__']=='node_time_seconds')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get{fuzzytime({$SYSTEM.FUZZYTIME.MAX})}=0System time is out of sync (diff with Zabbix server > {$SYSTEM.FUZZYTIME.MAX}s)WARNINGThe host system time is different from the Zabbix server time.YES
- System nameDEPENDENTsystem.name[node_exporter]02w0CHARSystem host name.NAMEGeneralJSONPATH$[?(@.metric['__name__']=='node_uname_info')].metric.nodenameJAVASCRIPTreturn JSON.parse(value)[0];DISCARD_UNCHANGED_HEARTBEAT1dnode_exporter.get{diff()}=1 and {strlen()}>0System name has changed (new name: {ITEM.VALUE})INFOSystem name has changed. Ack to close.YES
- Operating system architectureDEPENDENTsystem.sw.arch[node_exporter]02w0CHAROperating system architecture of the host.InventoryJSONPATH$[?(@.metric['__name__']=='node_uname_info')].metric.machineJAVASCRIPTreturn JSON.parse(value)[0];DISCARD_UNCHANGED_HEARTBEAT1dnode_exporter.get
- Operating systemDEPENDENTsystem.sw.os[node_exporter]02w0CHAROSInventoryDISCARD_UNCHANGED_HEARTBEAT1dsystem.descr[node_exporter]{diff()}=1 and {strlen()}>0NONEOperating system description has changedINFOOperating system description has changed. Possible reasons that system has been updated or replaced. Ack to close.YESSystem name has changed (new name: {ITEM.VALUE}){Kube Node by Prom API:system.name[node_exporter].diff()}=1 and {Kube Node by Prom API:system.name[node_exporter].strlen()}>0
- Free swap spaceDEPENDENTsystem.swap.free[node_exporter]07dFLOATBThe free space of swap volume/file in bytes.MemoryJSONPATH$[?(@.metric['__name__']=='node_memory_SwapFree_bytes')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Free swap space in %CALCULATEDsystem.swap.pfree[node_exporter]7dFLOAT%last("system.swap.free[node_exporter]")/last("system.swap.total[node_exporter]")*100The free space of swap volume/file in percent.Memory
- Total swap spaceDEPENDENTsystem.swap.total[node_exporter]07dFLOATBThe total space of swap volume/file in bytes.MemoryJSONPATH$[?(@.metric['__name__']=='node_memory_SwapTotal_bytes')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- System uptimeDEPENDENTsystem.uptime[node_exporter]02w0uptimeSystem uptime in 'N days, hh:mm:ss' format.StatusJSONPATH$[?(@.metric['__name__']=='node_boot_time_seconds')].value[1]JAVASCRIPT//use boottime to calculate uptime
return (Math.floor(Date.now()/1000)-Number(JSON.parse(value)[0]));node_exporter.get{last()}<10m{HOST.NAME} has been restarted (uptime < 10m)WARNINGThe device uptime is less than 10 minutesYES
- Available memoryDEPENDENTvm.memory.available[node_exporter]07dFLOATBAvailable memory, in Linux, available = free + buffers + cache. On other platforms calculation may vary. See also: https://www.zabbix.com/documentation/current/manual/appendix/items/vm.memory.size_paramsMemoryJSONPATH$[?(@.metric['__name__']=='node_memory_MemAvailable_bytes')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Total memoryDEPENDENTvm.memory.total[node_exporter]07dFLOATBTotal memory in BytesMemoryJSONPATH$[?(@.metric['__name__']=='node_memory_MemTotal_bytes')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get
- Memory utilizationCALCULATEDvm.memory.util[node_exporter]7dFLOAT%(last("vm.memory.total[node_exporter]")-last("vm.memory.available[node_exporter]"))/last("vm.memory.total[node_exporter]")*100Memory used percentage is calculated as (total-available)/total*100Memory{min(5m)}>{$MEMORY.UTIL.MAX}High memory utilization ( >{$MEMORY.UTIL.MAX}% for 5m)AVERAGEThe system is running out of free memory.Lack of available memory ( < {$MEMORY.AVAILABLE.MIN} of {ITEM.VALUE2}){Kube Node by Prom API:vm.memory.available[node_exporter].min(5m)}<{$MEMORY.AVAILABLE.MIN} and {Kube Node by Prom API:vm.memory.total[node_exporter].last()}>0
Network interface discoveryDEPENDENTnet.if.discovery[node_exporter]0AND{#IFNAME}{$NET.IF.IFNAME.NOT_MATCHES}NOT_MATCHES_REGEXB{#IFALIAS}{$NET.IF.IFALIAS.NOT_MATCHES}NOT_MATCHES_REGEXA{#IFOPERSTATUS}{$NET.IF.IFOPERSTATUS.NOT_MATCHES}NOT_MATCHES_REGEXCDiscovery of network interfaces. Requires node_exporter v0.18 and up.Interface {#IFNAME}({#IFALIAS}): Inbound packets discardedDEPENDENTnet.if.in.discards[node_exporter,"{#IFNAME}"]07dFLOATInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_receive_drop_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.getInterface {#IFNAME}({#IFALIAS}): Inbound packets with errorsDEPENDENTnet.if.in.errors[node_exporter,"{#IFNAME}"]07dFLOATInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_receive_errs_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.getInterface {#IFNAME}({#IFALIAS}): Bits receivedDEPENDENTnet.if.in[node_exporter,"{#IFNAME}"]07dFLOATbpsInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_receive_bytes_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDMULTIPLIER8node_exporter.getInterface {#IFNAME}({#IFALIAS}): Outbound packets discardedDEPENDENTnet.if.out.discards[node_exporter,"{#IFNAME}"]07dFLOATInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_transmit_drop_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.getInterface {#IFNAME}({#IFALIAS}): Outbound packets with errorsDEPENDENTnet.if.out.errors[node_exporter"{#IFNAME}"]07dFLOATInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_transmit_errs_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.getInterface {#IFNAME}({#IFALIAS}): Bits sentDEPENDENTnet.if.out[node_exporter,"{#IFNAME}"]07dFLOATbpsInterface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_transmit_bytes_total' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDMULTIPLIER8node_exporter.getInterface {#IFNAME}({#IFALIAS}): SpeedDEPENDENTnet.if.speed[node_exporter,"{#IFNAME}"]07d0bpsSets value to 0 if metric is missing in node_exporter output.Interface {#IFNAME}({#IFALIAS})JSONPATH$[?(@.metric['__name__']=='node_network_speed_bytes' && @.metric.device=='{#IFNAME}')].value[1]CUSTOM_VALUE["0"]JAVASCRIPTreturn JSON.parse(value).map(Number)MULTIPLIER8node_exporter.getInterface {#IFNAME}({#IFALIAS}): Operational statusDEPENDENTnet.if.status[node_exporter,"{#IFNAME}"]07d0Indicates the interface RFC2863 operational state as a string.
Possible values are:"unknown", "notpresent", "down", "lowerlayerdown", "testing","dormant", "up".
Reference: https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-netInterface {#IFNAME}({#IFALIAS})IF-MIB::ifOperStatusJSONPATH$[?(@.metric['__name__']=='node_network_info' && @.metric.device=='{#IFNAME}')].metric.operstateJAVASCRIPTvar newvalue;
switch(JSON.parse(value)[0]) {
case "up":
newvalue = 1;
break;
case "down":
newvalue = 2;
break;
case "testing":
newvalue = 4;
break;
case "unknown":
newvalue = 5;
break;
case "dormant":
newvalue = 6;
break;
case "notPresent":
newvalue = 7;
break;
default:
newvalue = "Problem parsing interface operstate in JS";
}
return newvalue;node_exporter.get{$IFCONTROL:"{#IFNAME}"}=1 and ({last()}=2 and {diff()}=1)RECOVERY_EXPRESSION{last()}<>2Interface {#IFNAME}({#IFALIAS}): Link downCurrent state: {ITEM.LASTVALUE1}AVERAGEThis trigger expression works as follows:
1. Can be triggered if operations status is down.
2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down.
3. {TEMPLATE_NAME:METRIC.diff()}=1) - trigger fires only if operational status was up(1) sometime before. (So, do not fire 'ethernal off' interfaces.)
WARNING: if closed manually - won't fire again on next poll, because of .diff.YESInterface {#IFNAME}({#IFALIAS}): Interface typeDEPENDENTnet.if.type[node_exporter,"{#IFNAME}"]07d0node_network_protocol_type protocol_type value of /sys/class/net/<iface>.Interface {#IFNAME}({#IFALIAS})Linux::Interface protocol typesJSONPATH$[?(@.metric['__name__']=='node_network_protocol_type' && @.metric.device=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].change()}<0 and {Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()}>0
and (
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=6 or
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=7 or
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=11 or
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=62 or
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=69 or
{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=117
)
and
({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2)RECOVERY_EXPRESSION({Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].change()}>0 and {Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].prev()}>0) or
({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2)Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was beforeCurrent reported speed: {ITEM.LASTVALUE1}INFOThis Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Ack to close.YESInterface {#IFNAME}({#IFALIAS}): Link down{$IFCONTROL:"{#IFNAME}"}=1 and ({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2 and {Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].diff()}=1){Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2{Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].change()}<0 and {Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}>0
and
({Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=6
or {Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].last()}=1)
and
({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2)RECOVERY_EXPRESSION({Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].change()}>0 and {Kube Node by Prom API:net.if.type[node_exporter,"{#IFNAME}"].prev()}>0) or
({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2)Interface {#IFNAME}({#IFALIAS}): Ethernet has changed to lower speed than it was beforeCurrent reported speed: {ITEM.LASTVALUE1}INFOThis Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Ack to close.YESInterface {#IFNAME}({#IFALIAS}): Link down{$IFCONTROL:"{#IFNAME}"}=1 and ({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2 and {Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].diff()}=1){Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2({Kube Node by Prom API:net.if.in[node_exporter,"{#IFNAME}"].avg(15m)}>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()} or
{Kube Node by Prom API:net.if.out[node_exporter,"{#IFNAME}"].avg(15m)}>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()}) and
{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()}>0RECOVERY_EXPRESSION{Kube Node by Prom API:net.if.in[node_exporter,"{#IFNAME}"].avg(15m)}<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()} and
{Kube Node by Prom API:net.if.out[node_exporter,"{#IFNAME}"].avg(15m)}<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*{Kube Node by Prom API:net.if.speed[node_exporter,"{#IFNAME}"].last()}Interface {#IFNAME}({#IFALIAS}): High bandwidth usage ( > {$IF.UTIL.MAX:"{#IFNAME}"}% )In: {ITEM.LASTVALUE1}, out: {ITEM.LASTVALUE3}, speed: {ITEM.LASTVALUE2}WARNINGThe network interface utilization is close to its estimated maximum bandwidth.YESInterface {#IFNAME}({#IFALIAS}): Link down{$IFCONTROL:"{#IFNAME}"}=1 and ({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2 and {Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].diff()}=1){Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2{Kube Node by Prom API:net.if.in.errors[node_exporter,"{#IFNAME}"].min(5m)}>{$IF.ERRORS.WARN:"{#IFNAME}"}
or {Kube Node by Prom API:net.if.out.errors[node_exporter"{#IFNAME}"].min(5m)}>{$IF.ERRORS.WARN:"{#IFNAME}"}RECOVERY_EXPRESSION{Kube Node by Prom API:net.if.in.errors[node_exporter,"{#IFNAME}"].max(5m)}<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8
and {Kube Node by Prom API:net.if.out.errors[node_exporter"{#IFNAME}"].max(5m)}<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8Interface {#IFNAME}({#IFALIAS}): High error rate ( > {$IF.ERRORS.WARN:"{#IFNAME}"} for 5m)errors in: {ITEM.LASTVALUE1}, errors out: {ITEM.LASTVALUE2}WARNINGRecovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} thresholdYESInterface {#IFNAME}({#IFALIAS}): Link down{$IFCONTROL:"{#IFNAME}"}=1 and ({Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}=2 and {Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].diff()}=1){Kube Node by Prom API:net.if.status[node_exporter,"{#IFNAME}"].last()}<>2Interface {#IFNAME}({#IFALIAS}): Network trafficGRADIENT_LINE1A7C11- Kube Node by Prom APInet.if.in[node_exporter,"{#IFNAME}"]
1BOLD_LINE2774A4- Kube Node by Prom APInet.if.out[node_exporter,"{#IFNAME}"]
2F63100RIGHT- Kube Node by Prom APInet.if.out.errors[node_exporter"{#IFNAME}"]
3A54F10RIGHT- Kube Node by Prom APInet.if.in.errors[node_exporter,"{#IFNAME}"]
4FC6EA3RIGHT- Kube Node by Prom APInet.if.out.discards[node_exporter,"{#IFNAME}"]
56C59DCRIGHT- Kube Node by Prom APInet.if.in.discards[node_exporter,"{#IFNAME}"]
node_exporter.get{#IFALIAS}$.ifalias{#IFNAME}$.device{#IFOPERSTATUS}$.operstateJSONPATH$[?(@.metric['__name__']=='node_network_info' && @.metric.device=~'{$NET.IF.IFNAME.MATCHES}' && @.metric.ifalias=~'{$NET.IF.IFALIAS.MATCHES}' && @.metric.operstate=~'{$NET.IF.IFOPERSTATUS.MATCHES}')].metricJAVASCRIPTreturn JSON.stringify(JSON.parse(value).map(
function(metric){if(!("ifalias" in metric)) {metric.ifalias=""} return metric}
))
Block devices discoveryDEPENDENTvfs.dev.discovery[node_exporter]0AND{#DEVNAME}{$VFS.DEV.DEVNAME.NOT_MATCHES}NOT_MATCHES_REGEXA{#DEVNAME}: Disk average queue size (avgqu-sz)DEPENDENTvfs.dev.queue_size[node_exporter,"{#DEVNAME}"]07dFLOATCurrent average disk queue, the number of requests outstanding on the disk at the time the performance data is collected.Disk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_io_time_weighted_seconds_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get{#DEVNAME}: Disk read request avg waiting time (r_await)CALCULATEDvfs.dev.read.await[node_exporter,"{#DEVNAME}"]7dFLOAT!ms(last("vfs.dev.read.time.rate[node_exporter,\"{#DEVNAME}\"]")/(last("vfs.dev.read.rate[node_exporter,\"{#DEVNAME}\"]")+(last("vfs.dev.read.rate[node_exporter,\"{#DEVNAME}\"]")=0)))*1000*(last("vfs.dev.read.rate[node_exporter,\"{#DEVNAME}\"]") > 0)This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.Disk {#DEVNAME}{#DEVNAME}: Disk read rateDEPENDENTvfs.dev.read.rate[node_exporter,"{#DEVNAME}"]07dFLOAT!r/sr/s. The number (after merges) of read requests completed per second for the device.Disk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_reads_completed_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get{#DEVNAME}: Disk read time (rate)DEPENDENTvfs.dev.read.time.rate[node_exporter,"{#DEVNAME}"]07dFLOATRate of total read time counter. Used in r_await calculationDisk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_read_time_seconds_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get{#DEVNAME}: Disk utilizationDEPENDENTvfs.dev.util[node_exporter,"{#DEVNAME}"]07dFLOAT%This item is the percentage of elapsed time that the selected disk drive was busy servicing read or writes requests.Disk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_io_time_seconds_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDMULTIPLIER100node_exporter.get{#DEVNAME}: Disk write request avg waiting time (w_await)CALCULATEDvfs.dev.write.await[node_exporter,"{#DEVNAME}"]7dFLOAT!ms(last("vfs.dev.write.time.rate[node_exporter,\"{#DEVNAME}\"]")/(last("vfs.dev.write.rate[node_exporter,\"{#DEVNAME}\"]")+(last("vfs.dev.write.rate[node_exporter,\"{#DEVNAME}\"]")=0)))*1000*(last("vfs.dev.write.rate[node_exporter,\"{#DEVNAME}\"]") > 0)This formula contains two boolean expressions that evaluates to 1 or 0 in order to set calculated metric to zero and to avoid division by zero exception.Disk {#DEVNAME}{#DEVNAME}: Disk write rateDEPENDENTvfs.dev.write.rate[node_exporter,"{#DEVNAME}"]07dFLOAT!w/sw/s. The number (after merges) of write requests completed per second for the device.Disk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_writes_completed_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get{#DEVNAME}: Disk write time (rate)DEPENDENTvfs.dev.write.time.rate[node_exporter,"{#DEVNAME}"]07dFLOATRate of total write time counter. Used in w_await calculationDisk {#DEVNAME}JSONPATH$[?(@.metric['__name__']=='node_disk_write_time_seconds_total' && @.metric.device=='{#DEVNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)CHANGE_PER_SECONDnode_exporter.get{Kube Node by Prom API:vfs.dev.read.await[node_exporter,"{#DEVNAME}"].min(15m)} > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} or {Kube Node by Prom API:vfs.dev.write.await[node_exporter,"{#DEVNAME}"].min(15m)} > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"}{#DEVNAME}: Disk read/write request responses are too high (read > {$VFS.DEV.READ.AWAIT.WARN:"{#DEVNAME}"} ms for 15m or write > {$VFS.DEV.WRITE.AWAIT.WARN:"{#DEVNAME}"} ms for 15m)WARNINGThis trigger might indicate disk {#DEVNAME} saturation.YES{#DEVNAME}: Disk average waiting time1A7C11- Kube Node by Prom APIvfs.dev.read.await[node_exporter,"{#DEVNAME}"]
1GRADIENT_LINE2774A4- Kube Node by Prom APIvfs.dev.write.await[node_exporter,"{#DEVNAME}"]
{#DEVNAME}: Disk read/write rates1A7C11- Kube Node by Prom APIvfs.dev.read.rate[node_exporter,"{#DEVNAME}"]
1GRADIENT_LINE2774A4- Kube Node by Prom APIvfs.dev.write.rate[node_exporter,"{#DEVNAME}"]
{#DEVNAME}: Disk utilization and queue1A7C11RIGHT- Kube Node by Prom APIvfs.dev.queue_size[node_exporter,"{#DEVNAME}"]
1GRADIENT_LINE2774A4- Kube Node by Prom APIvfs.dev.util[node_exporter,"{#DEVNAME}"]
node_exporter.get{#DEVNAME}$.deviceJSONPATH$[?(@.metric['__name__']=='node_disk_io_now' && @.metric.device=~'{$VFS.DEV.DEVNAME.MATCHES}')].metricMounted filesystem discoveryDEPENDENTvfs.fs.discovery[node_exporter]0AND{#FSTYPE}{$VFS.FS.FSTYPE.NOT_MATCHES}NOT_MATCHES_REGEXC{#FSNAME}{$VFS.FS.FSNAME.NOT_MATCHES}NOT_MATCHES_REGEXB{#FSDEVICE}{$VFS.FS.FSDEVICE.NOT_MATCHES}NOT_MATCHES_REGEXADiscovery of file systems of different types.{#FSNAME}: Free spaceDEPENDENTvfs.fs.free[node_exporter,"{#FSNAME}"]07dFLOATBFilesystem {#FSNAME}JSONPATH$[?(@.metric['__name__']=='node_filesystem_avail_bytes' && @.metric.device=='{#FSDEVICE}' && @.metric.fstype=='{#FSTYPE}' && @.metric.mountpoint=='{#FSNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get{#FSNAME}: Free inodes in %DEPENDENTvfs.fs.inode.pfree[node_exporter,"{#FSNAME}"]07dFLOAT%Filesystem {#FSNAME}JSONPATH$[?(@.metric['__name__']=~'node_filesystem_files.*' && @.metric.device=='{#FSDEVICE}' && @.metric.fstype=='{#FSTYPE}' && @.metric.mountpoint=='{#FSNAME}')]JAVASCRIPT//count vfs.fs.inode.pfree
var inode_free;
var inode_total;
JSON.parse(value).forEach(function(value) {
if (value.metric['__name__'] == 'node_filesystem_files'){
inode_total = value.value[1];
} else if (value.metric['__name__'] == 'node_filesystem_files_free'){
inode_free = value.value[1];
}
});
return (inode_free/inode_total)*100;
node_exporter.get{min(5m)}<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"}{#FSNAME}: Running out of free inodes (free < {$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"}%)Free inodes: {ITEM.LASTVALUE1}AVERAGEIt may become impossible to write to disk if there are no index nodes left.
As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available.{min(5m)}<{$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"}{#FSNAME}: Running out of free inodes (free < {$VFS.FS.INODE.PFREE.MIN.WARN:"{#FSNAME}"}%)Free inodes: {ITEM.LASTVALUE1}WARNINGIt may become impossible to write to disk if there are no index nodes left.
As symptoms, 'No space left on device' or 'Disk is full' errors may be seen even though free space is available.{#FSNAME}: Running out of free inodes (free < {$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"}%){Kube Node by Prom API:vfs.fs.inode.pfree[node_exporter,"{#FSNAME}"].min(5m)}<{$VFS.FS.INODE.PFREE.MIN.CRIT:"{#FSNAME}"}{#FSNAME}: Space utilizationCALCULATEDvfs.fs.pused[node_exporter,"{#FSNAME}"]7dFLOAT%(last("vfs.fs.used[node_exporter,\"{#FSNAME}\"]")/last("vfs.fs.total[node_exporter,\"{#FSNAME}\"]"))*100Space utilization in % for {#FSNAME}Filesystem {#FSNAME}{#FSNAME}: Total spaceDEPENDENTvfs.fs.total[node_exporter,"{#FSNAME}"]07dFLOATBTotal space in BytesFilesystem {#FSNAME}JSONPATH$[?(@.metric['__name__']=='node_filesystem_size_bytes' && @.metric.device=='{#FSDEVICE}' && @.metric.fstype=='{#FSTYPE}' && @.metric.mountpoint=='{#FSNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number)node_exporter.get{#FSNAME}: Used spaceCALCULATEDvfs.fs.used[node_exporter,"{#FSNAME}"]7dFLOATB(last("vfs.fs.total[node_exporter,\"{#FSNAME}\"]")-last("vfs.fs.free[node_exporter,\"{#FSNAME}\"]"))Used storage in BytesFilesystem {#FSNAME}{Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].last()}>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and
(({Kube Node by Prom API:vfs.fs.total[node_exporter,"{#FSNAME}"].last()}-{Kube Node by Prom API:vfs.fs.used[node_exporter,"{#FSNAME}"].last()})<5G or {Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].timeleft(1h,,100)}<1d){#FSNAME}: Disk space is critically low (used > {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}%)Space used: {ITEM.LASTVALUE3} of {ITEM.LASTVALUE2} ({ITEM.LASTVALUE1})AVERAGETwo conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}.
Second condition should be one of the following:
- The disk free space is less than 5G.
- The disk will be full in less than 24 hours.YES{Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].last()}>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and
(({Kube Node by Prom API:vfs.fs.total[node_exporter,"{#FSNAME}"].last()}-{Kube Node by Prom API:vfs.fs.used[node_exporter,"{#FSNAME}"].last()})<10G or {Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].timeleft(1h,,100)}<1d){#FSNAME}: Disk space is low (used > {$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"}%)Space used: {ITEM.LASTVALUE3} of {ITEM.LASTVALUE2} ({ITEM.LASTVALUE1})WARNINGTwo conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"}.
Second condition should be one of the following:
- The disk free space is less than 10G.
- The disk will be full in less than 24 hours.YES{#FSNAME}: Disk space is critically low (used > {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}%){Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].last()}>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and
(({Kube Node by Prom API:vfs.fs.total[node_exporter,"{#FSNAME}"].last()}-{Kube Node by Prom API:vfs.fs.used[node_exporter,"{#FSNAME}"].last()})<5G or {Kube Node by Prom API:vfs.fs.pused[node_exporter,"{#FSNAME}"].timeleft(1h,,100)}<1d){#FSNAME}: Disk space usage600340PIEYES969696LASTGRAPH_SUM- Kube Node by Prom APIvfs.fs.total[node_exporter,"{#FSNAME}"]
1C80000LAST- Kube Node by Prom APIvfs.fs.used[node_exporter,"{#FSNAME}"]
node_exporter.get{#FSDEVICE}$.device{#FSNAME}$.mountpoint{#FSTYPE}$.fstypeJSONPATH$[?(@.metric['__name__']=='node_filesystem_size_bytes' && @.metric.device=~'{$VFS.FS.FSDEVICE.MATCHES}' && @.metric.fstype=~'{$VFS.FS.FSTYPE.MATCHES}' && @.metric.mountpoint=~'{$VFS.FS.FSNAME.MATCHES}')].metric{$CPU.UTIL.CRIT}90{$IF.ERRORS.WARN}2{$IF.UTIL.MAX}90{$IFCONTROL}1{$KERNEL.MAXFILES.MIN}256{$LOAD_AVG_PER_CPU.MAX.WARN}1.5Load per CPU considered sustainable. Tune if needed.{$MEMORY.AVAILABLE.MIN}20M{$MEMORY.UTIL.MAX}90{$NET.IF.IFALIAS.MATCHES}^.*${$NET.IF.IFALIAS.NOT_MATCHES}CHANGE_IF_NEEDED{$NET.IF.IFNAME.MATCHES}^.*${$NET.IF.IFNAME.NOT_MATCHES}(^Software Loopback Interface|^NULL[0-9.]*$|^[Ll]o[0-9.]*$|^[Ss]ystem$|^Nu[0-9.]*$|^veth[0-9a-z]+$|docker[0-9]+|br-[a-z0-9]{12})Filter out loopbacks, nulls, docker veth links and docker0 bridge by default{$NET.IF.IFOPERSTATUS.MATCHES}^.*${$NET.IF.IFOPERSTATUS.NOT_MATCHES}^7$Ignore notPresent(7){$NODE_EXPORTER_PORT}9100TCP Port node_exporter is listening on.{$PROM.API.URL}{$SWAP.PFREE.MIN.WARN}50{$SYSTEM.FUZZYTIME.MAX}60{$VFS.DEV.DEVNAME.MATCHES}.+This macro is used in block devices discovery. Can be overridden on the host or linked template level{$VFS.DEV.DEVNAME.NOT_MATCHES}^(loop[0-9]*|sd[a-z][0-9]+|nbd[0-9]+|sr[0-9]+|fd[0-9]+|dm-[0-9]+|ram[0-9]+|ploop[a-z0-9]+|md[0-9]*|hcp[0-9]*|zram[0-9]*)This macro is used in block devices discovery. Can be overridden on the host or linked template level{$VFS.DEV.READ.AWAIT.WARN}20Disk read average response time (in ms) before the trigger would fire{$VFS.DEV.WRITE.AWAIT.WARN}20Disk write average response time (in ms) before the trigger would fire{$VFS.FS.FSDEVICE.MATCHES}^.+$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSDEVICE.NOT_MATCHES}^\s$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSNAME.MATCHES}.+This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSNAME.NOT_MATCHES}^(/dev|/sys|/run|/proc|.+/shm$)This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSTYPE.MATCHES}^(btrfs|ext2|ext3|ext4|reiser|xfs|ffs|ufs|jfs|jfs2|vxfs|hfs|apfs|refs|ntfs|fat32|zfs)$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.FSTYPE.NOT_MATCHES}^\s$This macro is used in filesystems discovery. Can be overridden on the host or linked template level{$VFS.FS.INODE.PFREE.MIN.CRIT}10{$VFS.FS.INODE.PFREE.MIN.WARN}20{$VFS.FS.PUSED.MAX.CRIT}90{$VFS.FS.PUSED.MAX.WARN}80Network interfaces1120Interface {#IFNAME}({#IFALIAS}): Network trafficKube Node by Prom API75010000112500003System performance270System loadKube Node by Prom API750100001125000030CPU usageKube Node by Prom API750100101125000030Memory usageKube Node by Prom API750100011125000030Swap usageKube Node by Prom API7501001111250000320{#FSNAME}: Disk space usageKube Node by Prom API7501000221250000320{#DEVNAME}: Disk read/write ratesKube Node by Prom API7501000321250000320{#DEVNAME}: Disk average waiting timeKube Node by Prom API7501000421250000320{#DEVNAME}: Disk utilization and queueKube Node by Prom API7501000521250000320Interface {#IFNAME}({#IFALIAS}): Network trafficKube Node by Prom API75010006212500003Kube Pod by Prom APIKube Pod by Prom API## Description
This template works out of the box as soon as Prometheus (Prometheus-operator) is available inside your cluster; it does not require any Zabbix agent installation or configuration. It allows external monitoring of the Kubernetes cluster through ingress, without any NodePort declaration. It uses the Prometheus API to create a Zabbix host for each pod available inside the Kubernetes cluster. {$PROM.API.URL} must contains the Prometheus entry point into your Kubernetes cluster. Zabbix pod hosts are created with the "Template Kube Pod by Prom API" template by default.
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
## Description
Official Linux template using node exporter. Known Issues: Description: node_exporter v0.16.0 renamed many metrics. CPU utilization for 'guest' and 'guest_nice' metrics are not supported in this template with node_exporter < 0.16. Disk IO metrics are not supported. Other metrics provided as 'best effort'. See https://github.com/prometheus/node_exporter/releases/tag/v0.16.0 for details. Version: below 0.16.0 Description: metric node_network_info with label 'device' cannot be found, so network discovery is not possible. Version: below 0.18 You can discuss this template or leave feedback on our forum https://www.zabbix.com/forum/zabbix-suggestions-and-feedback/387225-discussion-thread-for-official-zabbix-template-for-linux Template tooling version used: 0.34
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
## Overview
### Description
zabbix-kube-prom is a batch of Zabbix LLD templates for Zabbix server.
It is used for external Kubernetes monitoring by Zabbix via Prometheus API.
### Installation
1. Install [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) into the Kubernetes cluster.
2. Import global Zabbix Template (zabbix-kube-prom.xml) into your Zabbix server.
3. Create or import a host identifying your Kubernetes cluster where Prometheus is deployed.
4. Let LLD create discovered nodes as new "Zabbix hosts"
5. Let LLD create discovered pods as new "Virtual Zabbix hosts
### Templates
The global export (zabbix-kube-prom.xml) contains following templates:
| Templates | Description |
| --- | --- |
| Template Kube by Prom API | Creates a Zabbix host for each pod and node discovered. |
| Template Kube Node by Prom API | Template applied to the created host (node). |
| Template Kube Pod by Prom API | Template applied to the created host (pod). |
### Licenses
| Template | License |
| --- | --- |
| Template OS Linux by Prom | *GNU General Public License v2.0 or later*[Copyright (C) 2001-2021 Zabbix SIA](https://github.com/zabbix/zabbix/blob/master/README) |
| Template Kube by Prom APITemplate Kube Node by Prom APITemplate Kube Pod by Prom API | *GNU General Public License v3.0*[Copyright (C) 2021 Diagnostica Stago](https://www.stago.com/) |
---
## Author
Laurent Marchelli
TemplatesTemplates/KubernetesCPUGeneralInventoryMemoryMonitoringNetworkSpecStatusStorage_New Metrics_Raw items- Metrics cpuHTTP_AGENTprom.pod.metrics[cpu]30s00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^container_cpu_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by (__name__,container)
- Metrics cpu_usageHTTP_AGENTprom.pod.metrics[cpu_usage]30s00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__="node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate",pod="{HOST.NAME}",container!="POD"}) by (__name__,container)
- Metrics memoryHTTP_AGENTprom.pod.metrics[memory]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^container_memory_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by (__name__,container)
- Metrics monitoringHTTP_AGENTprom.pod.metrics[monitoring]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^prober_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by(__name__,container,probe_type,result)
- Metrics networkHTTP_AGENTprom.pod.metrics[network]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^container_network_.*$",pod="{HOST.NAME}",container="POD"}) by (__name__,interface)
- _New MetricsHTTP_AGENTprom.pod.metrics[new]5s1h0DISABLEDTEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquery{pod="{HOST.NAME}",container!="POD",container!=""}
- Metrics specHTTP_AGENTprom.pod.metrics[spec]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^container_spec_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by (__name__,container)
- Metrics storage_fsHTTP_AGENTprom.pod.metrics[storage,fs]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^container_fs_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by (__name__,container,device)
- Metrics storageHTTP_AGENTprom.pod.metrics[storage]00TEXT_Raw itemsJSONPATH$.data.result{$PROM.API.URL}/queryquerysum({__name__=~"^.*container_(file|ulimits|log)_.*$",pod="{HOST.NAME}",container!="POD",container!=""}) by (__name__,container)
Discovery cpuDEPENDENTprom.pod.discovery[cpu]0{#CONTAINER} - {#METRIC}DEPENDENTprom.pod.metrics[cpu,{#CONTAINER},{#METRIC}]0FLOATCPUJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[cpu]{#CONTAINER} - {#METRIC}1A7C11ALL- Kube Pod by Prom APIprom.pod.metrics[cpu,{#CONTAINER},{#METRIC}]
prom.pod.metrics[cpu]{#METRIC}$.metric['__name__']{#CONTAINER}$.metric.containerDiscovery cpu_usageDEPENDENTprom.pod.discovery[cpu_usage]0{#CONTAINER} - container_cpu_usage_seconds_totalDEPENDENTprom.pod.metrics[cpu_usage,{#CONTAINER},container_cpu_usage_seconds_total]0FLOATCPUJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[cpu_usage]{#CONTAINER} - container_cpu_usage_seconds_total1A7C11- Kube Pod by Prom APIprom.pod.metrics[cpu_usage,{#CONTAINER},container_cpu_usage_seconds_total]
prom.pod.metrics[cpu_usage]{#CONTAINER}$.metric.container{#METRIC}$.metric['__name__']Discovery memoryDEPENDENTprom.pod.discovery[memory]0{#CONTAINER} - {#METRIC}DEPENDENTprom.pod.metrics[memory,{#CONTAINER},{#METRIC}]0FLOATMemoryJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[memory]prom.pod.metrics[memory]{#METRIC}$.metric['__name__']{#CONTAINER}$.metric.containerDiscovery monitoringDEPENDENTprom.pod.discovery[monitoring]0{#METRIC} ({#TYPE},{#RESULT})DEPENDENTprom.pod.metrics[monitoring,{#METRIC},{#TYPE},{#RESULT}]0FLOATMonitoringJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.probe_type=='{#TYPE}' && @.metric.result=='{#RESULT}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[monitoring]prom.pod.metrics[monitoring]{#METRIC}$.metric['__name__']{#RESULT}$.metric.result{#TYPE}$.metric.probe_typeDiscovery networkDEPENDENTprom.pod.discovery[network]0Network {#IFNAME}: {#METRIC}DEPENDENTprom.pod.metrics[network,{#METRIC},{#IFNAME}]0FLOATNetwork {#IFNAME}JSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.interface=='{#IFNAME}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[network]prom.pod.metrics[network]{#IFNAME}$.metric.interface{#METRIC}$.metric['__name__']_New DiscoveryDEPENDENTprom.pod.discovery[new]0DISABLEDAND{#METRIC}^container_cpu_.*$NOT_MATCHES_REGEXA{#METRIC}^container_memory_.*$NOT_MATCHES_REGEXB{#METRIC}^prober_.*$NOT_MATCHES_REGEXC{#METRIC}^container_spec_.*$NOT_MATCHES_REGEXD{#METRIC}^.*container_(file|ulimits|log)_.*$NOT_MATCHES_REGEXE{#METRIC}^container_fs_.*$NOT_MATCHES_REGEXF{#METRIC}^kube_pod_.*$NOT_MATCHES_REGEXG{#CONTAINER} - {#METRIC}DEPENDENTprom.pod.metrics[new,{#CONTAINER},{#METRIC}]0FLOAT_New MetricsJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[new]prom.pod.metrics[new]{#METRIC}$.metric['__name__']{#CONTAINER}$.metric.containerDiscovery specDEPENDENTprom.pod.discovery[spec]0{#CONTAINER} - {#METRIC}DEPENDENTprom.pod.metrics[spec,{#CONTAINER},{#METRIC}]0FLOATSpecJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[spec]prom.pod.metrics[spec]{#METRIC}$.metric['__name__']{#CONTAINER}$.metric.containerDiscovery storage_fsDEPENDENTprom.pod.discovery[storage,fs]0AND{#CONTAINER} - Storage {#DEVICE}: {#METRIC}DEPENDENTprom.pod.metrics[storage,{#CONTAINER},{#METRIC},{#DEVICE}]0FLOATStorage {#DEVICE}JSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}' && @.metric.device=='{#DEVICE}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[storage,fs]prom.pod.metrics[storage,fs]{#METRIC}$.metric['__name__']{#DEVICE}$.metric.device{#CONTAINER}$.metric.containerDiscovery storageDEPENDENTprom.pod.discovery[storage]0{#CONTAINER} - {#METRIC}DEPENDENTprom.pod.metrics[storage,{#CONTAINER},{#METRIC}]0FLOATStorageJSONPATH$[?(@.metric['__name__']=='{#METRIC}' && @.metric.container=='{#CONTAINER}')].value[1]JAVASCRIPTreturn JSON.parse(value).map(Number);prom.pod.metrics[storage]prom.pod.metrics[storage]{#METRIC}$.metric['__name__']{#CONTAINER}$.metric.container{$PROM.POD.DEVICE.MATCHES}^.*$Device regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.DEVICE.NOT_MATCHES}CHANGE_IF_NEEDEDDevice interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.IFNAME.MATCHES}^.*$Network interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{$PROM.POD.IFNAME.NOT_MATCHES}CHANGE_IF_NEEDEDNetwork interface regex used in pod's metric discovery. Can be overridden on the host or linked template level.{Kube Node by Prom API:system.swap.pfree[node_exporter].min(5m)}<{$SWAP.PFREE.MIN.WARN} and {Kube Node by Prom API:system.swap.total[node_exporter].last()}>0High swap space usage ( less than {$SWAP.PFREE.MIN.WARN}% free)Free: {ITEM.LASTVALUE1}, total: {ITEM.LASTVALUE2}WARNINGThis trigger is ignored, if there is no swap configuredHigh memory utilization ( >{$MEMORY.UTIL.MAX}% for 5m){Kube Node by Prom API:vm.memory.util[node_exporter].min(5m)}>{$MEMORY.UTIL.MAX}Lack of available memory ( < {$MEMORY.AVAILABLE.MIN} of {ITEM.VALUE2}){Kube Node by Prom API:vm.memory.available[node_exporter].min(5m)}<{$MEMORY.AVAILABLE.MIN} and {Kube Node by Prom API:vm.memory.total[node_exporter].last()}>0{Kube Node by Prom API:vm.memory.available[node_exporter].min(5m)}<{$MEMORY.AVAILABLE.MIN} and {Kube Node by Prom API:vm.memory.total[node_exporter].last()}>0Lack of available memory ( < {$MEMORY.AVAILABLE.MIN} of {ITEM.VALUE2})Available: {ITEM.LASTVALUE1}, total: {ITEM.LASTVALUE2}AVERAGE{Kube Node by Prom API:system.cpu.load.avg1[node_exporter].min(5m)}/{Kube Node by Prom API:system.cpu.num[node_exporter].last()}>{$LOAD_AVG_PER_CPU.MAX.WARN}
and {Kube Node by Prom API:system.cpu.load.avg5[node_exporter].last()}>0
and {Kube Node by Prom API:system.cpu.load.avg15[node_exporter].last()}>0Load average is too high (per CPU load over {$LOAD_AVG_PER_CPU.MAX.WARN} for 5m)Load averages(1m 5m 15m): ({ITEM.LASTVALUE1} {ITEM.LASTVALUE3} {ITEM.LASTVALUE4}), # of CPUs: {ITEM.LASTVALUE2}AVERAGEPer CPU load average is too high. Your system may be slow to respond.{Kube Node by Prom API:fd.open[node_exporter].last()}/{Kube Node by Prom API:kernel.maxfiles[node_exporter].last()}*100>80Running out of file descriptors (less than < 20% free){ITEM.LASTVALUE1} of {ITEM.LASTVALUE2} file descriptors are in use.WARNINGCPU jumps1A7C11- Kube Node by Prom APIsystem.cpu.switches[node_exporter]
12774A4- Kube Node by Prom APIsystem.cpu.intr[node_exporter]
CPU usageSTACKEDFIXEDFIXED1A7C11- Kube Node by Prom APIsystem.cpu.system[node_exporter]
12774A4- Kube Node by Prom APIsystem.cpu.user[node_exporter]
2F63100- Kube Node by Prom APIsystem.cpu.nice[node_exporter]
3A54F10- Kube Node by Prom APIsystem.cpu.iowait[node_exporter]
4FC6EA3- Kube Node by Prom APIsystem.cpu.steal[node_exporter]
56C59DC- Kube Node by Prom APIsystem.cpu.interrupt[node_exporter]
6AC8C14- Kube Node by Prom APIsystem.cpu.softirq[node_exporter]
7611F27- Kube Node by Prom APIsystem.cpu.guest[node_exporter]
8F230E0- Kube Node by Prom APIsystem.cpu.guest_nice[node_exporter]
CPU utilizationFIXEDFIXEDGRADIENT_LINE1A7C11- Kube Node by Prom APIsystem.cpu.util[node_exporter]
Memory usageFIXEDBOLD_LINE1A7C11- Kube Node by Prom APIvm.memory.total[node_exporter]
1GRADIENT_LINE2774A4- Kube Node by Prom APIvm.memory.available[node_exporter]
Memory utilizationFIXEDFIXEDGRADIENT_LINE1A7C11- Kube Node by Prom APIvm.memory.util[node_exporter]
Swap usage1A7C11- Kube Node by Prom APIsystem.swap.free[node_exporter]
12774A4- Kube Node by Prom APIsystem.swap.total[node_exporter]
System loadFIXED1A7C11- Kube Node by Prom APIsystem.cpu.load.avg1[node_exporter]
12774A4- Kube Node by Prom APIsystem.cpu.load.avg5[node_exporter]
2F63100- Kube Node by Prom APIsystem.cpu.load.avg15[node_exporter]
3A54F10RIGHT- Kube Node by Prom APIsystem.cpu.num[node_exporter]
IF-MIB::ifOperStatus1up2down4unknown5dormant6notPresent7lowerLayerDownLinux::Interface protocol types0from KA9Q: NET/ROM pseudo1Ethernet2Experimental Ethernet3AX.25 Level 24PROnet token ring5Chaosnet6IEEE 802.2 Ethernet/TR/TB7ARCnet8APPLEtalk15Frame Relay DLCI19ATM23Metricom STRIP (new IANA id)24IEEE 1394 IPv4 - RFC 273427EUI-6432InfiniBand256ARPHRD_SLIP257ARPHRD_CSLIP258ARPHRD_SLIP6259ARPHRD_CSLIP6260Notional KISS type264ARPHRD_ADAPT270ARPHRD_ROSE271CCITT X.25272Boards with X.25 in firmware280Controller Area Network512ARPHRD_PPP513Cisco HDLC516LAPB517Digital's DDCMP protocol518Raw HDLC519Raw IP768IPIP tunnel769IP6IP6 tunnel770Frame Relay Access Device771SKIP vif772Loopback device773Localtalk device774Fiber Distributed Data Interface775AP1000 BIF776sit0 device - IPv6-in-IPv4777IP over DDP tunneller778GRE over IP779PIMSM register interface780High Performance Parallel Interface781Nexus 64Mbps Ash782Acorn Econet783Linux-IrDA784Point to point fibrechannel785Fibrechannel arbitrated loop786Fibrechannel public loop787Fibrechannel fabric800Magic type ident for TR801IEEE 802.11802IEEE 802.11 + Prism2 header803IEEE 802.11 + radiotap header804ARPHRD_IEEE802154805IEEE 802.15.4 network monitor820PhoNet media type821PhoNet pipe header822CAIF media type823GRE over IPv6824Netlink header825IPv6 over LoWPAN826Vsock monitor header