5.02021-11-21T21:40:57ZTemplatesOS Windows Server BaselineOS Windows Server Baseline## Description
Zabbix template for Microsoft Windows Server. Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,<param>]) may be unsupported. Mantas Tumenas. mantas.tumenas@gmail.com
## Overview
Zabbix template for Microsoft Windows Server.
Features:
* Performance counters.
* CPU Low Level Discovery.
* Mounted file system Low Level Discovery.
Difference from default Windows OS template:
* CPU's discovery and triggers per CPU's.
* Mounted file system discovery and triggers per logical disk.
* More items and triggers prototypes for Mounted file system discovery.
* Triggers are oriented for Microsoft Windows Server running Microsoft SQL Server.
Missing:
* Network interface items.
Supported versions: Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported.
My other templates are [here](owner/MantasT).
## Author
Mantas Tumenas
TemplatesCPUDiskDisk PerformanceMemoryMemory PerformanceOperating SystemProcessor PerformanceProcess PerformanceServicesSystem Performance- Memory % Committed Bytes in UseZABBIX_ACTIVEperf_counter["\Memory\% Committed Bytes in Use",1]30sFLOAT%This measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory.
Threshold: > 80%.Memory Performance{avg(300,0)}>80{HOST.NAME}: Memory % Committed Bytes in Use avg value > 80 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxWARNINGThis measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory.
Threshold: > 80%.YES
- Memory Cache BytesZABBIX_ACTIVEperf_counter["\Memory\Cache Bytes",1]30sFLOATBThis indicates the amount of memory being used for the file system cache.
Threshold: There may be a disk bottleneck if this value is greater than 300 MB.Memory Performance
- Memory Free System Page Table EntriesZABBIX_ACTIVEperf_counter["\Memory\Free System Page Table Entries",1]30sFLOATFree System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang.
Threshold: Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang).
Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang).Memory Performance{avg(300,0)}<5000{HOST.NAME}: Memory Free System Page Table Entries avg value < 5000 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxHIGHFree System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang.
Threshold: Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang).YES{avg(300,0)}<10000{HOST.NAME}: Memory Free System Page Table Entries avg value < 10000 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxAVERAGEFree System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang.
Threshold: Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang).YES{HOST.NAME}: Memory Free System Page Table Entries avg value < 5000 in the last 5 min{OS Windows Server Baseline:perf_counter["\Memory\Free System Page Table Entries",1].avg(300,0)}<5000
- Memory Pages/secZABBIX_ACTIVEperf_counter["\Memory\Pages/sec",1]30sFLOAT/secIf it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
Threshold: High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).
Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).
Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).Memory Performance{avg(300,0)}>1000{HOST.NAME}: Memory Pages/sec avg value > 1000 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxINFOIf it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
Threshold: High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).YES{HOST.NAME}: Memory Pages/sec avg value > 2500 in the last 5 min{OS Windows Server Baseline:perf_counter["\Memory\Pages/sec",1].avg(300,0)}>2500{avg(300,0)}>2500{HOST.NAME}: Memory Pages/sec avg value > 2500 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxWARNINGIf it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
Threshold: Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).YES{HOST.NAME}: Memory Pages/sec avg value > 5000 in the last 5 min{OS Windows Server Baseline:perf_counter["\Memory\Pages/sec",1].avg(300,0)}>5000{avg(300,0)}>5000{HOST.NAME}: Memory Pages/sec avg value > 5000 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxAVERAGEIf it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.
Threshold: Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).YES
- Memory Pages Input/secZABBIX_ACTIVEperf_counter["\Memory\Pages Input/sec",1]30sFLOAT/secPages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\Pages Input/sec to the value of Memory\\Page Reads/sec to determine the average number of pages read into memory during each read operation.
Threshold: More then 10 page file reads per second.Memory Performance{avg(300,0)}>10{HOST.NAME}: Memory Pages Input/sec avg value > 10 in the last 5 minhttp://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspxINFOPages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory\\Pages Input/sec to the value of Memory\\Page Reads/sec to determine the average number of pages read into memory during each read operation.
Threshold: More then 10 page file reads per second.YES
- PhysicalDisk % Disk TimeZABBIX_ACTIVEperf_counter["\PhysicalDisk(_Total)\% Disk Time",1]30sFLOAT%Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.
Threshold: greater than 50%, it represents an I/O bottleneck.
Symptoms. Third-party monitoring tool may generate multiple alarm events during times when your disk is very busy. If you monitor the Physical %Disk Time on your Windows based computer, you may note that the value may go over 100% if your computer is very busy. For example, this could occur if you are copying a large amount of files, or you are copying multiple large files, and so on.
Cause. This behavior can occur because some controllers allow the operating system to use overlapping input/output operations for multiple outstanding requests. The disk performance counters time the responses by using a 100 nanosecond precision counter, and then report the cumulative statistics for a given sample time. This sample time could go over 100% if, for example, you have 10 requests that completed in 2 milliseconds each in a 10 millisecond sampling interval. If you have multiple disks in a Raid arrangement, the overlapped input/output happens because the operating system can read and write to multiple disks, and this could show values that are higher than 100% for this counter.
Status. This behavior is by design.Disk Performance
- PhysicalDisk % Idle TimeZABBIX_ACTIVEperf_counter["\PhysicalDisk(_Total)\% Idle Time",1]30sFLOAT%This measures the percentage of time the disk was idle during the sample interval.
Threshold: If this counter falls below 20%, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.Disk Performance{avg(300,0)}<20{HOST.NAME}: PhysicalDisk % Idle Time avg value < 20 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThis measures the percentage of time the disk was idle during the sample interval. If this counter falls below 20 percent, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.
Threshold: < 20%.YES
- PhysicalDisk Avg. Disk Queue LengthZABBIX_ACTIVEperf_counter["\PhysicalDisk(_Total)\Avg. Disk Queue Length",1]30sFLOATThis indicates how many I/O operations are waiting for the hard drive to become available.
Threshold: If the value here is larger than the two times the number of spindles, that means the disk itself may be the bottleneck.Disk Performance
- PhysicalDisk Avg. Disk sec/ReadZABBIX_ACTIVEperf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Read",1]30sFLOATsecThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.
Average disk responsiveness is very slow – more than 25 ms.
Disk responsiveness is critical - more than 50 ms.Disk Performance{avg(300,0)}>0.015{HOST.NAME}: PhysicalDisk Read Latency avg value > 0.015 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES{HOST.NAME}: PhysicalDisk Read Latency avg value > 0.025 in the last 5 min{OS Windows Server Baseline:perf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Read",1].avg(300,0)}>0.025{avg(300,0)}>0.025{HOST.NAME}: PhysicalDisk Read Latency avg value > 0.025 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxWARNINGThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is very slow – more than 25 ms.YES{HOST.NAME}: PhysicalDisk Read Latency avg value > 0.050 in the last 5 min{OS Windows Server Baseline:perf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Read",1].avg(300,0)}>0.050{avg(300,0)}>0.050{HOST.NAME}: PhysicalDisk Read Latency avg value > 0.050 in the last 5 minAVERAGEThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Disk responsiveness is critical - more than 50 ms.YES
- PhysicalDisk Avg. Disk sec/WriteZABBIX_ACTIVEperf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Write",1]30sFLOATsecThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.
Average disk responsiveness is very slow – more than 25 ms.
Disk responsiveness is critical - more than 50 ms.Disk Performance{avg(300,0)}>0.015{HOST.NAME}: PhysicalDisk Write Latency avg value > 0.015 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES{HOST.NAME}: PhysicalDisk Write Latency avg value > 0.025 in the last 5 min{OS Windows Server Baseline:perf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Write",1].avg(300,0)}>0.025{avg(300,0)}>0.025{HOST.NAME}: PhysicalDisk Write Latency avg value > 0.025 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxWARNINGThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is very slow – more than 25 ms.YES{HOST.NAME}: PhysicalDisk Write Latency avg value > 0.050 in the last 5 min{OS Windows Server Baseline:perf_counter["\PhysicalDisk(_Total)\Avg. Disk sec/Write",1].avg(300,0)}>0.050{avg(300,0)}>0.050{HOST.NAME}: PhysicalDisk Write Latency avg value > 0.050 in the last 5 minAVERAGEThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is critical - more than 50 ms.YES
- IO Data Operations/secZABBIX_ACTIVEperf_counter["\Process(_Total)\IO Data Operations/sec",1]30sFLOATO/secThese counters count all I/O activity generated to include file, network and device I/Os. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning. These analyses are best used in correlation with other analyses such as disk analysis to determine which processes might be involved in the I/O activity.Process Performance{avg(300,0)}>1000{HOST.NAME}: Process IO Data Operations/sec avg value > 1000 in the last 5 minINFOThese counters count all I/O activity generated to include file, network and device I/Os. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning. These analyses are best used in correlation with other analyses such as disk analysis to determine which processes might be involved in the I/O activity.YES
- IO Other Operations/secZABBIX_ACTIVEperf_counter["\Process(_Total)\IO Other Operations/sec",1]30sFLOATO/secThe number of input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os. An example of this type of operation would be a control function. I/O Others directed to CONSOLE (console input object) handles are not counted. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning.Process Performance{avg(300,0)}>1000{HOST.NAME}: Process IO Other Operations/sec avg value > 1000 in the last 5 minINFOThe number of input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os. An example of this type of operation would be a control function. I/O Others directed to CONSOLE (console input object) handles are not counted. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning.YES
- IO Read Operations/secZABBIX_ACTIVEperf_counter["\Process(_Total)\IO Read Operations/sec",1]30sFLOATO/secThe number of read input/output operations generated by a process, including file, network, and device I/Os. I/O Reads directed to CONSOLE (console input object) handles are not counted.Process Performance{avg(300,0)}>1000{HOST.NAME}: Process IO Read Operations/sec avg value > 1000 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThe number of read input/output operations generated by a process, including file, network, and device I/Os. I/O Reads directed to CONSOLE (console input object) handles are not counted.YES
- IO Write Operations/secZABBIX_ACTIVEperf_counter["\Process(_Total)\IO Write Operations/sec",1]30sFLOATO/secThe number of write input/output operations generated by a process, including file, network, and device I/Os. I/O Writes directed to CONSOLE (console input object) handles are not counted.Process Performance{avg(300,0)}>1000{HOST.NAME}: Process IO Write Operations/sec avg value > 1000 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThe number of write input/output operations generated by a process, including file, network, and device I/Os. I/O Writes directed to CONSOLE (console input object) handles are not counted.YES
- Processor % DPC TimeZABBIX_ACTIVEperf_counter["\Processor Information(_Total)\% DPC Time",1]30sFLOAT%Determines how much time the processor is spending processing DPCs. DPCs originate when the processor performs tasks requiring immediate attention, and then defers the remainder of the task to be handled at lower priority. DPCs represent further processing of client requests.
Threshold: 40%.Processor Performance{avg(300,0)}>40{HOST.NAME}: Processor % DPC Time avg value > 40% in the last 5 minINFODetermines how much time the processor is spending processing DPCs. DPCs originate when the processor performs tasks requiring immediate attention, and then defers the remainder of the task to be handled at lower priority. DPCs represent further processing of client requests.
Threshold: 40%.YES
- Processor % Interrupt TimeZABBIX_ACTIVEperf_counter["\Processor Information(_Total)\% Interrupt Time",1]30sFLOAT%This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems.
Threshold: High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem).
Very high CPU Interrupt Time – more than 50% interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem.Processor Performance{avg(300,0)}>30{HOST.NAME}: Processor % Interrupt Time avg value > 30% in the last 5 minThis counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems.
Threshold: High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem).YES{HOST.NAME}: Processor % Interrupt Time avg value > 50% in the last 5 min{OS Windows Server Baseline:perf_counter["\Processor Information(_Total)\% Interrupt Time",1].avg(300,0)}>50{avg(300,0)}>50{HOST.NAME}: Processor % Interrupt Time avg value > 50% in the last 5 minINFOThis counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems.
Threshold: Very high CPU Interrupt Time – more than 50 % interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem.YES
- Processor % Privileged TimeZABBIX_ACTIVEperf_counter["\Processor Information(_Total)\% Privileged Time",1]30sFLOAT%This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode.
Threshold: A figure that is consistently over 75% indicates a bottleneck.Processor Performance{avg(300,0)}>75{HOST.NAME}: Processor % Privileged Time avg value > 75% in the last 5 minThis counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode.
Threshold: A figure that is consistently over 75% indicates a bottleneck.YES
- Processor % Processor TimeZABBIX_ACTIVEperf_counter["\Processor Information(_Total)\% Processor Time",1]30sFLOAT%This measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor.
This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating).
Threshold: 60% - Warning.
85% - Average.
95% - Critical.Processor Performance{avg(900,0)}>60{HOST.NAME}: Processor % Processor Time avg value > 60 % in the last 15 minINFOThis measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor.
This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating).
Threshold: 60% - Warning.YES{HOST.NAME}: Processor % Processor Time avg value > 85% in the last 15 min{OS Windows Server Baseline:perf_counter["\Processor Information(_Total)\% Processor Time",1].avg(900,0)}>85{avg(900,0)}>85{HOST.NAME}: Processor % Processor Time avg value > 85% in the last 15 minWARNINGThis measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor.
This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating).
Threshold: 85% - Average.YES{HOST.NAME}: Processor % Processor Time avg value > 95% in the last 15 min{OS Windows Server Baseline:perf_counter["\Processor Information(_Total)\% Processor Time",1].avg(900,0)}>95{avg(900,0)}>95{HOST.NAME}: Processor % Processor Time avg value > 95% in the last 15 minAVERAGEThis measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor.
This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating).
Threshold: 95% - Critical.YES
- Processor % User TimeZABBIX_ACTIVEperf_counter["\Processor Information(_Total)\% User Time",1]30sFLOAT%This measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources.
Threshold: Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN.Processor Performance{avg(300,0)}>70{HOST.NAME}: Processor % User Time avg value > 70% in the last 5 minINFOThis measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources.
Threshold: Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN.YES
- Server Work QueuesZABBIX_ACTIVEperf_counter["\Server Work Queues(*)\Queue Length",1]30sFLOATShows the current length of the server work queue for this CPU.
Threshold: A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.Processor Performance{avg(300,0)}>4{HOST.NAME}: Server Work Queues avg value > 4 in the last 5 minINFOShows the current length of the server work queue for this CPU.
Threshold: A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.YES
- System % Registry Quota In UseZABBIX_ACTIVEperf_counter["\System\% Registry Quota In Use",1]30sFLOAT%% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only; it is not an average.
Threshold: Average - 60%.
High - 85%.System Performance{avg(300,0)}>60{HOST.NAME}: % registry Quota in Use {ITEM.LASTVALUE}AVERAGE% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only; it is not an average.
Threshold: > 60%.YES{avg(300,0)}>85{HOST.NAME}: % registry Quota in Use {ITEM.LASTVALUE}HIGH% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only, it is not an average.
Threshold: High - 85%.YES
- System Context Switches/secZABBIX_ACTIVEperf_counter["\System\Context Switches/sec",1]30sFLOATS/secIndicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time.
Threshold: High context switches/sec – more than 5000 context switches per second.
Very high context switches/sec – more than 10,000 context switches per second.System Performance
- Processor Queue LengthZABBIX_ACTIVEperf_counter["\System\Processor Queue Length",1]30sFLOATIf there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor\% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved.
Threshold: Average - each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity).
High - each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity).Processor Performance{avg(300,0)}>10{HOST.NAME}: Processor Queue Length avg value > 10 in the last 5 minINFOIf there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor\% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved.
Threshold: Average - each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity).YES{HOST.NAME}: Processor Queue Length avg value > 20 in the last 5 min{OS Windows Server Baseline:perf_counter["\System\Processor Queue Length",1].avg(300,0)}>20{avg(300,0)}>20{HOST.NAME}: Processor Queue Length avg value > 20 in the last 5 minWARNINGIf there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor\% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved.
Threshold: High - each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity).YES
- Service DNS ClientZABBIX_ACTIVEservice_state[Dnscache]30sFLOATThe DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer's name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service DNS Cache {ITEM.LASTVALUE}The DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer's name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.YES
- Service Event LogZABBIX_ACTIVEservice_state[eventlog]30sFLOATThis service manages events and event logs. It supports logging events, querying events, subscribing to events, archiving event logs, and managing event metadata. It can display events in both XML and plain text format. Stopping this service may compromise security and reliability of the system.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Event Log {ITEM.LASTVALUE}This service manages events and event logs. It supports logging events, querying events, subscribing to events, archiving event logs, and managing event metadata. It can display events in both XML and plain text format. Stopping this service may compromise security and reliability of the system.YES
- Service Group Policy ClientZABBIX_ACTIVEservice_state[gpsvc]30sFLOATThe service is responsible for applying settings configured by administrators for the computer and users through the Group Policy component. If the service is stopped or disabled, the settings will not be applied and applications and components will not be manageable through Group Policy. Any components or applications that depend on the Group Policy component might not be functional if the service is stopped or disabled.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Group Policy Client {ITEM.LASTVALUE}The service is responsible for applying settings configured by administrators for the computer and users through the Group Policy component. If the service is stopped or disabled, the settings will not be applied and applications and components will not be manageable through Group Policy. Any components or applications that depend on the Group Policy component might not be functional if the service is stopped or disabled.YES
- Service ServerZABBIX_ACTIVEservice_state[LanmanServer]30sFLOATSupports file, print, and named-pipe sharing over the network for this computer. If this service is stopped, these functions will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Server {ITEM.LASTVALUE}Supports file, print, and named-pipe sharing over the network for this computer. If this service is stopped, these functions will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.YES
- Service WorkstationZABBIX_ACTIVEservice_state[LanManWorkstation]30sFLOATCreates and maintains client network connections to remote servers using the SMB protocol. If this service is stopped, these connections will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Workstation {ITEM.LASTVALUE}Creates and maintains client network connections to remote servers using the SMB protocol. If this service is stopped, these connections will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.YES
- Service Windows FirewallZABBIX_ACTIVEservice_state[MpsSvc]30sFLOATWindows Firewall helps protect your computer by preventing unauthorized users from gaining access to your computer through the Internet or a network.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Windows Firewall {ITEM.LASTVALUE}Windows Firewall helps protect your computer by preventing unauthorized users from gaining access to your computer through the Internet or a network.YES
- Service Network List ServiceZABBIX_ACTIVEservice_state[netprofm]30sIdentifies the networks to which the computer has connected, collects and stores properties for these networks, and notifies applications when these properties change.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Network List {ITEM.LASTVALUE}Identifies the networks to which the computer has connected, collects and stores properties for these networks, and notifies applications when these properties change.YES
- Service Network Location AwarenessZABBIX_ACTIVEservice_state[nlasvc]30sFLOATCollects and stores configuration information for the network and notifies programs when this information is modified. If this service is stopped, configuration information might be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Network Location Awareness {ITEM.LASTVALUE}Collects and stores configuration information for the network and notifies programs when this information is modified. If this service is stopped, configuration information might be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.YES
- Service RPC Endpoint MapperZABBIX_ACTIVEservice_state[RpcEptMapper]30sFLOATResolves RPC interfaces identifiers to transport endpoints. If this service is stopped or disabled, programs using Remote Procedure Call (RPC) services will not function properly.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service RPC Endpoint Mapper {ITEM.LASTVALUE}Resolves RPC interfaces identifiers to transport endpoints. If this service is stopped or disabled, programs using Remote Procedure Call (RPC) services will not function properly.YES
- Service Security Account ManagerZABBIX_ACTIVEservice_state[SamSs]30sFLOATThe start up of this service signals other services that the Security Accounts Manager (SAM) is ready to accept requests. Disabling this service will prevent other services in the system from being notified when the SAM is ready, which may in turn cause those services to fail to start correctly. This service should not be disabled.ServicesWindows service state{last(,0)}<>0{HOST.NAME}: Service Security Account Manager {ITEM.LASTVALUE}The startup of this service signals other services that the Security Accounts Manager (SAM) is ready to accept requests. Disabling this service will prevent other services in the system from being notified when the SAM is ready, which may in turn cause those services to fail to start correctly. This service should not be disabled.YES
- Number of CPUs onlineZABBIX_ACTIVEsystem.cpu.num[online]1hNumber of CPUs online.CPU
- System uptimeZABBIX_ACTIVEsystem.uptime30sFLOATsSystem uptime in seconds.Operating System{change(0)}<0{HOST.NAME} has just been restartedINFOServer has just been restarted.YES
- Memory AvailableZABBIX_ACTIVEvm.memory.size[available]30sBInactive + Cached + Free memory.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.Memory{last(,0)}<100{HOST.NAME}: Memory Available {ITEM.LASTVALUE}AVERAGEAvailable MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.YES{last(,0)}<300{HOST.NAME}: Memory Available {ITEM.LASTVALUE}WARNINGAvailable MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.YES{HOST.NAME}: Memory Available {ITEM.LASTVALUE}{OS Windows Server Baseline:vm.memory.size[available].last(,0)}<100
- Memory CachedZABBIX_ACTIVEvm.memory.size[cached]30sBMemory Cached.Memory
- Memory Available %ZABBIX_ACTIVEvm.memory.size[pavailable]30sFLOAT%Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.Memory{last(,0)}<3{HOST.NAME}: Memory Available percentage {ITEM.LASTVALUE}AVERAGEAvailable MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.YES{last(,0)}<5{HOST.NAME}: Memory Available percentage {ITEM.LASTVALUE}WARNINGAvailable MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory\Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.
Threshold: Low on available memory – less than 10% available.
Very low on available memory – less than 5% available.
Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.YES{HOST.NAME}: Memory Available percentage {ITEM.LASTVALUE}{OS Windows Server Baseline:vm.memory.size[pavailable].last(,0)}<3
- Memory TotalZABBIX_ACTIVEvm.memory.size[total]1hBMemory Total.Memory
- Memory Size UsedZABBIX_ACTIVEvm.memory.size[used]30sBMemory Used.Memory
CPUs discoveryZABBIX_ACTIVEsystem.cpu.discovery1h{#CPU.NUMBER}^[0-9]+$A90dDiscovery of CPUs of different types as defined in global regular expression "CPU for discovery".Processor No $1 Utilization % (1 min average)ZABBIX_ACTIVEsystem.cpu.util[{#CPU.NUMBER},system,avg1]30sFLOAT%CPU utilization in percent.Processor Performance{avg(600,0)}>90Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 minINFOCPU utilization in percent.
Threshold: 90 % in the last 15 minutes.YESProcessor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min{OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg5].avg(600,0)}>90Processor No $1 Utilization % (5 min average)ZABBIX_ACTIVEsystem.cpu.util[{#CPU.NUMBER},system,avg5]30sFLOAT%CPU utilization in percent.Processor Performance{avg(600,0)}>90Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 minWARNINGCPU utilization in percent.
Threshold: 90 % in the last 15 minutes.YESProcessor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min{OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg15].avg(600,0)}>90Processor No $1 Utilization % (15 min average)ZABBIX_ACTIVEsystem.cpu.util[{#CPU.NUMBER},system,avg15]30sFLOAT%CPU utilization in percent.Processor Performance{avg(600,0)}>90Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 minAVERAGECPU utilization in percent.
Threshold: 90 % in the last 15 minutes.YESMounted filesystem discoveryZABBIX_ACTIVEvfs.fs.discovery1h{#FSTYPE}@File systems for discoveryA90dDiscovery of file systems of different types as defined in global regular expression "File systems for discovery".$1ZABBIX_ACTIVEperf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Read",1]30sFLOATsecThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.
Average disk responsiveness is very slow – more than 25 ms.
Disk responsiveness is critical - more than 50 ms.Disk Performance{avg(300,0)}>0.015{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 minhttp://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-explained.aspxINFOThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Read",1].avg(300,0)}>0.025{avg(300,0)}>0.025{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 minWARNINGThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is very slow – more than 25 ms.YES{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Read",1].avg(300,0)}>0.050{avg(300,0)}>0.050{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 minAVERAGEThis measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system.
Threshold: Average disk responsiveness is very slow – more than 25 ms.YES$1ZABBIX_ACTIVEperf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Transfer",1]30sFLOATsecAvg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.Disk Performance$1ZABBIX_ACTIVEperf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Write",1]30sFLOATsecThis measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.
Average disk responsiveness is very slow – more than 25 ms.
Disk responsiveness is critical - more than 50 ms.Disk Performance{avg(300,0)}>0.015{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 minINFOThis measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Write",1].avg(300,0)}>0.025{avg(300,0)}>0.025{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 minWARNINGThis measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Write",1].avg(300,0)}>0.050{avg(300,0)}>0.050{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 minAVERAGEThis measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system.
Threshold: Average disk responsiveness is slow – more than 15 ms.YES$1ZABBIX_ACTIVEperf_counter["\LogicalDisk({#FSNAME})\Disk Transfers/sec",1]30sFLOATT/secDisk Transfers/sec is the rate of read and write operations on the disk.
Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.Disk PerformanceLogicalDisk Disk $1 Space AvailableZABBIX_ACTIVEvfs.fs.size[{#FSNAME},free]30sBThis measures the amount of free space on the selected logical disk drive.DiskLogicalDisk Disk $1 Space Available %ZABBIX_ACTIVEvfs.fs.size[{#FSNAME},pfree]30sFLOAT%This measures the percentage of free space on the selected logical disk drive.
Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.Disk{last(0)}<3{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE}AVERAGEThis measures the percentage of free space on the selected logical disk drive.
Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.YES{last(0)}<5{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE}WARNINGThis measures the percentage of free space on the selected logical disk drive.
Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.YES{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE}{OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<3{last(0)}<10{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE}INFOThis measures the percentage of free space on the selected logical disk drive.
Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.YES{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE}{OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<5LogicalDisk Disk $1 Space Used %ZABBIX_ACTIVEvfs.fs.size[{#FSNAME},pused]30sFLOAT%LogicalDisk Space Used in percentes.DiskLogicalDisk Disk $1 Space TotalZABBIX_ACTIVEvfs.fs.size[{#FSNAME},total]1hBLogicalDisk Space Total.DiskLogicalDisk Disk $1 Space UsedZABBIX_ACTIVEvfs.fs.size[{#FSNAME},used]30sBLogicalDisk Space Used.Disk{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Disk Transfers/sec",1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Read",1].avg(300,0)}>0.025{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 minINFOIndicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization.
Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.YES{OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Disk Transfers/sec",1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Write",1].avg(300,0)}>0.025{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 minINFOIndicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization.
Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.YESLogicalDisk Disk {#FSNAME} Space00C800- OS Windows Server Baselinevfs.fs.size[{#FSNAME},free]
10000C8- OS Windows Server Baselinevfs.fs.size[{#FSNAME},total]
Logical Disk {#FSNAME} Read/Write Latency00C800- OS Windows Server Baselineperf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Read",1]
1C80000- OS Windows Server Baselineperf_counter["\LogicalDisk({#FSNAME})\Avg. Disk sec/Write",1]
{OS Windows Server Baseline:perf_counter["\System\Context Switches/sec",1].avg(300,0)}/{OS Windows Server Baseline:system.cpu.num[online].last()}>5000{HOST.NAME}: Context Switches/sec {ITEM.LASTVALUE}WARNINGIndicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time.
Threshold: High context switches/sec – more than 5000 context switches per second.YES{OS Windows Server Baseline:perf_counter["\System\Context Switches/sec",1].avg(300,0)}/{OS Windows Server Baseline:system.cpu.num[online].last()}>10000{HOST.NAME}: Context Switches/sec {ITEM.LASTVALUE}AVERAGEIndicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time.
Threshold: Very high context switches/sec – more than 10,000 context switches per second.YESMemory % Committed Bytes in UseFF3333- OS Windows Server Baselineperf_counter["\Memory\% Committed Bytes in Use",1]
Memory Free Page Table EntriesFF3333- OS Windows Server Baselineperf_counter["\Memory\Free System Page Table Entries",1]
Memory Pages and Page InputsFF3333- OS Windows Server Baselineperf_counter["\Memory\Pages/sec",1]
100C800- OS Windows Server Baselineperf_counter["\Memory\Pages Input/sec",1]
Memory Total, Cached, Available and Free0000C8- OS Windows Server Baselinevm.memory.size[total]
1C80000- OS Windows Server Baselinevm.memory.size[cached]
200BB00- OS Windows Server Baselinevm.memory.size[available]
PhysicalDisk Avg. Disk Queue LengthFF3333- OS Windows Server Baselineperf_counter["\PhysicalDisk(_Total)\Avg. Disk Queue Length",1]
Processes IO Operations/secFF3333- OS Windows Server Baselineperf_counter["\Process(_Total)\IO Data Operations/sec",1]
100C800- OS Windows Server Baselineperf_counter["\Process(_Total)\IO Other Operations/sec",1]
20000C8- OS Windows Server Baselineperf_counter["\Process(_Total)\IO Read Operations/sec",1]
3C800C8- OS Windows Server Baselineperf_counter["\Process(_Total)\IO Write Operations/sec",1]
Processor and Server QueuesC80000- OS Windows Server Baselineperf_counter["\System\Processor Queue Length",1]
100C800- OS Windows Server Baselineperf_counter["\Server Work Queues(*)\Queue Length",1]
System % Registry Quota In UseFF3333- OS Windows Server Baselineperf_counter["\System\% Registry Quota In Use",1]
System Context Switches/secFF3333- OS Windows Server Baselineperf_counter["\System\Context Switches/sec",1]
Windows service state0Running1Paused2Start pending3Pause pending4Continue pending5Stop pending6Stopped7Unknown255No such service