5.02021-11-21T21:48:54ZTemplatesSNMP-HP-iLO5SNMP-HP-iLO5## Description
Created by Lucas Afonso Kremer <lucasafonsokremer@gmail.com> https://www.linkedin.com/in/lucasafonsokremer
## Overview
**Template for HP servers with iLO 5 controller.**
* #### Tested on ProLiant DL360 Gen10 with firmware version = 1.40 Feb 05 2019
* #### You must add one macro on the server or on the template with the name:
{$SNMP\_COMMUNITY} with the community to be used
* #### All the items was created with "SNMPv2 agent"
* #### Make sure your iLO is reachable from the zabbix server/proxy, test with:
snmpstatus -v 2c -c public 192.168.0.1
Based on @Wisenetman iLO 4 template
## Author
Lucas Afonso Kremer
TemplatesDrive ArrayHealthMemory ModulesSystem Info- ASR ConditionSNMP_AGENT.1.3.6.1.4.1.232.6.2.5.17.0cpqHeAsrCondition3007d"This value specifies the overall condition of the ASR feature."
other(1),
ok(2),
degraded(3),
failed(4)HealthcpqCondition{last()}>2{HOSTNAME}: Server was restarted by ASRHIGHHP Automatic Server Recovery (ASR) is reporting a "degraded" or "failed" state. ASR is a watchdog timer which automatically restarts a server if the server hangs or crashes.
When a server is restarted by ASR, a flag is set, and an alert is triggered until the flag is reset.YES
- Event Log ConditionSNMP_AGENT.1.3.6.1.4.1.232.6.2.11.2.0cpqHeEventLogCondition6007d30d"This value specifies the overall condition of the Integrated management Log feature."
other(1),
ok(2),
degraded(3),
failed(4)HealthcpqCondition
- Fault Tolerant Power Supply ConditionSNMP_AGENT.1.3.6.1.4.1.232.6.2.9.1.0cpqHeFltTolPwrSupplyCondition3007d30d"This value specifies the overall condition of the fault tolerant power supply sub-system."
other(1),
ok(2),
degraded(3),
failed(4)HealthcpqCondition
- Overall StatusSNMP_AGENT1.3.6.1.4.1.232.6.1.3.0cpqHeMibCondition6007d30d"This value specifies the overall condition of the Integrated management Log feature."
other(1),
ok(2),
degraded(3),
failed(4)HealthcpqCondition{nodata(600)}=1{HOSTNAME}: SNMP Agents are not respondingWARNINGThe HP SNMP Agents for this host have not responded to SNMP queries for more than 10 minutes. If the host is otherwise operating properly, it can mean that the HP SNMP agents are not working properl, or that the SNMP service has stopped. A less likely cause is a network issue - SNMP uses UDP port 161,YES
- Resilient Memory ConditionSNMP_AGENT.1.3.6.1.4.1.232.6.2.14.4.0cpqHeResilientMemCondition3007dOverall status of memory system. If we monitor the individual modules via discovery rule "Memory Modules", then this item is redundant and we don't need to alert on it.
other(1),
ok(2),
degraded(3)
}
The following states are supported:
other(1)
The system does not support fault tolerant memory or the
state cannot be determined by the Management Agent.
ok(2)
This system is operating normally.
degraded(3)
The system is running in a degraded state because the
Advanced Memory Protection subsystem has been engaged."HealthcpqCondition{last()}=3{HOSTNAME}: iLO Resilient Memory condition is degradedAVERAGEThe Advanced Memory Protection, probably because of correctable memory errors. The DIMM which had the errors should be replaced.YES{last()}=1{HOSTNAME}: Resilient Memory condition is unknownWARNINGThe condition of the resilient memory system cannot be determined (condition reported is "other"). Try, in this order:
1) resetting the iLO from the iLO screen
2) restarting the HP SNMP agents:
/sbin/service hp-snmp-agents restart
3) restarting snmpd:
/sbin/service snmpd restartYES
- Thermal ConditionSNMP_AGENT.1.3.6.1.4.1.232.6.2.6.1.0cpqHeThermalCondition3007d30dThis item is a combined status of all temperature and fan items present. By monitoring this item, it's not necessary to monitor each individual fan and temperature item.
other(1),
ok(2),
degraded(3),
failed(4)
}HealthcpqCondition{last()}>2{HOSTNAME}: Thermal Condition is DegradedHIGHSomething is wrong with this server's cooling system. Either a non-required fan is not operating properly, or a temp sensor is outside of normal operating range.
The server will automatically shutdown if a required fan is not operating properly, or if a temp sensor detects a condition that could permanently damage the system.YES
- Product NameSNMP_AGENT1.3.6.1.4.1.232.2.2.4.2.0cpqSiProductName288001d0TEXTThe machine product name.System Info
- System Product IDSNMP_AGENT.1.3.6.1.4.1.232.2.2.2.6.0cpqSiSysProductId288001d0TEXT"The product id string of the system unit.
The string will be empty if the system does not report the product id."System Info
- Serial NumberSNMP_AGENT.1.3.6.1.4.1.232.2.2.2.1.0cpqSiSysSerialNum288001d0TEXT"The serial number of the physical system unit.
The string will be empty if the system does not report the
serial number function."System Info
Drive Array AcceleratorsSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.3.2.2.2.1.1]snmp.discovery[DaAccelerators]36000Walk the cpqDaCntlrHwLocation table to get contoller locations:
CPQIDA-MIB::
cpqDaAccelCntlrIndex OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Accelerator Board Controller Index.
This value is a logical number whose meaning is OS dependent.
The value has a direct mapping to the controller table index
such that controller 'i' has accelerator table entry 'i'."
::= { cpqDaAccelEntry 1 }HP Drive Array Accelerator Backup Power Source {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.2.1.16.{#SNMPINDEX}cpqDaAccelBackupPowerSource[{#SNMPINDEX}]288001d1dCPQIDA-MIB::
cpqDaAccelBackupPowerSource OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
battery(2),
capacitor(3)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Accelerator Board Backup Power Source.
This describes the backup power source being used by the Array
Accelerator board.
The status can be:
Other (1)
Indicates that the instrument agent does not recognize the backup
power source used by the Array Accelerator board. You may need
to upgrade the instrument agent.
Battery (2)
Indicates that a battery is the backup power source for the Array
Accelerator board.
Capacitor (3)
Indicates that a capacitor is the backup power source for the
Array Accelerator board."
::= { cpqDaAccelEntry 16 }Drive ArraycpqDaAccelBackupPowerSourceHP Drive Array Accelerator Battery Status {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.2.1.6.{#SNMPINDEX}cpqDaAccelBattery[{#SNMPINDEX}]6007d90dCPQIDA-MIB::
cpqDaAccelBattery OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
recharging(3),
failed(4),
degraded(5),
notPresent(6),
capacitorFailed(7)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Accelerator Board Backup Power Status.
This monitors the status of each backup power source on the board.
The backup power source can only recharge when the system has
power applied. The type of backup power source used is indicated
by cpqDaAccelBackupPowerSource.
The following values are valid:
Other (1)
Indicates that the instrument agent does not recognize
battery status. You may need to update your software.
Ok (2)
Indicates that a particular battery pack is fully charged.
Charging (3)
The battery power is less than 75%. The Drive Array
Controller is attempting to recharge the battery. A
battery can take as long as 36 hours to fully recharge.
After 36 hours, if the battery has not recharged, it is
considered failed.
Failed (4)
The battery pack is below the sufficient voltage level and
has not recharged in 36 hours. Your Array Accelerator board
needs to be serviced.
Degraded (5)
The battery is still operating, however, one of the batteries
in the pack has failed to recharge properly. Your Array
Accelerator board should be serviced as soon as possible.
NotPresent (6)
There are no batteries associated with this controller.
Capacitor Failed (7)
The capacitor is below the sufficient voltage level and
has not recharged in 10 minutes. Your Array Accelerator board
needs to be serviced."
::= { cpqDaAccelEntry 6 }Drive ArraycpqDaAccelBatteryStatus{max(86400)}=4{HOST.NAME}: Drive Array Accelerator {#SNMPINDEX} battery has failedWARNINGThe battery provides backup power for the cache memory, to prevent data loss in the event of a power failure. This battery has failed, so the Drive Array Controller will disable that cache, and disk performance will be bad until the battery is replaced. The battery needs to be replaced. Suggested actions are to write a ticket to replace the battery, then acknowledge the alert, writing the ticket number in the notes. This will prevent a stream of continuing alerts.
When the battery is on the edge of failing, it can move in and out of the failed state. This can have the effect of restarting an alert which has already been acknowledged. To prevent this, this trigger is held until there has been no failure reported in the last 24 hours.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Drive Array Accelerator Condition {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.2.1.9.{#SNMPINDEX}cpqDaAccelCondition[{#SNMPINDEX}]6007d90dCPQIDA-MIB::
cpqDaAccelCondition OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The condition of the device. This value represents the overall
condition of this array accelerator."
::= { cpqDaAccelEntry 9 }Drive ArraycpqConditionHP Drive Array Accelerator Status {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.2.1.2.{#SNMPINDEX}cpqDaAccelStatus[{#SNMPINDEX}]6007d90dCPQIDA-MIB::
cpqDaAccelStatus OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
invalid(2),
enabled(3),
tmpDisabled(4),
permDisabled(5)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Accelerator Board Status.
This describes the status of the accelerator write cache.
The status can be:
Other (1)
Indicates that the instrument agent does not recognize the
status of the Array Accelerator. You may need to upgrade
the instrument agent.
Invalid (2)
Indicates that an Array Accelerator board has not been
installed in this system or is present but not configured.
Enabled (3)
Indicates that write cache operations are currently configured
and enabled for at least one logical drive.
Temporarily Disabled (4)
Indicates that write cache operations have been temporarily
disabled. View the Array Accelerator Board Error Code object
to determine why the write cache operations have been
temporarily disabled.
Permanently Disabled (5)
Indicates that write cache operations have been permanently
disabled. View the Array Accelerator Board Error Code object
to determine why the write cache operations have been disabled."
::= { cpqDaAccelEntry 2 }Drive ArraycpqDaAccelStatus{last()}=2{HOST.NAME}: Drive Array Accelerator {#SNMPINDEX} is missing or not configuredWARNINGThe Drive Array Accelerator is the cache memory board for the Drive Array Controller; if it is missing, disk performance will be bad. A Drive Array Accelerator (cache memory board) should be ordered and installed.
Suggested actions are to write a ticket to order and install the HP Drive Array Accelerator (cache memory board). Then acknowledge the alert, writing the ticket number in the notes.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Drive Array Accelerator Write Cache % {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.2.1.14.{#SNMPINDEX}cpqDaAccelWriteCachePercent[{#SNMPINDEX}]288001d1d%CPQIDA-MIB::
cpqDaAccelWriteCachePercent OBJECT-TYPE
SYNTAX Gauge
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Write Cache Percent.
This shows the percent of cache memory allocated for posted
write caching. If the data cannot be determined or is not
applicable, the value is set to 4,294,967,295."
::= { cpqDaAccelEntry 14 }Drive ArrayDrive Array Controller Performance MonitorsSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.3.2.7.1.1.1]snmp.discovery[DaControllerPerf]288000Walk the cpqDaCntlrPerfCntlrIndex table to find contoller performance monitor instances:
CPQIDA-MIB::
cpqDaCntlrPerfCntlrIndex OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Drive Array Controller Performance Monitor Controller Index.
This maps the performance monitor information into their
respective controllers which support performance data."
::= { cpqDaCntlrPerfEntry 1 }HP Drive Array Controller Latency {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.7.1.1.7.{#SNMPINDEX}cpqDaCntlrPerfAvgLatency[{#SNMPINDEX}]3600FLOATmsCPQIDA-MIB::
cpqDaCntlrPerfAvgLatency OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Controller Performance Monitor Average
Command Latency.
This value shows the average command latency for this sample
in 1/100,000 second units."
::= { cpqDaCntlrPerfEntry 7 }Drive ArrayMULTIPLIER.01Drive Array ControllersSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.3.2.2.1.1.20]snmp.discovery[DaControllers]36000Walk the cpqDaCntlrHwLocation table to get contoller locations:
CPQIDA-MIB::
cpqDaCntlrHwLocation OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..255))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"A text description of the hardware location of the controller.
A NULL string indicates that the hardware location could not
be determined or is irrelevant."
::= { cpqDaCntlrEntry 20 }HP Drive Array Controller Board Condition {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.1.1.12.{#SNMPINDEX}cpqDaCntlrBoardCondition[{#SNMPINDEX}]6007d30dCPQIDA-MIB::
cpqDaCntlrBoardCondition OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The condition of the device. This value represents the
condition of the controller and any associated array
accelerators."
::= { cpqDaCntlrEntry 12 }Drive ArraycpqCondition{last()}>2{HOST.NAME}: HP Drive Array Controller Board {#SNMPINDEX} is degraded or failedAVERAGEThis HP Drive Array Controller Controller Board is reporting a degraded or failed condition. Check the iLO or preferably the HP System Management Homepage for more information. This may require first starting the Array Configuration Utility from the command line ("/opt/compaq/cpqacuxe/bld/cpqacuxe -R")
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Drive Array Controller Condition {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.1.1.6.{#SNMPINDEX}cpqDaCntlrCondition[{#SNMPINDEX}]6007d30dCPQIDA-MIB::
cpqDaCntlrCondition OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The condition of the device. This value represents the overall
condition of this controller, and any associated logical drives,
physical drives, and array accelerators."
::= { cpqDaCntlrEntry 6 }Drive ArraycpqConditionHP Drive Array Controller Drive Write Cache State {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.1.1.27.{#SNMPINDEX}cpqDaCntlrDriveWriteCacheState[{#SNMPINDEX}]288001d1dCPQIDA-MIB::
cpqDaCntlrDriveWriteCacheState OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
disabled(2),
enabled(3)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Controller Drive Write Cache State.
This is the controller's drive write cache setting. The
following values are valid:
other (1)
Indicates that the instrument agent does not recognize the
value. You may need to upgrade the instrument agent.
disabled (2)
The controller will disable drive write cache for all drives.
enabled (3)
The controller will enable drive write cache for all drives."
::= { cpqDaCntlrEntry 27 }Drive ArraycpqDaCntlrDriveWriteCacheStateHP Drive Array Controller Firmware Revision {#SNMPINDEX}SNMP_AGENT1.3.6.1.4.1.232.3.2.2.1.1.3.{#SNMPINDEX}cpqDaCntlrFWRev[{#SNMPINDEX}]288001d0TEXTCPQIDA-MIB::
cpqDaCntlrFWRev OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..5))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Array Controller Firmware Revision.
The firmware revision of the drive array controller. This
value can be used to help identify a particular revision
of the controller."
::= { cpqDaCntlrEntry 3 }Drive ArrayFirmwareSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.11.2.14.1.1.4]snmp.discovery[firmware]28800{#SNMPVALUE}(Lights.Out|iLO|System.ROM)A0Walk the cpqHoFwVerDisplayName table, filter for "Lights Out", "iLO" and "System ROM"
CPQHLTH-MIB::
cpqHoFwVerDisplayName OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..127))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Firmware Version Device Display Name.
This is the display name of the device containing the firmware."
::= { cpqHoFwVerEntry 4 }HP Firmware Version {#SNMPINDEX} - {#SNMPVALUE}SNMP_AGENT.1.3.6.1.4.1.232.11.2.14.1.1.5.{#SNMPINDEX}cpqHoFwVerVersion[{#SNMPINDEX}]288007d0CHARCPQHOST-MIB::
cpqHoFwVerVersion OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..31))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Firmware Version.
This is the version of the device firmware."
::= { cpqHoFwVerEntry 5 }System InfoLogical DrivesSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.3.2.3.1.1.14]snmp.discovery[logicalDrives]36000Walk the cpqDaLogDrvOsName table to get logical drive OS names:
CPQIDA-MIB::
cpqDaLogDrvOsName OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..255))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Logical Drive OS Name.
The OS name for this array logical drive. This field will be
a null (size 0) string if the agent does not support OS name."
::= { cpqDaLogDrvEntry 14 }HP Logical Drive Condition {#SNMPINDEX} [{#SNMPVALUE}]SNMP_AGENT.1.3.6.1.4.1.232.3.2.3.1.1.11.{#SNMPINDEX}cpqDaLogDrvCondition[{#SNMPINDEX}]3007d90dCPQIDA-MIB::
cpqDaLogDrvCondition OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The Logical Drive condition.
This value represents the overall condition of this logical drive and
any associated physical drives."
::= { cpqDaLogDrvEntry 11 }Drive ArraycpqCondition{last()}=4{HOST.NAME}: Logical Drive {#SNMPINDEX} [{#SNMPVALUE}] has failedHIGHThis logical drive drive (volume) has failed. For more information, go to the system management homepage, then check the array configuration utility (may need to be started from the command line first). This alert is very serious (data has been lost) because failure of a logical drive (in RAID configuration with redundancy) is normally caused by multiple physical drive failures.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381{last()}=3{HOST.NAME}: Logical Drive {#SNMPINDEX} [{#SNMPVALUE}] is degradedWARNINGThis logical drive drive (volume) is degraded. For more information, go to the system management homepage, then check the array configuration utility (may need to be started from the command line first).
This alert normally is normally means that redundancy has been lost because an underlying physical drive has failed. This means that now a single drive failure could cause data loss. This alert is normally accompanied by an alert from the physical drive which has failed. Suggested actions are to write a ticket to replace the failed physical drive, then acknowledge the physical and logical drive alerts, writing the ticket number in the notes.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Logical Drive Fault Tolerance {#SNMPINDEX} [{#SNMPVALUE}]SNMP_AGENT.1.3.6.1.4.1.232.3.2.3.1.1.3.{#SNMPINDEX}cpqDaLogDrvFaultTol[{#SNMPINDEX}]288001d7dCPQIDA-MIB::
cpqDaLogDrvFaultTol OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
none(2),
mirroring(3),
dataGuard(4),
distribDataGuard(5),
advancedDataGuard(7),
raid50 (8),
raid60 (9)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Logical Drive Fault Tolerance.
This shows the fault tolerance mode of the logical drive.
The following values are valid for the Logical Drive Fault
Tolerance:
None (2)
Fault tolerance is not enabled. If a physical drive reports
an error, the data cannot be recovered by the drive array
controller.
Mirroring - RAID 1/RAID 1+0 (3)
For each physical drive, there is a second physical drive
containing identical data. If a drive fails, the data can be
retrieved from the mirror drive.
Data Guard - RAID 4 (4)
One of the physical drives is used as a data guard drive and
contains the exclusive OR of the data on the remaining drives.
If a failure is detected, the drive array controller rebuilds
the data using the data guard information plus information
from the other drives.
Distributed Data Guard - RAID 5 (5)
Distributed Data Guarding, sometimes referred to as RAID 5,
is similar to Data Guarding, but instead of storing the parity
information on one drive, the information is distributed across
all of the drives. If a failure is detected, the drive array
controller rebuilds the data using the data guard information
from all the drives.
Advanced Data Guarding - RAID 6 (7)
Advanced Data Guarding (RAID ADG) is the fault tolerance method
that provides the highest level of data protection. It
'stripes' data and parity across all the physical drives in the
configuration to ensure the uninterrupted availability of
uncorrupted data. This fault-tolerance method is similar to
distributed data guard (RAID 5) in that parity data is
distributed across all drives in the array, except in RAID ADG
the capacity of multiple drives is used to store parity data.
Assuming the capacity of 2 drives is used for parity data,
this allows continued operation despite simultaneous failure of
any 2 drives in the array, whereas RAID 4 and RAID 5 can only
sustain failure of a single drive.
RAID 50 (8)
Distributed data guarding (RAID 5) with multiple parity groups.
RAID 60 (9)
Advanced data guarding (RAID 6) with multiple parity groups."
::= { cpqDaLogDrvEntry 3 }Drive ArraycpqDaLogDrvFaultTolHP Logical Drive Size {#SNMPINDEX} [{#SNMPVALUE}]SNMP_AGENT.1.3.6.1.4.1.232.3.2.3.1.1.9.{#SNMPINDEX}cpqDaLogDrvSize[{#SNMPINDEX}]288001d7dBCPQIDA-MIB::
cpqDaLogDrvSize OBJECT-TYPE
SYNTAX INTEGER (0..2147483647)
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Logical Drive Size.
This is the size of the logical drive in megabytes. This value
is calculated using the value 1,048,576 (2^20) as a megabyte.
Drive manufacturers sometimes use the number 1,000,000 as a
megabyte when giving drive capacities so this value may
differ from the advertised size of a drive."
::= { cpqDaLogDrvEntry 9 }Drive ArrayMULTIPLIER1048576Memory ModulesSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.6.2.14.13.1.19]snmp.discovery[memoryModules]3600DISABLED{#SNMPVALUE}^[^2]A0Enable this discovery rule to monitor each memory module (i.e. to show which modules are installed in which sockets). If this rule is enabled, disable the top-level item "HP Resilient Memory Condition" so you don''t get 2 alerts in the event of a memory failure.
If you don't need monitoring of individual memory modules, leave this disabled, and leave the top-level item "HP Resilient Memory Condition" enabled.
Walk the cpqHeResMem2ModuleStatus, filter out any modules which are "notPresent(2)"
CPQHLTH-MIB::
cpqHeResMem2ModuleStatus OBJECT-TYPE
SYNTAX INTEGER {
other(1),
notPresent(2),
present(3),
good(4),
add(5),
upgrade(6),
missing(7),
doesNotMatch(8),
notSupported(9),
badConfig(10),
degraded(11)HP Memory Module CPU Number [{#SNMPINDEX}]SNMP_AGENT.1.3.6.1.4.1.232.6.2.14.13.1.3.{#SNMPINDEX}cpqHeResMem2CpuNum[{#SNMPINDEX}]288001d1dCPQHLTH-MIB::
cpqHeResMem2CpuNum OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The memory module CPU number. Value 0 means memory is not Processor based."
::= { cpqHeResMem2ModuleEntry 3 }Memory ModulesHP Memory Module Condition [{#SNMPINDEX}]SNMP_AGENT.1.3.6.1.4.1.232.6.2.14.13.1.20.{#SNMPINDEX}cpqHeResMem2ModuleCondition[{#SNMPINDEX}]3007d30dCPQHLTH-MIB::
cpqHeResMem2ModuleCondition OBJECT-TYPE
SYNTAX INTEGER {
other(1),
ok(2),
degraded(3),
degradedModuleIndexUnknown(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"This provides the current status of the correctable memory
errors for this memory module.
The following status values are supported:
other(1):
ECC is not supported on this memory module or the
condition could not be determined.
ok(2):
The memory module is operating normally.
degraded(3):
The memory module is correctable error count has exceeded
threshold or a configuration error has been detected.
degradedModuleIndexUnknown(4):
The correctable error count has exceeded threshold.
The module number not available."
::= { cpqHeResMem2ModuleEntry 20 }Memory ModulescpqConditionHP Memory Module Number [{#SNMPINDEX}]SNMP_AGENT.1.3.6.1.4.1.232.6.2.14.13.1.5.{#SNMPINDEX}cpqHeResMem2ModuleNum[{#SNMPINDEX}]288001d1dCPQHLTH-MIB::
cpqHeResMem2ModuleNum OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The memory module number."
::= { cpqHeResMem2ModuleEntry 5 }Memory ModulesHP Memory Module Size [{#SNMPINDEX}]SNMP_AGENT.1.3.6.1.4.1.232.6.2.14.13.1.6.{#SNMPINDEX}cpqHeResMem2ModuleSize[{#SNMPINDEX}]288001d1dBCPQHLTH-MIB::
cpqHeResMem2ModuleSize OBJECT-TYPE
SYNTAX INTEGER (0..2147483647)
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Module memory size in kilobytes. A kilobyte of memory is
defined as 1024 bytes.
A size of 0 indicates the module is not present."
::= { cpqHeResMem2ModuleEntry 6 }Memory ModulesMULTIPLIER1024{SNMP-HP-iLO5:cpqHeResMem2ModuleCondition[{#SNMPINDEX}].last()}>2 and {SNMP-HP-iLO5:cpqHeResMem2ModuleNum[{#SNMPINDEX}].last()}<>99 and {SNMP-HP-iLO5:cpqHeResMem2CpuNum[{#SNMPINDEX}].last()}<>99{HOST.NAME}: Memory Module {#SNMPINDEX} is degradedAVERAGEThe correctable error count for this memory module has been exceeded. It needs to be replaced. Suggested actions are to write a ticket to replace the module, then acknowledge the alert, writing the ticket number in the notes.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381Physical DrivesSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.3.2.5.1.1.64]snmp.discovery[physicalDrives]36000Walk the cpqDaPhyDrvLocationString table to get drive locations:
CPQIDA-MIB::
cpqDaPhyDrvLocationString OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..255))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Physical Drive Location String.
This string describes the location of the drive in relation to
the controller. If the location string cannot be determined,
the agent will return a NULL string."
::= { cpqDaPhyDrvEntry 64 }HP Physical Drive Condition {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.37.{#SNMPINDEX}cpqDaPhyDrvCondition[{#SNMPINDEX}]3007d90dCPQIDA-MIB::
cpqDaPhyDrvCondition OBJECT-TYPE
SYNTAX INTEGER
{
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The condition of the device.
This value represents the overall condition of this physical
drive."
::= { cpqDaPhyDrvEntry 37 }Drive ArraycpqCondition{last()}=4{HOST.NAME}: Drive {#SNMPINDEX} has failed [{#SNMPVALUE}]AVERAGEThis drive has failed. It needs to be replaced. This alert will normally be accompanied by an alert from a logical drive which has now lost redundancy. This means that now a single drive failure can cause data loss. Suggested actions are to write a ticket to replace the failed physical drive, then acknowledge the physical and logical drive alerts, writing the ticket number in the notes.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381{last()}=3{HOST.NAME}: Drive {#SNMPINDEX} is degraded [{#SNMPVALUE}]WARNINGThis drive is degraded (may mean predictive failure). It should be replaced. Suggested actions are to write a ticket to replace the drive, then acknowledge the alert, writing the ticket number in the notes.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Physical Drive Reads/Sec {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.11.{#SNMPINDEX}cpqDaPhyDrvReadsSec[{#SNMPINDEX}]6030dreads/sNumber of sector reads/sec
CPQIDA-MIB::
cpqDaPhyDrvReads OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Sectors Read (low).
The phyDrvHReads and the phyDrvReads together shows
the total number of sectors read from the physical disk drive
during the reference time (phyDrvRefHours).
The actual number of sectors read equals the phyDrvHReads
times 2^32 plus the phyDrvReads.
This information may be useful for determining rates.
For instance, if you wanted to calculate the average number
of reads per hour of operation, divide this number by the
reference hours."
::= { cpqDaPhyDrvEntry 11 }Drive ArrayCHANGE_PER_SECONDHP Physical Drive Total Bytes Read {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.11.{#SNMPINDEX}cpqDaPhyDrvReads[{#SNMPINDEX}]6007dBRunning total of number of bytes read from this drive. Wraps every 2TB.
CPQIDA-MIB::
cpqDaPhyDrvReads OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Sectors Read (low).
The phyDrvHReads and the phyDrvReads together shows
the total number of sectors read from the physical disk drive
during the reference time (phyDrvRefHours).
The actual number of sectors read equals the phyDrvHReads
times 2^32 plus the phyDrvReads.
This information may be useful for determining rates.
For instance, if you wanted to calculate the average number
of reads per hour of operation, divide this number by the
reference hours."
::= { cpqDaPhyDrvEntry 11 }Drive ArrayMULTIPLIER512HP Physical Drive Serial Number {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.51.{#SNMPINDEX}cpqDaPhyDrvSerialNum[{#SNMPINDEX}]288001d0TEXTCPQIDA-MIB::
cpqDaPhyDrvSerialNum OBJECT-TYPE
SYNTAX DisplayString (SIZE (0..40))
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Physical Drive Serial Number.
This is the serial number assigned to the physical drive.
This value is based upon the serial number as returned by the
SCSI inquiry command but may have been modified due to space
limitations. This can be used for identification purposes."
::= { cpqDaPhyDrvEntry 51 }Drive ArrayHP Physical Drive Size {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.45.{#SNMPINDEX}cpqDaPhyDrvSize[{#SNMPINDEX}]288001d7dBCPQIDA-MIB::
cpqDaPhyDrvSize OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Physical Drive Size in MB.
This is the size of the physical drive in megabytes. This value
is calculated using the value 1,048,576 (2^20) as a megabyte.
Drive manufacturers sometimes use the number 1,000,000 as a
megabyte when giving drive capacities so this value may differ
from the advertised size of a drive. This field is only
applicable for controllers which support SCSI drives, and
therefore is not supported by the IDA or IDA-2 controllers.
The field will contain 0xFFFFFFFF if the drive capacity cannot
be calculated or if the controller does not support SCSI drives."
::= { cpqDaPhyDrvEntry 45 }Drive ArrayMULTIPLIER1048576HP Physical Drive Writes/Sec {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.13.{#SNMPINDEX}cpqDaPhyDrvWritesSec[{#SNMPINDEX}]6030dwrites/sNumber of sector writes/sec
CPQIDA-MIB::
cpqDaPhyDrvWrites OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Sectors Written (low).
The phyDrvHWrites and the phyDrvWrites together
shows the total number of sectors written to the physical
disk drive during the reference hours (phyDrvRefHours).
The actual number of sectors written equals the phyDrvHWrites
times 2^32 plus the phyDrvWrites.
This information may be useful for determining rates.
For instance, if you wanted to calculate the average number of
writes per hour of operation, divide this number by the reference
hours. "
::= { cpqDaPhyDrvEntry 13 }Drive ArrayCHANGE_PER_SECONDHP Physical Drive Total Bytes Written {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.3.2.5.1.1.13.{#SNMPINDEX}cpqDaPhyDrvWrites[{#SNMPINDEX}]6007dBRunning total of number of bytes written to this drive. Wraps every 2TB.
CPQIDA-MIB::
cpqDaPhyDrvWrites OBJECT-TYPE
SYNTAX Counter
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Sectors Written (low).
The phyDrvHWrites and the phyDrvWrites together
shows the total number of sectors written to the physical
disk drive during the reference hours (phyDrvRefHours).
The actual number of sectors written equals the phyDrvHWrites
times 2^32 plus the phyDrvWrites.
This information may be useful for determining rates.
For instance, if you wanted to calculate the average number of
writes per hour of operation, divide this number by the reference
hours. "
::= { cpqDaPhyDrvEntry 13 }Drive ArrayMULTIPLIER512Power SuppliesSNMP_AGENTdiscovery[{#SNMPVALUE},.1.3.6.1.4.1.232.6.2.9.3.1.3]snmp.discovery[powerSupplies]3600{#SNMPVALUE}^[^2]A0Walk the cpqHeFltTolPowerSupplyPresent table, filter out any modules which are "absent(2)"
CPQHLTH-MIB::
cpqHeFltTolPowerSupplyPresent OBJECT-TYPE
SYNTAX INTEGER {
other(1),
absent(2),
present(3)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Indicates whether the power supply is present in the chassis."
::= { cpqHeFltTolPowerSupplyEntry 3 }HP Power Supply Condition {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.6.2.9.3.1.4.{#SNMPINDEX}cpqHeFltTolPowerSupplyCondition[{#SNMPINDEX}]3007d30dCPQHLTH-MIB::
cpqHeFltTolPowerSupplyCondition OBJECT-TYPE
SYNTAX INTEGER {
other(1),
ok(2),
degraded(3),
failed(4)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The condition of the power supply.
This value will be one of the following:
other(1)
The status could not be determined or not present.
ok(2)
The power supply is operating normally.
degraded(3)
A temperature sensor, fan or other power supply component is
outside of normal operating range.
failed(4)
A power supply component detects a condition that could
permanently damage the system."
::= { cpqHeFltTolPowerSupplyEntry 4 }HealthcpqCondition{last()}=4{HOST.NAME}: Power Supply {#SNMPINDEX} has failedAVERAGEThis power supply is reporting that it has failed. This could mean that the supply has failed and needs replacement, or it could also mean that there is no input power due to a cabling problem. Check to make sure that there is power going to this supply, if so it has failed and needs to be replaced.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381{last()}=3{HOST.NAME}: Power Supply {#SNMPINDEX} is degradedWARNINGThis power supply is reporting a degraded condition. This could mean that a fan has failed, or a termperature reading is too high. Check the iLO or system management homepage for more information. If a fan has failed, it needs to be replaced.
HP System Management Homepage (may require starting service hpsmhd):
https://{HOST.CONN}:2381HP Power Supply Redundant {#SNMPINDEX}SNMP_AGENT.1.3.6.1.4.1.232.6.2.9.3.1.9.{#SNMPINDEX}cpqHeFltTolPowerSupplyRedundant[{#SNMPINDEX}]36007d30dCPQHLTH-MIB::
cpqHeFltTolPowerSupplyRedundant OBJECT-TYPE
SYNTAX INTEGER {
other(1),
notRedundant(2),
redundant(3)
}
ACCESS read-only
STATUS mandatory
DESCRIPTION
"The redundancy state of the power supply.
This value will be one of the following:
other(1)
The redundancy state could not be determined.
notRedundant(2)
The power supply is not operating in a redundant state.
redundant(3)
The power supply is operating in a redundant state."
::= { cpqHeFltTolPowerSupplyEntry 9 }HealthcpqRedundant{$SNMP_COMMUNITY}publiccpqCondition1other2OK3degraded4failedcpqDaAccelBackupPowerSource1other2battery3capacitor4Flash-Backed Write CachecpqDaAccelBatteryStatus1other2ok3recharging4failed5degraded6notPresent7capacitorFailedcpqDaAccelStatus1other2invalid3enabled4tmpDisabled5permDisabled6cacheModFlashMemNotAttached7cacheModDegradedFailsafeSpeed8cacheModCriticalFailure9cacheReadCacheNotMappedcpqDaCntlrDriveWriteCacheState1other2disabled3enabledcpqDaLogDrvFaultTol1other2none3mirroring4dataGuard5distribDataGuard7advancedDataGuard8raid509raid60cpqRedundant1other2notRedundant3redundant