Create new alert for physical disk fault in HPE servers

Hi,

I’m a long time user of LibreNMS but somehow never came across this before. We have it configured to poll our physical HPE servers and can see HDD statuses in the GUI.

I’m trying to configure a generic alert, so that any disks not in status “2” generate an alert. I can see the following output in snmpwalk:

[root@librenms ~]# snmpwalk -v2c -c public 192.168.2.10 .1.3.6.1.4.1.232.3.2.5.1.1.6
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.8 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.9 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.10 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.11 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.12 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.13 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.14 = INTEGER: 3
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.15 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.16 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.6.2.17 = INTEGER: 2

As you can see above, one disk returns status “3”. How can I detect and alert on this please?

Update: I’ve configured an alert with the following parameters from the built-in collections:

sensors.sensor_current REGEXP "[3-4]" AND sensors.sensor_oid = ".1.3.6.1.4.1.232.3.2.5.1.1.37."

In snmpwalk this returns:

SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.8 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.9 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.10 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.11 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.12 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.13 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.14 = INTEGER: 4
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.15 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.16 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.2.17 = INTEGER: 2

So I can see one disk with a [3-4] status - but still no alert. What am I missing?

Is the sensor showing in the webui for that device? Can you post a screenshot of it?

Yes the sensors are being detected correctly it seems:

Does this look correct and is there anything I can do in your opinion?

Create an alert for it.

There is a generic one for state sensors in the alert rules collection.

this will always fail because your sensor_oid statement is checking for an exact match. Use the starts with match instead.

Of course, stupid mistake on my part. Thanks very much!

Just wanted to follow up here. The steps above worked for me and I’m now able to correctly monitor the physical disk status of my hosts. Thanks for the help!

can you post the rule with the starts please?

I have the same issue, my rule as below:
sensors.sensor_current > sensors.sensor_limit_warn AND sensors.sensor_class = state AND macros.device_up = 1

Not sure about the starts, the oid = “.1.3.6.1.4.1.232.3.2.5.1.1.37” means driver, the oid = “.1.3.6.1.4.1.232.3.2.5.1.1.6” means CPU is this correct?

I try to use starts but, no luck

[root@librenms ~]# snmpwalk -v2c -c hpeiLOv3 192.168.10.237 .1.3.6.1.4.1.232.3.2.5.1.1.37.*
.1.3.6.1.4.1.232.3.2.5.1.1.37.: Unknown Object Identifier (Sub-id not found: enterprises → )
[root@librenms ~]# snmpwalk -v2c -c hpeiLOv3 192.168.10.237 .1.3.6.1.4.1.232.3.2.5.1.1.37
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.0.0 = INTEGER: 2
SNMPv2-SMI::enterprises.232.3.2.5.1.1.37.0.1 = INTEGER: 2
[root@librenms ~]# snmpwalk -v2c -c hpeiLOv3 192.168.10.237 .1.3.6.1.4.1.232.3.2.5.1.1.37.
.

.1.3.6.1.4.1.232.3.2.5.1.1.37..: Unknown Object Identifier (Sub-id not found: enterprises → .)
[root@librenms ~]#

Finally, I got this run on HPE ilo 460 Gen9

sensors.sensor_current != “sensors.sensor_limit_warn” AND sensors.sensor_type = “cpqDaPhyDrvStatus” AND macros.device_up = 1

and the oid from librenms is .6. here is out put from the alert:

Faults: #1: sysObjectID = .1.3.6.1.4.1.232.9.4.10; sysDescr = Integrated Lights-Out 4 2.82 Feb 06 2023; location_id = 16; sensor_id = 2316; sensor_oid = .1.3.6.1.4.1.232.3.2.5.1.1.6.0.0; sensor_descr = Drive 1 Status; #2: sysObjectID = .1.3.6.1.4.1.232.9.4.10; sysDescr = Integrated Lights-Out 4 2.82 Feb 06 2023; location_id = 16; sensor_id = 2317; sensor_oid = .1.3.6.1.4.1.232.3.2.5.1.1.6.0.1; sensor_descr = Drive 2 Status;

then I change the rule by using current state for all my setting:

sensors.sensor_current != 2 AND sensors.sensor_type = “cpqDaPhyDrvStatus” AND macros.device_up = 1

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.