IPMI not detecting PSU failure on Supermicro server

Hi,

I have a Supermicro 2U Server that has a PSU failure and Librenms is not reporting the PSU as failed.

Output of a regular poller shows:

Load poller module ipmi

Fetching IPMI sensor data… done.
Updating IPMI sensor Chassis Intru… 51h ok
Updating IPMI sensor PS1 Status… C8h ok
Updating IPMI sensor PS2 Status… C9h ok
Updating IPMI sensor 12V… 11.978 Volts
Updating IPMI sensor +1.5 V… 1.488 Volts
Updating IPMI sensor 3.3V… 3.312 Volts
Updating IPMI sensor +3.3VSB… 3.312 Volts
Updating IPMI sensor 5V… 4.992 Volts
Updating IPMI sensor +5VSB… 4.928 Volts
Updating IPMI sensor CPU1 VSA… 0.928 Volts
Updating IPMI sensor CPU2 VSA… 0.928 Volts
Updating IPMI sensor FAN4… 3000 RPM
Updating IPMI sensor FAN5… 4425 RPM
Updating IPMI sensor System Temp… 30 degrees C
Updating IPMI sensor VTT… 0.976 Volts
Updating IPMI sensor CPU1 Temp… 34 degrees C
Updating IPMI sensor CPU1 Vcore… 0.800 Volts
Updating IPMI sensor CPU2 Temp… 35 degrees C
Updating IPMI sensor CPU2 Vcore… 0.784 Volts
Updating IPMI sensor FAN1… 4425 RPM
Updating IPMI sensor FAN3… 2925 RPM
Updating IPMI sensor FANA… 2775 RPM
Updating IPMI sensor PCH Temp… 45 degrees C
Updating IPMI sensor Peripheral Temp… 40 degrees C
Updating IPMI sensor VBAT… 3.120 Volts
Updating IPMI sensor VDIMM AB… 1.488 Volts
Updating IPMI sensor VDIMM CD… 1.488 Volts
Updating IPMI sensor VDIMM EFGH… 1.488 Volts

Runtime for poller module ‘ipmi’: 0.2733 seconds with 73912 bytes

Unload poller module ipmi

Indicating the PSU as ‘OK’ whereas a poller verbose output shows:

Fetching IPMI sensor data…SNMP[/usr/bin/ipmitool -I lanplus -c -H 192.168.32.152 -U ‘ADMIN’ -P ‘xxxyyy’ -L USER sdr 2>/dev/null]
CPU1 Temp,34,degrees C,ok
CPU2 Temp,35,degrees C,ok
System Temp,30,degrees C,ok
Peripheral Temp,40,degrees C,ok
PCH Temp,45,degrees C,ok
FAN1,4425,RPM,ok
FAN2,ns
FAN3,2925,RPM,ok
FAN4,2925,RPM,ok
FAN5,4275,RPM,ok
FAN6,ns
FANA,2850,RPM,ok
FANB,ns
VTT,0.976,Volts,ok
CPU1 Vcore,0.800,Volts,ok
CPU2 Vcore,0.784,Volts,ok
CPU1 VSA,0.928,Volts,ok
CPU2 VSA,0.928,Volts,ok
VDIMM AB,1.488,Volts,ok
VDIMM CD,1.488,Volts,ok
VDIMM EFGH,1.488,Volts,ok
+1.5 V,1.488,Volts,ok
3.3V,3.312,Volts,ok
+3.3VSB,3.312,Volts,ok
5V,4.992,Volts,ok
+5VSB,4.928,Volts,ok
12V,11.978,Volts,ok
VBAT,3.120,Volts,ok
HDD Status,52h,ns,26.1,No Reading
Chassis Intru,51h,ok,23.1,
PS1 Status,C8h,ok,10.1,Presence detected, Failure detected
PS2 Status,C9h,ok,10.2,Presence detected

CPU1 Temp,34,degrees C,ok
CPU2 Temp,35,degrees C,ok
System Temp,30,degrees C,ok
Peripheral Temp,40,degrees C,ok
PCH Temp,45,degrees C,ok
FAN1,4425,RPM,ok
FAN2,ns
FAN3,2925,RPM,ok
FAN4,2925,RPM,ok
FAN5,4275,RPM,ok
FAN6,ns
FANA,2850,RPM,ok
FANB,ns
VTT,0.976,Volts,ok
CPU1 Vcore,0.800,Volts,ok
CPU2 Vcore,0.784,Volts,ok
CPU1 VSA,0.928,Volts,ok
CPU2 VSA,0.928,Volts,ok
VDIMM AB,1.488,Volts,ok
VDIMM CD,1.488,Volts,ok
VDIMM EFGH,1.488,Volts,ok
+1.5 V,1.488,Volts,ok
3.3V,3.312,Volts,ok
+3.3VSB,3.312,Volts,ok
5V,4.992,Volts,ok
+5VSB,4.928,Volts,ok
12V,11.978,Volts,ok
VBAT,3.120,Volts,ok
HDD Status,52h,ns,26.1,No Reading
Chassis Intru,51h,ok,23.1,
PS1 Status,C8h,ok,10.1,Presence detected, Failure detected
PS2 Status,C9h,ok,10.2,Presence detected
done.
Updating IPMI sensor Chassis Intru… 51h ok
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor–ipmi-Chassis_Intru.rrd N:U]
RRDtool Output: OK u:0.00 s:0.00 r:1.54
SQL[UPDATE sensors set sensor_current =‘51h’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘’ AND sensor_id = ‘62’]
Updating IPMI sensor PS1 Status… C8h ok
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor–ipmi-PS1_Status.rrd N:U]
RRDtool Output: OK u:0.00 s:0.00 r:1.80
SQL[UPDATE sensors set sensor_current =‘C8h’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘’ AND sensor_id = ‘76’]
Updating IPMI sensor PS2 Status… C9h ok
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor–ipmi-PS2_Status.rrd N:U]
RRDtool Output: OK u:0.00 s:0.00 r:1.80
SQL[UPDATE sensors set sensor_current =‘C9h’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘’ AND sensor_id = ‘77’]
Updating IPMI sensor 12V… 11.978 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-12V.rrd N:11.978]
RRDtool Output: OK u:0.00 s:0.00 r:1.80
SQL[UPDATE sensors set sensor_current =‘11.978’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘299’]
Updating IPMI sensor +1.5 V… 1.488 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-_1.5_V.rrd N:1.488]
RRDtool Output: OK u:0.00 s:0.00 r:1.80
SQL[UPDATE sensors set sensor_current =‘1.488’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘300’]
Updating IPMI sensor 3.3V… 3.312 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-3.3V.rrd N:3.312]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘3.312’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘301’]
Updating IPMI sensor +3.3VSB… 3.312 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-_3.3VSB.rrd N:3.312]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘3.312’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘302’]
Updating IPMI sensor 5V… 4.992 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-5V.rrd N:4.992]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘4.992’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘303’]
Updating IPMI sensor +5VSB… 4.928 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-_5VSB.rrd N:4.928]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘4.928’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘304’]
Updating IPMI sensor CPU1 VSA… 0.928 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-CPU1_VSA.rrd N:0.928]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘0.928’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘306’]
Updating IPMI sensor CPU2 VSA… 0.928 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-CPU2_VSA.rrd N:0.928]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘0.928’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘309’]
Updating IPMI sensor FAN4… 2925 RPM
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-fanspeed-ipmi-FAN4.rrd N:2925]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘2925’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘fanspeed’ AND sensor_id = ‘312’]
Updating IPMI sensor FAN5… 4275 RPM
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-fanspeed-ipmi-FAN5.rrd N:4275]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘4275’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘fanspeed’ AND sensor_id = ‘313’]
Updating IPMI sensor System Temp… 30 degrees C
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-temperature-ipmi-System_Temp.rrd N:30]
RRDtool Output: OK u:0.00 s:0.00 r:1.81
SQL[UPDATE sensors set sensor_current =‘30’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘temperature’ AND sensor_id = ‘316’]
Updating IPMI sensor VTT… 0.976 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-VTT.rrd N:0.976]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘0.976’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘319’]
Updating IPMI sensor CPU1 Temp… 34 degrees C
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-temperature-ipmi-CPU1_Temp.rrd N:34]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘34’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘temperature’ AND sensor_id = ‘374’]
Updating IPMI sensor CPU1 Vcore… 0.800 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-CPU1_Vcore.rrd N:0.800]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘0.800’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘375’]
Updating IPMI sensor CPU2 Temp… 35 degrees C
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-temperature-ipmi-CPU2_Temp.rrd N:35]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘35’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘temperature’ AND sensor_id = ‘376’]
Updating IPMI sensor CPU2 Vcore… 0.784 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-CPU2_Vcore.rrd N:0.784]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘0.784’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘377’]
Updating IPMI sensor FAN1… 4425 RPM
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-fanspeed-ipmi-FAN1.rrd N:4425]
RRDtool Output: OK u:0.00 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘4425’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘fanspeed’ AND sensor_id = ‘378’]
Updating IPMI sensor FAN3… 2925 RPM
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-fanspeed-ipmi-FAN3.rrd N:2925]
RRDtool Output: OK u:0.01 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘2925’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘fanspeed’ AND sensor_id = ‘379’]
Updating IPMI sensor FANA… 2850 RPM
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-fanspeed-ipmi-FANA.rrd N:2850]
RRDtool Output: OK u:0.01 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘2850’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘fanspeed’ AND sensor_id = ‘380’]
Updating IPMI sensor PCH Temp… 45 degrees C
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-temperature-ipmi-PCH_Temp.rrd N:45]
RRDtool Output: OK u:0.01 s:0.00 r:1.82
SQL[UPDATE sensors set sensor_current =‘45’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘temperature’ AND sensor_id = ‘381’]
Updating IPMI sensor Peripheral Temp… 40 degrees C
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-temperature-ipmi-Peripheral_Temp.rrd N:40]
RRDtool Output: OK u:0.01 s:0.00 r:1.83
SQL[UPDATE sensors set sensor_current =‘40’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘temperature’ AND sensor_id = ‘382’]
Updating IPMI sensor VBAT… 3.120 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-VBAT.rrd N:3.120]
RRDtool Output: OK u:0.01 s:0.00 r:1.83
SQL[UPDATE sensors set sensor_current =‘3.120’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘383’]
Updating IPMI sensor VDIMM AB… 1.488 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-VDIMM_AB.rrd N:1.488]
RRDtool Output: OK u:0.01 s:0.00 r:1.83
SQL[UPDATE sensors set sensor_current =‘1.488’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘384’]
Updating IPMI sensor VDIMM CD… 1.488 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-VDIMM_CD.rrd N:1.488]
RRDtool Output: OK u:0.01 s:0.00 r:1.83
SQL[UPDATE sensors set sensor_current =‘1.488’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘385’]
Updating IPMI sensor VDIMM EFGH… 1.488 Volts
RRD[update /opt/librenms/rrd/db01a.mkdc0.xewave.io/sensor-voltage-ipmi-VDIMM_EFGH.rrd N:1.488]
RRDtool Output: OK u:0.01 s:0.00 r:1.83
SQL[UPDATE sensors set sensor_current =‘1.488’,lastupdate =NOW() WHERE poller_type = ‘ipmi’ AND sensor_class = ‘voltage’ AND sensor_id = ‘386’]

Runtime for poller module ‘ipmi’: 0.2956 seconds with 78008 bytes

Unload poller module ipmi

So the failure is detected but not acknowledged by the poller.

Cheers,

Neil.

what alert do you have setup for PSU failure on that device?

I suggest you post the full output of ./poller.php -h HOSTNAME -d -m sensors -r -f in a nicely formatted reply or use pastebin at least.

Also, share the alert rule

Hi, I don’t have an alert rule currently, even though the PSU status doesn’t appear anywhere obvious in LibreNMS I can see that the output of ipmitool is determined the same the same for both good and bad PSUs based on other servers because includes/polling/ipmi.inc.php takes only the first four comma separated fields and discards everything else:

Good Server:

PS1 Status,C8h,ok,10.1,Presence detected
PS2 Status,C9h,ok,10.2,Presence detected

Bad Server:

PS1 Status,C8h,ok,10.1,Presence detected, Failure detected
PS2 Status,C9h,ok,10.2,Presence detected

ipmi.inc.php:

    foreach (explode("\n", $results) as $row) {
        list($desc, $value, $type, $status) = explode(',', $row);

So the resulting SQL statement attempts to set sensors.sensor_current to ‘c8h’ or ‘c9h’ where sensor_current appears to be a numeric column so always ends up as zero with a warning returned from mysql.

After some googling I found the following ipmitool ticket: https://sourceforge.net/p/ipmitool/bugs/386/ where there’s a discussion around the “ok” status of the PSU output, where it seems the “ok” status relates to the status of the sensor reading and not its value.

It seems that in order to parse the output of the PSx sensors the output needs to be interpreted differently, however I don’t know if this is something specific to Supermicro or lanplus?