[bug] iDRAC sensor discovery

Having left the system alone for a while, I can confirm that the IDRAC devices are almost showing their full complement of sensors again. However, the virtual disks have now disappeared and don’t show up at all. From the event log, it does look as though the IDRAC’s are being rediscovered every so often. I am wondering whether somehow the discovery is running every 6.5 hours rather than at 6.33 as programmed so I will check that.

Right, I read your cron entry too fast : cronfile says :
Every 6 hours, starting at xx:33 . Of course, because the discovery process takes time, and the devices are discovered sequentially, it may be discovered around xx:40 or xx:50 etc etc, depending on the load of the server, the speed of the device …

Aside of the cron part, what is the resulting status now ? do you see the expected sensors back or not, for both ‘drac’ and ‘dell’ devices ? If not, could you list exactly the missing ones ?

You can always revert the patch like this :

git checkout includes/discovery/sensors/state/dell.inc.php
git checkout includes/discovery/sensors/state/drac.inc.php
rm cache/os_defs.cache

And force a discovery again.

(I don’t have the device so wou have to be my ears and eyes on this :slight_smile: )

Hi Pipo

I have changed my cron file as follows:
0 6 * * * librenms /opt/librenms/cronic /opt/librenms/discovery-wrapper.py 1

It seems like most of the sensors are back other than the two virtual data-stores, but perhaps if the discovery is run less frequently, those will come back too? (Spoke too soon, all the Idrac devices except one are now down to 5 sensors, with the exception showing a normal 75 sensors - It looks as if each discovery is either deleting sensors or adding sensors alternately.)
I can revert the patch but I would be interested to see if there is a resolution to this as we obviously do want to know if local storage is degraded.

Phil

Please run this one to ensure that you are back to the normal drac.inc.php. This one is not part of the patch anymore, so you must get back to the original version.

I have run that, validate still complains as follows:
Modified Files:
includes/discovery/sensors/state/dell.inc.php

Today, all hosts are showing their full complement of sensors, including the virtual datastores, so fingers crossed that it is fixed, but too early to say.

dell.inc.php is indeed patched. So this is expected for now. Let see how it goes with the DRACs.

Concerning DELL devices, do you have any of them ? if yes, do they behave correctly ?

We do have several Dell devices, however, other than physical servers with IDRAC, our Dell SANs do not allow SNMP interrogation, so are irrelevant. Dell switches work very well within LibreNMS. We have one server without IDRAC, and that shows up on LibreNMS as an ESXi host only.

After setting the discovery to run at 6 am daily, we are still seeing a flip-flopping deletion/addition routine during each discovery cycle, which seems to affect the IDRAC with local storage more than the others. This is an extract from the log from that server this morning, as you can see, the deletion affects other sensors as well as the virtual datastores. So this morning, there are no virtual datastores or e.g. fans showing against that device:

2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state memoryDeviceStatus 1.4 DIMM.Socket.B2, 16384 MB System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state memoryDeviceStatus 1.3 DIMM.Socket.B1, 16384 MB System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state memoryDeviceStatus 1.2 DIMM.Socket.A2, 16384 MB System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state memoryDeviceStatus 1.1 DIMM.Socket.A1, 16384 MB System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state processorDeviceStatus 1.2 Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state processorDeviceStatus 1.1 Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state virtualDiskState 2 Data_Store2 System
2019-09-13 06:15:00 sensor idrac-esxi09 Sensor Deleted: state virtualDiskState 1 VD1 System
2019-09-13 06:14:28 sensor idrac-esxi09 Sensor Deleted: fanspeed drac 14 System Fan7B System
2019-09-13 06:14:28 sensor idrac-esxi09 Sensor Deleted: fanspeed drac 13 System Fan6B System
2019-09-13 06:14:28 sensor idrac-esxi09 Sensor Deleted: fanspeed drac 12 System Fan5B System

@PhilipHalton
This is now another issue than the one this thread was opened. We have no change in the code now for iDrac in the current patch you are testing compared to vanilla LibreNMS.

I would say that you need to run the discovery manually, in CLI, with librenms user, and with options ‘-v’ and ‘-d’ until you see the sensors removed.
Reading the debug data around the sensor discovery should help understand what’s going on.

@angryp: Could you test the patch as well (the current one) and let us know if the issue is solved ?

Hello @PipoCanaja, sorry for a late reply.

Unfortunately, we have wiped the only DC that had Dell switches and we don’t have a place to test anymore.
This is the main reason I did not address this bug any further after submitting.

Sorry for inconveniences caused…