I assume some users have added an iDrac device to their instance at some point… how well does it work for you?
I’ve got around ~30 iDrac devices on a variety of systems (C6320’s, R730’s, R720XD’s…) being polled and every single one is flakey to the point it’s not really worth polling - sensors constantly and spuriously flap, some of them just fail to report some graphs, others will stop responding for hours then come back without intervention…
The polling process itself seems fine, it just looks like they just return garbage - is this anyone else’s experience?
Great, thats useful to know that its working for at least 1 person. Do you mind sharing a little about your environment, see what differences there are?
I’m running my poller on CentOS 7.4, PHP 7.2, using SNMPv3 exclusively (AES/SHA), using a read-only iDrac user, firmware 2.50 and 2.40
iDRAC6 - FW 1.98 / 1.96 / 2.85
only Global Status is working
iDRAC7 - FW 2.50.50.50 / 2.21.21.21 / 1.57.57
52-55 sensors working
iDRAC8 - FW 2.30.30.30 / 2.50.50.50 / 2.41.40.40
56-73 sensors working
Finally managed to catch it in the act of failing - after tuning every tweakable option I could and reducing how often OpenManage does its own checks, it now happens a handful of times per day.
I think what is happening from the trace is that the iDrac module is mostly responding to stuff, but occasionally drops a request for some unknown reason - likely load related.
When this happens, LibreNMS appears to either blank the value (sysloc/sysname/etc.) or if it’s a sensor, set it to critical.
So LibreNMS doesn’t appear to be at fault, but perhaps it’s more sensible not to update values if there’s a timeout? Either globally, or with a “this device is flakey” option?
Apologies for the thread necromancy, but after years of trying to solve this problem, I finally found the root cause, and this thread is one of the first ones that pops up if you search for intermittent iDRAC problems with LibreNMS.
iDRAC 6 and up have a rate-limiting firewall rule enabled by default that cannot be configured through the WebUI (except on iDRAC 9, and even then, it’s buried in the menus and disguised as the automatic lockout feature for password retries)
to fix this, SSH into your iDRAC (or use racadm on the host’s OS) and run the following:
racadm set iDRAC.IPBlocking.BlockEnable Disabled
Your SNMP queries will no longer sometimes time out. With this enabled, the iDRAC will throttle SNMP queries to around 3 per second sometimes, but not always. With it disabled, it does not.
It took an absolutely ludicrous amount of effort to get this answer from Dell, and even after months of back and forth they still won’t directly admit that this switch controls more than just the failed password lockout.
Not trying to be a pain, but as I also had the issue, I decided to run racadm get iDRAC.IPBlocking to see all of the attributes available and got this:
I think to be more security concious, you might want change the RangeAddr’s to whatever IP your logger is at, change the RangeMask if necessary, and then RangeEnable the proper range. You can also edit the FailCount, FailWindow, and PenaltyTime. BlockEnable=Disabled is great in a test environment, but probably not the best idea for production.