Polling Dell iDrac Devices

TheMysteriousX · 31 January 2018 22:27

I assume some users have added an iDrac device to their instance at some point… how well does it work for you?

I’ve got around ~30 iDrac devices on a variety of systems (C6320’s, R730’s, R720XD’s…) being polled and every single one is flakey to the point it’s not really worth polling - sensors constantly and spuriously flap, some of them just fail to report some graphs, others will stop responding for hours then come back without intervention…

The polling process itself seems fine, it just looks like they just return garbage - is this anyone else’s experience?

laf · 1 February 2018 20:01

I don’t use them no but I’ve not heard of others having issues.

Joerg · 8 February 2018 12:03

I have no problem with iDRAC snmp data. Works like a charm

TheMysteriousX · 8 February 2018 13:47

Great, thats useful to know that its working for at least 1 person. Do you mind sharing a little about your environment, see what differences there are?

I’m running my poller on CentOS 7.4, PHP 7.2, using SNMPv3 exclusively (AES/SHA), using a read-only iDrac user, firmware 2.50 and 2.40

Joerg · 9 March 2018 07:21

CentOS 7.4.1708

using SNMP v2c

about 50 IDRACs

iDRAC6 - FW 1.98 / 1.96 / 2.85
only Global Status is working
iDRAC7 - FW 2.50.50.50 / 2.21.21.21 / 1.57.57
52-55 sensors working
iDRAC8 - FW 2.30.30.30 / 2.50.50.50 / 2.41.40.40
56-73 sensors working

all have stable readouts

====================================

Component	Version
LibreNMS	1.37-21-g6c3473a
DB Schema	239
PHP	7.0.27
MySQL	5.5.56-MariaDB
RRDTool	1.4.8
SNMP	NET-SNMP 5.7.2

====================================

laf · 9 March 2018 17:37

Pastebin the output of ./discovery.php -h HOSTNAME -d -m sensors

Joerg · 4 April 2018 10:03

http://batman.gyptis.org/zerobin/?9b31275dc37dd212#6z+Nt2KxYg7ja8vlLLR7NAzL748GnNNJaOJuibVPK9o=

laf · 5 April 2018 07:34

We don’t need yours, we need the OPs

TheMysteriousX · 14 May 2018 21:43

Finally managed to catch it in the act of failing - after tuning every tweakable option I could and reducing how often OpenManage does its own checks, it now happens a handful of times per day.

I think what is happening from the trace is that the iDrac module is mostly responding to stuff, but occasionally drops a request for some unknown reason - likely load related.

When this happens, LibreNMS appears to either blank the value (sysloc/sysname/etc.) or if it’s a sensor, set it to critical.

So LibreNMS doesn’t appear to be at fault, but perhaps it’s more sensible not to update values if there’s a timeout? Either globally, or with a “this device is flakey” option?

gist.github.com

https://gist.github.com/TheMysteriousX/ec51e12b1f3980ec40ffc57fb1dca419

librenms-poller-fail

LibreNMS Poller
===================================
Version info:
Commit SHA: fbef48302b169800762ffb9db58b36db9fd1eed2
Commit Date: 1525274719
DB Schema: 249
PHP: 7.1.14
MySQL: 5.5.56-MariaDB
RRDTool: 1.4.8
SNMP: NET-SNMP 5.7.2

This file has been truncated. show original

andrewh · 12 June 2021 13:33

Apologies for the thread necromancy, but after years of trying to solve this problem, I finally found the root cause, and this thread is one of the first ones that pops up if you search for intermittent iDRAC problems with LibreNMS.

iDRAC 6 and up have a rate-limiting firewall rule enabled by default that cannot be configured through the WebUI (except on iDRAC 9, and even then, it’s buried in the menus and disguised as the automatic lockout feature for password retries)

to fix this, SSH into your iDRAC (or use racadm on the host’s OS) and run the following:

racadm set iDRAC.IPBlocking.BlockEnable Disabled

Your SNMP queries will no longer sometimes time out. With this enabled, the iDRAC will throttle SNMP queries to around 3 per second sometimes, but not always. With it disabled, it does not.

It took an absolutely ludicrous amount of effort to get this answer from Dell, and even after months of back and forth they still won’t directly admit that this switch controls more than just the failed password lockout.

NetOps · 21 September 2023 15:34

Not trying to be a pain, but as I also had the issue, I decided to run racadm get iDRAC.IPBlocking to see all of the attributes available and got this:

racadm>>racadm get iDRAC.IPBlocking
[Key=iDRAC.Embedded.1#IPBlocking.1]
BlockEnable=Disabled
FailCount=3
FailWindow=60
PenaltyTime=60
RangeAddr=192.168.1.1
RangeAddr2=192.168.1.1
RangeAddr3=192.168.1.1
RangeAddr4=192.168.1.1
RangeAddr5=192.168.1.1
RangeEnable=Disabled
RangeEnable2=Disabled
RangeEnable3=Disabled
RangeEnable4=Disabled
RangeEnable5=Disabled
RangeMask=255.255.255.0
RangeMask2=255.255.255.0
RangeMask3=255.255.255.0
RangeMask4=255.255.255.0
RangeMask5=255.255.255.0

I think to be more security concious, you might want change the RangeAddr’s to whatever IP your logger is at, change the RangeMask if necessary, and then RangeEnable the proper range. You can also edit the FailCount, FailWindow, and PenaltyTime. BlockEnable=Disabled is great in a test environment, but probably not the best idea for production.