How to handle persistently slow SNMP requests

SNMP requests on the infinera-groove device type are sufficiently slow that the standard 1s snmp timeout consistently doesn’t work for device discovery. If a device discovery is executed, then it will only pick up the first set of sensors - a debug run shows that all the other sensor requests time out, and the result is that the UI shows dBm, but none of the other sensors.

It looks like the groove device buffers incoming snmp requests and then synchronously attempts to read the hardware counters on the physial component being polled. If there’s a timeout, librenms will move to the next oid before the previous reply is ready. Eventually so many snmp requests are queued up that the device effectively stops responding. tcpdump confirms that some time later, the snmp replies are sent out, but that might be 30 seconds later because the inbound buffer was stuffed full of requests.

Increasing the device snmp timeout to 5 seconds allows librenms to consistently poll successfully.

The problem is that this behaviour is consistent on all more recent versions of the infinera-groove software. I.e. by default with specific hardware combinations, Librenms won’t display most of the sensors that are available.

My preferred fix would be to set an OS default snmp timeout for this to be 5s, but the snmp.timeout variable cannot be configured at an OS level. Of all the snmp.* variables, only snmp.snmp_max_repeaters can be configured at an os level, but that’s handled in get_device_max_repeaters(), i.e. it’s specific to just that variable.

Any developer suggestions on what the best way to handle this would be?

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.