Hung host discovery caused poller to stop working

Hi guys,

after updating to 1.65 this error occurs:

Error in discovery-protocols module. The process “’/usr/bin/snmpbulkwalk’ ‘-v2c’ ‘-c’ ‘some_community’ ‘-OQUsetX’ ‘-m’ ‘LLDP-MIB’ ‘-M’ ‘/opt/librenms/mibs’ ‘udp:some_hostname:161’ ‘lldpRemTable’” exceeded the timeout of 1200 seconds.
#0 /opt/librenms/vendor/symfony/process/Process.php(417): Symfony\Component\Process\Process->checkTimeout()
#1 /opt/librenms/vendor/symfony/process/Process.php(238): Symfony\Component\Process\Process->wait()
#2 /opt/librenms/includes/common.php(119): Symfony\Component\Process\Process->run()
#3 /opt/librenms/includes/snmp.inc.php(645): external_exec(Array)
#4 /opt/librenms/includes/discovery/discovery-protocols.inc.php(196): snmpwalk_group(Array, ‘lldpRemTable’, ‘LLDP-MIB’, 3)
#5 /opt/librenms/includes/discovery/functions.inc.php(162): include(’/opt/librenms/i…’)
#6 /opt/librenms/discovery.php(121): discover_device(Array, false)
#7 {main}

So after this poller on hosts that do discovery of this host stop working (I’m using service, not cron).

I admit, this host is indeed not responding properly although this shouldn’t affect poller running. Also, it seems that watchdog isn’t working. If you need any more info I’ll be happy to provide it.

Are you sure it isn’t working? Looks like it might just be waiting for 20 minutes at a time (1200s) monopolizing workers.

You could set the timeout lower (don’t set it too low) and see.

Do you mean that dicovery timeout can hang polling for 20 minutes? Didn’t think that could be the case (it shouldn’t I guess)

/usr/bin/snmpbulkwalk’ ‘-v2c’ ‘-c’ ‘some_community’ ‘-OQUsetX’ ‘-m’ ‘LLDP-MIB’ ‘-M’ ‘/opt/librenms/mibs’ ‘udp:some_hostname:161’ ‘lldpRemTable’” exceeded the timeout of 1200 seconds.

So, it seems each snmp command can block for 20 minutes.

Each work only waits for one discovery.php process at a time. It cannot do any others until it is complete (successfully or unsuccessfully). How many discovery workers do you have configured?

4 I guess

$config[‘service_discovery_workers’] = 4; # Processes spawned for discovery

which option is discovery timeout? service_discovery_timeout? :slight_smile: