Cisco NCS : Global discovery is failing on some modules

Hello.

I see a very weird behaviour on all our Cisco NCS on LibreNMS

The “standard” discovery fails to discover some components like ipv4-addresses, ipv6-addresses, discovery-protocols, for example during a global discovery ->

php discovery.php -h 50
....
#### Load disco module ipv4-addresses ####


>> Runtime for discovery module 'ipv4-addresses': 12.0700 seconds with 9496 bytes
>> SNMP: [2/12.05s] MySQL: [1/0.02s] RRD: [0/0.00s]
#### Unload disco module ipv4-addresses ####


#### Load disco module ipv6-addresses ####


>> Runtime for discovery module 'ipv6-addresses': 12.0600 seconds with 2336 bytes
>> SNMP: [2/12.05s] MySQL: [1/0.01s] RRD: [0/0.00s]
#### Unload disco module ipv6-addresses ####
....

Nothing is found, however when I run the specific module explicitely :

php discovery.php -h 50 -m ipv4-addresses
LibreNMS Discovery
xxx-xx-nc5 50 iosxr

#### Load disco module core ####

>> Runtime for discovery module 'core': 0.0130 seconds with 904 bytes
>> SNMP: [2/0.05s] MySQL: [3/0.02s] RRD: [0/0.00s]
#### Unload disco module core ####


#### Load disco module ipv4-addresses ####
s+S+s+s+S+s+s+S+s+S+s+

>> Runtime for discovery module 'ipv4-addresses': 0.5740 seconds with 8056 bytes
>> SNMP: [12/0.29s] MySQL: [23/0.12s] RRD: [0/0.00s]
#### Unload disco module ipv4-addresses ####

Discovered in 1.653 seconds

And then the discovered component will be gone at the next global discovery.

Any idea of what can cause this ? I see this on 100% of our Cisco NCS

Our NCS have a lot of things monitored (200+ interfaces, approx. 2000 sensors, etc…), could this be that there are too many elements and this is causing this kind of failure ?)

Thanks

After doing some tcpdump, it look like that when doing the full discovery, the NCS stop responding to SNMP requests right before the ipv4-discovery module, but not when using only this module, maybe there is some kind of default QoS and rate limiting ? I’m going to check that

I’m trying this configuration, looks like it’s better, apparently there is aggressive throttling on the snmp process and it sometimes need almost 10s to get the SNMP response:

// Tunning
$config['snmp']['timeout'] = 10;
$config['snmp']['retries'] = 1;
$config['snmp']['max_repeaters'] = 20;
$config['os']['iosxr']['discovery_modules']['cisco-otv'] = false;
$config['os']['iosxr']['discovery_modules']['cisco-sla'] = false;
$config['os']['iosxr']['discovery_modules']['cisco-cef'] = false;
1 Like

On ASR9k, by default, the CPU is protected from SNMP polling with very restrictive limits, which prevents discovery or polling to occur. So you probably need to disable this protection on these devices as well.

The weird thing is that on ASR9k I don’t have any of those issue, just on the more recent Cisco NCS

Then it probably depends on the supervisor and the version of the software running on the 9k. That being said, I have no clue how to disable this restriction on the ASR9k nor on the NCS. So when you’ll have found it, please post it here for documentation purposes.