IBM Blade Center SNMP lockout

Hey Everyone,

This one is perplexing!

After a few days my IBM blade center AMM SNMP will lockout after an intermittent amount of time usually 48 to 72 hours or even as long as a week. What happens is the SNMP will lockout and an SNMP walk of the device shows the port closed like a firewall has closed it. It looks like an IBM device issue however before I started using Librenms which i might add is quite Awesome, I was using Cacti to monitor. This did not happen with Cacti, so i can only guess it might be with the way Librenms is polling this device and IBM doesn’t like to play after a while being poked so hard.

The AMM SNMP settings are really straight forward, not complicated. There are no thresholds for SNMP queries. You have SNMP v1 and v3. No configurable firewall settings to speak of, and a reset of the setting like disable and re enable SNMP makes it work again. Using either SNMP version has the same result.

I suspect it is a timeout issue or threshold being exceeded, i just don’t know, the IBM device logs are not geared to log that kind of information. Librenms logs only indicate unable to communicate.

Anyone else have this issue or have seen this?

IBM Advance Management Module
IBM Blade Center-S 8886AC1
Both have the latest available published firmware!

Updating to latest codebase OK
Updating Composer packages OK
Updating SQL-Schema OK
Updating submodules OK
Cleaning up DB OK
Fetching notifications OK
Caching PeeringDB data OK

====================================

Component Version
LibreNMS 1.39-100-g02348be
DB Schema 251
PHP 7.0.30-0ubuntu0.16.04.1
MySQL 10.0.34-MariaDB-0ubuntu0.16.04.1
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.6.5
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct

You may need to adjust the snmp time out. https://docs.librenms.org/#Support/Configuration/#snmp-settings

I am still having the issue. I have increased the timeout to 4 seconds and have removed the device from smoke-ping. I believe it is getting hammered and there is some loose documentation regarding anti-flooding measures that this device has built in.

Not really sure how to exclude a device from smoke-ping other than removing the device from being listed in the config file “/opt/smokeping/etc/librenms.conf” which is defeated by the cron script! I disabled the icmp checks for this device.

I will let you know how these changes helped. Am I the only one with this problem?

Did not make a lick of difference, the IBM AMM keeps locking up after 36 hours approximately.