librenms@librenms-l9cv:~$ ./validate.php
====================================
Component | Version
--------- | -------
LibreNMS | 1.58.1-14-gb1e56c77e
DB Schema | 2019_04_22_220000_update_route_table (147)
PHP | 7.2.24-0ubuntu0.18.04.1
MySQL | 5.7.14-google-log
RRDTool | 1.7.0
SNMP | NET-SNMP 5.7.3
====================================
[OK] Composer Version: 1.9.1
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
We’re experiencing an odd issue with a Cisco switch, specifically a Catalyst 9500 (C9500-40X) running
Cisco IOS-XE CAT9K_IOSXE 16.9.3 (Fuji).
Specifically -
./discovery.php -d -h <host> -m sensors
returns at least 2 oids similar to the following
Discover sensor: .1.3.6.1.4.1.9.9.91.1.1.1.1.4.2114, 2114, cisco-entity-sensor, - Te2/0/39 Receive Power , snmp, 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
and continues appending approximately 16 gigabytes of 0’s on the same line in the logs. So the resulting log file has multiple oids with multiple gigabytes of 0’s on the same line and is approximately 160gigs in size. Lines that long will completely break most unix tools (i.e. grep, sed, less…) with absolutely fascinating error messages which makes troubleshooting rather difficult. Searches for related issues have not found anything, though the issue described here is similar Problem with last update of Librenms With the exception of the specific oids with the bad data the rest of the health sensors are reporting data just fine.
From a purely analytical standpoint the possible culprits here are
- bad hardware
- bad firmware
- bad mib
- snmp bug
- librenms bug
My suspicion is one the first 3 but I’m curious if anyone else has experienced anything similar.