I am currently migrating my Solarwinds environment to LibreNMS. I have imported all of my nodes and now I am having problems with erroneous device downs. I believe this has to do with the ports polling timing out before a specific threshold.
I have a pair of VSS Cisco 4500x chassis, a stack of 5x 3750s, and a Brocade VDX 8770 chassis, which all have a lot of ports to poll via SNMP.
Solarwinds can poll these devices just fine, but these three devices give me alerting on a Device Down alert.
The alert is currently configured as follows:
macros.device_down = Yes devices.status_reason = 'icmp'
I originally investigated the issue without the icmp condition, but the problem appeared to be related to SNMP timeouts, so I read the forum and attempted to tailor the alert to ICMP outages only, just to see.
ICMP connectivity is certainly not down. These devices are reliable for their sites and for my other SNMP poller (Solarwinds) I am migrating from.
I read the optimization guide.
- I am using the official Docker image. (https://github.com/librenms/docker). This image uses RRD, MariaDB, and Docker Compose.
- I updated today with the same issue.
- I originally thought the limited CPU of these devices couldn’t handle getting polled by both Solarwinds and LibreNMS simultaniously, so I removed them from Solarwinds with the same result.
- I tried to run poller.php against one of the devices in question. Below are the results:
/opt/librenms/poller.php core.3750 2019-01-02 12:14:13 - 1 devices polled in 596.2 secs SNMP [52/588.49s]: Get[22/240.93s] Getnext[0/0.00s] Walk[30/347.56s] MySQL [1164/4.71s]: Cell[27/0.05s] Row[-26/-0.05s] Rows[58/0.24s] Column[2/0.00s] Update[813/2.96s] Insert[284/1.48s] Delete[6/0.02s] RRD [334/0.27s]: Update[0/0.00s] Create [0/0.00s] Other[334/0.27s]
- My performance polling history shows that the longest module to poll is “ports” at approxmiately 1000 ms.
- I need some clarification as to what Mac SNMP Max Repeaters actually does. When following the official guide, I used the following syntax (https://docs.librenms.org/Support/Performance/)
time snmpbulkwalk -v2c -cpublic HOSTNAME -Cr<REPEATERS> -M /opt/librenms/mibs -m IF-MIB IfEntry
However, when I attempted to replace ‘public’ and HOSTNAME , with the appropriate information I got the following error:
-Cr: Unknown Object Identifier (Sub-id not found: (top) -> -Cr)
- Below is the output of my validate.php command:
bash-4.4# /opt/librenms/validate.php ==================================== Component | Version --------- | ------- LibreNMS | 1.47 DB Schema | 275 PHP | 7.2.13 MySQL | 10.2.20-MariaDB-1:10.2.20+maria~bionic RRDTool | 1.7.0 SNMP | NET-SNMP 5.7.3 ==================================== [OK] Composer Version: 1.8.0 [OK] Dependencies up-to-date. [OK] Database connection successful [OK] Database schema correct [WARN] IPv6 is disabled on your server, you will not be able to add IPv6 devices. [WARN] Your install is over 24 hours out of date, last update: Sun, 30 Dec 2018 14:29:16 +0000 [FIX]: Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors. [WARN] Your local git branch is not master, this will prevent automatic updates. [FIX]: You can switch back to master with git checkout master [FAIL] We have found some files that are owned by a different user than librenms, this will stop you updating automatically and / or rrd files being updated causing graphs to fail. [FIX]: sudo chown -R librenms:librenms /opt/librenms sudo setfacl -d -m g::rwx /data/rrd /data/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/ sudo chmod -R ug=rwX /data/rrd /data/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/ Files: /opt/librenms/cache/os_defs.cache
Thanks in advance. Any help is appreciated.