Hi LibreNMS users.
My Issue
I have an issue with gaps in my data, for most of my VMs. I have read and researched this thoroughl and Googled, and read through threads here, but with no luck.
I get missing data, and when I use realtime on <15s I see huge spikes, and negative values too.
One thing that is interesting, is that no port speed is detected for these interfaces as they are VMs. ethtool
shows no port speed, which is expected, so it may be that RRDTune won’t properly work because no port speed is reported
Example of gaps in data for VMs
There are no gaps whatsoever in a Juniper device, for example. (Can only post 1 image as a new user…)
My Goal
My aim is to have a very, very simple traffic bill which aggregates several VMs eth0 ports.
All I need is to accurately poll eth0 on a set of Linux VMs, running Debian.
LibreNMS Debug
====================================
Component | Version
--------- | -------
LibreNMS | 1.60-58-g8a2ce01dc
DB Schema | 2020_02_10_223323_create_alert_location_map_table (159)
PHP | 7.2.19-0ubuntu0.18.04.1
MySQL | 10.1.40-MariaDB-0ubuntu0.18.04.1
RRDTool | 1.7.0
SNMP | NET-SNMP 5.7.3
====================================
[OK] Composer Version: 1.9.3
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[WARN] Your install is over 24 hours out of date, last update: Mon, 24 Feb 2020 08:49:42 +0000
[FIX]:
Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors.
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
bootstrap/cache/.gitignore
html/js/lang/de.js
html/js/lang/en.js
html/js/lang/fr.js
html/js/lang/ru.js
html/js/lang/uk.js
html/js/lang/zh-TW.js
includes/definitions/linux.yaml
logs/.gitignore
rrd/.gitignore
storage/app/.gitignore
storage/app/public/.gitignore
storage/debugbar/.gitignore
storage/framework/cache/.gitignore
storage/framework/cache/data/.gitignore
and 4 more...
The error above is due to some changes I made to the linux.yaml definition, to remove the graphs at the top of the device and only show the device_bits
graph.
What I’ve Done To Try and Fix The Data Issue
- Installed rrdcached (it’s working)
- Changed from 5 to 1 minute polling, used rrdstep
- Tried using tune_port, enabling RRD Tune Globally and on each port
- Checked logs and health and available resource to my LibreNMS server
I have also turned off most discovery and polling modules for Linux servers, and even tuned my SNMPd config on the VMs, to only expose specific MIBs, to keep polling time down (~1.5s per host) - my total poller time is around 20 seconds.
From snmpd.conf
view libre-mibs included .1.3.6.1.2.1.2
view libre-mibs included .1.3.6.1.2.1.1
More Info
I have tried testing polling manually, and this is where it gets very strange, I see results. for
eth0bps
or eth0negative
on different polling runs, also the values fluctuate massively. The traffic on the VM is steady, as shown below.
I gathered this data using:
while true ; do date ; ./poller.php -h 28 |grep eth0 ; sleep 10 ; done
The results:
Wed 26 Feb 16:55:52 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0bps(46.23 Mbps/249.49 Mbps)bytes(242.5 MB/1.28 GB)pkts(78.5 kpps/97.45 kpps)
Wed 26 Feb 16:56:03 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0negative ifOutOctetsbps(85.8 Mbps/0 bps)bytes(122.74 MB/-2.37 MB)pkts(82.6 kpps/102.34 kpps)
Wed 26 Feb 16:56:15 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0bps(35.01 Mbps/2.87 Gbps)bytes(12.52 MB/1 GB)pkts(77.45 kpps/95.71 kpps)
Wed 26 Feb 16:56:26 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0negative ifOutOctetsbps(43.73 Mbps/0 bps)bytes(62.55 MB/-904.03 MB)pkts(62.33 kpps/77.81 kpps)
Wed 26 Feb 16:56:38 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0negative ifOutOctetsbps(41.01 Mbps/0 bps)bytes(53.78 MB/-105.9 MB)pkts(91.58 kpps/114.34 kpps)
Wed 26 Feb 16:56:49 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0negative ifOutOctetsbps(41.42 Mbps/0 bps)bytes(54.32 MB/-238.28 MB)pkts(86.55 kpps/103.01 kpps)
Wed 26 Feb 16:57:00 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0negative ifOutOctetsbps(143.01 Mbps/0 bps)bytes(187.53 MB/-564.53 MB)pkts(84.39 kpps/95.57 kpps)
Wed 26 Feb 16:57:11 UTC 2020
Port eth0: eth0 (2 / #713) VLAN = eth0bps(51.03 Mbps/2.04 Gbps)bytes(66.91 MB/2.61 GB)pkts(57.5 kpps/71.77 kpps)
And corresponding results from the same VM, using ifstat
Time eth0
HH:MM:SS KB/s in KB/s out
16:56:02 11642.53 346103.5
16:56:12 4582.17 341816.0
16:56:22 6712.92 360460.1
16:56:32 4597.14 339000.6
16:56:42 4516.98 332766.2
16:56:52 4904.77 320262.0
16:57:02 17968.14 304737.7
16:57:12 7309.22 308829.2
Any advice whatsoever would be appreciated.
Thanks