Hi, I made a bug on github about an issue I’ve been having with LibreNMS, but it was closed as a support issue so I’m hoping someone here can point me in the right direction.
I am monitoring Docker containers on Truenas Scale using the official Docker agent.
The problem I am having is that all of my CPU-heavy containers render with gaps when the CPU usage is too high. Here is an example CPU graph:
The graph is visible in places, but on others it is a series of short lines or even completely blank for large stretches of time. The host has 40 cores, 80 with hyperthreading so I don’t believe this is a lack of CPU power on the host. I don’t see any signs of a timeout in the logs or the poller timing graphs. Polling always completes before the start of the next poll cycle.
As evidence that polling is working fine, here is a memory graph over the same time period:
There are no issues here. All graphs other than CPU similarly render with no gaps. To me, this is strong evidence that SNMP is not timing out and the docker agent script is running on the host successfully.
I did some troubleshooting on my own and it looks like the docker agent can report CPU usage above 100%. For the docker container above, I allocated 8 CPUs and docker-stats.py
reports "cpu": "767.23%"
. That lines up pretty well with how Linux’s top
utility works; I’m not sure if this is a side effect of the platform (Linux) of the host or if the Docker command that docker-stats.py
invokes deliberately does this.
My hypothesis here is that LibreNMS expects the CPU graph to always be between 0 and 100%, but doesn’t account for this behavior and so it discards everything out of that range. It matches what I tend to see, where the graph renders fine when I know the container was mostly idle, but has gaps and disappears when I know it was being used heavily. My guess here is that removing the 100% limit should remove the gaps.
On the Github bug, I was recommended to modify the RRD file, but the syntax is very complicated. Would someone be able to help me with this issue? I don’t like the idea of having to recreate the RRD file every time I add a new docker container, but it would at least allow me to see the graphs in full.
Steps to reproduce:
- Set up the docker agent on the host, enable Docker polling in LibreNMS
- Install a docker container that uses a lot of CPU, and assign at least 2 cores. The more CPU cores, the easier it is to see the effect. (whisper-asr is great, so is Yacy that I’m using here)
- Run the container, ensure LibreNMS is polling Docker, and ensure the container is using a lot of CPU power
- Go to the Docker app in LibreNms and observe the CPU graph
I’m using Chrome Version 138.0.7204.97 (Official Build) (64-bit)
./validate.php
output:
===========================================
Component | Version
--------- | -------
LibreNMS | 25.6.0-142-g4ae643f0c (2025-07-09T14:09:29-04:00)
DB Schema | 2025_07_08_111910_change_stp_bridge_max_age_size (351)
PHP | 8.3.21
Python | 3.12.9
Database | MariaDB 11.8.2-MariaDB-ubu2404
RRDTool | 1.8.0
SNMP | 5.9.4.pre2
===========================================
[OK] Composer Version: 2.8.9
[OK] Dependencies up-to-date.
[OK] Database Connected
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQL and PHP time match
[OK] Active pollers found
[OK] Dispatcher Service not detected
[OK] Locks are functional
[OK] Python poller wrapper is polling
[OK] Redis is unavailable
[OK] rrdtool version ok
[OK] Connected to rrdcached
Thanks in advance!