No graphs or rrd files for Docker containers

Hi

I have problems getting graphs on Docker ressources printed.
When I click Apps -> Docker under the host I get ‘Error drawing graph’ on all graphs on that page.
As per the documentation I have installed the script which polls docker stats via snmp and added the snmp user to the docker group (and verified that it can run under that user). Host OS is Debian.
When I debug graphs I get:
‘Warning: Invalid argument supplied for foreach() in /opt/librenms/includes/html/graphs/generic_multi_line_exact_numbers.inc.php on line 28’
That makes perfect sense because when I look at the rrd directory there’s no app-docker*.rrd files.
On the host I want to poll I’ve been doing a little digging too. I tried adding the docker-stats.sh to /etc/sudoers so that the Debian-snmp user could run that (even though running the script as the Debian-snmp user worked fine) just to rule that out. I had also read that rrdcache could influence some graphs being drawn. But rrdcache is not installed (also, with no rrd files that can’t be the issue either).
The eventlog on LibreNMS has no entries for that host that can provide me with any help; neither has the daemon log on the polled server.

So I am fresh out of ideas here.

To reproduce:

  1. Install Debian on random box
  2. Install Docker
  3. Install LibreNMS using docker-compose (as described on https://github.com/librenms/docker/tree/master/examples/compose)
  4. Setup snmpd and install docker-helper script on remote Linux server with Docker (Debian as well) (as described in LibreNMS docs). Add Debian-snmp user to docker group.

I have run validate.php as requested. LibreNMS itself runs in Docker too. But I don’t see that have any impact.


/opt/librenms $ ./validate.php 
====================================
Component | Version
--------- | -------
LibreNMS  | 21.3.0
DB Schema | 2021_03_17_160729_service_templates_cleanup (201)
PHP       | 7.4.15
Python    | 3.8.8
MySQL     | 10.4.18-MariaDB-1:10.4.18+maria~focal
RRDTool   | 1.7.2
SNMP      | NET-SNMP 5.9
====================================

[OK]    Installed from the official Docker image; no Composer required
[OK]    Database connection successful
[OK]    Database schema correct
[WARN]  IPv6 is disabled on your server, you will not be able to add IPv6 devices.
[WARN]  Global lnms shortcut not installed. lnms command must be run with full path
        [FIX]: 
        sudo ln -s /opt/librenms/lnms /usr/bin/lnms
[WARN]  Log rotation not enabled, could cause disk space issues
        [FIX]: 
        sudo cp /opt/librenms/misc/librenms.logrotate /etc/logrotate.d/librenms
[WARN]  Updates are managed through the official Docker image

Thanks for any help you can provide

/klaus

You may have ended up here: Docker stats application not drawing graphs

Assuming you get correct results when running:

sudo -u Debian-snmp /etc/snmp/docker-stats.sh

Did you add the /usr/bin/sudo call to the extend line in snmp config and restart it?

extend docker /usr/bin/sudo /etc/snmp/docker-stats.sh

Thanks for your reply.

Yes, I saw that one and tried it before posting. Unfortunately it made no difference as to my original configuration (putting Debian-snmp in docker group). It seems to work running the script manually. But that’s about it.

/klaus

I re-read the documentation for docker-stats on https://docs.librenms.org/Extensions/Applications/#docker-stats and realized that jq had to be installed on the docker images. It wasn’t installed in the stock docker images so I edited the Dockerfile and rebuild the container locally. I checked that jq is in fact available in the containers. It is. Yet there are still no graphs (or rrd files) available. LibreNMS log still tells me nothing relevant, unfortunately.

I’ll look into how I can make the log more verbose to try and find out where the problem is. In the meantime any suggestions are more than welcome.

Thanks

/klaus

You’ve restarted snmpd also after adding the extend line? I only ask because I forget at that myself :slight_smile:

Try an SNMP capture on the device page, 3 dots top right -> Capture:

image

Search for docker-stats, then a bit further down, do you see the metrics?

Hi again

Yep, I did restart snmpd :slight_smile:

Thanks for the debugging tips. Not sure whether it makes me less confused. I tried running the snmp capture a few times. And the command seem to run, no errors - but no result either.

I have another server, also a debian linux, docker-stats etc configured the same way. Here it also fails to run but with a different problem:
.1.3.6.1.4.1.8072.1.3.2.3.1.1.6.100.111.99.107.101.114 = STRING: /etc/snmp/docker-stats.sh: No such file or directory
The file /etc/snmd/docker-stats.sh exist and is executable (I can execute it manually using su -u Debian-snmp. So I have no idea why this error exists either :confused:

Do you have any idea of the root cause to either error?

Thanks!

/k

Can you output what you get from this?

[email protected]:~$ ls -l /etc/snmp/
total 28
-rwxr-xr-x 1 root root  795 Apr 28 16:22 docker-stats.sh
-rwxr-xr-x 1 root root  850 Apr 28 16:20 nginx
-rwxr-xr-x 1 root root 3032 Apr 28 16:25 osupdate
-rwxr-xr-x 1 root root 4135 Apr 29 11:33 phpfpmsp
-rw-r--r-- 1 root root  211 Jul 11  2019 snmp.conf
-rw------- 1 root root  926 Apr 29 10:40 snmpd.conf
[email protected]:~$ sudo tail /etc/snmp/snmpd.conf
...
extend nginx /etc/snmp/nginx
extend docker /usr/bin/sudo /etc/snmp/docker-stats.sh
extend osupdate /etc/snmp/osupdate
extend phpfpmsp /etc/snmp/phpfpmsp
...
[email protected]:~$ sudo -u Debian-snmp /etc/snmp/docker-stats.sh
{"version":"1","data":[{"container":"unimus_unimus_1","pids":93,"memory":{"used":"719.7MiB","limit":"15.45GiB","perc":"4.55%"},"cpu":"0.26%"},{"container":"unimus_db_1","pids":8,"memory":{"used":"60.77MiB","limit":"15.45GiB","perc":"0.38%"},"cpu":"0.01%"},{"container":"portainer","pids":19,"memory":{"used":"9.738MiB","limit":"15.45GiB","perc":"0.06%"},"cpu":"0.00%"},{"container":"root_oxidized_1","pids":20,"memory":{"used":"53MiB","limit":"15.45GiB","perc":"0.33%"},"cpu":"0.01%"}],"error":"0","errorString":""}

Do you get any useful errors if you:

grep snmpd /var/log/syslog

I assume you mean that this should be done on the host that needs to be polled by LibreNMS. So here goes:

➜   ls -l /etc/snmp
total 52
-rw-r--r-- 1 root root   148 Apr 30 00:45 certificate.json
-rwxr-xr-x 1 root root  2725 Apr  7 08:50 certificate.py
-rwxr-xr-x 1 root root   795 Apr  7 08:45 docker-stats.sh
-rwxr-xr-x 1 root root  3521 Apr  7 08:51 mdadm
-rwxr-xr-x 1 root root  1161 Apr  7 08:53 nvidia
-rwxr-xr-x 1 root root  3032 Apr  7 08:54 osupdate
-rwxr-xr-x 1 root root 12260 Apr 30 00:58 smart
-rw-r--r-- 1 root root   299 Apr 30 01:00 smart.config
-rw-r--r-- 1 root root   211 Jan 28 10:49 snmp.conf
-rw------- 1 root root  7324 Apr 30 00:43 snmpd.conf

➜   sudo tail /etc/snmp/snmpd.conf
....
extend docker /usr/bin/sudo /etc/snmp/docker-stats.sh
extend certificate /etc/snmp/certificate.py
extend mdadm /etc/snmp/mdadm
extend nvidia /etc/snmp/nvidia
extend osupdate /etc/snmp/osupdate
extend smart /usr/bin/sudo /etc/snmp/smart
....

➜   sudo -u Debian-snmp /etc/snmp/docker-stats.sh

{"version":"1","data":[{"container":"myadmin","pids":11,"memory":{"used":"27.77MiB","limit":"31.24GiB","perc":"0.09%"},"cpu":"0.00%"},{"container":"netdata","pids":58,"memory":{"used":"605.4MiB","limit":"31.24GiB","perc":"1.89%"},"cpu":"17.88%"},{"container":"socket-proxy","pids":2,"memory":{"used":"4.387MiB","limit":"31.24GiB","perc":"0.01%"},"cpu":"0.01%"},{"container":"tautulli","pids":30,"memory":{"used":"75.8MiB","limit":"31.24GiB","perc":"0.24%"},"cpu":"0.07%"},{"container":"qbittorrent","pids":13,"memory":{"used":"20.54MiB","limit":"31.24GiB","perc":"0.06%"},"cpu":"0.02%"},{"container":"elastic_kibana_1","pids":12,"memory":{"used":"311.4MiB","limit":"31.24GiB","perc":"0.97%"},"cpu":"0.17%"},{"container":"elastic_logstash_1","pids":145,"memory":{"used":"3.109GiB","limit":"31.24GiB","perc":"9.95%"},"cpu":"107.54%"},{"container":"elastic_filebeat_1","pids":16,"memory":{"used":"115.3MiB","limit":"31.24GiB","perc":"0.36%"},"cpu":"0.42%"},{"container":"elastic_journalbeat_1","pids":16,"memory":{"used":"56.35MiB","limit":"31.24GiB","perc":"0.18%"},"cpu":"0.34%"},{"container":"elastic_metricbeat_1","pids":16,"memory":{"used":"163MiB","limit":"31.24GiB","perc":"0.51%"},"cpu":"10.60%"},{"container":"elastic_auditbeat_1","pids":0,"memory":{"used":"0B","limit":"0B","perc":"0.00%"},"cpu":"0.00%"},{"container":"elastic_elastic-agent_1","pids":74,"memory":{"used":"210.8MiB","limit":"31.24GiB","perc":"0.66%"},"cpu":"0.71%"},{"container":"portainer","pids":18,"memory":{"used":"26.36MiB","limit":"31.24GiB","perc":"0.08%"},"cpu":"0.01%"},{"container":"elastic_praeco_1","pids":4,"memory":{"used":"19.09MiB","limit":"31.24GiB","perc":"0.06%"},"cpu":"0.00%"},{"container":"elastic_elastalert_1","pids":35,"memory":{"used":"124.4MiB","limit":"31.24GiB","perc":"0.39%"},"cpu":"0.00%"},{"container":"elastic_cerebro_1","pids":32,"memory":{"used":"482.5MiB","limit":"31.24GiB","perc":"1.51%"},"cpu":"0.41%"},{"container":"elastic_elasticsearch_1","pids":141,"memory":{"used":"2.784GiB","limit":"31.24GiB","perc":"8.91%"},"cpu":"281.59%"},{"container":"geoipupdate","pids":2,"memory":{"used":"3.215MiB","limit":"31.24GiB","perc":"0.01%"},"cpu":"0.00%"},{"container":"nzbhydra","pids":60,"memory":{"used":"330.5MiB","limit":"31.24GiB","perc":"1.03%"},"cpu":"0.15%"},{"container":"sonarr","pids":19,"memory":{"used":"579.4MiB","limit":"31.24GiB","perc":"1.81%"},"cpu":"0.12%"},{"container":"influxdb","pids":14,"memory":{"used":"49.53MiB","limit":"31.24GiB","perc":"0.15%"},"cpu":"0.01%"},{"container":"lidarr","pids":19,"memory":{"used":"226.5MiB","limit":"31.24GiB","perc":"0.71%"},"cpu":"1.24%"},{"container":"grafana","pids":14,"memory":{"used":"47.46MiB","limit":"31.24GiB","perc":"0.15%"},"cpu":"0.03%"},{"container":"bazarr","pids":35,"memory":{"used":"984.2MiB","limit":"31.24GiB","perc":"3.08%"},"cpu":"0.01%"},{"container":"ombi","pids":16,"memory":{"used":"44.15MiB","limit":"31.24GiB","perc":"0.14%"},"cpu":"0.03%"},{"container":"muximux","pids":14,"memory":{"used":"19.22MiB","limit":"31.24GiB","perc":"0.06%"},"cpu":"0.00%"},{"container":"radarr","pids":20,"memory":{"used":"167.2MiB","limit":"31.24GiB","perc":"0.52%"},"cpu":"1.21%"},{"container":"plex","pids":46,"memory":{"used":"4.534GiB","limit":"31.24GiB","perc":"14.51%"},"cpu":"1.12%"},{"container":"tdarr","pids":81,"memory":{"used":"705.2MiB","limit":"31.24GiB","perc":"2.20%"},"cpu":"1.26%"},{"container":"varken","pids":0,"memory":{"used":"0B","limit":"0B","perc":"0.00%"},"cpu":"0.00%"},{"container":"sabnzbdvpn","pids":42,"memory":{"used":"268MiB","limit":"31.24GiB","perc":"0.84%"},"cpu":"1.83%"}],"error":"0","errorString":""}

➜   sudo grep snmp /var/log/syslog
Apr 30 08:42:54 debiantower snmpd[28852]: Cannot statfs /run/docker/netns/fbae9d32f1d0: Permission denied
Apr 30 08:42:54 debiantower snmpd[28852]: Connection from UDP: [librenms]:51892->[debiantower]:161
Apr 30 08:45:01 debiantower CRON[30194]: (root) CMD (/etc/snmp/smart -u)
Apr 30 08:45:04 debiantower snmpd[28852]: error on subcontainer 'ia_addr' insert (-1)

Not really. Basically there’s three different messages. I googled the first and that is literally just spam in the log.
I have no idea what this last message means but I see it a lot on both hosts that I want to probe by LibreNMS. I googled it and it seems to be unrelated to LibreNMS.

Any other suggestions? :slight_smile:

/k

Got me stumped too I’m afraid. I had issues getting it going, but it was sudo and putting the Debian-snmp user in the docker group which got it working for me.

My LibreNMS server is the same host I’m polling for docker stats, but I’m not running LibreNMS in docker.

====================================
Component | Version
--------- | -------
LibreNMS  | 21.4.0-48-g193a102b4
DB Schema | 2021_04_08_151101_add_foreign_keys_to_port_group_port_table (208)
PHP       | 7.3.27-9+ubuntu18.04.1+deb.sury.org+1
Python    | 3.6.9
MySQL     | 10.5.9-MariaDB-1:10.5.9+maria~bionic
RRDTool   | 1.7.0
SNMP      | NET-SNMP 5.7.3
====================================

[OK]    Composer Version: 2.0.13
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database schema correct

Yeah me too. For the record I am starting to suspect it has something to do with Docker. Also I have problems getting the other apps working on the same host which corresponds fine with a thesis claiming that snmp traffic ishaving some problems getting through to LibreNMS via macvlan networking on the Docker host.

Is it possible to get the docker-stats.sh working via check_mk? Or would it make sense to move as much as possible away from snmp since there seems to be a limit as to how much data can be send reliably, you think?

I did notice the absense of a docker-stats.dh script in the directory of scripts working with check_mk but that doesn’t mean that it can’t be done or that it’s particularly hard. Maybe there’s even a nagios plugin that can do it. I’ll check it out.

Anyway, thanks for your help :slight_smile:

/k

Aha - now we’re starting to get somewhere. I noticed this error in the snmp output:

.1.3.6.1.4.1.8072.1.3.2.3.1.2.6.100.111.99.107.101.114 = STRING: jq: Bad JSON in --slurpfile stats /dev/fd/63: Invalid numeric literal at line 1, column 8
.1.3.6.1.4.1.8072.1.3.2.3.1.3.6.100.111.99.107.101.114 = INTEGER: 1
.1.3.6.1.4.1.8072.1.3.2.3.1.4.6.100.111.99.107.101.114 = INTEGER: 2
.1.3.6.1.4.1.8072.1.3.2.4.1.2.6.100.111.99.107.101.114.1 = STRING: jq: Bad JSON in --slurpfile stats /dev/fd/63: Invalid numeric literal at line 1, column 8

OIDs are the same as the ones you get the missing Docker stats from in your screenshot. So something goes wrong. I don’t understand what or how to fix it, though. When I run the docker-stats script manually, the output is similar to what you get in the snmp data.

Any idea?

/k

So to conclude something here: For some reason the docker-stats.sh script runs fine manually but when executed by snmpd (same user - Debian-snmp) it fails with the jq error I pasted above. I assume the error is generated in the output sent to LibreNMS (and not by LibreNMS itself).

I have no idea if it makes a difference but this is the version of jq I have installed:
ii jq 1.5+dfsg-2+b1 amd64 lightweight and flexible command-line JSON processor

Can you dechifer the error message? I don’t know if /dev/fd/63 refers to a device. If it does, it doesn’t exist on the host. And what line and column is referred to? Of what?

/k

AAAlrighty then! Playing back and forth with options I now have something that almost works.
As it turns out I have a number of snmp extensions enabled. And occasionally that would also throw in a timeout. I have no idea what’s going on with the jq error. For some reason that went away by removing sudo from that line. That worked - if I disabled all others. Then I tried enabling them again one by one and ended up with this:

extend docker /etc/snmp/docker-stats.sh
extend certificate /etc/snmp/certificate.py
extend mdadm /etc/snmp/mdadm
#extend nvidia /etc/snmp/nvidia
extend osupdate /etc/snmp/osupdate
extend smart /usr/bin/sudo /etc/snmp/smart

If I enable nvidia then it times out. No idea why. Does anyone have any suggestions to what else I can do? Can I fiddle with multithreading in net-snmp, for instance? Set up a timeout?

Thanks

/k

This is the final update for this. It worked for all extends etc when I set the timeout for the snmp probe up to 20s like this:


I hope this will benefit others at some point. Thanks for an awesome NMS!

/k

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.