hi @jihaddaouk, this server has many devices, something like 200. I imagined when starting the job again it might take a while to catch up but then settle, it doesn’t seem like this is the case.
I can add devices, but would appreciate any thoughts on why this is now choppy. It was not choppy before I resized the HDD.
I also wonder why the rrdcached docker container isn’t working, that should replace the need for a cron job from what I’m seeing.
hey @jihaddaouk, thank you again, here’s what I am looking at for rrdcached:
user@srv1:~$ docker service ls
Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
user@srv1:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cfd55543a8e1 adolfintel/speedtest "docker-php-entrypoi…" 15 months ago Up 2 days 0.0.0.0:80->80/tcp, :::80->80/tcp librespeed
db210a218757 librenms/librenms:latest "/init" 16 months ago Up 21 hours 514/tcp, 514/udp, 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp nms_librenms
0b11c6a03748 balabit/syslog-ng "/usr/sbin/syslog-ng…" 2 years ago Up 4 weeks (healthy) 601/tcp, 6514/tcp, 0.0.0.0:50514->514/udp, :::50514->514/udp nms_syslog
455bff71219e grafana/grafana:7.3.3 "/run.sh" 2 years ago Up 4 weeks 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp nms_grafana
6452321d9a41 telegraf "/entrypoint.sh tele…" 2 years ago Up 4 weeks 8092/udp, 8125/udp, 8094/tcp nms_telegraf
9316ec8448be influxdb "/entrypoint.sh infl…" 2 years ago Up 4 weeks 8086/tcp nms_influxdb
36a8765bd68d oxidized/oxidized:latest "/sbin/my_init" 2 years ago Up 13 days 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp nms_oxidized
59f7d62c9ab9 librenms/librenms:latest "/init" 2 years ago Up 4 weeks 514/tcp, 8000/tcp, 514/udp nms_dispatcher
68b78f335430 andyshinn/dnsmasq:2.75 "dnsmasq -k" 2 years ago Up 7 days 53/tcp, 53/udp nms_dns
b956a4d03546 mariadb:10.2 "docker-entrypoint.s…" 2 years ago Up 4 weeks 3306/tcp nms_db
3e4915e76a01 redis:5.0-alpine "docker-entrypoint.s…" 2 years ago Up 4 weeks 6379/tcp nms_redis
c6cf6f102294 memcached:alpine "docker-entrypoint.s…" 2 years ago Up 4 weeks 11211/tcp nms_memcached
3e1bf3250c15 crazymax/rrdcached "/init" 2 years ago Up 39 hours (healthy) 42217/tcp nms_rrdcached
user@srv1:~$ systemctl status rrdcached.service
Unit rrdcached.service could not be found.
user@srv1:~$ docker exec -it nms_rrdcached bash
bash-5.0# rrdcached start
rrdcached: can't create pid file '/usr/var/run/rrdcached.pid' (File exists)
FATAL: Another rrdcached daemon is running?? (pid 89176)
rrdcached: daemonize failed, exiting.
bash-5.0# systemctl status rrdcached
bash: systemctl: command not found
bash-5.0# exit
service rrdcached status
or
ps -ef | grep rrdcached
And since you don’t have systemd installed in you container image, you must be able to use service rrdcached restart|stop|start.
I’m somewhat there now, here’s the fix I undertook to recover this. I still cannot understand the relationship between rrdcached and librenms when using containers, I still cannot understand why rrdcached didn’t kick in or why graphs start to look choppy when the environment hasn’t changed other than HDD size. But alas I’ve learnt a lot.
So running the cron job as listed by jihaddaouk is useful. Although should not be necessary with the containers, as observed in my other working servers. Use crontab -l to check when in the container. This does seem to help get the processes running. Command once in the container is crontab librenms.cron (without the root user in the file, that was removed per jihaddaouks advice earlier in the thread, when running librenms.nonroot.cron I did not see the same results so have chosen not to use this cron).
Now, something else, I was clicking around the GUI and checked the pollers, compared to my working servers I noticed a lot of 0s in the workers seconds column. To fix this I increased the number of workers. To do this from your LibreNMS GUI:
Hover the cog in the top right
Select Poller > Poller (observe your consumed workers in seconds)
Select Settings in the top headings of the page
Expand to Advanced (top right)
Grow your workers for Pollers, Discovery and Workers.
I turned off Billing as not required for my deployment (although this still seems to run…)
With the con job running and increased workers my graphs did seem to populate.
I still seem to have an issue that graphs stop but mostly it is better, and I think when it stops I can restart the container and if needed start the cron job, although starting the cron job is the last resort.
I might have to move away from Libre as this was too tough to troubleshoot, but a good exercise none the less.
I believe rrdcached was used to enhance the performance whenever you have a huge number of devices. I have read some where in this community Librenms can monitor 20k devices.