Poller Dies after 24 hours

The5thBeatle · 13 January 2021 09:40

I have just discovered Librenms and am in the process of setting it up. Hope someone may be able to help with some questions please.

This is set up on Docker with the latest image.
I have 289 devices in the database

Everything is running fine for around 24 hours but then I get the error message that polling has not completed in the past 5 minutes. It is like the polling service just stops! If I restart Librenms it runs again for around 24 hours and dies again.

I have been running tail -f /data/logs/librenms.log and see the polling taking place against each device but when it dies I see no more polling entries in this files

I have added the below to my config.php

Polling testing

$config[‘service_poller_workers’] = 48; # Processes spawned for polling
$config[‘service_services_workers’] = 16; # Processes spawned for service polling
$config[‘service_discovery_workers’] = 16; # Processes spawned for discovery

//Optional Settings
$config[‘service_poller_frequency’] = 300; # Seconds between polling attempts
$config[‘service_services_frequency’] = 300; # Seconds between service polling attempts
$config[‘service_discovery_frequency’] = 21600; # Seconds between discovery runs
$config[‘service_billing_frequency’] = 300; # Seconds between billing calculations
$config[‘service_billing_calculate_frequency’] = 60; # Billing interval
$config[‘service_poller_down_retry’] = 60; # Seconds between failed polling attempts
$config[‘service_loglevel’] = ‘INFO’; # Must be one of ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’
$config[‘service_update_frequency’] = 86400; # Seconds between LibreNMS update checks

Watchdog service. Restarts poller if it dies.

$config[‘service_watchdog_enabled’] = true;

From Poller Cluster Health (when it is working) I see:
Poller Workers 48
Devices Actioned 297
Worker Seconds 3436/14400

From Poller Cluster Health (when it dies) I see:
Poller Workers 48
Devices Actioned 0
Worker Seconds 0/14400

Am I correct in thinking that Librenms no longer uses CRON? I have no CRON image loaded via my docker-compose.yml ?

Could someone please suggest how to go about trouble shooting this, I am struggling at the moment.

I have added the validate file below. I see the couple of errors in the output and am also trying to fix those I have checked my PATH based on a couple of suggestions in other doc’s but doesn’t seem to sort it.

/opt/librenms # ./validate.php
Do not run validate.php as root
/opt/librenms # su librenms
/opt/librenms $ ./validate.php

Component	Version
LibreNMS	1.70.1
DB Schema	2020_11_02_164331_add_powerstate_enum_to_vminfo (191)
PHP	7.3.25
Python	3.8.5
MySQL	10.4.17-MariaDB-1:10.4.17+maria~focal
RRDTool	1.7.2
SNMP	NET-SNMP 5.8
OpenSSL

====================================

[OK] Installed from the official Docker image; no Composer required
[OK] Database connection successful
[OK] Database schema correct
[WARN] IPv6 is disabled on your server, you will not be able to add IPv6 devices.
[WARN] Global lnms shortcut not installed. lnms command must be run with full path
[FIX]:
sudo ln -s /opt/librenms/lnms /usr/local/bin/lnms
[WARN] Log rotation not enabled, could cause disk space issues
[FIX]:
sudo cp /opt/librenms/misc/librenms.logrotate /etc/logrotate.d/librenms
[WARN] Updates are managed through the official Docker image

penguin02007 · 13 January 2021 19:44

@The5thBeatle We are running 1.50.1 with the docker images and having similar problem. Did you figure out the issue?

The5thBeatle · 13 January 2021 23:32

Nothing yet I’m afraid @penguin02007 The pooler has ‘died’ again today with nothing in the logs (that I can see)

draga79 · 14 January 2021 07:34

Same problem here. I have 3 different LibreNMS docker installations and they have the same problem. After hours or days (I still can’t understand what triggers the problem), the poller dies. As a workaround I’ve set a cron job on the host, it restarts the dispatcher container every two hours so if it dies it gets restarted.

The5thBeatle · 14 January 2021 14:09

Interesting that a few people are having the same issue. I think my next step will be to try running an older image going back a few versions to see if that does anything. I am currently running ‘latest’

draga79 · 14 January 2021 16:30

Same here, running latest. I’ve started running LibreNMS docker containers more or less two months ago and I’ve always had the same problem

The5thBeatle · 15 January 2021 08:41

out of interest then are you running a CRON container in the later versions? I notice in the example docker-compose file on GIT that there is no entry for CRON. Is this because polling / discovery has been moved to a service, I notice the doc’s seem to suggest that?

draga79 · 15 January 2021 11:15

No, there’s no cron container running.

The5thBeatle · 15 January 2021 11:17

Understood, I am the same

penguin02007 · 8 February 2021 15:33

Sorry @The5thBeatle I didn’t get notification on this thread and it happens again this weekend. The poller is being managed by dispatcher.

There isn’t any errors generated from librenms-dispatcher, I am going to setup a job to get notified and restart the container automatically.

root@lnms1:~# docker logs -n 10 librenms_dispatcher
Billing(INFO):Calculating billing
Billing(INFO):Completed billing run for calculate in 0.26s
Alerting(INFO):Checking alerts
Alerting(INFO):Completed alerting run for alerts in 0.28s
Billing(INFO):Calculating billing
Billing(INFO):Completed billing run for calculate in 0.26s
Alerting(INFO):Checking alerts
Alerting(INFO):Completed alerting run for alerts in 0.28s
Billing(INFO):Calculating billing
Billing(INFO):Completed billing run for calculate in 0.26s
root@lnms1:~# docker restart librenms_dispatcher
librenms_dispatcher
root@lnms1:~# docker logs -n 10 librenms_dispatcher
Poller_0-10(INFO):Polling device 19
Poller_0-11(INFO):Polling device 18
Poller_0-13(INFO):Polling device 20
Poller_0-12(INFO):Polling device 10
Poller_0-14(INFO):Polling device 8
Poller_0-15(INFO):Polling device 9
Poller_0-16(INFO):Polling device 22
Poller_0-18(INFO):Polling device 5
Poller_0-17(INFO):Polling device 12
Poller_0-19(INFO):Polling device 21

john_b · 23 February 2021 21:52

Hi

I also see this problem, can’t find any direct error messages, restarting containers seems to help.

penguin02007 · 1 March 2021 14:20

Scheduled the following as a job should fix it -

docker logs -t -n 100 librenms_dispatcher 2>&1 | grep Poller || docker restart librenms_dispatcher

john_b · 29 March 2021 13:49

I don’t get that line to work,
the -t flag don’t exist in my installation, this is what I came up with:
/usr/bin/docker logs --tail 100 librenms_dispatcher 2>&1 | grep Poller || docker restart librenms_dispatcher

But when the container stops working there is still logg entries last in the log that containers “Poller”, so it never triggers.

system · 29 March 2023 13:50

This topic was automatically closed 730 days after the last reply. New replies are no longer allowed.

Poller Dies after 24 hours

Polling testing

Watchdog service. Restarts poller if it dies.

/opt/librenms # ./validate.php Do not run validate.php as root /opt/librenms # su librenms /opt/librenms $ ./validate.php

/opt/librenms # ./validate.php
Do not run validate.php as root
/opt/librenms # su librenms
/opt/librenms $ ./validate.php