Poller Dies after 24 hours

I have just discovered Librenms and am in the process of setting it up. Hope someone may be able to help with some questions please.

This is set up on Docker with the latest image.
I have 289 devices in the database

Everything is running fine for around 24 hours but then I get the error message that polling has not completed in the past 5 minutes. It is like the polling service just stops! If I restart Librenms it runs again for around 24 hours and dies again.

I have been running tail -f /data/logs/librenms.log and see the polling taking place against each device but when it dies I see no more polling entries in this files

I have added the below to my config.php

Polling testing

$config[‘service_poller_workers’] = 48; # Processes spawned for polling
$config[‘service_services_workers’] = 16; # Processes spawned for service polling
$config[‘service_discovery_workers’] = 16; # Processes spawned for discovery

//Optional Settings
$config[‘service_poller_frequency’] = 300; # Seconds between polling attempts
$config[‘service_services_frequency’] = 300; # Seconds between service polling attempts
$config[‘service_discovery_frequency’] = 21600; # Seconds between discovery runs
$config[‘service_billing_frequency’] = 300; # Seconds between billing calculations
$config[‘service_billing_calculate_frequency’] = 60; # Billing interval
$config[‘service_poller_down_retry’] = 60; # Seconds between failed polling attempts
$config[‘service_loglevel’] = ‘INFO’; # Must be one of ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’
$config[‘service_update_frequency’] = 86400; # Seconds between LibreNMS update checks

Watchdog service. Restarts poller if it dies.

$config[‘service_watchdog_enabled’] = true;

From Poller Cluster Health (when it is working) I see:
Poller Workers 48
Devices Actioned 297
Worker Seconds 3436/14400

From Poller Cluster Health (when it dies) I see:
Poller Workers 48
Devices Actioned 0
Worker Seconds 0/14400

Am I correct in thinking that Librenms no longer uses CRON? I have no CRON image loaded via my docker-compose.yml ?

Could someone please suggest how to go about trouble shooting this, I am struggling at the moment.

I have added the validate file below. I see the couple of errors in the output and am also trying to fix those I have checked my PATH based on a couple of suggestions in other doc’s but doesn’t seem to sort it.

/opt/librenms # ./validate.php
Do not run validate.php as root
/opt/librenms # su librenms
/opt/librenms $ ./validate.php

Component Version
LibreNMS 1.70.1
DB Schema 2020_11_02_164331_add_powerstate_enum_to_vminfo (191)
PHP 7.3.25
Python 3.8.5
MySQL 10.4.17-MariaDB-1:10.4.17+maria~focal
RRDTool 1.7.2
SNMP NET-SNMP 5.8
OpenSSL

====================================

[OK] Installed from the official Docker image; no Composer required
[OK] Database connection successful
[OK] Database schema correct
[WARN] IPv6 is disabled on your server, you will not be able to add IPv6 devices.
[WARN] Global lnms shortcut not installed. lnms command must be run with full path
[FIX]:
sudo ln -s /opt/librenms/lnms /usr/local/bin/lnms
[WARN] Log rotation not enabled, could cause disk space issues
[FIX]:
sudo cp /opt/librenms/misc/librenms.logrotate /etc/logrotate.d/librenms
[WARN] Updates are managed through the official Docker image

@The5thBeatle We are running 1.50.1 with the docker images and having similar problem. Did you figure out the issue?

Nothing yet I’m afraid @penguin02007 The pooler has ‘died’ again today with nothing in the logs (that I can see)

Same problem here. I have 3 different LibreNMS docker installations and they have the same problem. After hours or days (I still can’t understand what triggers the problem), the poller dies. As a workaround I’ve set a cron job on the host, it restarts the dispatcher container every two hours so if it dies it gets restarted.

Interesting that a few people are having the same issue. I think my next step will be to try running an older image going back a few versions to see if that does anything. I am currently running ‘latest’

Same here, running latest. I’ve started running LibreNMS docker containers more or less two months ago and I’ve always had the same problem

out of interest then are you running a CRON container in the later versions? I notice in the example docker-compose file on GIT that there is no entry for CRON. Is this because polling / discovery has been moved to a service, I notice the doc’s seem to suggest that?

No, there’s no cron container running.

Understood, I am the same