After setting up distributed pollers, 16 hosts are marked as down, and 16 are fluctuating between up and down due to ping responses

I encountered issues after configuring distributed pollers. Although all 4 pollers using Redis are operational, I’m not receiving alerts. The event log is functional, but the alerts remain empty.

Furthermore, there are persisting problems with 16 hosts not coming online, while another 16 hosts are experiencing fluctuating connectivity. Ping responses are received from all the different pollers.

Sounds like some type of connectivity issue between some of the pollers and some of the devices?

Perhaps take one that is working intermittently and run a ICMP and SNMP test from each of the pollers.

We had something similar where an ACL did not match a couple of our pollers and when those pollers were used for polling we would get alarms.

Do you see in the event logs that the pollers are attempting to raise Alerts?

I’ve somehow managed to establish contact with all the devices, but now I’m encountering a continuous pattern of connectivity going up and down. Strangely, this issue is consistent across 16 devices that are experiencing this flapping behavior.

I’ve also noticed that when attempting to trigger alerts, errors are being generated. After some time, I located these errors in the log files.

Alerting(ERROR):Alerting poller exception! init() missing 2 required positional arguments: >
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: Traceback (most recent call last):
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: File “/opt/librenms/LibreNMS/queuemanager.py”, line 85, in _service_worker
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: self.do_work(device_id, queue_id)
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: File “/opt/librenms/LibreNMS/queuemanager.py”, line 488, in do_work
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: raise CalledProcessError
Aug 08 16:14:02 onn-librenms-01 librenms-service.py[877977]: TypeError: init() missing 2 required positional arguments: ‘returncode’ and ‘cmd’

Did you come across the same problem while attempting to set up alerts? I’m new to Librenms, so I’m facing difficulties in grasping how to resolve the problem with alerting.

Perhaps on the woker nodes do a
./validate.php
and also try running
./alert.php
(as the librenms user)

I assume you are running the dispatcher service method of distributed polling?

each poller .env file should have a unique NODE_ID and each node’s config.php should have

$config['distributed_poller_name'] = php_uname('n'); # Uniquely identifies the poller instance

It might also pay to check fping is installed and working on each node, although I think the validate.php does that …

The validation returns “OK” for all aspects across all pollers and the server. In the case of .alert.php, there are no errors on three out of four pollers. However, on poller 3, the following error is displayed:

librenms@onn-librepoller-03:~$ ./alerts.php
Start: Fri, 11 Aug 2023 11:09:20 +0200
ClearStaleAlerts():
RunFollowUp():
RunAlerts():
Issuing Alert-UID #29595/4:
No configured transports
Issuing Alert-UID #29572/3:
No configured transports

When I execute the dispatcher service, I observe the same configuration in all the poller environment files. Upon examining the status of the “librenms-server.service” using “systemctl status,” it’s evident that it also utilizes “fping.”

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.