Distributed Poller Help!

Hi,

We’ve been running LibreNMS for a looooong time now and I’ve finally gotten it together to start looking at distributed polling. I think I’m about 90% of the way there.

I’ve two servers, one is also running the web UI, MySQL, memcached, rrdcached and redis. The second server is configured to connect to memcached, rrdcached, redis, and MySQL on the first.

It seems to be working - devices are being polled, RRDs are updating. What has me a little confused is that my poller page looks like this:

Sometimes lnms-poller-01 is red in this list, sometimes it’s green (it’s approximately 23:22 where I am, by the way). It never appears in the ‘Poller Cluster Health’ list though, and I’m pretty sure it’s supposed to.

Does anyone have any pointers for what I might have missed?

Thanks

Dermot

Do you have distributed polling set to true on both and an unique node-id in the .env?

This means you are running the dispatch service on one node and the python wrapper on the other node. You shouldn’t mix them as that will cause double polling certainly.

Yep - distributed polling is set to true on both nodes, and both nodes have a unique node-id in the .env file.

@murrant - so I need to run poller-wrapper.py on lnms-poller-01 and librenms-service.py on im-nms-01 (poller and web UI/redis/memcached/etc. host respectively)?

Edit: never mind - I understand what you mean now! librenms-service.py only on both pollers, or poller-wrapper.py only on both pollers, not a mix of the two.

1 Like

Hmmm… For some reason neither poller is the master. Both are pointing at the same redis server though. I’ll need to do more investigation, though nothing appears to be broken as a result of this…

OK, I am seeing some issues. I have both of my pollers registered and they appear to be working. 1144 of my devices are polling correctly. 5 aren’t.

The 5 that aren’t are a specific type of device that will only allow a limited number of remote hosts to poll them using SNMP. I’ve added these devices to a group and am trying to pin them to the poller that is in the list of allowed remote hosts.

If I do that and then go to the Poller list, I can see that my poller is set to poll both groups:

Check again a while later and I’m back to this:

Is there any way to troubleshoot what’s going on here?

librenms@lnms-poller-01:~$ ./validate.php -g distributedpoller
====================================
Component | Version
--------- | -------
LibreNMS  | 1.65-19-ge5bb6d80b
DB Schema | 2020_06_23_00522_alter_availability_perc_column (170)
PHP       | 7.4.3
Python    | 3.8.2
MySQL     | 10.3.22-MariaDB-1ubuntu1
RRDTool   | 1.7.2
SNMP      | NET-SNMP 5.8
====================================

[OK]    Composer Version: 1.10.8
[OK]    Dependencies up-to-date.
Checking distributedpoller: OK
[OK]    Connection to memcached is ok

librenms@im-nms:~$ ./validate.php -g distributedpoller
====================================
Component | Version
--------- | -------
LibreNMS  | 1.65-22-g42971a55d
DB Schema | 2020_06_23_00522_alter_availability_perc_column (170)
PHP       | 7.4.3
Python    | 3.8.2
MySQL     | 10.3.22-MariaDB-1ubuntu1
RRDTool   | 1.7.2
SNMP      | NET-SNMP 5.8
====================================

[OK]    Composer Version: 1.10.8
[OK]    Dependencies up-to-date.
Checking distributedpoller: OK
[OK]    Connection to memcached is ok

Thanks again!

Dermot

Hmm, I wonder if the maintenance restart isn’t apply the settings correctly or something.
Try setting poller groups in config.php on the specific poller for now.

Setting the groups to 0,1 through the web UI and then immediately restarting librenms-service.py seems to do the trick.