Hi All,
I have recently setup distributed polling. I have one host which run Libre, Web and MySQL.
root@cti02:/opt/librenms# ./validate.php -g distributedpoller
Component |
Version |
LibreNMS |
1.54-16-g01a519ef2 |
DB Schema |
2019_07_03_132417_create_mpls_saps_table (139) |
PHP |
7.2.19-0ubuntu0.18.04.1 |
MySQL |
10.1.40-MariaDB-0ubuntu0.18.04.1 |
RRDTool |
1.7.0 |
SNMP |
NET-SNMP 5.7.3 |
==================================== |
|
[OK] Composer Version: 1.9.0
[OK] Dependencies up-to-date.
Checking distributedpoller: OK
[OK] Connection to memcached is ok
I have the new host which run libre as a distbributed poller just running libre
root@cti03:/opt/librenms# ./validate.php -g distributedpoller
Component |
Version |
LibreNMS |
1.54-17-gc18ba96f0 |
DB Schema |
2019_07_03_132417_create_mpls_saps_table (139) |
PHP |
7.2.19-0ubuntu0.18.04.1 |
MySQL |
10.1.40-MariaDB-0ubuntu0.18.04.1 |
RRDTool |
1.7.0 |
SNMP |
NET-SNMP 5.7.3 |
==================================== |
|
[OK] Composer Version: 1.9.0
[OK] Dependencies up-to-date.
Checking distributedpoller: OK
[OK] Connection to memcached is ok
I have approx 600 devices, I thought they would split these devices and poll 300 each. But each poller is polling all of the devices?
Is this correct?
Do I have to split the devices into defined Groups to get each poller to poll half?
I was hoping this was taken care of by the default poller group 0?
Dunc
The documentation seems to suggest that the devices should be split amongst the pollers.
Another benefit to this is that you can provide N+x pollers, i.e if you know that you require three pollers to process all devices within 300 seconds then adding a 4th poller will mean that should any one single poller fail then the remaining three will complete polling in time. You could also use this to take a poller out of service for maintenance, i.e OS updates and software updates.
https://docs.librenms.org/Extensions/Distributed-Poller/#pollers
Does this have to be set somehow?
Im having the same issue… was there ever a resolution
Hi Johnathan,
What we ended up doing was creating a Fast and Slow poller.
We were getting alot of “Device status changed to Down from icmp check.” alerts. We ran some analysis on avergae response times from devices to discover what our fping timeouts should be.
Poller 2 - Fast Devices
//### Distributed Poller Config
$config['distributed_poller_name'] = file_get_contents('/etc/hostname');
$config['distributed_poller_group'] = '0,2,3';
$config['distributed_poller_memcached_host'] = "x.x.x.x";
$config['distributed_poller_memcached_port'] = 11211;
$config['distributed_poller'] = true;
$config['fping_options']['timeout'] = 10000;
$config['fping_options']['count'] = 2;
$config['fping_options']['interval'] = 10000;
Poller 3 - Slow Devices
//### Distributed Poller Config
$config['distributed_poller_name'] = file_get_contents('/etc/hostname');
$config['distributed_poller_group'] = '1,4';
$config['distributed_poller_memcached_host'] = "x.x.x.x";
$config['distributed_poller_memcached_port'] = 11211;
$config['distributed_poller'] = true;
$config['fping_options']['timeout'] = 12000;
$config['fping_options']['count'] = 2;
$config['fping_options']['interval'] = 12000;
We then set up new ghroups for Slow responding devices, as you can see Group 1 is associated with Poller 3.
We then keep track of response times for devices in a number of files outside Libre, we do this by pulling response times from the database using some scripts run from Cron. Based on these we move devices between the specific pollers.
Descr: Create an event in Libre NMS Database to assign or re-assign the poller group for a device based on maximum response time.
Where a device response equals or is greater than 6 seconds assign the device to the slow poller. Poller identification of 1
Overview of event.
Creat event called device_poller_assignment, at time of creation (now) and assign an interval of 4 minutes
Make the event permanent
*************************************************************************
CREATE EVENT device_poller_assignment
ON SCHEDULE AT EVERY 4 MINUTES
ON COMPLETION PRESERVE
DO
Update devices.poller_group
set devices.poller_group=1
where devices_perf.`max` >= "6000.00"
*************************************************************************
To stop the event, use the following SQL command.
DROP EVENT IF EXIST <enter the event here>;
To find an event in the datbase use the following command.
SHOW EVENTS FROM <enter database name here>
Hope this helps.
Duncan
1 Like