Problems with Distributed Poller

Hello,
We have in our organization a LibreNMS that polls 2500 devices and the idea is that it reaches 4000. But although it has many resources, the poller is already more than half an hour, so we have implemented an additional machine to help the main server.
First of all, I have followed all the documentation of Install LibreNMS, Dispatcher Service and Distributed Poller.

WebServer (IP: X.X.X.13):
librenms@librenms-com:~$ ./validate.php

Component Version
LibreNMS 25.3.0-19-g3e4b8c5a2 (2025-03-20T02:02:41+01:00)
DB Schema 2025_03_11_031114_drop_ospfv3ifinstid (321)
PHP 8.3.19
Python 3.11.2
Database MariaDB 10.11.11-MariaDB-0+deb12u1
RRDTool 1.7.2
SNMP 5.9.3
===========================================

[OK] Composer Version: 2.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database connection successful
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQL and PHP time match
[OK] Distributed Polling setting is enabled globally
[OK] Connected to rrdcached
[OK] Active pollers found
[FAIL] Some dispatcher nodes have not checked in recently
Inactive Nodes:
librenms-poller
[OK] Locks are functional
[OK] Python wrapper cron entry is not present
[OK] Redis is functional
[FAIL] Python3 module issue found: 'Required packages: [‘PyMySQL!=1.0.0’, ‘python-dotenv’, ‘redis>=4.0’, ‘setuptools’, ‘psutil>=5.6.0’, ‘command_runner>=1.3.0’]
Package not found: The ‘command_runner>=1.3.0’ distribution was not found and is required by the application

[FIX]:
pip3 install -r /opt/librenms/requirements.txt
[OK] rrdtool version ok
[OK] Connected to rrdcached
librenms@librenms-com:~$

And Server2 (service librenms stop. IP: X.X.X.19):
librenms@librenms-poller:~$ ./validate.php

Component Version
LibreNMS 25.3.0-19-g3e4b8c5a2 (2025-03-20T02:02:41+01:00)
DB Schema 2025_03_11_031114_drop_ospfv3ifinstid (321)
PHP 8.2.28
Python 3.11.2
Database MariaDB 10.11.11-MariaDB-0+deb12u1
RRDTool 1.7.2
SNMP 5.9.3
===========================================

[OK] Composer Version: 2.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database connection successful
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQL and PHP time match
[OK] Distributed Polling setting is enabled globally
[OK] Connected to rrdcached
[OK] Active pollers found
[FAIL] Some dispatcher nodes have not checked in recently
Inactive Nodes:
librenms-poller (this node)
[OK] Locks are functional
[OK] Python wrapper cron entry is not present
[OK] Redis is functional
[FAIL] Python3 module issue found: 'Required packages: [‘PyMySQL!=1.0.0’, ‘python-dotenv’, ‘redis>=4.0’, ‘setuptools’, ‘psutil>=5.6.0’, ‘command_runner>=1.3.0’]
Package not found: The ‘command_runner>=1.3.0’ distribution was not found and is required by the application

[FIX]:
pip3 install -r /opt/librenms/requirements.txt
[OK] rrdtool version ok
[OK] Connected to rrdcached
librenms@librenms-poller:~$

Output .env webserver:
librenms@librenms-com:~$ cat .env
APP_KEY=base64:bC568SVDdMnq5I4tpxNIaLjIo6LT5SKQEvw6JhzrNPQ=

DB_HOST=localhost
DB_DATABASE=librenms
DB_USERNAME=librenms
DB_PASSWORD=password

APP_URL=https://librenms-com.domain.es
NODE_ID=6399c4b270877
VAPID_PUBLIC_KEY=BFPSn6DUdTgBqRHNDVRbJlFCHNv01jKOjhDB67oEkCIkzi-_zu7P5jAYNTtaUZwOkoAy_y1nbMgseDi3UKkYXBU
VAPID_PRIVATE_KEY=n2Z20TgJOtyrysB6eTmifV4lpvtLZlXCEq9fVgg1T5E

SESSION_SECURE_COOKIE=true

REDIS_HOST=X.X.X.13
REDIS_PORT=6379
REDIS_DB=0
REDIS_TIMEOUT=60
CACHE_DRIVER=redis

Output .env server2:
librenms@librenms-poller:~$ cat .env

APP_KEY=base64:bC568SVDdMnq5I4tpxNIaLjIo6LT5SKQEvw6JhzrNPQ=

DB_HOST=X.X.X.13
DB_DATABASE=librenms
DB_USERNAME=librenms
DB_PASSWORD=password

REDIS_HOST=X.X.X.13
REDIS_PORT=6379
REDIS_DB=0
REDIS_TIMEOUT=60
CACHE_DRIVER=redis

#APP_URL=
INSTALL=true
NODE_ID=67dbdfc8cb0ce
VAPID_PUBLIC_KEY=BIiIwwfqghtBVPTtM1J4hZkmSspe4SMcjj2L7rPeX3DwWhNhSKtQwjoc-8E7q1r7ZciznUO2NR5ilT4d_-gGPTk
VAPID_PRIVATE_KEY=4NlEF2rSLdBPLnAWdt__Gc1imYuEmHznt9T5PECPhoQ

Output webserver /etc/default/rrdcached:
BASE_OPTIONS=“-B -F -R -l 0:42217”
BASE_PATH=/opt/librenms/rrd/
DAEMON_GROUP=librenms
DAEMON_USER=librenms
DAEMON=/usr/bin/rrdcached
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/var/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms
WRITE_JITTER=1800
WRITE_THREADS=4
WRITE_TIMEOUT=1800

Output server2 /etc/default/rrdcached:
doesn’t exist

Output webserver /etc/redis/redis.conf:

bind X.X.X.13

Output server2 /etc/redis/redis.conf:
doesn’t exist

Output webserver config.php:

$config[‘distributed_poller’] = true;
$config[‘distributed_poller_name’] = php_uname(‘n’);
$config[‘distributed_poller_group’] = 0;

Output server2 config.php:
$config[‘distributed_poller’] = true;
$config[‘distributed_poller_name’] = php_uname(‘n’);
$config[‘distributed_poller_group’] = 0;

Output webserver /etc/mysql/mariadb.conf.d/50-server.cnf:

bind-address = 0.0.0.0

And I have applied the lnms command: lnms config:set rrdcached “X.X.X.X.13:42217” on both machines.

When I start the service on server2, everything is ok for a while, I see both servers in the /poller section active and polling, but the second one writes to the rrd folder on the server2 machine. Shouldn’t it write only in the rrd folder of the webserver? I don’t have the same space for the webserver as for server2 and my disk space fills up and then LibreNMS crashes.

Thanks in advance to everyone, and any help is welcome.

Scaling out can be a bit of a challenge. I’ve been working on that as well recently.

Understanding how to deploy RRDCached was a bit of a mystery for me. The conclusion I came to was to put it on a dedicated box where it becomes centralized RRD storage for my scaled out LibreNMS deployment. I don’t like the idea of having a single instance for all my RRD needs, but I’m able to backup regularly to S3 and it seems to work really well as I’ve provisioned a disk with a ton of throughput. I tried NFS/EFS before, but it was just too slow and would bog down polling operations.

Hello,

I have seen that it really writes in the RRD folder of the WebServer (X.X.X.X.13), although it creates a folder for each polled machine in the poller, so I don’t really need extra space in the poller. But it is true that if the poller 2 group (the one that polls poller machine. X.X.X.X.19) is increasing the CPU usage and there comes a time that causes the LibreNMS service to crash. If I pass 300 computers in the poller machine, in less than 24 hours, has blinded the cpu and has thrown the service by problems with the connection to the bbdd (the WebServer machine polls approximately 2700 and does not end up dropping the service).
I will continue testing.