Is there a way to recover the poller operator after deleting it?

Today I have the opportunity to do Distributed Polling because the current machine is not able to finish polling in 300 seconds. (I have 7000+ devices)

After setting everything, I have created a separate group for pollers.

The problem I have is:
Machine 1 - does not continue poller group 1.
Machine 2 - polls group 1 instead, not going to group 2.

And now I accidentally deleted the poller of machine 1 because it has a trash can icon.
With the understanding that it can be added again, I still can’t find a way to add it back.

Is there a way to recover the poller of machine 1 and have each poller group separately?

This is a picture of my poller page.

This is the information of the first machine.

[librenms@zabbix-mon ~]$ ./validate.php
===========================================
Component | Version
--------- | -------
LibreNMS  | 24.8.0-75-g2fc59a470 (2024-09-18T02:54:34+07:00)
DB Schema | 2024_08_27_182000_ports_statistics_table_rev_length (299)
PHP       | 8.2.13
Python    | 3.9.18
Database  | MariaDB 10.5.22-MariaDB
RRDTool   | 1.7.2
SNMP      | 5.9.1
===========================================

[OK]    Composer Version: 2.7.9
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQL and PHP time match
[OK]    Distributed Polling setting is enabled globally
[OK]    Connected to rrdcached
[OK]    Active pollers found
[OK]    Dispatcher Service not detected
[OK]    Locks are functional
[OK]    Python poller wrapper is polling
[OK]    Redis is functional
[OK]    rrdtool version ok
[OK]    Connected to rrdcached
[FAIL]  We have found some files that are owned by a different user than 'librenms', this will stop you updating automatically and / or rrd files being updated causing graphs to fail.
        [FIX]:
        sudo chown -R librenms:librenms /opt/librenms
        sudo setfacl -d -m g::rwx /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
        sudo chmod -R ug=rwX /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
        Files:
         /opt/librenms/cache/os_defs.cache

This is the information of the second machine.

[librenms@Libra3-mon ~]$ ./validate.php
===========================================
Component | Version
--------- | -------
LibreNMS  | 24.8.0-75-g2fc59a470 (2024-09-18T02:54:34+07:00)
DB Schema | 2024_08_27_182000_ports_statistics_table_rev_length (299)
PHP       | 8.1.27
Python    | 3.9.19
Database  | MariaDB 10.5.22-MariaDB
RRDTool   | 1.7.2
SNMP      | 5.9.1
===========================================

[OK]    Composer Version: 2.7.9
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQL and PHP time match
[OK]    Distributed Polling setting is enabled globally
[OK]    Connected to rrdcached
[OK]    Active pollers found
[OK]    Dispatcher Service not detected
[OK]    Locks are functional
[OK]    Python poller wrapper is polling
[OK]    Redis is functional
[OK]    rrdtool version ok
[OK]    Connected to rrdcached

how are you starting the poller on each device?

If you change which poller groups a poller handles, or other options in the UI, you need to first shutdown that poller, change the value in the UI, and start the poller again.

I tried following the instructions but it didn’t work.

It seems that my machine 1 poller-warpper.py doesn’t work.
Is it because machine 1 doesn’t show up in the pollers summary page http://localhost/poller ?

There is only one machine Libra3-mon.1-to-all.com (as shown in the picture).

Before that, machine 1 was showing up.
Now only machine 2 is left.

Is there a way to fix the missing ?

Currently, machine 1 is doing redis for Distributed Polling.
If I roll back to the original standalone mode, Will it show up again at http://localhost/poller?

Or is it a design that if Distributed Polling is done on 2 machines, the poller and database will be separated?

I don’t want to have to reinstall the program.
Thank you for your suggestion.

Check NODE_ID in .env, it should be unique on both pollers. Also check $config['distributed_poller_name'] in config.php as that should also be unique.

I tried checking as mentioned

  • NODE_ID in .env is not duplicated
###### .env HOST ######
DB_HOST=localhost
DB_DATABASE=librenms
DB_USERNAME=librenms
DB_PASSWORD=password

REDIS_HOST=10.221.1.6
REDIS_PORT=6379
REDIS_DB=0
CACHE_DRIVER=redis

#APP_URL=
#NODE_ID=6616699b31ccb
NODE_ID=HOST_10.221.1.6
###### .env POLLER ######
DB_HOST=10.221.1.6
DB_DATABASE=librenms
DB_USERNAME=librenms
DB_PASSWORD=password

REDIS_HOST=10.221.1.6
REDIS_PORT=6379
REDIS_DB=0
CACHE_DRIVER=redis


#APP_URL=
#NODE_ID=66ea610209d82
NODE_ID=Poller_10.221.1.14
  • config.php in the settings section it is the same
  • By setting like this $config['distributed_poller_name'] = php_uname('n');

I tried changing the values ​​according to this for each machine

###### config.php  Host ######
$config['distributed_poller_name'] = 'HOST_10.221.1.6';

###### config.php  POLLER ######
$config['distributed_poller_name'] = 'Poller_10.221.1.14';

After that, I ordered to restart the service on both machines.

  • systemctl restart librenms.service

After doing everything as I said, the poller page has now turned into this.
I am so happy. :partying_face: Finally, the poller page has poller cluster health like everyone else.

Before, I was looking for poller cluster health to find where to open it. :woozy_face:

Now the problem is, can I get back the "HOST_10.221.1.6" that I deleted earlier?

I designed it like this. It is divided into 2 groups as follows:

 Poller_10.221.1.14 ----> Poll Group ID 0
 HOST_10.221.1.6 ---> Poll Group ID 3

By HOST_10.221.1.6 it disappeared from poller. :thinking:

Thank you for your help.

Check the logs/librenms.log on the one not showing up (or stop the service and run the python script manually to test)

Thanks for the advice

I checked the librenms.log log and didn’t find any unusual logs

But when I tried to run poller-wrapper.py manually, I got the following error

[librenms@zabbix-mon ~]$ /opt/librenms/poller-wrapper.py 16
2024-09-21 15:20:17,159 :: WARNING :: Could not connect to memcached, disabling distributed service checks.
2024-09-21 15:20:17,163 :: CRITICAL :: ERROR: Could not connect to MySQL database! (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 569, in connect
    sock = socket.create_connection(
  File "/usr/lib64/python3.9/socket.py", line 844, in create_connection
    raise err
  File "/usr/lib64/python3.9/socket.py", line 832, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/librenms/poller-wrapper.py", line 64, in <module>
    wrapper.wrapper(
  File "/opt/librenms/LibreNMS/wrapper.py", line 468, in wrapper
    db_connection = LibreNMS.DB(sconfig)
  File "/opt/librenms/LibreNMS/__init__.py", line 220, in __init__
    self.connect()
  File "/opt/librenms/LibreNMS/__init__.py", line 269, in connect
    conn = MySQLdb.connect(**args)
  File "/usr/lib/python3.9/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 327, in __init__
    self.connect()
  File "/usr/lib/python3.9/site-packages/pymysql/connections.py", line 619, in connect
    raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")

After looking at the log, I checked the HOST_10.221.1.6 database connection.

So I found that the .env and config.php files were set to localhost.

[librenms@zabbix-mon ~]$ cat .env

DB_HOST=localhost
DB_DATABASE=librenms
DB_USERNAME=librenms
DB_PASSWORD=password
[librenms@zabbix-mon ~]$ cat /opt/librenms/config.php

$config['db_host'] = 'localhost';
$config['db_name'] = 'librenms';
$config['db_user'] = 'librenms';
$config['db_pass'] = 'password';

So I tried changing from localhost to the 10.221.1.6 of the machine running SQL

Then try running poller-wrapper.py again.

[librenms@zabbix-mon ~]$ ./poller-wrapper.py 16
2024-09-21 15:35:59,444 :: WARNING :: Could not connect to memcached, disabling distributed service checks.
2024-09-21 15:35:59,453 :: INFO :: starting the poller check at 2024-09-21 15:35:59 with 1 threads for 1 devices
2024-09-21 15:36:01,055 :: INFO :: worker Thread-2 finished device 5629 in 1 seconds
2024-09-21 15:36:01,055 :: INFO :: poller-wrapper checked 1 devices in 1 seconds with 1 workers with 0 errors

There’s no more error. :kissing:
Now, go check out the poller page.

Finally it shows up.! :sweat_smile:

I managed to group for all devices and it worked smoothly.

In conclusion, it all comes from setting the database in .env and config.php incorrectly, causing python to not be able to connect to the database.

Thank you very much. :100: :slightly_smiling_face:
I hope my topics will be informative for everyone in the future. :kissing_heart:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.