A little Help with Distributed polling

Hello and good morning,
I wondered if some one could sanity check an issue I am having with distributed polling please.

I have a environment with 1 Webserver/RRDCached/Memcached and base librenms poller, 1 SQL and 1 Additional poller. The issue i am having is seeing the second poller come up in the poller list.

My base librenms .env file is;

APP_KEY=base64:a_key

DB_HOST=a_db_server
DB_DATABASE=“a_db”
DB_USERNAME=“a_user”
DB_PASSWORD=“a_password”

APP_URL=/
NODE_ID=5d22fc06e1fa6
LIBRENMS_USER=librenms

REDIS_HOST=localhost
REDIS_DB=0
REDIS_PASSWORD=“a_password”
REDIS_PORT=6379

My poller .env file is

APP_KEY=base64:a_key

DB_HOST=a_db_server
DB_DATABASE=“a_db”
DB_USERNAME=“a_user”
DB_PASSWORD=“a_password”

APP_URL=/
NODE_ID=5d22fc06e1fa6
LIBRENMS_USER=librenms

REDIS_HOST=localhost
REDIS_DB=0
REDIS_PASSWORD=“a_password”
REDIS_PORT=6379

My base config.php is

Database config

$config[‘db_host’] = ‘x.x.x.x’;
$config[‘db_user’] = ‘librenms’;
$config[‘db_pass’] = ‘a_password’;
$config[‘db_name’] = ‘librenms’;

$config[‘distributed_poller’] = true;
$config[‘distributed_poller_group’] = ‘0’;

$config[‘memcached’][‘enable’] = true;
$config[‘memcached’][‘host’] = ‘x.x.x.x’;
$config[‘memcached’][‘port’] = 11211;

$config[‘service_poller_workers’] = 50; # Processes spawned for polling
$config[‘service_services_workers’] = 3; # Processes spawned for service polling
$config[‘service_discovery_workers’] = 5; # Processes spawned for discovery

$config[‘os’][‘junos’][‘nobulk’] = false;
$config[‘snmp’][‘max_repeaters’] = 50;

//Optional Settings
$config[‘service_poller_frequency’] = 400; # Seconds between polling attempts
$config[‘service_discovery_frequency’] = 21600; # Seconds between discovery runs
$config[‘service_poller_down_retry’] = 60; # Seconds between failed polling attempts
$config[‘service_loglevel’] = ‘CRITICAL’; # Must be one of ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’
$config[‘service_ping_enabled’] = true;

My poller config.php is

$config[‘distributed_poller_name’] = php_uname(‘n’);
$config[‘distributed_poller_group’] = ‘0’;
$config[‘distributed_poller_memcached_host’] = “x.x.x.x”;
$config[‘distributed_poller_memcached_port’] = 11211;
$config[‘distributed_poller’] = true;
$config[‘rrdcached’] = “x.x.x.x:42217”;
$config[‘update’] = 0;

I do not see both pollers showing in the list. What am I doing wrong?

Depends, but first lets see both servers with ./validate and paste output here.

Are both servers in the same network or is the poller behind a NAT, if so you need to open some ports to make a connection.
Also are you using the new poller ( installed as a service )?
Try using this command on your master server: redis-cli -a YOURPASSWORD INFO Replication
See if your slave is connected to the master server.

My main host;

:/opt/librenms# php ./validate.php

Component Version
LibreNMS 1.54
DB Schema 2019_07_03_132417_create_mpls_saps_table (139)
PHP 7.3.7-2+ubuntu18.04.1+deb.sury.org+1
MySQL 5.7.27-0ubuntu0.18.04.1
RRDTool 1.7.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[FAIL] The poller cluster member (localhost-poller) has not checked in within the last 5 minutes, check that it is running and healthy.
[WARN] Some devices have not been polled in the last 5 minutes. You may have performance issues.
[FIX]:
Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
xxx
[FAIL] Some devices have not completed their polling run in 5 minutes, this will create gaps in data.
[FIX]:
Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
xxxx
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
bootstrap/cache/.gitignore
logs/.gitignore
rrd/.gitignore
storage/app/.gitignore
storage/app/public/.gitignore
storage/debugbar/.gitignore
storage/framework/cache/.gitignore
storage/framework/cache/data/.gitignore
storage/framework/sessions/.gitignore
storage/framework/testing/.gitignore
storage/framework/views/.gitignore
storage/logs/.gitignore
[FAIL] Some folders have incorrect file permissions, this may cause issues.
[FIX]:
sudo chown -R librenms:librenms /opt/librenms
sudo setfacl -d -m g::rwx /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
sudo chmod -R ug=rwX /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
Files:
/opt/librenms/bootstrap/cache/packages.php

Remote Poller

php ./validate.php

Component Version
LibreNMS 1.53.1-52-gb09963782
DB Schema 2019_07_03_132417_create_mpls_saps_table (139)
PHP 7.3.7-2+ubuntu18.04.1+deb.sury.org+1
MySQL 5.7.27-0ubuntu0.18.04.1
RRDTool 1.7.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[WARN] Your database schema has extra migrations (2019_06_30_190400_create_mpls_sdps_table, 2019_06_30_190401_create_mpls_sdp_binds_table, 2019_06_30_190402_create_mpls_services_table, 2019_07_03_132417_create_mpls_saps_table). If you just switched to the stable release from the daily release, your database is in between releases and this will be resolved with the next release.
[FAIL] Database: extra table (mpls_saps)
[FAIL] Database: extra table (mpls_sdps)
[FAIL] Database: extra table (mpls_sdp_binds)
[FAIL] Database: extra table (mpls_services)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (https://t.libren.ms/discord) or the community site (https://t.libren.ms/5gscd):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
DROP TABLE mpls_saps;
DROP TABLE mpls_sdps;
DROP TABLE mpls_sdp_binds;
DROP TABLE mpls_services;
[FAIL] Disk space where /opt/librenms/rrd is located is empty!!!
[FAIL] Some devices have not completed their polling run in 5 minutes, this will create gaps in data.
[FIX]:
Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
xxx
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your install is over 24 hours out of date, last update: Thu, 25 Jul 2019 13:10:47 +0000
[FIX]:
Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors.
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
bootstrap/cache/.gitignore
logs/.gitignore
rrd/.gitignore
storage/app/.gitignore
storage/app/public/.gitignore
storage/debugbar/.gitignore
storage/framework/cache/.gitignore
storage/framework/cache/data/.gitignore
storage/framework/sessions/.gitignore
storage/framework/testing/.gitignore
storage/framework/views/.gitignore
storage/logs/.gitignore
[FAIL] Some folders have incorrect file permissions, this may cause issues.
[FIX]:
sudo chown -R librenms:librenms /opt/librenms
sudo setfacl -d -m g::rwx /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
sudo chmod -R ug=rwX /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/

With the redis replication check I get

Main box

Replication

role:master
connected_slaves:0
master_replid:dee141b5befaa73ee31abe879cb31c6fc8810c6e
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Remote

Replication

role:master
connected_slaves:0
master_replid:a06dbb33ef705869e24a9d09e7ed9b70d992de6f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Can you run the following command: sudo apt list | grep installed and paste the output
I want to know which version of Redis you’re running, cause the 18.04 of ubuntu doesn’t distribute version 5+ of redis while it is recommended by LibreNMS for certain commands. Try upgrading to ubuntu 19+ ( Disco ) it provides redis 5+ and some other updates here and there, but be aware, it’s not recommended to install/upgrade to Ubuntu 19+ if you read the docs, but I haven’t had any problems with it so far.

Also the main question still remains, is the poller behind a NAT?

Main server

php-redis/bionic,now 4.3.0-1+ubuntu18.04.1+deb.sury.org+1 amd64 [installed]
redis-server/bionic,now 5:5.0.5-1chl1~bionic1 amd64 [installed]
redis-tools/bionic,now 5:5.0.5-1chl1~bionic1 amd64 [installed,automatic]

Poller

php-redis/bionic,now 4.3.0-1+ubuntu18.04.1+deb.sury.org+1 amd64 [installed]
redis-server/bionic,now 5:5.0.5-1chl1~bionic1 amd64 [installed]
redis-tools/bionic,now 5:5.0.5-1chl1~bionic1 amd64 [installed,automatic]

Both devices in the same subnet with no nat.

In regards to the .env file can I confirm how the line should be? Should they both be localhost or the other device?

REDIS_HOST=

1 Like

Okay, here is a sample config I use for easy configuring a server as slave

REDIS

sudo apt install python3 python-mysqldb python3-pip
cd /tmp
wget http://ftp.fr.debian.org/debian/pool/main/p/python-dotenv/python3-dotenv_0.9.1-1_all.deb
sudo dpkg -i python3-dotenv_0.9.1-1_all.deb
pip3 install redis
echo "vm.overcommit_memory = 1" | sudo tee -a /etc/sysctl.conf
sudo sed -i "s/supervised no/supervised systemd/g" /etc/redis/redis.conf
KEY=$(openssl rand 60 | openssl base64 -A)
sudo sed -i "s/# requirepass foobared/requirepass ${KEY}/g" /etc/redis/redis.conf
sudo sed -i "s/# slaveof <masterip> <masterport>/slaveof #IP ADDRES OF MASTER# 6379/g" /etc/redis/redis.conf
sudo sed -i "s,# masterauth <master-password>,masterauth #SOME RANDOM GENERATED LONG KEY#**" /etc/redis/redis.conf
sudo systemctl restart redis.service

Only change: #IP ADDRES OF MASTER# and #SOME RANDOM GENERATED LONG KEY#

I’ve noticed that your REDIS are both masters, which is wrong.
One question: Is your poller running as a service instead of the old cron?

This is the first part of the Dispatcher Documentation

The new LibreNMS dispatcher service ( librenms-service.py ) replaces the old poller service ( poller-service.py ), improving its reliability. It’s mostly a drop in replacement for the old service, but testing is recommended before switching over.

1 Like

Also maybe these tutorials can help you out to understand how a replication with redis is setup and works.

https://community.pivotal.io/s/article/How-to-setup-Redis-master-and-slave-replication

Hi,

I have successfully finished installing Librenms as master with two Librenms instances as remote pollers.
All the servers are based on Ubuntu 18.04 with php 7.3, and mariadb-10.1.40
I did not use REDIS (and I don’t know what that is in all honesty)
but here’s what I needed to do to make the solution work:

1- The master LibreNMS MUST have listening mysql, rrdcached, and memcached. make sure that firewalls are allowing TCP ports 11211, 42217, and 3306 between the master and pollers.
2- make sure that pollers config point to the master IP address for mysql, rrdcacched, and memcached
3- make sure the master server’s MySQL service is listening to public port rather than localhost (disable bind-address=127.0.0.1)
4-make sure that the remote database user is granted privileges in the master’s librenms database (this was the last trap I fell into, once that was granted, the pollers immediately poped-up in the master’s list).

This is only a brief but it might help :slight_smile:

regards

So progress has been made - Thankyou.
I know show the 2 pollers and they seem to be handling the polls suitably. I have some slow JUNOS devices but see they suffer with an issue in their SNMP stack.,

One thing I am getting alot of since having 2 pollers is that the devices keep going down saying SNMP status of up to 60 devices has changed.

2019-07-30 10:33:09 Device status changed to Up from snmp check.
2019-07-30 10:26:59 Device status changed to Down from snmp check.
2019-07-30 10:26:29 Device status changed to Up from snmp check.
2019-07-30 10:20:43 Device status changed to Down from snmp check.
2019-07-30 10:13:09 Device status changed to Up from snmp check.
2019-07-30 10:07:00 Device status changed to Down from snmp check.
2019-07-30 10:06:29 Device status changed to Up from snmp check.
2019-07-30 10:00:22 Device status changed to Down from snmp check.
2019-07-30 09:59:49 Device status changed to Up from snmp check.
2019-07-30 09:53:53 Device status changed to Down from snmp check.

Yet if I manually carry out a walk from either pollers they poll sucessfully. Has any one seen this at all?

Hi Kalamchi75,

I have a Central LibreNMS (Web+mysql) server at my office, and I have a distributed poller at remote site. The distributed poller is linking with Central server via fortigate SSL vpn (static IP). The Central server has full vpn access to distributed poller. The distributed poller has full access to remote site switches. (but Central server does not have access to remote site switches)

The distributed poller is showing on the Central server master list.

The issue is: On the central server, I tried to add a switch from the remote site. But it is failed, because the Central server is not able to ping/snmp to the adding device. I tried add the device with “force add” option. The added device keep “unpulled” status.

Can you please give me an advice on how to add a remote device onto Central LibreNMS server?

Thanks

Hi Jeff,

Are you using poller groups ?
Here is how mine setup:

Notice the group ID. This is assigned by the system once created.

  • in the config file of the remote poller, assign the poller to the group ID:

         $config['distributed_poller_group'] = '2';
    
  • Now, when you add a new device, set it to the remote poller group:

If i’m not mistaken, the machine would actually be polled by the remote poller with which it shares a VPN connection, then the polled data will be piped to the database in the central poller.

Try it.

Best Regards