Distributed Poller - Unable to connect to memcached

I’m currently trying to set up a distributed poller, but when I run ./daily.sh on my poller, I see the error Could not connect to memcached (librenms.andrena.com:11211). In addition, I don’t see any devices populated on the main server, even though logs in the poller indicate that devices are being polled (/opt/librenms/poller.php 9 2020-08-07 05:00:46 - 1 devices polled in 44.27 secs.

I have one server running LibreNMS with rrdcached and memcached, and have confirmed both are up and running with sudo service rrdcached/memcached status. The validate.php output reports no errors, and the config in config.php related to distributed polling is

$config['distributed_poller'] = true;
$config['rrdcached']    = "localhost:42217"; 
$config['distributed_poller_memcached_host'] = 'localhost';
$config['distributed_poller_memcached_port'] = '11211';

I have one remote poller running LibreNMS and configured to point to the main server’s memcached/rrdcached instance. The config.php related to distributed polling is as follows:

# Distributed polling
$config['distributed_poller_memcached_host'] = "librenms.andrena.com";
$config['distributed_poller_memcached_port'] = 11211;
$config['rrdcached']                         = "librenms.andrena.com:42217";
$config['distributed_poller']                = true;
$config['distributed_poller_group']          = '0';

validate.php reports no errors, but when I try running ./daily.sh on. the poller, I get the error Could not connect to memcached (librenms.andrena.com:11211).

Both netstat -anp | grep 11211 and netstat -anp | grep 42217 on the main server output

tcp        0      0 127.0.0.1:11211         0.0.0.0:*               LISTEN      - 
tcp        0      0 0.0.0.0:42217           0.0.0.0:*               LISTEN      -                   
tcp        0      0 172.31.62.96:42217      144.121.76.2:48110      ESTABLISHED -  

respectively. This tells me that the poller connected to the rrdcached instance, but for some reason couldn’t establish a connection with the memcached instance even though it’s listening on port 11211 (/usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1 -P /var/run/memcached/me is the process).

I see in the logs that the poller is currently polling data for a device (/opt/librenms/poller.php 9 2020-08-07 17:10:53 - 1 devices polled in 51.63 secs), but I don’t see any data populated on the main server. Did I miss any config/is there anything else I need to set up?

Thanks!

memcached is only listening in localhost (127.0.0.1) so you need to change the bind address

1 Like

Thanks for pointing that out! daily.sh is able to run now- only issue is that I’m still unable to see any polled devices from the remote poller on the main server. Other than pointing the poller to the main server’s memcached instance and setting distributed_poller to true, is there anything else I need to do to get this set up?

Here’s the output of netstat on the main install that shows that connection is established.

root@ip-172-31-62-96:/opt/librenms# netstat  -anp | grep 42217
tcp        0      0 0.0.0.0:42217           0.0.0.0:*               LISTEN      995/rrdcached       
tcp        0      0 172.31.62.96:42217      144.121.76.2:35180      ESTABLISHED 995/rrdcached       

and when I run daily.sh I get

root@ip-172-31-62-96:/opt/librenms# netstat  -anp | grep 11211
tcp        0      0 0.0.0.0:11211           0.0.0.0:*               LISTEN      884/memcached       
tcp        0     74 172.31.62.96:11211      144.121.76.2:57994      ESTABLISHED 884/memcached 

What poller_group did you setup in both pollers?

In my remote poller I set $config['distributed_poller_group'] = '0';. In my main server I don’t have a setting for distributed_poller_group. Do I need the main server to have all poller groups set in config.php?

Not all, just the one you want. If you want to poller share the load, set it to group 0 like the other one.

In the second poller, have you setup the crons or service?

I think so. I’ve run cp /opt/librenms/librenms.nonroot.cron /etc/cron.d/librenms on both installs. Is there a command I can run to confirm that the cron jobs are set up properly? validate.php still checks off on both.

For context, I was hoping to set up a central LibreNMS install on AWS, then have a remote distributed poller to monitor each site where we have equipment. Should I set $config['distributed_poller_group'] = '0'; on the initial poller(AWS) as well?

Well, it depends on what are you going to do with it.

If you are not going to poll any device, localhost included, you can disable poller/discovery on main install.

By default, all devices are set to poller group 0. That way only remote poller will poll the devices asigned to group 0.

Sorry for the confusion- the main install isn’t going to poll any devices. Only the remote pollers are responsible for polling data. The main install’s main purpose is to serve as a central area for me to monitor the status of the different sites.

According to the UI of the remote poller, the poller is polling 13 devices that are assigned to group 0, but I don’t see any of this data showing up on the main install. If I set up remote polling correctly (i.e. remote poller’s rrdcached and memcached pointing to the main install’s server), should I expect to see this data populate the main install’s UI? If so, is there anything else I need to set up?

Yes, you should see the data if memcached and rrdcached are setup correctly.

Try manually running ./poller.php -d -h anydevice in the remote poller and check the output for errors or something that you feel it should not be that way

The output seems healthy. It’s quite long, but the only lines that seemed suspicious were these warnings:

bps(347.75 kbps/347.75 kbps)bytes(8.58 MB/8.58 MB)pkts(39.22 pps/39.22 pps)RRD warning: unused data sent ifInUcastPkts_rate  
RRD warning: unused data sent ifOutUcastPkts_rate  
RRD warning: unused data sent ifInErrors_rate  
RRD warning: unused data sent ifOutErrors_rate  
RRD warning: unused data sent ifInOctets_rate  
RRD warning: unused data sent ifOutOctets_rate  
RRD warning: unused data sent ifInBits_rate  
RRD warning: unused data sent ifOutBits_rate 

Otherwise, I see blocks that seem to indicate that it’s writing to RRD on the main server properly:

RRD[create /opt/librenms/rrd/100.64.0.11/poller-perf-mpls.rrd --step 300 DS:poller:GAUGE:600:0:U   RRA:AVERAGE:0.5:1:2016 RRA:AVERAGE:0.5:6:1440 RRA:AVERAGE:0.5:24:1440 RRA:AVERAGE:0.5:288:1440 RRA:MIN:0.5:1:2016 RRA:MIN:0.5:6:1440 RRA:MIN:0.5:24:1440 RRA:MIN:0.5:288:1440 RRA:MAX:0.5:1:2016 RRA:MAX:0.5:6:1440 RRA:MAX:0.5:24:1440 RRA:MAX:0.5:288:1440 RRA:LAST:0.5:1:2016 ]  
RRD[update 100.64.0.11/poller-perf-mpls.rrd N:0.00036501884460449 --daemon librenms.andrena.com:42217]  
RRD[create /opt/librenms/rrd/100.64.0.11/ping-perf.rrd --step 300 DS:ping:GAUGE:600:0:65535   RRA:AVERAGE:0.5:1:2016 RRA:AVERAGE:0.5:6:1440 RRA:AVERAGE:0.5:24:1440 RRA:AVERAGE:0.5:288:1440 RRA:MIN:0.5:1:2016 RRA:MIN:0.5:6:1440 RRA:MIN:0.5:24:1440 RRA:MIN:0.5:288:1440 RRA:MAX:0.5:1:2016 RRA:MAX:0.5:6:1440 RRA:MAX:0.5:24:1440 RRA:MAX:0.5:288:1440 RRA:LAST:0.5:1:2016 ]  
RRD[update 100.64.0.11/ping-perf.rrd N:0.04 --daemon librenms.andrena.com:42217]  
RRD[create /opt/librenms/rrd/100.64.0.11/poller-perf.rrd --step 300 DS:poller:GAUGE:600:0:U   RRA:AVERAGE:0.5:1:2016 RRA:AVERAGE:0.5:6:1440 RRA:AVERAGE:0.5:24:1440 RRA:AVERAGE:0.5:288:1440 RRA:MIN:0.5:1:2016 RRA:MIN:0.5:6:1440 RRA:MIN:0.5:24:1440 RRA:MIN:0.5:288:1440 RRA:MAX:0.5:1:2016 RRA:MAX:0.5:6:1440 RRA:MAX:0.5:24:1440 RRA:MAX:0.5:288:1440 RRA:LAST:0.5:1:2016 ]  
RRD[update 100.64.0.11/poller-perf.rrd N:1.775 --daemon librenms.andrena.com:42217] 

Are there any specific errors that I should be looking out for?

So far, I’ve confirmed that rrdcached and memcached are listening on 42217 and 11211 on the main server, and that the poller is able to connect. Is it possible it’s writing to the wrong location? My /opt/librenms/rrd folder on the main server is empty even though logs indicate that the poller should’ve written to the directory. Instead, all the data is going into the /opt/librenms/rrd directory of the remote poller.

Another observation I made is that if I shut down rrdcached on the main server and run ./poller.php -d -h on the remote poller, I don’t see any obvious errors, but the logs still show that it’s still trying to write to the main server.

My /etc/default/rrdcached on the main server is pasted below if it helps:

# /etc/default file for RRD cache daemon

# Full path to daemon
DAEMON=/usr/bin/rrdcached

# Optional override maximum write delay, in seconds.
WRITE_JITTER=1800

# Optional override number of write_threads
WRITE_THREADS=4
WRITE_TIMEOUT=1800
# Where database files are placed.  If left unset, the default /tmp will
# be used.  NB: The daemon will reject a directory that has symlinks as
# components.  NB: You may want to have -B in BASE_OPTS.
BASE_PATH=/opt/librenms/rrd/

# Where journal files are placed.  If left unset, journaling will
# be disabled.
JOURNAL_PATH=/var/lib/rrdcached/journal/

# FHS standard placement for process ID file.
PIDFILE=/run/rrdcached.pid

# FHS standard placement for local control socket.
SOCKFILE=/run/rrdcached.sock

# Optional override group that should own/access the local control
# socket
SOCKGROUP=librenms

# Optional override access mode of local control socket.
#SOCKMODE=0660

# Optional unprivileged group to run under when daemon.  If unset
# retains invocation group privileges.
DAEMON_GROUP=librenms

# Optional unprivileged user to run under when daemon.  If unset
# retains invocation user privileges.
DAEMON_USER=librenms

# Network socket address requests.  Use in conjunction with SOCKFILE to
# also listen on INET domain sockets.  The option is a lower-case ell
# ASCII 108 = 0x6c, and should be repeated for each address.  The
# parameter is an optional IP address, followed by an optional port with
# a colon separating it from the address.  The empty string is
# interpreted as "open sockets on the default port on all available
# interfaces", but generally does not pass through init script functions
# so use -L with no parameters for that configuration.
NETWORK_OPTIONS="-L"

# Any other options not specifically supported by the script (-P, -f,
# -F, -B).
OPTS="-R -j /var/lib/rrdcached/journal/ -F"
OPTS="$OPTS -b /opt/librenms/rrd -B"
OPTS="$OPTS -w 1800 -z 900"

Also unrelated, but I saw somewhere in the forums that the remote pollers need to connect to the same mysql instance as the main server. Is this the case? I currently updated the remote poller to point to the main install’s mysql. I think I might need to delete/re-add the devices so they get added to the central db.

Oh. I think I found the issue.

It looks like is creating rrds locally and trying to update remotely.

You have to set $config['rrdtool_version'] = '1.5.5';

Of course, changing 1.5.5 with your current version.

Check https://docs.librenms.org/Extensions/RRDCached/

I’ve updated config.php to include $config['rrdtool_version'] = '1.7.0; and pointed the distributed poller to the same mysql instance hosted by the main install, and it seems to be working. Thanks for all the help in setting this up!

I’ve just got one last question: is it necessary to set all distributed pollers to point to the same db instance? I didn’t see any example config for that anywhere in the scaling librenms docs.

Yes. And it is (more or less). From the docs:

Database Server
MySQL / MariaDB - At the moment these are the only database servers that are supported.

The pollers, web and API layers should all be able to access the database server directly.

If you feel so, improve documentation by clicking in the upper right pencil :wink:

1 Like

Got it! I see that now, that makes sense. I’ll find sometime this week to add a short code snippet to the docs for configuring the DB on the pollers as well. Is it sufficient to just set DB settings in config.php or do we need to make modifications to .env as well? A bit new to LibreNMS/I want to make sure I get the instructions right.

Yes, in all pollers there must be a .env and config.php with db configuration.