Gaps in Graphs on Server Only

Hello,

I have run LibreNMS on a test server for about a year without trouble. It has worked perfectly. We recently purchased a brand new server (32 cores, 128GB of ram) and installed a fresh copy of LibreNMS. It works well for all of the devices except for the LibreNMS Server itself. The highest poll time for any of our devices is 175 seconds or so.

I am getting gaps in the graphs for the server itself. Every graph - processor, memory, storage, users logged in, context switches, etc gets 10 minute gaps at the same point. These gaps happen on average about every 2 hours. The poll time for the server itself is around 1.9 seconds. Max processor utilization is around 35% with average being 20%. I am monitoring 200 devices with a total of 15,000 ports. Not running rrdcached. (Tried running rrdcached but the service won’t start with the config recommended for LibreNMS. I disabled the service for now).

Any ideas what could be the problem? Are there any debug scripts I can run to diagnose the issue?

Here is the output of ./validate.php:

root@LibreNMS:/opt/librenms# ./validate.php

Component Version
LibreNMS 1.37-21-g6c3473a
DB Schema 239
PHP 7.0.25-0ubuntu0.16.04.1
MySQL 10.0.33-MariaDB-0ubuntu0.16.04.1
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.6.3
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
root@LibreNMS:/opt/librenms#

You really should get RRDcaching that’s going to make a big difference in this. Also when you tried to configure rrdcaching what docs did you use. What error are you getting?

I followed the instructions from the LibreNMS Docs Page:

https://docs.librenms.org/#Extensions/RRDCached/

When I get to the point of restarting rrdcached.service, it gives:
“Job for rrdcached.service failed because the control process exited with error code. See “systemctl status rrdcached.service” and “journalctl -xe” for details.”

I enter - systemctl status rrdcached.service and it gives:
root@LibreNMS:~# systemctl status rrdcached.service
● rrdcached.service - LSB: start or stop rrdcached
Loaded: loaded (/etc/init.d/rrdcached; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2018-03-07 15:09:45 PST; 4min 57s ago
Docs: man:systemd-sysv-generator(8)

Mar 07 15:09:45 LibreNMS rrdcached[1698]: [63B blob data]
Mar 07 15:09:45 LibreNMS rrdcached[1698]: [63B blob data]
Mar 07 15:09:45 LibreNMS rrdcached[1698]: [63B blob data]
Mar 07 15:09:45 LibreNMS rrdcached[1698]: [63B blob data]
Mar 07 15:09:45 LibreNMS rrdcached[1698]: [87B blob data]
Mar 07 15:09:45 LibreNMS rrdcached[1698]: * rrdcached FAILED
Mar 07 15:09:45 LibreNMS systemd[1]: rrdcached.service: Control process exited, code=exited status=2
Mar 07 15:09:45 LibreNMS systemd[1]: Failed to start LSB: start or stop rrdcached.
Mar 07 15:09:45 LibreNMS systemd[1]: rrdcached.service: Unit entered failed state.
Mar 07 15:09:45 LibreNMS systemd[1]: rrdcached.service: Failed with result ‘exit-code’.
root@LibreNMS:~#

As a test, I commented all of the config file entries. Service started successfully. So, it would seem a problem in the config file? Not sure what. I copy and pasted from the webpage so I wouldn’t think there is a typo.

These are the instructions I followed (seems pretty straightforward):

RRDCached installation Ubuntu 16

Install rrdcached
sudo apt-get install rrdcached

Edit /etc/default/rrdcached to include:

DAEMON=/usr/bin/rrdcached
DAEMON_USER=librenms
DAEMON_GROUP=librenms
WRITE_THREADS=4
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
BASE_PATH=/opt/librenms/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms
BASE_OPTIONS="-B -F -R"

Fix permissions
chown librenms:librenms /var/lib/rrdcached/journal/

Restart the rrdcached service
systemctl restart rrdcached.service

Edit /opt/librenms/config.php to include:
$config[‘rrdcached’] = “unix:/var/run/rrdcached.sock”;

This is my config file contents: (Not sure how to get the page to display a # and not large bold print

/etc/default file for RRD cache daemon

Full path to daemon

DAEMON=/usr/bin/rrdcached
DAEMON_USER=librenms
DAEMON_GROUP=librenms
WRITE_THREADS=4
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
BASE_PATH=/opt/librenms/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms

BASE_OPTIONS="-B -F -R"

Optional override flush interval, in seconds.

#WRITE_TIMEOUT=300

Optional override maximum write delay, in seconds.

#WRITE_JITTER=0

Optional override number of write_threads

#WRITE_THREADS=4

Where database files are placed. If left unset, the default /tmp will

be used. NB: The daemon will reject a directory that has symlinks as

components. NB: You may want to have -B in BASE_OPTS.

#BASE_PATH=/var/lib/rrdcached/db/

Where journal files are placed. If left unset, journaling will

be disabled.

#JOURNAL_PATH=/var/lib/rrdcached/journal/

FHS standard placement for process ID file.

#PIDFILE=/var/run/rrdcached.pid

FHS standard placement for local control socket.

#SOCKFILE=/var/run/rrdcached.sock

Optional override group that should own/access the local control

socket

#SOCKGROUP=root

Optional override access mode of local control socket.

SOCKMODE=0660

Optional unprivileged group to run under when daemon. If unset

retains invocation group privileges.

#DAEMON_GROUP=_rrdcached

Optional unprivileged user to run under when daemon. If unset

retains invocation user privileges.

#DAEMON_USER=_rrdcached

Network socket address requests. Use in conjunction with SOCKFILE to

also listen on INET domain sockets. The option is a lower-case ell

ASCII 108 = 0x6c, and should be repeated for each address. The

parameter is an optional IP address, followed by an optional port with

a colon separating it from the address. The empty string is

interpreted as "open sockets on the default port on all available

interfaces", but generally does not pass through init script functions

so use -L with no parameters for that configuration.

#NETWORK_OPTIONS="-L"

Any other options not specifically supported by the script (-P, -f,

-F, -B).

#BASE_OPTIONS="-B"

Please format code / config between ``` ```

It looks like you’ve got a mix of init.d and systemd config going on there. Maybe check for an rrdcached file in /etc/init.d/

I ended up rebuilding the server. It was easier and faster to spin up a new VM than trying to troubleshoot it further.

The new server runs RRDCached perfectly. I followed the https://docs.librenms.org/#Extensions/RRDCached/ page. No problems at all. I have found that Linux does this sometimes. Sometimes following the directions works. Sometimes not. YYMV.

Thanks for your help and consideration.

Peter