Multiple errors in ./validate.php

Hi @kalamchi75

I tried to check for the php-memcached equivalent for CentOS and tried all the methods. No matter what module I specify, I am not able to get rid of the message stating that memcached needs to be installed. Also @murrant mentioned in a chat that the memcached error is related to a python module and not the service. I tried to install the package “yum install php73-php-pecl-memcached” in accordance to my php version 7.3.xxx.

Below is the output of the nmap command that you asked me to execute:

[root@dc4up-vlibrenms02 ~]# nmap -p 42217 10.69.176.104
Starting Nmap 7.70 ( https://nmap.org ) at 2021-09-16 13:53 UTC
Nmap scan report for DC5UP-vLibreNMS01.example.com (10.69.176.104)
Host is up (0.17s latency).

PORT STATE SERVICE
42217/tcp closed unknown

Nmap done: 1 IP address (1 host up) scanned in 0.68 seconds
[root@dc4up-vlibrenms02 ~]#

I have disabled the firewalld service on my CentOS instance and there are no other physical FWs between the central server and the poller as this is all on the internal network.

I have opened another thread seeking for help with the database errors on the poller.

Thanks,

Santosh Kotla

Hi Santosh,

Please see below nmap on port 42217 from my remote poller to the master (where rrdcached service is running).

As you can see, the port is open.
You need to find out why the port is closed in your case. This means your poller will not be able to connect to your master rrdcached instance since it does not see an open/listening port.

Did you re-enable rrdcached in the your master’s config file ?

Hi @kalamchi75

I am using this in the config file at my master:
$config[“rrdcached”] = “unix:/var/run/rrdcached/rrdcached.sock”;

When I specify either the hostname of the IP address, it breaks the graphs with a message stating that it is not able to connect to rrdcached.

Also, I have doublechecked that there is no firewall on the central server side.

Thanks,
Santosh Kotla

Thanks,
Santosh Kotla

ok back to rrdcached

I noticed that the output of the rrdcached service status (you sent few days ago) points to /var/tmp:

root@dc5up-vlibrenms01:~# systemctl status rrdcached
● rrdcached.service - Data caching daemon for rrdtool
Loaded: loaded (/etc/systemd/system/rrdcached.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-09-09 09:49:24 UTC; 33s ago
Process: 29833 ExecStart=/usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G  librenms -B -R -j /
Main PID: 29836 (rrdcached)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/rrdcached.service
└─29836 /usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j  /var/tmp

Sep 09 09:49:24 dc5up-vlibrenms01 systemd[1]: Starting Data caching daemon for rrdtool…
Sep 09 09:49:24 dc5up-vlibrenms01 systemd[1]: Started Data caching daemon for rrdtool.

And it does nowhere points to /opt/librenms/rrd directory, and I think that’s why it keeps saying “Could not read RRD file”

See my service status below:

Now, make a copy of your current rrdcached config file (as .BAK, to revert back to if things go wrong)
then replace the options part at the end of the configuration with the lines below:

# Any other options not specifically supported by the script (-P, -f,
# -F, -B).
BASE_OPTIONS="-B -F -R"
OPTS="-l 0:42217"
OPTS="$OPTS -R -j /var/lib/rrdcached/journal/ -F"
OPTS="$OPTS -b /opt/librenms/rrd -B"
OPTS="$OPTS -w 1800 -z 900"

Restart your rrdcached, see if it points to the correct directory in the status output, if yes, enable it again in your librenms configuration.

Hi @kalamchi75

I see that the number of errors seem to have reduced now. But I still get the error and the /var/tmp has not dissappeared but /opt/librenms/rrd has been added to the output.

I did a restart before taking this snippet.

Thanks,
Santosh Kotla

Hi @kalamchi75
:frowning: It’s back.

root@dc5up-vlibrenms01:~# systemctl status rrdcached
● rrdcached.service - Data caching daemon for rrdtool
Loaded: loaded (/etc/systemd/system/rrdcached.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-09-17 10:25:36 UTC; 5min ago
Process: 3197 ExecStart=/usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /opt/librenms/rrd/ (code=exited, status=0/SUCCESS)
Main PID: 3198 (rrdcached)
Tasks: 614 (limit: 4915)
CGroup: /system.slice/rrdcached.service
└─3198 /usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /opt/librenms/rrd/

Sep 17 10:29:58 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:29:58 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:00 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:00 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:03 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:03 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:08 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:08 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:10 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.
Sep 17 10:30:10 dc5up-vlibrenms01 rrdcached[3198]: handle_request_update: Could not read RRD file.

allow permission to ALL for your /var/tmp and see what happens

chmod 777 /var/tmp

Then restart rrdcached again

I have no idea why does it or where does it point to /var/tmp, I didn’t find it in your shared configuration.

Hi @kalamchi75

I had to revert the changes because it broke the GUI and I got the “Whoops, something went wrong” page…

I will try this change that you suggested as well.

Thanks,
Santosh Kotla

Hi @kalamchi75

No luck sir. Still the same issue. Even I don’t understand how it is getting loaded from init.d for you and from a different directory for me.

Thanks,
Santosh Kotla

Do you use vmware or any other virtaul machines host ?
If Yes,
Create a new LibreNMS server on Ubuntu. clean install of everything. Install rrdcached and memcached and add a couple of hosts and see the graphs. Then integrate your remote poller with it.
If that proves to be working, add all your machines, keep it running in tandem with the old server, until you have enough history graphs then you can decommission the old one.

Hi @kalamchi75

The existing Librenms I have is on a virtual instance. We have a lot of things integrated with this server and that’s why it is difficult to get rid of this one. I have another instance where I have deployed the poller and I am trying to get it to talk to this one. But that is on a CentOS as the server team within the company can help us with any issues that we run into with CentOS.

Even I am trying to dig as to why this is going to the /var/tmp. Even when I installed it on my poller, it shows the same. Not sure what changed. This wasn’t there before the reboot. :frowning:

I have a raspberrypi at home. Let me try to do an install on that and see how it works.

Unfortunately, I only use CentOS when i’m forced to.
Otherwise I usually install on Ubuntu (including all our LNMS servers).
There might be differences between how/where rrdcached runs between CentOS and Ubuntu, i’m not sure.

Try to search for issues of rrdcached and CentOS. You might find some solutions and explanations.

Hi @kalamchi75

All of a sudden my rrdcached seems to have completely broken. I tried to uncomment and recomment and then restart the services but that didn’t seem to help. I just hit the system for a reboot.

Will keep you posted.

Thanks,
Santosh Kotla

Hi @kalamchi75

I just went thoroughly through the rrdcached config file and I saw that there were two command lines missing:

DAEMON=/usr/bin/rrdcached
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
WRITE_THREADS=4
BASE_PATH=/opt/librenms/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/var/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms
DAEMON_GROUP=librenms
DAEMON_USER=librenms
BASE_OPTIONS="-B -F -R"

I put that in and now it is breaking my GUI… It’s super slow again. CPUs all below threshold and nothing choking anywhere.

Any suggestions from your side Ali?

Thanks,
Santosh Kotla

Hi Santosh,

is that the complete rrdcached config ?
and what does your rrdcached status show now ?

@kalamchi75

No Sir, this is what is there right now:

DAEMON=/usr/bin/rrdcached
DAEMON_USER=librenms
DAEMON_GROUP=librenms
WRITE_THREADS=4
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
BASE_PATH=/opt/librenms/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms
DAEMON_GROUP=librenms
DAEMON_USER=librenms
BASE_OPTIONS="-B -F -R"

BASE_OPTIONS="-l 0:42217"
BASE_OPTIONS="$BASE_OPTIONS -R -j /var/lib/rrdcached/journal/ -F"
BASE_OPTIONS="$BASE_OPTIONS -b /opt/librenms/rrd -B"
BASE_OPTIONS="$BASE_OPTIONS -w 1800 -z 900"

Thanks,
Santosh Kotla

Hi @kalamchi75

I have one observation however:

If you take a look at my rrdcached status:

root@dc5up-vlibrenms01:~# systemctl status rrdcached
● rrdcached.service - Data caching daemon for rrdtool
Loaded: loaded (/etc/systemd/system/rrdcached.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-09-17 16:01:45 UTC; 2min 26s ago
Process: 1635 ExecStart=/usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /opt/librenms/r
Main PID: 1636 (rrdcached)
Tasks: 7 (limit: 4915)
CGroup: /system.slice/rrdcached.service
└─1636 /usr/bin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /opt/librenms/rrd/

You see that it is pointing to /var/run/rrdcached/rrdcached.sock.

But in my rrdcached config file, all of them are pointing to /var/run/rrdcached.sock. This is the same statement as per the document that needs to be put in the config file.

$config[‘rrdcached’] = “unix:/var/run/rrdcached.sock”;

I have tried to change the rrdcached statement in the config file to the hostname:42217 and even IP:42217 but none of that works. It just breaks the rrdcached.

As of now, the rrdcached is still broken ever since I put in the daemon_group and daemon_user in the rrdcached config file.

What are your thoughts on this one sir?

Thanks,
Santosh Kotla

Hi @kalamchi75

I have changed the /etc/systemd/system/rrdcached.service file to point it to the right destination for the sock file. That solved the problem and my server is able to communicate to rrdcached and plot graphs.

But putting in localhost:42217 and or the hostname/IP:42217 is still a problem. Are you sure we need to do that on the central server? or is it something that needs to be configured on the poller?

Thanks,
Santosh Kotla

Hi @kalamchi75

Any Suggestions from your side sir?

Thanks,
Santosh Kotla

Hi Snatosh,

What is the problem you see now ?