Sure Thing @kalamchi75
The GUI seems to be working just fine now. No issues and sluggishness seen so far. The CPU is still at above 85% at least.
I will send you the recent graphs tomorrow morning. Thanks for all your time and help today Sir.
Thanks,
Santosh Kotla
you are most welcome sir.
and I apologize for my limited knowledge.
The more we troubleshoot the more we learn
1 Like
Hi @kalamchi75
So far, the graphs are looking smooth. However the memory seems to be rising but I am not sure if that the lighter part of the graph is actually going to affect the system performance.
CPU:
Memory:
HTOP:
So from your observation, if you can confirm that the parameters are below threshold (except for the CPU which I think will come down after I offload some of the devices to the poller) how do you suggest we start troubleshooting the rrdcached?
Thanks,
Santosh Kotla
Hi @kalamchi75
For a start, I have ensured that the ownership of the rrdcached files is with librenms using the command from the install guide and then enabled the rrdcached command in the config that points to the /var/run/rrdcacahed directory. I did a validate and see that there are no issues.
This is what the status of rrdcached looks like:
Will observe the system for a few minutes and keep an eye on the graphs and the htop as well.
Thanks,
Santosh Kotla
Hi @kalamchi75
Bad luck sir
Looks like rrdcached is still not happy:
Any suggestions from your side?
Thanks,
Santosh Kotla
Hi Snatosh,
I think the memory is fine.
The CPU though looks loaded still,. but perhaps as you said offloading some machines to the remote pollers might bring it down.
Now, rrdcached, I am not sure why is it complaining that it can;t read rrd files.
Can you confirm which user owns the directory /opt/librenms/rrd and all its files and sub directories?
Hi @kalamchi75
I have run the command again at the install documents to make sure that the files and subfolders are owned by librenms. Is there a way I can check on the current ownership of them?
On a second note, I see that rrdcached is not happy but looks like my graphs are still plotting! I donât know what that meansâŚ
Thanks,
Santosh Kotla
You can simply do
cd /opt/librenms
ls -lah
This will show you the directories/files and their ownership in the librenms directory
you can then
cd rrd
ls -lah
This will show you the files inside rrd directory their ownership
send me again the output of
systemctl status rrdcached
Hi @kalamchi75
I doublechecked and the ownership of ALL the files are under librenms. The output is quite big so I donât know how to send it to you. If you can please send me the link where I can create a temporary file and share you the link, I will send it across.
This is the status of rrdcached:
Thanks,
Santosh Kotla
Thatâs fine, if you can see the files there owned by librenms:librenms thatâs enough you donât have to send it.
Please resend the status showing the full lines (after -B -R âŚ)
and send me your rrdcached configuration. I will compare it against mine. You mask any passwords or IP addresses.
Hi @kalamchi75
This is the rrdcached config file:
DAEMON=/usr/bin/rrdcached
DAEMON_USER=librenms
DAEMON_GROUP=librenms
WRITE_THREADS=4
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
BASE_PATH=/opt/librenms/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/run/rrdcached.pid
SOCKFILE=/run/rrdcached.sock
SOCKGROUP=librenms
BASE_OPTIONS="-B -F -R"
BASE_OPTIONS="-l 0:42217"
BASE_OPTIONS="$BASE_OPTIONS -R -j /var/lib/rrdcached/journal/ -F"
BASE_OPTIONS="$BASE_OPTIONS -b /opt/librenms/rrd -B"
BASE_OPTIONS="$BASE_OPTIONS -w 1800 -z 900"
Output of the requested rrdcached status lines:
Sep 15 10:45:53 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:45:53 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:03 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:03 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:03 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:03 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:04 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:04 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:04 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Sep 15 10:46:04 dc5up-vlibrenms01 rrdcached[12342]: handle_request_update: Could not read RRD file.
Thanks,
Santosh Kotla
Hi @kalamchi75
I just went by confidence of the graphs that I am seeing here even after rrdcached has been put into effect and did the basic configuration on the poller. It comes up as a poller but I donât see it as a part of the cluster yet.
I have installed rrdcached and memcached on the remote poller. It will be polling for group 1 and the central server is polling for group 0.
Thanks,
Santosh Kotla
I donât think you need rrdcached or memcached on the remote poller. It should connect to the rrdcached and memcached of the master server.
Iâm not getting what you mean by cluster ?
Hi @kalamchi75
Alrighty. I have installed them but if you want me to remove it, thatâs possible or I can just stop the processes.
By cluster I mean, it doesnât show up in the poller cluster.
The main central server doesnât show so many issues with the database but the poller complains about it and asks me run commands. I just came to know that the rrdcached and the memcached are not able to speak to the main cluster.
[librenms@dc4up-vlibrenms02 ~]$ ./validate.php
Component |
Version |
LibreNMS |
21.8.0-55-gf3fa2ce1e |
DB Schema |
2021_25_01_0129_isis_adjacencies_nullable (217) |
PHP |
7.3.20 |
Python |
3.6.8 |
MySQL |
10.5.12-MariaDB-1:10.5.12+maria~bionic |
RRDTool |
1.7.0 |
SNMP |
NET-SNMP 5.8 |
==================================== |
|
[OK] Composer Version: 2.1.7
[OK] Dependencies up-to-date.
[OK] Database connection successful
[FAIL] Time between this server and the mysql database is off
Mysql time 2021-09-15 12:09:23
PHP time 2021-09-15 12:06:37
[FAIL] Database: incorrect column (config/config_value)
[FAIL] Database: incorrect column (isis_adjacencies/port_id)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighSysType)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighSysID)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighPriority)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjLastUpTime)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjAreaAddress)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjIPAddrType)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjIPAddrAddress)
[FAIL] Database: missing column (isis_adjacencies/isisCircAdminState)
[FAIL] Database: extra column (ports/ifHighSpeed)
[FAIL] Database: extra column (ports/ifHighSpeed_prev)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (LibreNMS) or the community site (Report database schema issues here - LibreNMS Community):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
ALTER TABLE config
CHANGE config_value
config_value
mediumtext NOT NULL ;
ALTER TABLE isis_adjacencies
CHANGE port_id
port_id
int NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighSysType
isisISAdjNeighSysType
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighSysID
isisISAdjNeighSysID
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighPriority
isisISAdjNeighPriority
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjLastUpTime
isisISAdjLastUpTime
bigint unsigned NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjAreaAddress
isisISAdjAreaAddress
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjIPAddrType
isisISAdjIPAddrType
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjIPAddrAddress
isisISAdjIPAddrAddress
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
ADD isisCircAdminState
varchar(16) NOT NULL DEFAULT âoffâ AFTER isisISAdjIPAddrAddress
;
ALTER TABLE ports
DROP ifHighSpeed
;
ALTER TABLE ports
DROP ifHighSpeed_prev
;
[INFO] Detected Dispatcher Service
[FAIL] Dispatcher service is enabled on your cluster, but not in use on this node
[FAIL] Missing PHP extension: memcached
[FIX]:
Please install memcached
[FAIL] Cannot connect to rrdcached instance
[FAIL] Cannot connect to rrdcached instance
Thanks,
Santosh Kotla
Run:
su librenms
git pull
./daily.sh
once done, run
./validate.php again
then we need to fix the following
[FIX]:
[FAIL] Missing PHP extension: memcached
Please install memcached
[FAIL] Cannot connect to rrdcached instance
[FAIL] Cannot connect to rrdcached instance
install the php extension, then in your librenms config file, you need to point your poller to connect to mamcached and rrdcached in the main server.
show me please how you enable rrdcached and memcached in both configuration files from the master and poller
Hi @kalamchi75
Sure. I will run all of that and send you the config snippets of both the master and the poller.
./daily.sh seems to be taking a little long at the cleaning DB part at the poller.
Thanks,
Santosh Kotla
Yep, it takes its sweet time. But thatâs normal, give it the time that it needs.
Hi @kalamchi75
This is what I see after running the daily.sh and the validate.
[root@dc4up-vlibrenms02 ~]# su - librenms
Last login: Wed Sep 15 12:06:22 UTC 2021 on pts/0
[librenms@dc4up-vlibrenms02 ~]$ ./daily.sh
Updating to latest codebase OK
Updating Composer packages OK
Updating SQL-Schema OK
Updating submodules OK
Cleaning up DB OK
Fetching notifications OK
Caching PeeringDB data OK
Caching Mac OUI data OK
[librenms@dc4up-vlibrenms02 ~]$ ./validate.php
Component |
Version |
LibreNMS |
21.8.0-55-gf3fa2ce1e |
DB Schema |
2021_25_01_0129_isis_adjacencies_nullable (217) |
PHP |
7.3.20 |
Python |
3.6.8 |
MySQL |
10.5.12-MariaDB-1:10.5.12+maria~bionic |
RRDTool |
1.7.0 |
SNMP |
NET-SNMP 5.8 |
====================================
[OK] Composer Version: 2.1.7
[OK] Dependencies up-to-date.
[OK] Database connection successful
[FAIL] Time between this server and the mysql database is off
Mysql time 2021-09-15 15:54:22
PHP time 2021-09-15 15:51:36
[FAIL] Database: incorrect column (config/config_value)
[FAIL] Database: incorrect column (isis_adjacencies/port_id)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighSysType)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighSysID)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjNeighPriority)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjLastUpTime)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjAreaAddress)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjIPAddrType)
[FAIL] Database: incorrect column (isis_adjacencies/isisISAdjIPAddrAddress)
[FAIL] Database: missing column (isis_adjacencies/isisCircAdminState)
[FAIL] Database: extra column (ports/ifHighSpeed)
[FAIL] Database: extra column (ports/ifHighSpeed_prev)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (https://t.libren.ms/discord) or the community site (https://t.libren.ms/5gscd):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
ALTER TABLE config
CHANGE config_value
config_value
mediumtext NOT NULL ;
ALTER TABLE isis_adjacencies
CHANGE port_id
port_id
int NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighSysType
isisISAdjNeighSysType
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighSysID
isisISAdjNeighSysID
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjNeighPriority
isisISAdjNeighPriority
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjLastUpTime
isisISAdjLastUpTime
bigint unsigned NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjAreaAddress
isisISAdjAreaAddress
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjIPAddrType
isisISAdjIPAddrType
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
CHANGE isisISAdjIPAddrAddress
isisISAdjIPAddrAddress
varchar(128) NULL ;
ALTER TABLE isis_adjacencies
ADD isisCircAdminState
varchar(16) NOT NULL DEFAULT âoffâ AFTER isisISAdjIPAddrAddress
;
ALTER TABLE ports
DROP ifHighSpeed
;
ALTER TABLE ports
DROP ifHighSpeed_prev
;
[INFO] Detected Dispatcher Service
[FAIL] Dispatcher service is enabled on your cluster, but not in use on this node
[FAIL] Missing PHP extension: memcached
[FIX]:
Please install memcached
[FAIL] Cannot connect to rrdcached instance
[FAIL] Cannot connect to rrdcached instance
Following are the snippets from my config file for rrdcached and distributed polling. Please let me know if you want to take a look at the other parts of the config as well.
Poller:
// Distributed Poller-Settings
$config[âdistributed_pollerâ] = true;
// optional: defaults to hostname
$config[âdistributed_poller_nameâ] = php_uname(ânâ);
$config[âdistributed_poller_groupâ] = â1â;
$config[ârrdcachedâ] = âdc5up-vlibrenms01.example.com:42217â;
#$config[ârrdtool_versionâ] = â1.7.0â;
$config[âdistributed_poller_memcached_hostâ] = âdc5up-vlibrenms01.example.comâ;
$config[âdistributed_poller_memcached_portâ] = 11211;
Central Server:
// Distributed Poller-Settings
$config[âdistributed_pollerâ] = true;
// optional: defaults to hostname
$config[âdistributed_poller_nameâ] = php_uname(ânâ);
$config[âdistributed_poller_groupâ] = â0â;
#$config[ârrdcachedâ] = âdc5up-vlibrenms01.example.com:42217â;
$config[âdistributed_poller_memcached_hostâ] = âdc5up-vlibrenms01.example.comâ;
$config[âdistributed_poller_memcached_portâ] = 11211;
Thanks,
Santosh Kotla
Hi @kalamchi75
The memcached service is running as per the service status:
[root@dc4up-vlibrenms02 ~]# service memcached status
Redirecting to /bin/systemctl status memcached.service
â memcached.service - memcached daemon
Loaded: loaded (/usr/lib/systemd/system/memcached.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2021-09-15 12:06:17 UTC; 16h ago
Main PID: 1269403 (memcached)
Tasks: 10 (limit: 76498)
Memory: 3.6M
CGroup: /system.slice/memcached.service
ââ1269403 /usr/bin/memcached -p 11211 -u memcached -m 64 -c 1024 -l 127.0.0.1,::1
Sep 15 12:06:17 dc4up-vlibrenms02 systemd[1]: Started memcached daemon.
But it still asks me to install memcached.
Any thoughts on this one Sir?
Thanks,
Santosh Kotla
Good morning Santosh,
You can call me Ali by the way
Here is what i think :
your remote poller is unable to connect to your masterâs rrdcached service. Check any firewall that might blocking access. on your remote poller run:
nmap -p 42217 <YOUR MASTER SERVER IP>
and see whatâs the status of the port.
Do the same command and test port 11211 as well.
As for memcached, I think it asks you to install the PHP extension. Try
apt install php-memcached
or php7-memcached (you might need to check the exact name of the extension).
As for the database failures, perhaps you need to open a separate thread just for that, and ask LibreNMS team for advice on what to do. I am not sure why the database is issuing those failure warnings.
You definitely need to find out also why your PHP and MySQL have difference in time.