High slab memory

sjohnson · 2 January 2018 16:34

Hi,

We have an issue where the server’s cache memory keeps steadily increasing, up to a point where the system has no more free memory.

The memory had been cleaned with echo 3 > /proc/sys/vm/drop_caches, but started to increase steadily again ever since. We’re currently running the following version on CentOS 7:

Version	1.34 - Sat Nov 25 2017 06:48:49 GMT-0500
DB Schema	#217
Web Server	nginx/1.10.2
PHP	7.0.22
MySQL	5.5.52-MariaDB
RRDtool	1.6.0

There was one element from the configuration validation that I noticed hadn’t been cought, which was that php-fpm was running under user apache, which hadn’t been added to the librenms group. I’ve just corrected that part, but was there any chance it was related?

Also, we’re still having issues with RRD defunct processes that we have to clean up on a regular basis, as per this post:

The last time we cleaned them up we had 17 of them (Dec. 21st) and we’re up to 16 again. Any other clue as to what could be causing this? Unfortunately, we hadn’t been able to find anything in the previous thread.

Thanks again!

murrant · 2 January 2018 19:36

https://www.linuxatemyram.com/

It also looks like you are quite a bit out of date. We just release 1.35.

sjohnson · 2 January 2018 20:46

Thanks for the reply. We’ll upgrade to 1.35, but the memory usage is all slabdata used by dentry. Based on what we’ve read on various articles, this especially seems to create this behaviour when there are processes that deal with huge number of files, such as in the millions. Is this the case with librenms? I’ve read that it could be normal, but our system lead is especially worried as we had started to see performance issues once the server was “stuck” in that high range (including having a few BFD hickups) and even a RHEL article confirms that it can reach a point where the performance of the system is impacted:

We can see high dentry_cache usage on the systems those who are running some programs which are opening and closing huge number of files. Sometimes high dentry_cache leads the system to run out of memory in such situations performance gets severly impacted as the system will start using swap space.

Ref: The dentry_cache / dentry slab cache size continually grows on Red Hat Enterprise Linux - Red Hat Customer Portal

We haven’t seen that behaviour on other systems, which raises the curiosity even more.

Thanks again

Kevin_Krumm · 2 January 2018 23:43

have you gone through the performance doc? https://docs.librenms.org/#Support/Performance/

sjohnson · 2 January 2018 23:57

I don’t remember reading all of those in the past. Don’t know how long they’ve been there or if I had just missed them, but I’ll definitely be going through all of them tomorrow, thanks.

sjohnson · 3 January 2018 22:27

OK, here’s a current recap. We’ve switched to RRDCached, did some MySQL optimizations and dug a bit more, as it might indeed be nothing to worry about, but we really would like to make sense into all of it and understand what’s happening.

We’ve noticed that we have around 82K new dentry objects after each 5 min cron iteration. Is that considered normal from what you’ve experienced? We’ve never had a system that had that many entries (up to 110G the last time), so we’re really trying to figure what’s causing this. And our installation is still pretty small (see below). If you have a clue, that would be great, otherwise, we’ll keep digging.

Thanks!

Kevin_Krumm · 3 January 2018 22:31

RRDCaching should help allot. How many devices are you polling, ports, sensors? Also hows your poller time? Is staying below 300 seconds?

sjohnson · 3 January 2018 22:32

Sorry, I just added up the inventory as indeed, I noticed it was a key element missing right after I posted

As for the time it takes, its just a little over a minute at the most.

Kevin_Krumm · 3 January 2018 22:34

Looks good then. I would watch it see if the RRDCaching makes a diffrence.

sjohnson · 3 January 2018 22:35

Is it supposed to improve over time? Because I’ve enabled it around noon, and the 82K dentry stats is from about 20 minutes ago.

Kevin_Krumm · 3 January 2018 22:35

Yes. https://blog.librenms.org/2017/10/09/performance-when-rrdcached-can-reduce-your-write-iops-by-3000/

sjohnson · 3 January 2018 22:43

OK, thanks for the quick replies. Will read keep an eye on it then.

sjohnson · 5 January 2018 13:25

After close to 48h after the changes, there really hasn’t been any improvement whatsoever, neither on the IOPS or the dentry object count… Any idea if that can be normal and/or what can I check to validate/improve?

Thanks again!

Kevin_Krumm · 5 January 2018 14:46

Are you sure RRDcaching is running?

sjohnson · 5 January 2018 14:51

Yes it is, and with all the settings given in the RRDCached installation CentOS 7 part of the guide.

And have this in the config, as per the guide as well:
$config['rrdcached'] = "unix:/run/rrdcached.sock";

Kevin_Krumm · 5 January 2018 14:53

Can you check status of the service?

sjohnson · 5 January 2018 17:30

Yes, it was running/enabled and I restarted it to be sure, still no change:

$ systemctl status rrdcached
● rrdcached.service - Data caching daemon for rrdtool
   Loaded: loaded (/etc/systemd/system/rrdcached.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-01-05 09:56:54 EST; 2h 30min ago
  Process: 29232 ExecStart=/sbin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/run/rrdcached.sock -t 4 -F -b [path]/librenms/rrd/ (code=exited, status=0/SUCCESS)
 Main PID: 29233 (rrdcached)
   CGroup: /system.slice/rrdcached.service
           └─29233 /sbin/rrdcached -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/run/rrdcached.sock -t 4 -F -b [path]/librenms/rrd/

Kevin_Krumm · 5 January 2018 17:41

I’m sorry I really don’t know what else suggest I have never seen this on any my servers or with other users with LibreNMS or Network Monitoring in general. I’m not even sure if its a real issue.