Performance & RRDcached

Hello,

I have setup Librenms with the recommended rrdcached, cause the instance was slow, espacially the alert list is very slow.
but with rrdcached activated it was alot slower than befor.
maybe there is less IO but it seems that the cpu is used massively more.
it was unusable with rrdcached activated so i was needed to deactivate it again.

someone else recognized a similar behavior?

(i’ve asked in irc, but lost log on restart, so I’m sorry for asking again)

it would be nice if i get help, or hints where to start searching.

kind regards,
sebastian

The alert list doesn’t have anything to do with rrdcached really.

What is your resource utilization like on your server?
How many devices are you monitoring and what are the specs of your server?

the server is a virtual machine with following specs:

  • 16vcores
  • 16gb ram
    anything else relevant?

i’m monitoring 75 devices.

the overall utilization is not really high.
I’m using the dashboard with some widgets.

  • Availability-map
  • Device summary vert
  • Top 5 devices with traffic
  • top 5 devices with load
  • top 5 interface
  • unacknowledged alerts
  • external image with link

on dashboard reload i can see in htop 2 rrdcached processes every with about 75% cpu utilization, for about 20 seconds, if they are done the dashboard loads the widgets and the graphs, and starts typing loading on the alert widget, and a few seconds later i get the alerts list.
i think this is not really normal for that few devices.

if i disable rrdcached its the dashboard loading time is about 10 seconds instead of 20 - 30

i think both is too slow, but dont know where to start. and with rrdcached it gets slower is very strange.

Can you post the result of ./validate.php and your rrdcached settings in config.php?

==========================================================

Component Version
LibreNMS c5a3d82040083e2f366da29958ec8a6b62ba7bbb
DB Schema 153
PHP 7.0.8-0ubuntu0.16.04.3
MySQL 5.6.35
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

==========================================================

Settings in config.php:
$config[‘rrdtool_version’] = “1.5.5”;
$config[‘rrdcached’] = “unix:/var/run/rrdcached/rrdcached.sock”;

Are you actually running rrdcached 1.5.5?

Yes it is rrdcached 1.5.5

Have you gone through the performance guide and tried to optimise things?

This one: http://docs.librenms.org/Support/Performance/ ??
I started there, first thing is rrdcached.

Yes but the rest is all relevant as you reduce load in polling / discovery then you allow headroom elsewhere.

I have checked the other things, and done the matching ones.
There is no load on the machine.
So there should be room for more performance.
If I’m wait on graphs i can see 2 processes of rrdcached with about 75% cpu usage.
but not more. is there a limit?
The machine has 16 cores available.

You need to give some more info back, what have you changed if you’ve followed the performance doc?

What’s slow?

Have you checked the logs on your server, especially for rrdcached?

It is stuck if rrdcache is on processing data.
iotop shows me that rrdcached used nearly 100% of iops.

Other things i found is from mysqltuner:
[!!] Joins performed without indexes: 173486
[!!] Temporary tables created on disk: 54% (21K on disk / 38K total)

My setup ist:

  • Librenms
  • memcached
  • rrdcached
  • mysql with innodb_flush_log_at_trx_commit = 0

any hint where to go next to get more performance?

I’d say post your rrdcached startup config.

Post more info on what storage this vm is on.

RRDcached Startup:
rrdcached -g -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /data/rrd/

storage system is a netapp FAS8020 in cluster mode. this shouldn’t be a problem (hopefully)

can you inspect which files are being accessed?
There’s a general idea (not LibreNMS origin) of running the rrdcached journal directory in ramdisk and flushing it to disk once an hour.
If your process takes too long for files in /var/tmp then move that to a ramdisk.
However if your process takes too long for real rrd files, then adjust the buffers.

Being a SAN, is it FC or FCoE? or iSCSI?
What about the filesystem?

See my response to this thread: LibreNMS is loading very slow despite optimizations

Hello

I also see a similar behavior to @seti running this on a VM platform connected to a SAN, and RRDcached creates severe IO so much so the box is nearly unusable at times. System load averages were high as a result of the system waiting. In-fact I think this caused the gaps in the graphs I’ve been previously experiencing.

After RRDcached is turned off, atop and iotop shows my disk going down from 110% to 2% and everything is back to being responsive.

I think to be running RRDcached it really needs to be on a standalone server with a local disk.Though according to the RRDcached website there is a strange mention of IO.

The daemon was written with big setups in mind. Those setups usually run into IO related problems sooner or later for reasons that are beyond the scope of this document.

I can provide some screenshots if anyone’s interested.

laf i will look into your suggestion and thanks for fixing my last problem!:slight_smile: