Performance & RRDcached

seti · 11 January 2017 16:21

Hello,

I have setup Librenms with the recommended rrdcached, cause the instance was slow, espacially the alert list is very slow.
but with rrdcached activated it was alot slower than befor.
maybe there is less IO but it seems that the cpu is used massively more.
it was unusable with rrdcached activated so i was needed to deactivate it again.

someone else recognized a similar behavior?

(i’ve asked in irc, but lost log on restart, so I’m sorry for asking again)

it would be nice if i get help, or hints where to start searching.

kind regards,
sebastian

murrant · 11 January 2017 16:26

The alert list doesn’t have anything to do with rrdcached really.

What is your resource utilization like on your server?
How many devices are you monitoring and what are the specs of your server?

seti · 11 January 2017 16:37

the server is a virtual machine with following specs:

16vcores
16gb ram
anything else relevant?

i’m monitoring 75 devices.

the overall utilization is not really high.
I’m using the dashboard with some widgets.

Availability-map
Device summary vert
Top 5 devices with traffic
top 5 devices with load
top 5 interface
unacknowledged alerts
external image with link

on dashboard reload i can see in htop 2 rrdcached processes every with about 75% cpu utilization, for about 20 seconds, if they are done the dashboard loads the widgets and the graphs, and starts typing loading on the alert widget, and a few seconds later i get the alerts list.
i think this is not really normal for that few devices.

if i disable rrdcached its the dashboard loading time is about 10 seconds instead of 20 - 30

i think both is too slow, but dont know where to start. and with rrdcached it gets slower is very strange.

murrant · 11 January 2017 22:39

Can you post the result of ./validate.php and your rrdcached settings in config.php?

seti · 12 January 2017 10:17

==========================================================

Component	Version
LibreNMS	c5a3d82040083e2f366da29958ec8a6b62ba7bbb
DB Schema	153
PHP	7.0.8-0ubuntu0.16.04.3
MySQL	5.6.35
RRDTool	1.5.5
SNMP	NET-SNMP 5.7.3

==========================================================

Settings in config.php:
$config[‘rrdtool_version’] = “1.5.5”;
$config[‘rrdcached’] = “unix:/var/run/rrdcached/rrdcached.sock”;

laf · 12 January 2017 22:16

Are you actually running rrdcached 1.5.5?

seti · 14 January 2017 12:47

Yes it is rrdcached 1.5.5

laf · 15 January 2017 19:29

Have you gone through the performance guide and tried to optimise things?

seti · 15 January 2017 19:53

This one: http://docs.librenms.org/Support/Performance/ ??
I started there, first thing is rrdcached.

laf · 15 January 2017 20:07

Yes but the rest is all relevant as you reduce load in polling / discovery then you allow headroom elsewhere.

seti · 16 January 2017 08:29

I have checked the other things, and done the matching ones.
There is no load on the machine.
So there should be room for more performance.
If I’m wait on graphs i can see 2 processes of rrdcached with about 75% cpu usage.
but not more. is there a limit?
The machine has 16 cores available.

laf · 18 January 2017 14:17

You need to give some more info back, what have you changed if you’ve followed the performance doc?

What’s slow?

Have you checked the logs on your server, especially for rrdcached?

seti · 22 January 2017 14:00

It is stuck if rrdcache is on processing data.
iotop shows me that rrdcached used nearly 100% of iops.

Other things i found is from mysqltuner:
[!!] Joins performed without indexes: 173486
[!!] Temporary tables created on disk: 54% (21K on disk / 38K total)

My setup ist:

Librenms
memcached
rrdcached
mysql with innodb_flush_log_at_trx_commit = 0

any hint where to go next to get more performance?

laf · 22 January 2017 17:09

I’d say post your rrdcached startup config.

Post more info on what storage this vm is on.

seti · 22 January 2017 17:16

RRDcached Startup:
rrdcached -g -w 1800 -z 1800 -f 3600 -s librenms -U librenms -G librenms -B -R -j /var/tmp -l unix:/var/run/rrdcached/rrdcached.sock -t 4 -F -b /data/rrd/

storage system is a netapp FAS8020 in cluster mode. this shouldn’t be a problem (hopefully)

f0o · 5 February 2017 15:33

can you inspect which files are being accessed?
There’s a general idea (not LibreNMS origin) of running the rrdcached journal directory in ramdisk and flushing it to disk once an hour.
If your process takes too long for files in /var/tmp then move that to a ramdisk.
However if your process takes too long for real rrd files, then adjust the buffers.

Being a SAN, is it FC or FCoE? or iSCSI?
What about the filesystem?

laf · 8 February 2017 23:51

See my response to this thread: LibreNMS is loading very slow despite optimizations

Chas · 11 December 2017 14:23

Hello

I also see a similar behavior to @seti running this on a VM platform connected to a SAN, and RRDcached creates severe IO so much so the box is nearly unusable at times. System load averages were high as a result of the system waiting. In-fact I think this caused the gaps in the graphs I’ve been previously experiencing.

After RRDcached is turned off, atop and iotop shows my disk going down from 110% to 2% and everything is back to being responsive.

I think to be running RRDcached it really needs to be on a standalone server with a local disk.Though according to the RRDcached website there is a strange mention of IO.

The daemon was written with big setups in mind. Those setups usually run into IO related problems sooner or later for reasons that are beyond the scope of this document.

I can provide some screenshots if anyone’s interested.

laf i will look into your suggestion and thanks for fixing my last problem!