Rrdcached slow start with large set of RRD files

I opened an issue on github for rrdtool also https://github.com/oetiker/rrdtool-1.x/issues/1277

I only post this to here because maybe someone have some experience about this scale.
I have two LibreNMS instance, each produces and updates 500-800k rrd files. When I restart any of them, rrdcached starts very slow, it takes ~2 hours.
command line options:
/usr/bin/rrdcached -B -F -R -w 1800 -z 1800 -l 0:42217 -b /opt/librenms/rrd/ -j /var/lib/rrdcached/journal/ -G librenms -U librenms -p /run/rrdcached.pid -l unix:/run/rrdcached.sock
RRD files is on 1TByte XFS volume, blkid:
/dev/vdb: LABEL="RRD_XFS" UUID="78752a7b-bd48-4aa3-bbc2-01928c1695e8" BLOCK_SIZE="4096" TYPE="xfs"
When rrdcached starting up, I see 50-70 reading IOPs with 300-400kByte/sec I/O BW. After rrdcached started everything works fine.
First of all I need to find out what causes this very slow startup. Any opinion would be appreciated. Maybe I need to turn off the journal in rrdcached? It would be help if I move from XFS to ZFS and using persistent L2ARC?
Thanks!
Best Regards,
Tibor

I tried ZFS and BTRFS, and the consequences is:

  1. CoW filesystems is not suitable for RRDs. ZFS fragmentation hits 70% after 10 days, BTRFS also fragmented itself, 300GB of RRDs used 450GB after a week. IO and CPU Load is at least 2 times highers then on any of EXT4/XFS filesystems.
  2. If you have hundreds of thousands of RRDs and using journal in rrdcached then you must use SSD especially on network storage otherwise rrdcached starts might takes hours because of the lot of synchronous random 4k reads.
1 Like