Rrd_fetch_r failed

Tags: #<Tag:0x00007fd8942625e0>

I am having an issue with some of my graphs being created and was hoping to get some insight as to what might be going on here. I am running LibreNMS on an Ubuntu 20.04 LTS server with rrdcache. I’ve followed all of the documentation to set it up and for the most part it all seems to be working well with the exception of the following errors that I’m seeing under Poller > Performance. See Attached Screenshots

I’ve run the ./daily.sh and the ./validate.php scripts and receive no errors. The poller-perf.rrd file does exist and permissions seem to be correct but it appears no data is bring written. Could someone help me figure out what might be going on and point me in the right direction as to where to start looking? I’ve spent a few days trying to figure this out and I’m not having much success.

[email protected]:~$ ./validate.php

Component Version
LibreNMS 21.8.0-50-g055895e4a
DB Schema 2021_25_01_0129_isis_adjacencies_nullable (217)
PHP 7.4.3
Python 3.8.10
MySQL 10.3.31-MariaDB-0ubuntu0.20.04.1
RRDTool 1.7.2
SNMP NET-SNMP 5.8

====================================

[OK] Composer Version: 2.1.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[email protected]:~$

[email protected]:/opt/librenms/rrd/216.21.15.135# ls -al
total 20484
drwxrwxr-x+ 2 librenms librenms 4096 Sep 8 20:02 .
drwxrwxr-x+ 926 librenms librenms 36864 Sep 10 17:47 …
-rw-r–r-- 1 librenms librenms 171272 Sep 11 13:14 availability-2592000.rrd
-rw-r–r-- 1 librenms librenms 171272 Sep 11 13:30 availability-31536000.rrd
-rw-r–r-- 1 librenms librenms 171272 Sep 11 13:59 availability-604800.rrd
-rw-r–r-- 1 librenms librenms 171272 Sep 11 13:59 availability-86400.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 netstats-snmp.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 ping-perf.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-applications.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-availability.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-bgp-peers.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-core.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-customoid.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-entity-physical.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-hr-mib.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-ipmi.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-ipSystemStats.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-mempools.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-mpls.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-netstats.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-ntp.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-ospf.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-os.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf-ports.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:30 poller-perf-processors.rrd
-rw-r–r-- 1 librenms librenms 0 Sep 8 19:31 poller-perf.rrd

When I look at the RRD command, I see the following error:

ERROR: [email protected]:/run/rrdcached.sock: rrd_fetch_r failed: mmaping file ‘/opt/librenms/rrd/216.21.15.135/poller-perf.rrd’: Invalid argument

Thanks in advance for the help.

Many of those files have 0 size.

Are you perhaps out of disk space? (or were you recently?)

If not, delete all the 0 sized files and let LibreNMS recreate them

Hello!

Thanks for your response. Disk space doesn’t appear to be an issue. I’m not seeing any errors indicating that I am out of space. I’m curious what the invalid argument error at the end of the rrd command is all about.

I went ahead and removed all the 0 sized files per your suggestion. Watching to see if they are recreated.

Thank You!

After deleting the 0 sized files. LibreNMS did recreate them and it is now writing to them. However I still have the same error on now another device. I suspect I have 0 sized files on all of my devices that once removed and recreated it will resolve the problem. It’s looking like I have some work to do to clear them out. I have over 1000 devices. This is going to be fun. :joy:

I was able to use a find command to search for all the 0 sized files in my rrd directly and remove them quickly. The graphs are now showing under Total Poller Time but now I get broken images under Total Poller Time Per Module.

If I click on one of the broken images to see if it gives me an error or a clue as to what’s going on. It will spin it’s wheels for about a minute and then I get a gateway timeout error. I’m not sure what is causing the gateway timeout now. It wasn’t doing that before I deleted the 0 sized files.

I am running rrdcached and when I do a systemctl status on it. It did initially give me an error about it not being able to read an RRD file. I restarted rrdcached and that error is now gone and it seems to restart normally. I also restarted nginx and php7.4-fpm services and they restart normally with no errors.

I ran ./daily.sh and ./validate.sh and get no errors.

Could there be something in my cache that is causing it to hang up and give me gateway timeouts? Any thoughts?

Probably not all the rrd files exist for the total poller time graph. Give it time to get data back and it should be working again.

The total poller time graphs are showing up. It’s the Total Poller Time Per Module graphs below those graphs that are broken as shown in my screenshot. The other day I tweaked some php-fpm settings to increase the number of servers and I did get one of those graphs to show up but now it’s back to the broken images again. It’s been days and still nothing appears. I’m not sure how much time needs to pass before they will show up but it doesn’t seem normal to have broken images.

Thanks Again