Good afternoon,
We currently have some services configured via a Nagios plugin which graphs span loss on each ‘side’ (span) connected to the ROADM node. This is working great, but I noticed that we have one particular device in which graphs have not been updating after adding a new side to our ROADM node a few months ago. All graphing has now stopped for the service configured on that device. It appears that the RRD file is not actually updating. To start, here is the output of validate.php:
./validate.php
====================================
Component | Version
--------- | -------
LibreNMS | 22.3.0-10-gcc7345d54
DB Schema | 2022_02_03_164059_increase_auth_id_length (235)
PHP | 7.4.28
Python | 3.8.13
MySQL | 10.5.15-MariaDB-1:10.5.15+maria~bionic
RRDTool | 1.7.0
SNMP | 5.7.3
====================================
[OK] Composer Version: 2.2.9
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
I am able to replicate the service config for that device, which runs and updates the new RRD file as expected. The following is the debug output of the ./check-services.php command for the node in question: (Note, Service - 54 is the ‘old’ service, while Service - 74 is the ‘new’ service)
Nagios Service - 54
Request: '/usr/lib/nagios/plugins/check_ons_span_loss' '-H' '10.67.11.11'
Perf Data - DS: A, Value: 13.2, UOM:
Perf Data - DS: B, Value: 14.9, UOM:
Perf Data - DS: C, Value: 15.9, UOM:
Perf Data - DS: D, Value: 12.5, UOM:
Response: OK
Service DS: {
"A": "",
"B": "",
"C": "",
"D": ""
}
RRD[last 10.67.11.11/services-54.rrd --daemon unix:/var/run/rrdcached.sock]
RRD[update 10.67.11.11/services-54.rrd N:13.2:14.9:15.9:12.5 --daemon unix:/var/run/rrdcached.sock]
Nagios Service - 74
Request: '/usr/lib/nagios/plugins/check_ons_span_loss' '-H' '10.67.11.11'
Perf Data - DS: A, Value: 13.2, UOM:
Perf Data - DS: B, Value: 14.9, UOM:
Perf Data - DS: C, Value: 15.9, UOM:
Perf Data - DS: D, Value: 12.5, UOM:
Response: OK
Service DS: {
"A": "",
"B": "",
"C": "",
"D": ""
}
RRD[last 10.67.11.11/services-74.rrd --daemon unix:/var/run/rrdcached.sock]
RRD[update 10.67.11.11/services-74.rrd N:13.2:14.9:15.9:12.5 --daemon unix:/var/run/rrdcached.sock]
As you can see, both services run and appear to update the associated rrd file, but when I review the last changes to the RRD file, it does not appear to be making modifications:
-rw-rw-r-- 1 librenms librenms 510200 Mar 3 13:46 services-54.rrd
-rw-r--r-- 1 librenms librenms 679664 Mar 25 13:17 services-74.rrd
We had previously been graphing on sides A, B, and C, before adding side D, which is when the graphing stopped. While re-creating the service does appear to fix our issue, the point of this service addition is to have the ability to maintain historical graphing for each side. I’ll admit I’m fairly ignorant when it comes to the RRDtool, so I may just be missing something. Is it possible the service graphing stopped due to adding a new field, which is side ‘D’? I did notice that we’re having issues with RRDtune updating a few interfaces that are over 100G as well, so is it possible I have something misconfigured? If there is other command output that would be helpful, I would be happy to get it. Any help would be appreciated!