I recently enabled Services (using Nagios plugins) to be able to monitor Cisco Smart Licensing since it’s not natively supported in LibreNMS. I’m currently running LibreNMS 21.12.1 with patches from PRs 13663 and 14532 applied (I realize it’s old), in a distributed setup with rrdcached and redis, using the dispatcher service. I’m using the check_nwc_health plugin, installed on master server and all pollers that are currently being used for testing.
If I run the check manually (as librenms
user), everything works as expected (from both pollers):
[librenms@distributedpoller01 ~]$ /usr/lib64/nagios/plugins/check_nwc_health -H <removed> --mode check-licenses --community <removed> --warning 30 --critical 15; echo $?
OK - smart licensing is not enabled
0
[librenms@distributedpoller02 ~]$ /usr/lib64/nagios/plugins/check_nwc_health -H <removed> --mode check-licenses --community <removed> --warning 30 --critical 15; echo $?
OK - smart licensing is not enabled
0
However, when I configure the service on a host, I’m constantly seeing the service state flapping, and multiple events per polling cycle:
2024-07-22 02:20:03 service <removed> Service 'nwc_health' changed status from OK to Critical - Check the installed licences/keys -
2024-07-22 02:20:03 service <removed> Service 'nwc_health' changed status from OK to Critical - Check the installed licences/keys -
2024-07-22 02:20:03 service <removed> Service 'nwc_health' changed status from OK to Critical - Check the installed licences/keys -
2024-07-22 02:20:03 service <removed> Service 'nwc_health' changed status from OK to Critical - Check the installed licences/keys -
2024-07-22 02:20:03 service <removed> Service 'nwc_health' changed status from OK to Critical - Check the installed licences/keys -
2024-07-22 02:20:02 service <removed> Service 'nwc_health' changed status from Critical to OK - Check the installed licences/keys - OK - smart licensing is not enabled
It always seems to start every polling cycle with one successful poll, and one or more failed “polls” in the event log.
If I run ./check-services.php -d
, it appears to work correctly (despite a couple warnings about undefined array keys that seem to be related to the fact there’s no performance data from the plugin):
[librenms@distributedpoller0 ~]$ ./check-services.php -d
DEBUG!
Starting service polling run:
SQL[SELECT D.*,S.*,attrib_value FROM `devices` AS D INNER JOIN `services` AS S ON S.device_id = D.device_id AND D.disabled = 0 LEFT JOIN `devices_attribs` as A ON D.device_id
= A.device_id AND A.attrib_type = "override_icmp_disable" ORDER by D.device_id DESC; [] 8.76ms]
Nagios Service - 3
Request: '/usr/lib64/nagios/plugins/check_nwc_health' '-H' '<removed>' '--mode' 'check-licenses' '--community' '<removed>' '--warning' '30' '--critical' '15' '-t' '60'
Warning: Undefined array key 1 in /opt/librenms/includes/services.inc.php on line 242
Warning: Undefined array key 1 in /opt/librenms/includes/services.inc.php on line 253
Perf Data - None.
Response: OK - smart licensing is not enabled
./check-services.php 2024-07-23 05:14:07 - 1 services polled in 1.094 secs
In an attempt to debug this, I wrote a quick wrapper around the Nagios plugin to get a tiny bit of extra logging.
check_licenses_wrapper.sh:
#!/bin/bash
TIMESTAMP=`date +%s%N | cut -b1-13`
FILENAME="/tmp/check_licenses-$TIMESTAMP.debug"
IFS=" " read -r -a ARGS <<< "$*"
OUTPUT=$(/usr/lib64/nagios/plugins/check_nwc_health "${ARGS[@]}" 2>&1)
RETURN=$?
echo "ARGS: $*" >> $FILENAME
echo "OUTPUT: $OUTPUT" >> $FILENAME
echo "RETURN: $RETURN" >> $FILENAME
echo $OUTPUT
exit $RETURN
I then removed the old service, added a new one using the wrapper and waited to see which files were created (it creates a new “debug” file per each time it’s executed):
[librenms@distributedpoller01 ~]$ ls /tmp/check_licenses-*
ls: cannot access '/tmp/check_licenses-*': No such file or directory
[librenms@distributedpoller02 ~]$ ls -alh /tmp/check_licenses-*
-rw-r--r--. 1 librenms librenms 153 Jul 22 07:35 /tmp/check_licenses-1721633701255.debug
Check the contents of the single debug file:
[librenms@distributedpoller02 ~]$ cat /tmp/check_licenses-1721633701255.debug
ARGS: -H <removed> --mode check-licenses --community <removed> --warning 30 --critical 15 -t 60
OUTPUT: OK - smart licensing is not enabled
RETURN: 0
So the check is running only once per polling cycle (and it’s sucessful), but I’m getting mutiple events per polling cycle.
Does anyone know how to go about troubleshooting this? I’m nearly at my wits’ end trying to figure this out.
My ./validate.php
:
./validate.php
====================================
Component | Version
--------- | -------
LibreNMS | 21.12.1
DB Schema | 2021_02_09_122930_migrate_to_utf8mb4 (233)
PHP | 8.0.30
Python | 3.9.18
MySQL | 10.11.5-MariaDB
RRDTool | 1.7.2
SNMP | 5.9.1
====================================
[OK] Composer Version: 2.7.7
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[INFO] Detected Dispatcher Service