Nagios Plugin OK: checking vs OK (no colon after)

Hey all.

I have a question about how the Nagios service checks trigger a down. From what I’ve been able to tell, it triggers on “OK:” to show green/good, but I am testing the “check_rbl” plugin which is coming back with an OK (notice there’s no colon after OK) and LibreNMS is tagging it as a down.

I ran a diagnostic -d on the check-services.php and it comes back with an OK, but the only difference on the results line is that this OK doesn’t have a colon immediately following the OK. I was wondering if that’s what is triggering the down?

The plugin I’m testing is “https://github.com/matteocorti/check_rbl

====================================

Component Version
LibreNMS 1.31.03-60-g838f91a
DB Schema 207
PHP 7.0.22
MySQL 5.5.56-MariaDB
RRDTool 1.4.8
SNMP NET-SNMP 5.7.2

====================================

I’m just trying to see if it’s something in LibreNMS or if its something I need to take up with the guy who wrote the check_rbl plugin. I’ve been using this on Nagios for quite a while without any issue, but as I’ve been porting my Nagios checks over to Libre, this one popped up as a problem child.

Thanks much!

P

1 Like

I think it’s the actual status of the script we check, i.e run check_rbl with the params used and then do echo $? and see what is output.
You should pastebin the outpug of ./check-services.php -d

The funny thing is, the results are coming the same either way that I run the plugin - it comes back OK. Due to the sensitivity of the IP addresses and our company information, I’ve blanked out that data in this screenshot:

Here’s what the services screen looks like:

It’s not showing a message as to why its failing in the Services view, but clearly, the check result comes back with an OK.

That’s why i was wondering if the check for an OK is looking for OK: with the colon and this plugin isn’t throwing a colon at the end of the result…

Copy that sql query and run it in mysql.

That update is showing that the status is 0 so should be green, I’m betting the query is failing.

Here’s what I get…

It looks like it’s sending the data okay then… but LibreNMS isn’t picking up on it…

That indicates it’s updated ok - is it not showing green in the webui (green comes from the status being 0 which this is).

Show the output of:

SELECT * FROM services WHERE service_id=5;

Here you go:

That shows a status of 2.

Run the update query then the select straight after, see if it’s 0 or 2.

Here you go. This is back-to-back query running:

Ok, if that resets after then something is changing it. Did you run the check-services.php as librenms user?

Here’s a rerun of it as the “librenms” user and then a manual run of the update/select statements back to back:

err… now #5 shows fine under the UI / services with the CHECK_RBL:

I checked the cron setting:
*/5 * * * * librenms /opt/librenms/check-services.php >> /dev/null 2>&1

it Should be running under the correct user, but I can’t confirm that is.

It went green when I manually ran the update, but went back to red when the poller ran it.

Not sure what else to suggest. You could try and disable that cron entry, run the update and see if it stays green. If it doesn’t something else is updating the db.

Well, I gave it another try:

  1. I disabled the entry in the cron for the services poller.
  2. I ran the check-services.php manually as root
  3. The web UI shows green for the 2 RBL checks.
  4. I wait 7 minutes.
  5. su to librenms
  6. I run the check-services.php manually
  7. The web UI shows green on the 2 RBL checks
  8. After re-enabling the cron job in /etc/cron.d/librenms, it runs and the web UI goes red and I get an alert.

I checked permissions and ownership and ran validate.php and all looks good.
Does /etc/crond.d/librenms have to be under the librenms group?

Thanks,
P

[root@web01 ~]# ll /etc/cron.d/librenms
-rw-r--r-- 1 root root 827 Jul 24 10:39 /etc/cron.d/librenms

You could update cron to run check-services.php -d >> /opt/librenms/logs/services.log, then check what’s going on from that log?

This is interesting… So I did what you mentioned and this is the result:

I also checked to see if the librenms user can access the .ini config file for the rbl checker and it should be able to see it:

image

SELinux maybe?

Well, i know that I’ve run the commands specified as part of the installation document because I missed them initially and posted about it a week or two ago.

I just turned off SELinux and rebooted to see if that would make any difference, but so far, nothing.
I also changed the location of the .ini file to live in the /opt/librenms folder in case there’s some where permissions issue, but again, nothing. It is still erroring out, but when I run ./check-services.php -d manually, it comes back okay.

I also tried chmod 777 on the ini file to make sure the world had access to it in case that was the problem, but it’s still throwing the same error.

Surprise… I got it working.
It seems it’s something to do with the check_rbl plugin. The latest version seems to have an issue with LibreNMS’s way of polling the service because as soon as I put in a much older version of the plugin in, it worked fine.

1 Like