Group of False Positives, Reachable but wont Transport Recovery

The fact that the service scripts are called in the wrong directory is clear. Correct display is :

Request:  '/usr/lib/nagios/plugins/check_dns' '-H' 'www.google.com' '-s' '192.168.168.234' 

This variable is defined in config.php.

So:

  • either the definition is not done correctly (you checked and it seems OK with quotes etc).
  • either the definition is overriden later in the file.

You can edit the file includes/services.inc.php at line 124 and below to add some debug data, this is where the command string is built :
$check_script = Config::get('install_dir') . '/includes/services/check_' . strtolower($service['service_type']) . '.inc.php';

That should put you (and us) on tracks to understand what’s going on.

Bye

I’m not a PHP expert or even a how would you recommend I add that debug info. That code there is exactly what is already in the file. I did in looking around the services directory /opt/librenms/includes/services find some curious items.

When I look at one of the ‘bundled’ services you are grabbing the check command directory like this:

$check_cmd = \LibreNMS\Config::get(‘nagios_plugins’) . "/check_graylog " . $service[‘service_param’];

Our custom include scripts are like this:
$check_cmd = $config[‘nagios_plugins’] . "/check_timing_status_365 ".$service[‘hostname’];

Based on the output with the check-services script just showing the /check_timing_status_365 not found, I bet it’s not finding the directory in the first part of the string. Let me test this and see what it does.

Sure as snot that was it!

I ran my check-services.php -d and on the check_timing_status_365 services I get data:

Nagios Service - 31
Request: ‘/usr/lib/nagios/plugins/check_timing_status_365’ ‘10.10.170.132’
Perf Data - DS: Timing, Value: 0, UOM:
Response: OK
Service DS: {
“Timing”: “”
}
RRD[last 10.10.170.132/services-31.rrd --daemon libresql.cerento.com:42217]
RRD[update 10.10.170.132/services-31.rrd N:0 --daemon libresql.cerento.com:42217]
SQL[SELECT devices.*, location, lat, lng FROM devices LEFT JOIN locations ON devices.location_id=locations.id WHERE device_id = ? [487] 1.61ms]

SQL[SELECT * FROM devices_attribs WHERE device_id = ? [487] 1.28ms]

SQL[SELECT * FROM vrf_lite_cisco WHERE device_id = ? [487] 1.09ms]

SQL[INSERT IGNORE INTO eventlog (device_id,reference,type,datetime,severity,message,username) VALUES (:device_id,:reference,:type,:datetime,:severity,:message,:username) {“device_id”:487,“reference”:31,“type”:“service”,“datetime”:“2019-07-10 07:08:43”,“severity”:4,“message”:“Service ‘timing_status_365’ changed status from Critical to OK - GPS Sync - OK”,“username”:""} 2.7ms]

SQL[UPDATE services set service_changed=?,service_status=?,service_message=? WHERE service_id=? [1562764123,0,“OK”,31] 1.71ms]

The other services still return not found.

I looked back at some old plugins on one of our test boxes that haven’t been updated in about a year or so and sure enough that’s how the directory used to be called out so I’m guessing that changed on or about 6-23-19 or there’s some sort of bug that’s not honoring that anymore?

The code bellow is the correct form to call config parameters. That’s the one you get in check_services.php if you have an up to date server.

$check_cmd = \LibreNMS\Config::get(‘nagios_plugins’) . "/check_graylog " . $service[‘service_param’];

And this is the old way of doing it that is not supported anymore. You should not have it unless you skipped updates or manually reverted a file to an older version.

$check_cmd = $config[‘nagios_plugins’] . "/check_timing_status_365 ".$service[‘hostname’];

You never mentionned that you were running non standard code here, which explains why we were just repeating “your issue cannot happen”. If you write your own code or change LibreNMS code, then it either break the auto-updates (at least you get notified) or break a feature (that’s what happened to you).

I would suggest, for next time, to either send your custom code via a pull request so it get merged into LibreNMS (which means it gets tested and updated by the community) or avoid custom code.

1 Like

So now I’m confused, I’ve been out while this is happening and set these up originally. We were told by you folks in Custom Nagios Plugin to write these custom modules to allow the custom nagios modules to work, and now you are telling us that we shouldn’t be doing this? Am I understanding this correctly? I want to ensure that if we deploy more of these custom nagios plugin’s that we are doing it correctly to avoid breaking something again in the future.

I’d also like to point out after reading over this thread, it was mentioned several times that we were using custom nagios plugin’s.

No, he’s just saying that accessing $config is no longer supported you have to use \LibreNMS\Config::get(). You need to update your custom code.

You can use custom nagios plugins without adding a custom php handler, fyi.

Yep,
The nagios plugin is fine by itself, but the PHP code that calls the plugin should not be customized, this one is part of LibreNMS. This would ensure compatibility and auto-updates.

It is fine to write custom ones, but you need to remember it is your responsibility to fix them :wink: