Group of False Positives, Reachable but wont Transport Recovery

And I have a hard time trying to explain you that you need to help us to help you :smiley:

This is a community driven project, I am trying to help you on my free time, and I expect a little bit of cooperation on your side as well.

You could have checked the commits that took place on the 25th and 26th of June, and pinpoint me which one broke your setup. But I already did, before telling you that we did not find any issue. You can check by yourself :

Clearly, something has to be fixed to get your services up and running. It seems that it is not on LibreNMS side, so it has to be on your server, and the only way to find out is that you troubleshoot it on your side, we can’t do it remotely.

If something, in the end, must be changed on LibreNMS side to avoid the issue to re-appear, we will of course do it. But first we need to find the issue.

Bye

1 Like

A bit more on our setup. We have 1 box doing the web interface, 1 box doing the sql database and 2 boxes doing polling.

My config.php file in relation to services:

Show Nagios Plugins

$config[‘show_services’] = 1;
$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

On my pollers the cron job:

*/5 * * * * librenms /opt/librenms/services-wrapper.py >> /dev/null 2>&1

When I switch over to the librenms user here’s what I get for the path, and I can manually run CURL:

$ whoami
librenms
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/puppetlabs/bin
$ curl
curl: try ‘curl --help’ or ‘curl --manual’ for more information

Here is the code behind that particular nagios plugin that is saying it can’t find curl:

$ more check_drobo
#!/usr/bin/perl

use strict;
use warnings;
use WWW::Curl::Easy;

my $ip = shift;
my $url = “http://:5000$ip/”;
my $curl = “curl -s -m 2 $url”;

my $content = $curl;

my %values = ();
my @keys = qw(mTotalCapacityProtected mUsedCapacityProtected mFreeCapacityProtected mTotalCapacityUnprotected mUsedCapacityOS mSlotCountExp );
my @content = split(m/\n/, $content);
my $status;
my $baddrive = 0;
foreach my $line (@content){
foreach my $key (@keys){
if ($line =~ m/<$key>(.*)</$key>/){
$values{$key} = $1;
}
}

    # see http://code.google.com/p/drobowebdashboard/wiki/ESATMUpdate for a half-way decent explanation
    # 98304 seems to be OK. I think only the lower bits mean anything.
    if (!defined $status){
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    $status = $1 & 0xff;
            }
    }
    # drive-specific status
    else{
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    # 3 is the magic number for OK. 128 is empty. 98304 is also normal?
                    if ($1 ne 3 && $1 ne 128 && $1 ne 98304){
                            print "bad status $1\n";
                            $baddrive++;
                    }
            }
    }

}

print ‘|’;
foreach my $key (keys %values){
print " $key=$values{$key}";
}

print " status=$status bad=$baddrive\n";

I can provide the code behind one of the other plugins if you’d like but they are generally just snmpgets via a bash script that might use sed or AWK to parse data out of the responses.

Could you try to put the full path for curl in the script, and see if it improves ? If yes, then you’ll know that for some reason the PATH variable does not make it to the scripts.

For the other scripts, it seems that they don’t even start. Please attach one of them, but I suppose that the 1st line “#!/xxxxxx” is not even reaching a binary, which explains why the script does not even start.

If yes, then clearly something changed in your binary and file structure somewhere.

So adding the full path got rid of the error relating to curl not being found, however it’s still telling me that my service files are not found:

Nagios Service - 96
Request: ‘/check_drobo’ ‘172.16.4.55’
sh: 1: /check_drobo: not found
Perf Data - None.
Response:

Nagios Service - 92
Request: ‘/check_timing_status_365’ ‘10.10.190.226’
sh: 1: /check_timing_status_365: not found
Perf Data - None.
Response:

Nagios Service - 93
Request: ‘/check_pmp450_cpu’ ‘10.10.190.226’
sh: 1: /check_pmp450_cpu: not found
Perf Data - None.
Response:

Nagios Service - 80
Request: ‘/check_timing_status’ ‘10.10.170.6’
sh: 1: /check_timing_status: not found
Perf Data - None.
Response:

Here is one of the other scripts (cehck_timing_status_365):
#!/bin/bash

gettimingstatus=$(snmpget -v2c -cCanopy $1 1.3.6.1.4.1.161.19.3.1.3.3.0 | awk ‘{print $4}’)
finalvalue=$(echo $gettimingstatus | sed ‘s/"//g’ | sed ‘s/Recieving/1/g’ | sed ‘s/Timing/2/g’ | sed ‘s/No/3/g’)
#echo “OK|Timing=$finalvalue”
#echo $finalvalue
if [[ $finalvalue == “3” ]]; then
echo ‘CRITICAL|Timing=2’
exit 2
elif [[ $finalvalue == “2” ]]; then
echo ‘WARNING|Timing=1’
exit 1
else
echo ‘OK|Timing=0’
exit 0
fi

seems as though it expects the scripts to exist in the root directory, is this true?

can you see what nagios_plugins is set to on the global config page?

/usr/lib/monitoring-plugins is that shows up. That’s curious. My php configs don’t show that at all!

and the directory doesn’t exist either… That probably explains it. How do I figure out how that directory was changed? When I look at my config.php it shows the standard directory per documentation:
./config.php:$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”

is that exactly what is in your config?? the quotes are wrong and it doesn’t end with a semi-colin.

No sir sorry about that… Here’s what exactly is setup:

$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

Pasted goofy between my terminal and browser the first time. That’s how it’s been ever since we setup Libre

Hi,
Quotes are still wrong I would say. As well as double quotes. You should write :

$config['nagios_plugins'] = "/usr/lib/nagios/plugins";

Here is info on my config.php showing it hasn’t bee changed since May 30th of this year.

-rw-r–r-- 1 librenms librenms 18K May 30 15:23 config.php

I have copied and pasted this into my config.php:
$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

I’m waiting 10 minutes to see if this fixes the issue. I’m still concerned why the nagios_plugin directory in the web interface per Murrant’s post is incorrect. I have grepped my config files and I can’t find mention of that directory ANYWHERE.

That doesn’t seem to have fixed anything. The broken services are not working and check-services.php -d is returning the same output as it was yesterday stating the service was not found.

That is the default setting. So for some reason your setting is not applying. Have you run ./validate.php?

This validate was run about 2 weeks ago when we initially opened this thread:

====================================

Component Version
LibreNMS 1.52-70-gf3ba8947f
DB Schema 2019_05_30_225937_device_groups_rewrite (135)
PHP 7.2.14-1+0~20190205200805.15+stretch~1.gbpd83c69
MySQL 10.1.26-MariaDB-0+deb9u1
RRDTool 1.6.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[FAIL] Database: extra table (vw_alertlog_updown)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (https://t.libren.ms/discord) or the community site (https://t.libren.ms/5gscd):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
DROP TABLE vw_alertlog_updown ;
[FAIL] The poller (cerento012) has not completed within the last 5 minutes, check the cron job.
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
includes/definitions/discovery/pmp.yaml

The down poller we know about.

If you would like a more recent validate, let me know. We do have a distributed architecture, 1 box for web, 1 for SQL/RRD and 2 pollers, so I’ll need to know which boxes you’d like them off of if only specific ones.

Quotes are still wrong. And of course the path has to be the one where you installed the plugins.

This has to be some issue between the terminal I’m using and the forum then as I copied and pasted exactly what you provided. We have ALWAYS had the plugins installed in the directory listed above. What do we need to do to continue troubleshooting this? I believe now I’ve provided all required information and we are just spinning the wheels.

24%20AM … Maybe it’s the font my terminal emulator is using. I re-ran validate.php and I’m not getting any warnings regarding errors in my file. Here’s a screencap of the section in my config.php

We did find 2 of the poller boxes hadn’t updated since 6-23-19 at about 5:29am. daily.sh was failing. we had to run the advanced upgrade and those boxes are now updated to the newest version. So we have that cleared up but the services still appear to be broke. Same errors as before when running check-services.php -d.