Group of False Positives, Reachable but wont Transport Recovery

PipoCanaja · 8 July 2019 13:34

And I have a hard time trying to explain you that you need to help us to help you

This is a community driven project, I am trying to help you on my free time, and I expect a little bit of cooperation on your side as well.

You could have checked the commits that took place on the 25th and 26th of June, and pinpoint me which one broke your setup. But I already did, before telling you that we did not find any issue. You can check by yourself :

documentation update (https://github.com/librenms/librenms/pull/10377)
broken auto-update (https://github.com/librenms/librenms/pull/10380)
field sorting for UPS (https://github.com/librenms/librenms/pull/10375)
sysName support for CLI added devices (https://github.com/librenms/librenms/pull/10381)
smart application fix (https://github.com/librenms/librenms/pull/10378)
fix dummy alert transport (https://github.com/librenms/librenms/pull/10379)

Clearly, something has to be fixed to get your services up and running. It seems that it is not on LibreNMS side, so it has to be on your server, and the only way to find out is that you troubleshoot it on your side, we can’t do it remotely.

If something, in the end, must be changed on LibreNMS side to avoid the issue to re-appear, we will of course do it. But first we need to find the issue.

Bye

Brandon_Shiers · 8 July 2019 14:08

A bit more on our setup. We have 1 box doing the web interface, 1 box doing the sql database and 2 boxes doing polling.

My config.php file in relation to services:

Show Nagios Plugins

$config[‘show_services’] = 1;
$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

On my pollers the cron job:

*/5 * * * * librenms /opt/librenms/services-wrapper.py >> /dev/null 2>&1

When I switch over to the librenms user here’s what I get for the path, and I can manually run CURL:

$ whoami
librenms
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/puppetlabs/bin
$ curl
curl: try ‘curl --help’ or ‘curl --manual’ for more information

Here is the code behind that particular nagios plugin that is saying it can’t find curl:

$ more check_drobo
#!/usr/bin/perl

use strict;
use warnings;
use WWW::Curl::Easy;

my $ip = shift;
my $url = “http://:5000$ip/”;
my $curl = “curl -s -m 2 $url”;

my $content = $curl;

my %values = ();
my @keys = qw(mTotalCapacityProtected mUsedCapacityProtected mFreeCapacityProtected mTotalCapacityUnprotected mUsedCapacityOS mSlotCountExp );
my @content = split(m/\n/, $content);
my $status;
my $baddrive = 0;
foreach my $line (@content){
foreach my $key (@keys){
if ($line =~ m/<$key>(.*)</$key>/){
$values{$key} = $1;
}
}

    # see http://code.google.com/p/drobowebdashboard/wiki/ESATMUpdate for a half-way decent explanation
    # 98304 seems to be OK. I think only the lower bits mean anything.
    if (!defined $status){
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    $status = $1 & 0xff;
            }
    }
    # drive-specific status
    else{
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    # 3 is the magic number for OK. 128 is empty. 98304 is also normal?
                    if ($1 ne 3 && $1 ne 128 && $1 ne 98304){
                            print "bad status $1\n";
                            $baddrive++;
                    }
            }
    }

}

print ‘|’;
foreach my $key (keys %values){
print " $key=$values{$key}";
}

print " status=$status bad=$baddrive\n";

I can provide the code behind one of the other plugins if you’d like but they are generally just snmpgets via a bash script that might use sed or AWK to parse data out of the responses.

PipoCanaja · 8 July 2019 14:36

Could you try to put the full path for curl in the script, and see if it improves ? If yes, then you’ll know that for some reason the PATH variable does not make it to the scripts.

For the other scripts, it seems that they don’t even start. Please attach one of them, but I suppose that the 1st line “#!/xxxxxx” is not even reaching a binary, which explains why the script does not even start.

If yes, then clearly something changed in your binary and file structure somewhere.

Brandon_Shiers · 8 July 2019 15:38

So adding the full path got rid of the error relating to curl not being found, however it’s still telling me that my service files are not found:

Nagios Service - 96
Request: ‘/check_drobo’ ‘172.16.4.55’
sh: 1: /check_drobo: not found
Perf Data - None.
Response:

Nagios Service - 92
Request: ‘/check_timing_status_365’ ‘10.10.190.226’
sh: 1: /check_timing_status_365: not found
Perf Data - None.
Response:

Nagios Service - 93
Request: ‘/check_pmp450_cpu’ ‘10.10.190.226’
sh: 1: /check_pmp450_cpu: not found
Perf Data - None.
Response:

Nagios Service - 80
Request: ‘/check_timing_status’ ‘10.10.170.6’
sh: 1: /check_timing_status: not found
Perf Data - None.
Response:

Here is one of the other scripts (cehck_timing_status_365):
#!/bin/bash

gettimingstatus=$(snmpget -v2c -cCanopy $1 1.3.6.1.4.1.161.19.3.1.3.3.0 | awk ‘{print $4}’)
finalvalue=$(echo $gettimingstatus | sed ‘s/"//g’ | sed ‘s/Recieving/1/g’ | sed ‘s/Timing/2/g’ | sed ‘s/No/3/g’)
#echo “OK|Timing=$finalvalue”
#echo $finalvalue
if [[ $finalvalue == “3” ]]; then
echo ‘CRITICAL|Timing=2’
exit 2
elif [[ $finalvalue == “2” ]]; then
echo ‘WARNING|Timing=1’
exit 1
else
echo ‘OK|Timing=0’
exit 0
fi

murrant · 9 July 2019 00:02

seems as though it expects the scripts to exist in the root directory, is this true?

murrant · 9 July 2019 00:06

can you see what nagios_plugins is set to on the global config page?

Brandon_Shiers · 9 July 2019 00:33

/usr/lib/monitoring-plugins is that shows up. That’s curious. My php configs don’t show that at all!

Brandon_Shiers · 9 July 2019 00:40

and the directory doesn’t exist either… That probably explains it. How do I figure out how that directory was changed? When I look at my config.php it shows the standard directory per documentation:
./config.php:$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”

murrant · 9 July 2019 00:48

is that exactly what is in your config?? the quotes are wrong and it doesn’t end with a semi-colin.

Brandon_Shiers · 9 July 2019 01:00

No sir sorry about that… Here’s what exactly is setup:

$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

Pasted goofy between my terminal and browser the first time. That’s how it’s been ever since we setup Libre

PipoCanaja · 9 July 2019 08:11

Hi,
Quotes are still wrong I would say. As well as double quotes. You should write :

$config['nagios_plugins'] = "/usr/lib/nagios/plugins";

Brandon_Shiers · 9 July 2019 12:53

Here is info on my config.php showing it hasn’t bee changed since May 30th of this year.

-rw-r–r-- 1 librenms librenms 18K May 30 15:23 config.php

I have copied and pasted this into my config.php:
$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

I’m waiting 10 minutes to see if this fixes the issue. I’m still concerned why the nagios_plugin directory in the web interface per Murrant’s post is incorrect. I have grepped my config files and I can’t find mention of that directory ANYWHERE.

Brandon_Shiers · 9 July 2019 13:09

That doesn’t seem to have fixed anything. The broken services are not working and check-services.php -d is returning the same output as it was yesterday stating the service was not found.

murrant · 9 July 2019 13:11

That is the default setting. So for some reason your setting is not applying. Have you run ./validate.php?

Brandon_Shiers · 9 July 2019 13:15

This validate was run about 2 weeks ago when we initially opened this thread:

====================================

Component	Version
LibreNMS	1.52-70-gf3ba8947f
DB Schema	2019_05_30_225937_device_groups_rewrite (135)
PHP	7.2.14-1+0~20190205200805.15+stretch~1.gbpd83c69
MySQL	10.1.26-MariaDB-0+deb9u1
RRDTool	1.6.0
SNMP	NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[FAIL] Database: extra table (vw_alertlog_updown)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (https://t.libren.ms/discord) or the community site (https://t.libren.ms/5gscd):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
DROP TABLE vw_alertlog_updown ;
[FAIL] The poller (cerento012) has not completed within the last 5 minutes, check the cron job.
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
includes/definitions/discovery/pmp.yaml

The down poller we know about.

Brandon_Shiers · 9 July 2019 14:13

If you would like a more recent validate, let me know. We do have a distributed architecture, 1 box for web, 1 for SQL/RRD and 2 pollers, so I’ll need to know which boxes you’d like them off of if only specific ones.

PipoCanaja · 9 July 2019 15:24

Quotes are still wrong. And of course the path has to be the one where you installed the plugins.

Brandon_Shiers · 9 July 2019 15:46

This has to be some issue between the terminal I’m using and the forum then as I copied and pasted exactly what you provided. We have ALWAYS had the plugins installed in the directory listed above. What do we need to do to continue troubleshooting this? I believe now I’ve provided all required information and we are just spinning the wheels.

Brandon_Shiers · 9 July 2019 15:49

24%20AM … Maybe it’s the font my terminal emulator is using. I re-ran validate.php and I’m not getting any warnings regarding errors in my file. Here’s a screencap of the section in my config.php

Brandon_Shiers · 9 July 2019 16:37

We did find 2 of the poller boxes hadn’t updated since 6-23-19 at about 5:29am. daily.sh was failing. we had to run the advanced upgrade and those boxes are now updated to the newest version. So we have that cleared up but the services still appear to be broke. Same errors as before when running check-services.php -d.