Group of False Positives, Reachable but wont Transport Recovery

I posted the Alert Configs etc but found out that these are unrelated to my issue so I removed them from the thread. I had a more experienced tech look at this and it would appear that ever since an automatic update LibreNMS is not seeing some of our modules. The ones specifically for GPS timing and AP Traffic monitoring. We confirmed that the modules are still located in the directories.

the validate still helps

====================================

Component Version
LibreNMS 1.52-70-gf3ba8947f
DB Schema 2019_05_30_225937_device_groups_rewrite (135)
PHP 7.2.14-1+0~20190205200805.15+stretch~1.gbpd83c69
MySQL 10.1.26-MariaDB-0+deb9u1
RRDTool 1.6.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.6
[OK] Dependencies up-to-date.
[OK] Database connection successful
[FAIL] Database: extra table (vw_alertlog_updown)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on Discord (https://t.libren.ms/discord) or the community site (https://t.libren.ms/5gscd):
[FIX]:
Run the following SQL statements to fix.
SQL Statements:
DROP TABLE vw_alertlog_updown;
[FAIL] The poller (cerento012) has not completed within the last 5 minutes, check the cron job.
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
includes/definitions/discovery/pmp.yaml

It looks like LibreNMS is unable to find the files placed in the custom plugin director. The permissions on the files are set top read/execute and the files are intact. This starting happening after an update on 6/23/2019

librenms@cerento010:~$ ./check-services.php -d
DEBUG!
Starting service polling run:

SQL[SELECT D.,S.,attrib_value FROM devices AS D INNER JOIN services AS S ON S.device_id = D.device_id AND D.disabled = 0 LEFT JOIN devices_attribs as A ON D.device_id = A.device_id AND A.attrib_type = “override_icmp_disable” ORDER by D.device_id DESC; [] 5.68ms]

Nagios Service - 96
Request: ‘/usr/lib/nagios/plugins/check_drobo’ ‘-H’ ‘172.16.4.55’
Can’t exec “curl”: No such file or directory at /usr/lib/nagios/plugins/check_drobo line 11.
Use of uninitialized value $content in split at /usr/lib/nagios/plugins/check_drobo line 15.
Use of uninitialized value $status in concatenation (.) or string at /usr/lib/nagios/plugins/check_drobo line 49.
Perf Data - None.
Perf Data - DS: status, Value: , UOM:
Perf Data - DS: bad, Value: 0, UOM:
Response:
Service DS: {
“status”: “”,
“bad”: “”
}
RRD[last 172.16.4.55/services-96.rrd --daemon libresql.cerento.com:42217]
RRD[update 172.16.4.55/services-96.rrd N:U:0 --daemon libresql.cerento.com:42217]
SQL[SELECT devices.*, location, lat, lng FROM devices LEFT JOIN locations ON devices.location_id=locations.id WHERE device_id = ? [3906] 1.56ms]

SQL[SELECT * FROM devices_attribs WHERE device_id = ? [3906] 0.86ms]

SQL[SELECT * FROM vrf_lite_cisco WHERE device_id = ? [3906] 1.21ms]

SQL[INSERT IGNORE INTO eventlog (device_id,reference,type,datetime,severity,message,username) VALUES (:device_id,:reference,:type,:datetime,:severity,:message,:username) {“device_id”:3906,“reference”:96,“type”:“service”,“datetime”:“2019-06-24 12:44:34”,“severity”:4,“message”:“Service ‘drobo’ changed status from Critical to OK - - “,“username”:””} 2.63ms]

SQL[UPDATE services set service_changed=?,service_status=?,service_message=? WHERE service_id=? [1561401874,0,"",96] 1.99ms]

Nagios Service - 92
Request: ‘/check_timing_status_365’ ‘10.10.190.226’
sh: 1: /check_timing_status_365: not found
Perf Data - None.
Response:

Nagios Service - 93
Request: ‘/check_pmp450_cpu’ ‘10.10.190.226’
sh: 1: /check_pmp450_cpu: not found
Perf Data - None.
Response:

Should I start a new Topic with the title. Custom services broke after latest update?

I followed the steps on this URL: Broken auto-updater (Manual intervention required)

I was hoping that the issue was related and I still encounter this issue.

Is there anything else that you need from me? I’m still experiencing this issue

I’ve rebooted the server and the polling servers again and this made no difference. When I run the following command the output stats that the directories are empty when they are in fact not. I confirmed that they have execute permissions. Here are just a few. But basically every nagios plugin is listed.

su - librenms
librenms@cerento010:~$ ./check-services.php -D
Starting service polling run:

Can’t exec “curl”: No such file or directory at /usr/lib/nagios/plugins/check_drobo line 11.
Use of uninitialized value $content in split at /usr/lib/nagios/plugins/check_drobo line 15.
Use of uninitialized value $status in concatenation (.) or string at /usr/lib/nagios/plugins/check_drobo line 49.
sh: 1: /check_timing_status_365: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status_365: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_pmp450_cpu: not found

I would wager it’s something to with that nagios plug-in.

But these plugins have all been working for months. They only decided to break after a LibreNMS auto-update.

How do you know for sure it was an update that broke it? And if so what update?

All the custom services we had setup with the nagios-style scripts had been working for close to a year. Just after midnight on 6-23-19 they paged out and we started seeing these errors. Our daily.sh runs at midnight and the pages came in shortly thereafter all at the same time. We confirmed GPS sync was good on the devices themselves, which is how we know the update is what broke it.

Yes I get that but what update?

Past that I couldn’t tell you. Whatever daily.sh pulled in for changes at midnight on 6-23-19 would be the cuplrit.

You should probably start searching the issues that are described in the error … Seems more a PATH or shell issue here .
For instance, ‘curl’ is not found. So either curl is missing and you should re-install it, or the PATH variable is broken (and this is probably a shell issue in LibreNMS user homedir.

I will track that one down for sure with the one plugin, my concern is with the other plugins:

sh: 1: /check_timing_status_365: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_timing_status_365: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_pmp450_cpu: not found
sh: 1: /check_pmp450_cpu: not found

While the check status says these files are not found, I assure you they are there and have proper permissions. We have made NO changes to the OS on the server when this occured, so again I firmly believe it was the updates done on 6-23-19 by daily.sh.

As it seems that it is not a general issue, you’ll have to dig a little bit more to understand what’s going on.

I will check that but given that we made NO other changes to the servers I have a hard time swallowing that, but I will check.

And I have a hard time trying to explain you that you need to help us to help you :smiley:

This is a community driven project, I am trying to help you on my free time, and I expect a little bit of cooperation on your side as well.

You could have checked the commits that took place on the 25th and 26th of June, and pinpoint me which one broke your setup. But I already did, before telling you that we did not find any issue. You can check by yourself :

Clearly, something has to be fixed to get your services up and running. It seems that it is not on LibreNMS side, so it has to be on your server, and the only way to find out is that you troubleshoot it on your side, we can’t do it remotely.

If something, in the end, must be changed on LibreNMS side to avoid the issue to re-appear, we will of course do it. But first we need to find the issue.

Bye

1 Like

A bit more on our setup. We have 1 box doing the web interface, 1 box doing the sql database and 2 boxes doing polling.

My config.php file in relation to services:

Show Nagios Plugins

$config[‘show_services’] = 1;
$config[‘nagios_plugins’] = “/usr/lib/nagios/plugins”;

On my pollers the cron job:

*/5 * * * * librenms /opt/librenms/services-wrapper.py >> /dev/null 2>&1

When I switch over to the librenms user here’s what I get for the path, and I can manually run CURL:

$ whoami
librenms
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/puppetlabs/bin
$ curl
curl: try ‘curl --help’ or ‘curl --manual’ for more information

Here is the code behind that particular nagios plugin that is saying it can’t find curl:

$ more check_drobo
#!/usr/bin/perl

use strict;
use warnings;
use WWW::Curl::Easy;

my $ip = shift;
my $url = “http://:5000$ip/”;
my $curl = “curl -s -m 2 $url”;

my $content = $curl;

my %values = ();
my @keys = qw(mTotalCapacityProtected mUsedCapacityProtected mFreeCapacityProtected mTotalCapacityUnprotected mUsedCapacityOS mSlotCountExp );
my @content = split(m/\n/, $content);
my $status;
my $baddrive = 0;
foreach my $line (@content){
foreach my $key (@keys){
if ($line =~ m/<$key>(.*)</$key>/){
$values{$key} = $1;
}
}

    # see http://code.google.com/p/drobowebdashboard/wiki/ESATMUpdate for a half-way decent explanation
    # 98304 seems to be OK. I think only the lower bits mean anything.
    if (!defined $status){
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    $status = $1 & 0xff;
            }
    }
    # drive-specific status
    else{
            if ($line =~ m/<mStatus>(.*)<\/mStatus>/){
                    # 3 is the magic number for OK. 128 is empty. 98304 is also normal?
                    if ($1 ne 3 && $1 ne 128 && $1 ne 98304){
                            print "bad status $1\n";
                            $baddrive++;
                    }
            }
    }

}

print ‘|’;
foreach my $key (keys %values){
print " $key=$values{$key}";
}

print " status=$status bad=$baddrive\n";

I can provide the code behind one of the other plugins if you’d like but they are generally just snmpgets via a bash script that might use sed or AWK to parse data out of the responses.