Librenms issues after update, false warnings

Gun_Runner · 16 May 2018 14:42

I seem to be having major issues with Librenms all the sudden. I get constant alert notices for warnings that are not true. For example Librenms alerts that my RAID status on my file server is bad, but the server says it is fine.

Here is the ./validate.php output, you can see some issues. This seemed to happen after I updated the Ubuntu OS.

librenms@librenms:~$ ./validate.php
====================================
Component | Version
--------- | -------
LibreNMS  | 1.39-66-gb881cbe
DB Schema | 250
PHP       | 7.0.30-0ubuntu0.16.04.1
MySQL     | 10.0.34-MariaDB-0ubuntu0.16.04.1
RRDTool   | 1.5.5
SNMP      | NET-SNMP 5.7.3
====================================

[OK]    Composer Version: 1.6.5
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[FAIL]  Database: incorrect column (users/updated_at) 
[FAIL]  Database: missing column (users/remember_token)
[FAIL]  We have detected that your database schema may be wrong, please report the following to us on IRC 
or the community site (https://t.libren.ms/5gscd):
[FIX] Run the following SQL statements to fix.
SQL Statements:
 ALTER TABLE `users` CHANGE `updated_at` `updated_at` timestamp NOT NULL DEFAULT 
CURRENT_TIMESTAMP ;
 ALTER TABLE `users` ADD `remember_token` varchar(100) NULL  AFTER `updated_at`;
[FAIL]  You have no timezone set for php.
[FIX] http://php.net/manual/en/datetime.configuration.php#ini.date.timezone
[FAIL]  The poller (librenms) has not completed within the last 5 minutes, check the cron job.
[WARN]  Some devices have not been polled in the last 5 minutes. You may have performance issues.
[FIX] Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
 192.168.1.2
librenms@librenms:~$

Any help is appreciated…

murrant · 16 May 2018 16:13

Fix the fails in validate.

You haven’t really provided any information about the problem you are having.

Gun_Runner · 16 May 2018 17:54

The problem I’m having is constant, multiple Alerts from Librenms That are False (not correct). This am alone I have several dozen, ex: Device up/down, bad raid status, etc. I get a lot of the “Device Up/Down” alerts almost every day. If I reboot my Cisco SG300-52 switch they seem to stop for a while, but will eventually reappear. (I have no network related problems with the switch).

Take the “Bad Raid Status” for example: I created this rule from the Librenms default collection, using the rule: “Synology NAS has a bad RAID status”. I am getting this alert, yet the actual server shows no problems. I have not modified the rule from the default at all.

Gun_Runner · 16 May 2018 18:21

OK, I’ve got the errors fixed fyi. Ill see if it effects the Alerts issue…

librenms@librenms:~$ ./validate.php
====================================
Component | Version
--------- | -------
LibreNMS  | 1.39-66-gb881cbe
DB Schema | 250
PHP       | 7.0.30-0ubuntu0.16.04.1
MySQL     | 10.0.34-MariaDB-0ubuntu0.16.04.1
RRDTool   | 1.5.5
SNMP      | NET-SNMP 5.7.3
====================================

[OK]    Composer Version: 1.6.5
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database schema correct

murrant · 16 May 2018 20:39

What is your alert rule? show your logs, show the output you are seeing with the alerts…

Gun_Runner · 17 May 2018 18:32

The two sample alert rules. I’ve had to turn off the alerting because its getting out of control. Rebooting the switch does not help. Which specific logs are you refering to and where exactly are they located?

Here is a sample of the two alert text:

critical

Title: Alert for device atlas - Synology NAS has a bad RAID status
Timestamp: 2018-05-16 16:33:49
Severity: critical
Host: atlas
Duration: 2m 21s

Unique-ID: 7424
Rule: Synology NAS has a bad RAID status 
Faults: 
#1: sysObjectID => .1.3.6.1.4.1.8072.3.2.10; sysDescr => Linux Atlas 3.10.102 #15266 SMP Mon Mar 26 15:08:28 CST 2018 x86_64; override_sysLocation => 1; sensor_id => 657; sensor_oid => .1.3.6.1.4.1.6574.3.1.1.3.0; sensor_descr => RAID Status; 
Alert sent to:

warning

Title: Alert for device epson-wf7610.lan - Devices up/down
Timestamp: 2018-05-16 16:25:11
Severity: warning
Host: epson-wf7610.lan
Duration: 11m 50s

Unique-ID: 7412
Rule: Devices up/down 
Faults: 
#1: sysObjectID => .1.3.6.1.4.1.1248.1.1.2.1.3.5.69.69.80.83.50; sysDescr => EPSON Built-in 11b/g/n & 10/100 Print Server; 
Alert sent to:

murrant · 18 May 2018 12:32

The first rule is bad because you are comparing a number with a string.

The second looks fine. Maybe you need to tweak your time outs.

https://docs.librenms.org/Support/Configuration/#fping

Gun_Runner · 18 May 2018 14:21

The first rule is exactly from the rules included in Librenms, no changes made to it. What about it needs to be fixed?

murrant · 19 May 2018 05:06

[11-12] is not a number. 11 == ‘[11-12]’ is false.