Some devices with false "Device Rebooted" alert

Alan · 20 October 2021 03:31

I am getting some devices with false “Device Rebooted” alert, please help!

librenms@librenms:~$ ./validate.php

Component	Version
LibreNMS	21.10.0-24-gf94f7f23b
DB Schema	2021_07_28_102443_plugins_add_version_and_settings (222)
PHP	7.4.3
Python	3.8.10
MySQL	10.3.31-MariaDB-0ubuntu0.20.04.1
RRDTool	1.7.2
SNMP	NET-SNMP 5.8
====================================

[OK] Composer Version: 2.1.9
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct

DBMandrake · 20 October 2021 05:44

Same here. Since 12:30 am all our Windows servers and most of our switches have bogus uptimes of a few seconds after every poll which has triggered and held the reboot alert active for dozens of devices for the last 6 hours and counting. Uptimes for nearly all our monitored devices are bogus.

I’m aware of the gpsd bug but from what I’ve read it shouldn’t bite until this weekend so I think that’s unrelated.

Will have to wait until I’m in work to analyse what the heck is going on. At first I thought then entire site had a power cut but no UPS alerts made me realise that wasn’t the case.

Voltage and temperature sensor readings from the server LibreNMS runs on are also reporting zero (via SNMP) now setting sensor limit alerts so my guess is there is a major issue with SNMP polling giving incorrect data.

Kbz4 · 20 October 2021 05:53

I think it’s from last night’s update.

I have updated at the last of this morning (21.10.0-26-gfdfea6e93 - Wed Oct 20 2021 02:32:28 GMT+0200]) and everything is correct.

TheGreatDoc · 20 October 2021 05:54

Run ./daily.sh to update

DBMandrake · 20 October 2021 05:54

There has been another update since midnight UTC ? Our system only auto updated 7 hours ago shortly before the problem began.

DBMandrake · 20 October 2021 05:58

./daily.sh
Updating to latest codebase                        OK
Updating Composer packages                         OK
Updated from f94f7f23b to fdfea6e93                OK
Updating SQL-Schema                                OK
Updating submodules                                OK
Cleaning up DB                                     OK
Fetching notifications                             OK
Caching PeeringDB data                             OK
Caching Mac OUI data                               OK

Fingers crossed…

Dejan · 20 October 2021 06:09

I can confirm same error. Fake device rebooted alarms.
After running daily.sh everything is fine again

Version before daily.sh

Version	21.10.0-24-gf94f7f23b - Tue Oct 19 2021 22:43:43 GMT+0200
Database Schema	2021_07_28_102443_plugins_add_version_and_settings (222)
Web Server	Apache/2.4.29 (Ubuntu)
PHP	7.4.12
Python	3.6.9
MySQL	10.2.37-MariaDB-1:10.2.37+maria~bionic
Laravel	8.62.0
RRDtool	1.7.0

Version after daily.sh

Version	21.10.0-26-gfdfea6e93 - Wed Oct 20 2021 02:32:28 GMT+0200
Database Schema	2021_07_28_102443_plugins_add_version_and_settings (222)
Web Server	Apache/2.4.29 (Ubuntu)
PHP	7.4.12
Python	3.6.9
MySQL	10.2.37-MariaDB-1:10.2.37+maria~bionic
Laravel	8.62.0
RRDtool	1.7.0

winglessza · 20 October 2021 06:47

Had same issue this morning.

Can confirm running daily.sh manually, fixed issue.

ebaena · 20 October 2021 08:06

Same experience as everyone else after auto update to 21.10.0-24-gf94f7f23b.
Running ./daily.sh and waited about 5-10 mins fixed the issue , current version is 21.10.0-27-g6bf706eaa

DBMandrake · 20 October 2021 08:17

Fixed here after manually updating again.

Alan · 20 October 2021 08:32

Just fixed with manual update!!!
Thanks All!!!

Tim_E · 20 October 2021 16:34

Daily.sh today didn’t fix ours.

Our DB was out of date yesterday, so I ran the migrate.php which fixed the symptoms I had noticed. We’re on

Version	[21.10.1 - Mon Oct 18 2021 16:31:16 GMT-0600]
Database Schema	2021_25_01_0129_isis_adjacencies_nullable (221)
Web Server	Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_fcgid/2.3.9 PHP/7.2.34
PHP	7.3.27
Python	3.6.8
MySQL	10.5.9-MariaDB
[Laravel]	8.62.0
[RRDtool]	1.7.1

Tim_E · 20 October 2021 18:09

Further detailing the false reboot event problem, the plain number of seconds shown does accurately reflect the device uptime. However, the friendlier “years, days, hours, minutes, seconds” does not calculate from the plain number of seconds. These two events happened within 30 minutes, but the friendly changed by over one day.

2021-10-20 11:43:18	reboot	10.x.x.x	Device rebooted after 2 years 52 days 9 hours 46 minutes 15 seconds -> 676000s	System
2021-10-20 11:18:52	reboot	10.x.x.x	Device rebooted after 2 years 50 days 17 hours 3 minutes 26 seconds -> 674534s	System

Tim_E · 20 October 2021 21:51

Appears the incorrect friendly reboot is being calculated from the reference field in eventlog, where the reference field is (now) the number of seconds * 100. Looking back to reboot events on October 10th, the reference field was unrelated to the number of seconds.

DBMandrake · 21 October 2021 05:11

The problem has been fixed in yesterdays update. (if your current device uptimes look correct it is fixed)

You will still see these bogus reboot entries in the event log - they were calculated and logged while the uptime calculation was incorrect and they will remain in the event log.

If it’s just these historical event log entries you’re worrying about then just ignore them. The fix for the bug that was applied yesterday will not retrospectively remove incorrect event log entries. There’s nothing you can do about them short of editing them out of the database manually.

laureninc · 21 October 2021 08:24

Yesterday i have the same problem. Many devices, running on windows (7, Server 2012 R2 maybe 10), network devices (Mikrotik SwOS v2.12, SVOS) have uptime from 2 sec to few minutes. But real uptime is few days!
For now i have problem only for Hitachi Storage Virtualization Operating System (SVOS). It showing uptime 2 sec.

Tim_E · 21 October 2021 14:50

Viewing an individual device in the webui, the (friendly) uptime is correct. But the Message in Eventlog is still showing the wrong friendly time…

2021-10-21 08:45:03 reboot 10.46.4.189 Device rebooted after 10 hours 3 minutes 18 seconds -> 261s System

But, the eventlog is not flooded with false reboot events. This is progress.

DBMandrake · 21 October 2021 15:24

But are there any new incorrect Reboot events logged after you updated ?

As I explained, any incorrect reboot events that were logged before the bug was fixed will remain in the event log. The update only fixes the bug, it doesn’t clear out old reboot event log entries that were incorrect.

Tim_E · 21 October 2021 15:30

most recent reboot eventlog entry was just seconds ago…

2021-10-21 09:28:01 reboot [10.35.6.125] Device rebooted after 5 hours 53 minutes 38 seconds -> 177s System

DBMandrake · 21 October 2021 16:04

My apologies, I didn’t realise you’re still seeing new events generated. Now I check I am also seeing one example of this:

2021-10-21 06:35:31	reboot	clavius 48 port poe switch 1	Device rebooted after 68 years 35 days 3 hours 13 minutes 47 seconds -> 4295318s

The switch in question is a D-Link DGS-3120-48PC Rev. B1 and has a current listed uptime of 50 days 3 hours 33 minutes 36 seconds, which seems correct, so I don’t know where 68 years 35 days comes from, why this triggered a reboot event, and why the event was logged at 6:35 am in particular…

(However it did not trigger a reboot alert, as my reboot alerts trigger for an uptime of <15 minutes)

So there is still something wrong with some of the uptime calculations even after yesterday’s fixes. I will keep an eye on the event log to see if I see any more instances of this happening.

I would suggest you post a bit more information about the device in your log so the developers have a bit more info to go on…