False reboots

Tomas_Andersson1 · 12 January 2023 07:44

Hi,

Experienced these false reboots for a long time and it’s becoming an issue since the eventlogs become huge. We have roughly 4000 devices that we monitor and we get over 15000 reboot messages per day. The vast majority of these are false. We’ve hade these problems for a long time now and it’s really time to fix it now I thought.

Example:

Device rebooted after 2 years 6 months 3 weeks 3 days 13 hours 18 minutes 38 seconds → 31748461s

sysUptime shows that it indeed has been up for over 2 years.

The main reason why the eventlogs were growing was that LibreNMS was set to do a discovery on reboot for all devices. I have disabled that function now, even though I do think it should be enabled but only for true reboots.

Thoughts on how to solve the problem?

===========================================

Component	Version
LibreNMS	22.12.0-3-g6e42eaf1a (2022-12-31T16:33:36+01:00)
DB Schema	2022_08_15_084507_add_rrd_type_to_wireless_sensors_table (248)
PHP	8.1.13
Python	3.8.10
Database	MariaDB 10.4.8-MariaDB-1:10.4.8+maria~xenial
RRDTool	1.7.2
SNMP	5.8

===========================================

slashdoom · 12 January 2023 22:20

What type of device is it? Is it all one type of device or various types?

Tomas_Andersson · 12 January 2023 23:13

It doesn’t matter the type of device. We get these eventlogs for all our devices. From HPE 5130/5710 to Huawei switches and Coromatic UPS devices.

slashdoom · 13 January 2023 01:48

Right, do you have the “Device rebooted” alert set up? If so, does it fire when these events occur?

AFAIK that particular event could only occur if the device uptime value in the database (which should be from the previous poll) is greater than the uptime value received from SNMP in the current poll.

So if the database says your previous uptime was 100000 it wouldn’t matter if the polled uptime was 99999 or if it was 0, both would generate that event. The “Device rebooted” alert on the other hand alerts if the uptime is less than 300 (5 minutes). So it might be a clue what’s going on.

Silky_Sandpape · 16 January 2023 08:51

Almost thought it was the 497 days reboot problem (where the 32 bit uptime counter wraps). But if you say it happens everyday…

https://kb.paessler.com/en/topic/61249-why-does-the-snmp-system-uptime-sensor-report-wrong-values

If you are running snmp v1/2c, and If a reboot is reported the same time everyday, perhaps run a wireshark/tcpdump to try and capture the snmp poll that goes out (see which community gets polled), and also see what comes back?

Could possibly also check your ntp services? Make sure they are consistent and the same timezone across all your devices to rule out time sync issues?

system · 16 April 2023 08:52

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.