False reboots

Hi,

Experienced these false reboots for a long time and it’s becoming an issue since the eventlogs become huge. We have roughly 4000 devices that we monitor and we get over 15000 reboot messages per day. The vast majority of these are false. We’ve hade these problems for a long time now and it’s really time to fix it now I thought.

Example:

Device rebooted after 2 years 6 months 3 weeks 3 days 13 hours 18 minutes 38 seconds → 31748461s

sysUptime shows that it indeed has been up for over 2 years.

The main reason why the eventlogs were growing was that LibreNMS was set to do a discovery on reboot for all devices. I have disabled that function now, even though I do think it should be enabled but only for true reboots.

Thoughts on how to solve the problem?

===========================================

Component Version
LibreNMS 22.12.0-3-g6e42eaf1a (2022-12-31T16:33:36+01:00)
DB Schema 2022_08_15_084507_add_rrd_type_to_wireless_sensors_table (248)
PHP 8.1.13
Python 3.8.10
Database MariaDB 10.4.8-MariaDB-1:10.4.8+maria~xenial
RRDTool 1.7.2
SNMP 5.8

===========================================

What type of device is it? Is it all one type of device or various types?

It doesn’t matter the type of device. We get these eventlogs for all our devices. From HPE 5130/5710 to Huawei switches and Coromatic UPS devices.

Right, do you have the “Device rebooted” alert set up? If so, does it fire when these events occur?

AFAIK that particular event could only occur if the device uptime value in the database (which should be from the previous poll) is greater than the uptime value received from SNMP in the current poll.

So if the database says your previous uptime was 100000 it wouldn’t matter if the polled uptime was 99999 or if it was 0, both would generate that event. The “Device rebooted” alert on the other hand alerts if the uptime is less than 300 (5 minutes). So it might be a clue what’s going on.

Almost thought it was the 497 days reboot problem (where the 32 bit uptime counter wraps). But if you say it happens everyday…

https://kb.paessler.com/en/topic/61249-why-does-the-snmp-system-uptime-sensor-report-wrong-values

If you are running snmp v1/2c, and If a reboot is reported the same time everyday, perhaps run a wireshark/tcpdump to try and capture the snmp poll that goes out (see which community gets polled), and also see what comes back?

Could possibly also check your ntp services? Make sure they are consistent and the same timezone across all your devices to rule out time sync issues?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.