Sensor Limits Changing "Automatically"

arrmo · 6 April 2022 23:28

Hmm … OK, that makes sense. But - why my storage limits as well? It’s not tied to lm-sensors, right?

Also, is there not a mapping from sensor name (vs. only number) to the limits?

Thanks!

flove · 18 October 2022 18:46

Same issue here since many years.
This not only happens for linux VMs or LXD (Ubuntu) / LXC (Proxmox) containers.
Also network devices like MikroTik are affected…

This is so annoying and I totally understand that LibreNMS can’t do much about it.
We need at least some workarounds like:

Tell LibreNMS to:
ignore all health sensors for a device
don’t calculate some weird values for CPU low temperature warning = I don’t think 20 degree celsius is bad

I can’t re-produce the behaviour, not even by rebooting lxc host and containers multiple times + running poller.php / discovery.php manually…

This drives me nuts since many years

arrmo · 18 October 2022 23:26

Sort of glad it’s not just me . I’m running Ubuntu, on HW (no VM involved). Still happens. It re-updates on every reboot, makes it a real pain. Breaks storage mappings and thresholds also.

rhinoau · 19 October 2022 12:39

Confirmed I see it with Mikrotik devices seemingly more in the normal course of operation, and things like Palo Alto session counters when they are offline for a long time or get interrupted (loss/latency) during regular discovery.

flove · 22 May 2023 05:48

Anyone with a workaround or new understandings why this happens on linux hosts?

In one example I replaced an Intel NUC8 with NUC13 and the sensors get deleted / added on every reboot.

danielfranc · 19 February 2024 14:55

Hi.

This is still happening when LibreNMS discovers (or rediscovers) the device.
For some reason, silenced alerts become active again.

Tozz · 19 February 2024 16:42

This is still happening when LibreNMS discovers (or rediscovers) the device.
For some reason, silenced alerts become active again.

That’s not the real reason. During discovery (which is usually ran once every day for each device) it tries to find new, deleted and updated sensors. For example, lets say we have a switch that has 2 slots for power supplies, but only one was populated. The engineer added a second power supply. LibreNMS will discover the new SNMP sensors for the added power supply during the (re)discovery.

Same goes for deleting sensors… If an engineer removes the additional power supply, those sensors would be removed.

What is happening here is that once LibreNMS tries to discover new, deleted or updated sensors it’s getting a whole list of new sensor IDs and a whole list of deleted sensor IDs.

eg. it might previously had a sensor with ID 1 that was ‘CPU Usage’. Then for some reason, the monitored host (=The linux box) suddenly reindexes the sensors, and the ‘CPU Usage’ sensor suddenly becomes ID 2. During (re)discovery LibreNMS will then remove sensor ID 1 and add sensor ID 2.

This newly added sensor with ID 2 is a completely new sensor from the perspective of LibreNMS and thus any alerting rules will be applied to this new sensor as well. Any silenced alerts for the now removed sensor ID 1 are ignored (and deleted), because ID 2 is a completely new sensor.

The root of the issue is that on Linux machines, the sensor indexes change. LibreNMS’ discovery is not causing it, but you won’t notice the change in LibreNMS till the (re)discovery process has ran.

The solution to this problem is that Linux’s SNMP daemon (net-snmp probably) should retain it’s indexes. If ‘CPU Usage’ was ID 1, it should stay at ID 1.

To go back to my example of a power supply in a switch… What Linux is doing is saying ‘Hey I got a power supply sensor for you at index 123’, then after some trigger it says ‘Hey I got you a new power supply sensor at index 345’, ‘Oh and the power supply at index 123 is now removed’. LibreNMS can’t know if 345 is a third and new power supply in the system, or if it’s perhaps the second power supply that had gotten a new index ID.

arrmo · 19 February 2024 22:38

This makes sense, thanks! But … could it not be that names could be remembered, auto “re-align” if matching? It’s a real pain that almost every day the custom alarm limits are tossed out.

Tozz · 20 February 2024 07:50

You are barking up the wrong tree. This needs to be fixed in net-snmp (or whatever Linux package is acting as the SNMP server).

While there are device specific workarounds in LibreNMS… this isn’t something that we could reliably get right.

What you could do is write a little SQL script dat re-configures the limits daily based on device_id and sensor_descr. It’s not ideal, but it works and is an easy workaround.

eg:
UPDATE sensors SET sensor_limit_warn = 5 WHERE device_id = X AND sensor_descr = ‘CPU Usage’
UPDATE sensors SET sensor_limit_warn = 123 WHERE device_id = X AND sensor_descr = ‘Fan 1’

obviously replace the X with the device_id of your Linux machine and the sensor_descr on whatever you want to set.

You could also look at the sensor_type column, but I’m not sure what is set for that column on Linux boxes.

arrmo · 20 February 2024 15:42

Agreed, may need a workaround here. Wondering if it’s possible to trigger this, when LibreNMS rediscovers the device? To be able to avoid the alarm … “fix” it as soon as the change happens.

Thoughts?

Thanks!

arrmo · 21 February 2024 02:19

Tried your command, it works! Slightly modified, for my case (as it’s storage),

UPDATE storage SET storage_perc_warn = 95 WHERE device_id = 35 AND storage_descr = '/mnt/ix2';

Perfect! So now two thoughts

Setting up a text file, looping over rows, where I set the variable names and values, just do substitution and can run through a bunch of updates
But, how to trigger when a device is rediscovered? And really, just want to trigger for that device, agreed?

Thanks!

Tozz · 21 February 2024 22:24

LibreNMS has a ‘last_discovered’ column in the devices table in MySQL.
You could write a script that periodically (eg. every hour or perhaps even every 5 minutes) checks if this value has changed since the last run. If it has changed, run your update queries.

But again… this is a hack for an issue that should be solved in Net-SNMP.

arrmo · 21 February 2024 23:40

No argument, but I don’t see Net-SNMP changing anytime soon on this … agreed?

Thanks!

Tozz · 22 February 2024 08:51

Depends. if nobody filed a bug report, probably not.

flove · 16 May 2025 05:44

Do you guys know if this has been addressed for the linux snmp daemon lately?
It’s driving me crazy and makes sensor alerting in almost unusable for me. Every few days I get hundreds of sensor / health alerts for sensors I already disabled alerting, changed threshold, etc.
This happens for me on MikroTik devices too.

PipoCanaja · 16 May 2025 19:35

@flove This is an issue with the Linux snmp daemon, there is nothing we can do here. A workaround could be to start a discovery after each reboot.
The lnms config option is discovery_on_reboot. Haven’t tested it myself. That would not avoid alert, but at least the alerts would be “auto solved” after discovery.

flove · 17 May 2025 05:55

Sure - it’s not librenms’s fault
I can’t find any documentation for “discovery_on_reboot”… Do you know what it does in theory?

Many thanks.

PipoCanaja · 18 May 2025 09:15

Hi @flove
Basically, it does trigger a new discovery of the device whenever that device reboots… So the sensors will always be re-discovered with the latest IDs as fast as possible. Still , you’ll probably get alerts between the reboot and the end of the re-discovery.

gitterdoneplease · 21 August 2025 16:31

I’m having a very similar issues where LibreNMS made up all new sensor limits for all of the BMCs in my fleet. Volt_P3V3 was set to 1.5v, Volt_P5V was set to 3.4V for instance. This triggered thousands of warnings overnight. Where are these bad values coming from and how do I stop them?

gitterdoneplease · 21 August 2025 16:33

Volt_VR_CPU0 was set to max at 5.251 and Volt_VR_CPU1 was set to max at .96! Gah!