Sensor Limits Changing "Automatically"

Hmm … OK, that makes sense. But - why my storage limits as well? It’s not tied to lm-sensors, right?

Also, is there not a mapping from sensor name (vs. only number) to the limits?

Thanks!

Same issue here since many years.
This not only happens for linux VMs or LXD (Ubuntu) / LXC (Proxmox) containers.
Also network devices like MikroTik are affected…

This is so annoying and I totally understand that LibreNMS can’t do much about it.
We need at least some workarounds like:

  • Tell LibreNMS to:
  • ignore all health sensors for a device
  • don’t calculate some weird values for CPU low temperature warning = I don’t think 20 degree celsius is bad :slight_smile:

I can’t re-produce the behaviour, not even by rebooting lxc host and containers multiple times + running poller.php / discovery.php manually…

This drives me nuts since many years :wink:

Sort of glad it’s not just me :laughing:. I’m running Ubuntu, on HW (no VM involved). Still happens. It re-updates on every reboot, makes it a real pain. Breaks storage mappings and thresholds also.

1 Like

Confirmed I see it with Mikrotik devices seemingly more in the normal course of operation, and things like Palo Alto session counters when they are offline for a long time or get interrupted (loss/latency) during regular discovery.

Anyone with a workaround or new understandings why this happens on linux hosts?

In one example I replaced an Intel NUC8 with NUC13 and the sensors get deleted / added on every reboot.

Hi.

This is still happening when LibreNMS discovers (or rediscovers) the device.
For some reason, silenced alerts become active again.

This is still happening when LibreNMS discovers (or rediscovers) the device.
For some reason, silenced alerts become active again.

That’s not the real reason. During discovery (which is usually ran once every day for each device) it tries to find new, deleted and updated sensors. For example, lets say we have a switch that has 2 slots for power supplies, but only one was populated. The engineer added a second power supply. LibreNMS will discover the new SNMP sensors for the added power supply during the (re)discovery.

Same goes for deleting sensors… If an engineer removes the additional power supply, those sensors would be removed.

What is happening here is that once LibreNMS tries to discover new, deleted or updated sensors it’s getting a whole list of new sensor IDs and a whole list of deleted sensor IDs.

eg. it might previously had a sensor with ID 1 that was ‘CPU Usage’. Then for some reason, the monitored host (=The linux box) suddenly reindexes the sensors, and the ‘CPU Usage’ sensor suddenly becomes ID 2. During (re)discovery LibreNMS will then remove sensor ID 1 and add sensor ID 2.

This newly added sensor with ID 2 is a completely new sensor from the perspective of LibreNMS and thus any alerting rules will be applied to this new sensor as well. Any silenced alerts for the now removed sensor ID 1 are ignored (and deleted), because ID 2 is a completely new sensor.

The root of the issue is that on Linux machines, the sensor indexes change. LibreNMS’ discovery is not causing it, but you won’t notice the change in LibreNMS till the (re)discovery process has ran.

The solution to this problem is that Linux’s SNMP daemon (net-snmp probably) should retain it’s indexes. If ‘CPU Usage’ was ID 1, it should stay at ID 1.

To go back to my example of a power supply in a switch… What Linux is doing is saying ‘Hey I got a power supply sensor for you at index 123’, then after some trigger it says ‘Hey I got you a new power supply sensor at index 345’, ‘Oh and the power supply at index 123 is now removed’. LibreNMS can’t know if 345 is a third and new power supply in the system, or if it’s perhaps the second power supply that had gotten a new index ID.

This makes sense, thanks! But :laughing: … could it not be that names could be remembered, auto “re-align” if matching? It’s a real pain that almost every day the custom alarm limits are tossed out.

You are barking up the wrong tree. This needs to be fixed in net-snmp (or whatever Linux package is acting as the SNMP server).

While there are device specific workarounds in LibreNMS… this isn’t something that we could reliably get right.

What you could do is write a little SQL script dat re-configures the limits daily based on device_id and sensor_descr. It’s not ideal, but it works and is an easy workaround.

eg:
UPDATE sensors SET sensor_limit_warn = 5 WHERE device_id = X AND sensor_descr = ‘CPU Usage’
UPDATE sensors SET sensor_limit_warn = 123 WHERE device_id = X AND sensor_descr = ‘Fan 1’

obviously replace the X with the device_id of your Linux machine and the sensor_descr on whatever you want to set.

You could also look at the sensor_type column, but I’m not sure what is set for that column on Linux boxes.

1 Like

Agreed, may need a workaround here. Wondering if it’s possible to trigger this, when LibreNMS rediscovers the device? To be able to avoid the alarm … “fix” it as soon as the change happens.

Thoughts?

Thanks!

Tried your command, it works! Slightly modified, for my case (as it’s storage),

UPDATE storage SET storage_perc_warn = 95 WHERE device_id = 35 AND storage_descr = '/mnt/ix2';

Perfect! So now two thoughts :wink:

  1. Setting up a text file, looping over rows, where I set the variable names and values, just do substitution and can run through a bunch of updates
  2. But, how to trigger when a device is rediscovered? And really, just want to trigger for that device, agreed?

Thanks!

LibreNMS has a ‘last_discovered’ column in the devices table in MySQL.
You could write a script that periodically (eg. every hour or perhaps even every 5 minutes) checks if this value has changed since the last run. If it has changed, run your update queries.

But again… this is a hack for an issue that should be solved in Net-SNMP.

No argument, but I don’t see Net-SNMP changing anytime soon on this … agreed?

Thanks!

Depends. if nobody filed a bug report, probably not.