Sensor Limits Changing "Automatically"

arrmo · 17 December 2020 01:16

OK, it seems like I’m going crazy, but this has happened too many times for that . I manually change some sensor / health limits, as I want (e.g. Vcore for my CPU) … and then a day or two later, they are reset (automatically it seems). So I change them again, and the cycle repeats.

Is there some setting that is enabing this override of my manual settings?

Thanks!

arrmo · 29 March 2021 11:43

Just curious - anyone else seeing this? For my storage devices in particular, it’s happening almost daily.

Thanks!

PipoCanaja · 29 March 2021 11:59

It does not happen in general, but sometimes, with certain devices/OS, the sensors are deleted and created again. In that case you will loose any manual setting.
Could you check that your sensors are not deleted/recreated ?
If yes, the step coming next is to understand why and fix it

arrmo · 29 March 2021 14:23

Yes, I do think that’s it - I say that because the (Ubuntu Linux) server that this is running on … well, the other day I happened to notice that it had “lost” all it’s drives (in LibreNMS), except for two NFS mounts. For example, even the root ("/") partition was “gone”. Not sure why it is, but I do see devices and sensors dropping on this machine - and it’s the LibreNMS server, so clearly up .

Suggestions of things to check?

Thanks!

arrmo · 31 March 2021 17:47

You’re right! Got the warning again today (custom values lost), and I checked that device. A whole bunch of these (all sensors I think),

Open to any suggestions. Thanks!

PipoCanaja · 31 March 2021 18:00

Do you have any CPU overloading on your LibreNMS server ? Or on the device being monitored ? Any high latency ?
I have this kind of behaviour with an old Mac Mini running Debian, where the sensors are changing their ID randomly during reboot. So LibreNMS discovers new sensors after 50% of reboots.

arrmo · 31 March 2021 18:53

I don’t think so (could be wrong of course). The server and machine being monitored are the same (though I have seen this with other “clients” as well). As for overload, here is top output,

top - 13:51:36 up 3 days, 17:36,  3 users,  load average: 1.18, 1.08, 1.00
Tasks: 509 total,   1 running, 508 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  1.4 sy,  0.0 ni, 98.2 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  32102.4 total,    870.8 free,  19688.7 used,  11542.9 buff/cache
MiB Swap:  15320.0 total,   7670.0 free,   7650.0 used.  12310.6 avail Mem

So I don’t think it’s overloaded - it’s a Ryzen 7 2700X, and as you see ~ 98% idle .

Thanks!

PipoCanaja · 31 March 2021 18:56

yep. Looks good. I don’t see why these sensors are removed/rediscovered. I guess you’ll have to dig into the code to find out …

arrmo · 1 April 2021 00:26

Yep, agreed. As this happens very infrequently (i.e. can go a few weeks between occurrences) - I need to figure out how to get logging beefed up, to help debug.

Thanks!

arrmo · 9 April 2021 00:08

Arrgh - happened again today … and no OS updates, not even a reboot. Only “change” is the daily update to LibreNMS - I don’t see sensors going away / being re-added, but custom settings are reset again.

Is there a way to have debug output captured for all polls (for a single device)? Just asking because it’s very random, really need to capture it all to debug.

Thanks!

arrmo · 4 May 2021 01:16

FYI, happened again today - and no sensors added / removed it seems. Really not seeing what the trigger is . Open to suggestions!

Thanks!

arrmo · 8 February 2022 01:37

OK, I may have found a trigger here! It seems like rediscover (and reset of some sensors) happens on reboot. Make sense? Could that be the case?

Thanks!

PipoCanaja · 10 February 2022 22:43

I have a Linux MacMini which shows a similar pattern. after reboot, thermal sensors and fans will change OID, flapping between 2 different values. So sensor gets recreated, new default min/max values, and even more fun, the RRDs are kept so I have one RRD for OID1 and one RRD for OID2, the sum of both covers 100% of time …

Don’t know how and why the SNMP Agent behave s like this.

arrmo · 18 February 2022 22:50

FYI, just found the same thing with my Raspberry PI - so some seem to happen on multiple Linux devices / OS’s?

More than willing to try to debug / capture logs - just not quite sure where to dig on this one .

Thanks!

PipoCanaja · 20 February 2022 22:37

Unfortunately, it seems to be an issue in the device itself, not in LNMS. LNMS only receive the SNMP data and cannot change it.
I did not find much description on how lm-sensors defines the IDs…

arrmo · 20 February 2022 22:55

Hi,

But it’s not just lm-sensors … my (custom) threshold for SSD (storage) usage is also being reset. Or is it that when lm-sensors changes, all limits are being changed / reset?

Thanks!

arrmo · 5 April 2022 00:59

OK, found the trigger I think! Not good detective work on my part, sort of tripped over it

I happened to run some OS upgrades, and found that there was a misalignment between reported storage usage and reality. So I triggered a rediscover from the UI. It worked, but it also reset all of my thresholds! Or at least for storage, the one I was looking at.

So it seems - on reboot (and perhaps kernel update, or some other trigger?), rediscover is being run - and in the process, changing the thresholds.

Sound about right?

Thanks!

PipoCanaja · 5 April 2022 07:36

Thresholds are changed when then sensor/storage/entity is created only. But if for some reason, the sensor/storage/entity is changing ID, meaning deleted and created again right after, then you end up with reset thresholds.

arrmo · 6 April 2022 00:28

Hmmm … but would that happen on every reboot? That’s when I notice that the thresholds are being reset / changed.

Thanks!

PipoCanaja · 6 April 2022 07:48

Yep, that is a beahviour I see with at least 1 MacMini running Debian, net-snmp with lm-sensors. Don’t know why NetSNMP keeps renumbering the sensors, but each time they change ID, they get deleted/rediscovered and all thresholds are reseted. There is nothing really LibreNMS can do about it, because ID is the only way to identify a sensor from one poll/discovery to another.