Docker 21.11 new install and new librenms user

bcookatpcsd · 23 November 2021 17:23

librenms:/opt/librenms$ ./validate.php

Component	Version
LibreNMS	21.11.0
DB Schema	2021_25_01_0129_isis_adjacencies_nullable (224)
PHP	7.4.25
Python	3.9.5
MySQL	10.5.13-MariaDB-1:10.5.13+maria~focal
RRDTool	1.7.2
SNMP	NET-SNMP 5.9

====================================

[OK] Installed from package; no Composer required
[OK] Database connection successful
[OK] Database schema correct
[WARN] IPv6 is disabled on your server, you will not be able to add IPv6 devices.
[WARN] Non-git install, updates are manual or from package

I’m a long time observium user and new librenms docker user…

I have a bare metal Fedora 35 (not that it matter much but in case it might)

I didn’t migrate anything from observium, I manually added hosts…

First I did the docker clone…
git clone https://github.com/librenms/docker.git --depth=1 docker-librenms

Then the usual
docker-compose -f /root/docker-librenms/examples/compose/docker-compose.yml [pull, create, up]

After a few days I noticed that I didn’t have rrdcached… figured out I needed a new docker-compose for that…

docker-compose -f /root/docker-librenms/examples/rrdcached-server/docker-compose.yml [pull, create, up, etc…]
(which it looked like that gave me rrdcached… please let me know if it didn’t… )

figured out that I lost all my nodes and data… so I manually added big important things, and figured out that I cannot figure out how the msmtpd.env works… (rrdcached was more important…) I got default alerts, and notifications, going… etc… now I’m getting emails…

But these emails…

For example:

Alert for device 10.20.0.82 - State Sensor Critical
Severity: critical
Timestamp: 2021-11-22 15:12:16
Unique-ID: 588
Rule: State Sensor Critical Faults:
#1: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 808; sensor_oid = .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870; sensor_descr = Health; state_descr = critical;
Alert sent to:

10.20.0.82 is an eql4210 there is a sensor that librenms finds called ‘Health’ (observium doesn’t have this) and is reporting a 3 (the rrdgraph is a flat 3 for the past two weeks, etc…)

Not sure what that is…

snmpget -v 2c -c xyz1232^2=4 10.20.0.82 .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870
SNMPv2-SMI::enterprises.12740.2.1.5.1.1.1.1084917870 = INTEGER: 3

the snmp poll does return a 3…

and /opt/librenms/mibs/dell … there is no answer for what it is… hence the numeric oid…

I can’t even find on dell’s site the MIBs listed in librenms…

There is a secondary email that comes every five minutes when the other email comes…

Alert for device 10.20.0.82 - Sensor over limit - Check Device Health Settings
Severity: critical
Timestamp: 2021-11-22 15:12:16
Unique-ID: 583
Rule: Sensor over limit - Check Device Health Settings Faults:
#1: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 808; sensor_oid = .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870; sensor_descr = Health;
#2: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 834; sensor_oid = .1.3.6.1.4.1.12740.3.1.1.1.8.1.1084917870.24; sensor_descr = Disk 24 - S3L1HDDY;
Alert sent to:

(great that gives me a disk to look at… )

Disk 24 is a spare, and it’s state has a return value of 2.

[10.20.0.82] :: Sensor :: Disk 24 - S3L1HDDY :: State
:8000/graphs/to=1637688300/id=834/type=sensor_state/from=1637601900/

Oddly enough… this url
:8000/graphs/device=78/type=device_storage/from=1637601948/legend=yes/popup_title=Storage+Usage/

shows ‘No Storage’ (in blue)

I would like to confirm if this is some sort of actual error, observium doesn’t register a problem, and if it is a problem, I’d like to get more information about it so that I could put information into an issue report… (and if it isn’t a problem) how do I stop the messages from coming, or at least ack them in the mean time…

Thanks in advance for taking the time to read all of this…

murrant · 25 November 2021 13:41

Probably the spare state is set to the wrong generic status…

murrant · 25 November 2021 13:52

eqlMemberHealthStatus
INTEGER {
unknown (0),
normal (1),
warning (2),
critical (3)
}

So, 3 = critical.

eqlDiskStatus OBJECT-TYPE
SYNTAX INTEGER {
on-line (1),
spare (2),
failed (3),
off-line (4),
alt-sig (5),
too-small(6),
history-of-failures(7),
unsupported-version(8),
unhealthy(9),
replacement(10),
encrypted(11),
notApproved(12),
preempt-failed(13)
}

2 is spare, which is marked as 0 in LibreNMS (ok).

You sure the device didn’t have an issue?
Explanations:

The device reported incorrect data. (not likely)
The device had a transient error. (or persistent if still alarming)
Your alert rules are not quite right and giving false alarms.

system · 23 February 2022 13:52

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.