Docker 21.11 new install and new librenms user

Tags: #<Tag:0x00007fed60e84d10> #<Tag:0x00007fed60e84c48>

librenms:/opt/librenms$ ./validate.php

Component Version
LibreNMS 21.11.0
DB Schema 2021_25_01_0129_isis_adjacencies_nullable (224)
PHP 7.4.25
Python 3.9.5
MySQL 10.5.13-MariaDB-1:10.5.13+maria~focal
RRDTool 1.7.2
SNMP NET-SNMP 5.9

====================================

[OK] Installed from package; no Composer required
[OK] Database connection successful
[OK] Database schema correct
[WARN] IPv6 is disabled on your server, you will not be able to add IPv6 devices.
[WARN] Non-git install, updates are manual or from package

I’m a long time observium user and new librenms docker user…

I have a bare metal Fedora 35 (not that it matter much but in case it might)

I didn’t migrate anything from observium, I manually added hosts…

First I did the docker clone…
git clone https://github.com/librenms/docker.git --depth=1 docker-librenms

Then the usual
docker-compose -f /root/docker-librenms/examples/compose/docker-compose.yml [pull, create, up]

After a few days I noticed that I didn’t have rrdcached… figured out I needed a new docker-compose for that…

docker-compose -f /root/docker-librenms/examples/rrdcached-server/docker-compose.yml [pull, create, up, etc…]
(which it looked like that gave me rrdcached… please let me know if it didn’t… )

figured out that I lost all my nodes and data… so I manually added big important things, and figured out that I cannot figure out how the msmtpd.env works… (rrdcached was more important…) I got default alerts, and notifications, going… etc… now I’m getting emails…

But these emails…

For example:

Alert for device 10.20.0.82 - State Sensor Critical
Severity: critical
Timestamp: 2021-11-22 15:12:16
Unique-ID: 588
Rule: State Sensor Critical Faults:
#1: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 808; sensor_oid = .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870; sensor_descr = Health; state_descr = critical;
Alert sent to:

10.20.0.82 is an eql4210 there is a sensor that librenms finds called ‘Health’ (observium doesn’t have this) and is reporting a 3 (the rrdgraph is a flat 3 for the past two weeks, etc…)

Not sure what that is…

snmpget -v 2c -c xyz1232^2=4 10.20.0.82 .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870
SNMPv2-SMI::enterprises.12740.2.1.5.1.1.1.1084917870 = INTEGER: 3

the snmp poll does return a 3…

and /opt/librenms/mibs/dell … there is no answer for what it is… hence the numeric oid…

I can’t even find on dell’s site the MIBs listed in librenms…

There is a secondary email that comes every five minutes when the other email comes…

Alert for device 10.20.0.82 - Sensor over limit - Check Device Health Settings
Severity: critical
Timestamp: 2021-11-22 15:12:16
Unique-ID: 583
Rule: Sensor over limit - Check Device Health Settings Faults:
#1: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 808; sensor_oid = .1.3.6.1.4.1.12740.2.1.5.1.1.1.1084917870; sensor_descr = Health;
#2: sysObjectID = .1.3.6.1.4.1.12740.17.1; location_id = 12; sensor_id = 834; sensor_oid = .1.3.6.1.4.1.12740.3.1.1.1.8.1.1084917870.24; sensor_descr = Disk 24 - S3L1HDDY;
Alert sent to:

(great that gives me a disk to look at… )

Disk 24 is a spare, and it’s state has a return value of 2.

[10.20.0.82] :: Sensor :: Disk 24 - S3L1HDDY :: State
:8000/graphs/to=1637688300/id=834/type=sensor_state/from=1637601900/

Oddly enough… this url
:8000/graphs/device=78/type=device_storage/from=1637601948/legend=yes/popup_title=Storage+Usage/

shows ‘No Storage’ (in blue)

I would like to confirm if this is some sort of actual error, observium doesn’t register a problem, and if it is a problem, I’d like to get more information about it so that I could put information into an issue report… (and if it isn’t a problem) how do I stop the messages from coming, or at least ack them in the mean time…

Thanks in advance for taking the time to read all of this…

Probably the spare state is set to the wrong generic status…

eqlMemberHealthStatus
INTEGER {
unknown (0),
normal (1),
warning (2),
critical (3)
}

So, 3 = critical.

eqlDiskStatus OBJECT-TYPE
SYNTAX INTEGER {
on-line (1),
spare (2),
failed (3),
off-line (4),
alt-sig (5),
too-small(6),
history-of-failures(7),
unsupported-version(8),
unhealthy(9),
replacement(10),
encrypted(11),
notApproved(12),
preempt-failed(13)
}

2 is spare, which is marked as 0 in LibreNMS (ok).

You sure the device didn’t have an issue?
Explanations:

  1. The device reported incorrect data. (not likely)
  2. The device had a transient error. (or persistent if still alarming)
  3. Your alert rules are not quite right and giving false alarms.