Multiple devices down/unreachable Fast Ping only updates a single device down/ureachable

Hello,

Fast Ping does not work when multiple unreachable devices are down
Ping is set run every minute via the /etc/cron.d/librenms cron file.

The Poller that runs every 5 minutes detects all unreachable devices updates mySql and runs the proper alert rules like it should.

I think this is a bug:
Fast Ping works great for a single device that is unreachable. It detects the device is unreachable, updates the mySql ‘devices.status:0’ and ‘devices.status_reason:icmp’ and runs the proper alert rule when a single device becomes unreachable: devices.status = 0 AND devices.status_reason = “icmp”.

However when multiple devices are unreachable, only a single device is updated in the Mysql database and the proper alert rule is run. The Ping.php script recognizes the remaining devices are unreachable but does not update mySql and run the proper rule.

I am running the latest LibreNMS 1.48.1
I have setup up Fast Ping per the docs.

  • The output of ./validate.php:
    ====================================
    Component | Version
    --------- | -------
    LibreNMS | 1.48.1
    DB Schema | 2019_01_16_195644_add_vrf_id_and_bgpLocalAs (131)
    PHP | 7.2.10-0ubuntu0.18.04.1
    MySQL | 10.1.34-MariaDB-0ubuntu0.18.04.1
    RRDTool | 1.7.0
    SNMP | NET-SNMP 5.7.3
    ====================================
    [OK] Composer Version: 1.8.4
    [OK] Dependencies up-to-date.
    [OK] Database connection successful
    [OK] Database schema correct

Devices: 192.168.50.59 and 192.168.50.60 are currently powered off, and there is only a alert for 192.168.50.59. Not for 192.168.50.60

mySql hostname, status, status_reason output:
MariaDB [librenms]> select hostname, status, status_reason from devices librenms;

±-------------------±-------±-------------------+
| hostname | status | status_reason |
±-------------------±-------±----------- -------+
| localhost | 1 | |
| 192.168.50.248 | 1 | |
| 192.168.50.59 | 0 | icmp |
| 192.168.50.60 | 1 | |
±-------------------±------±--------------------+
4 rows in set (0.00 sec)

Output after Poller runs:
±-------------------±-------±-------------------+
| hostname | status | status_reason |
±-------------------±-------±----------- -------+
| localhost | 1 | |
| 192.168.50.248 | 1 | |
| 192.168.50.59 | 0 | icmp |
| 192.168.50.60 | 0 | icmp |
±-------------------±------±--------------------+

Output of ping.php -v

librenms@librenms:~$ ./ping.php -v
Tier 0 (4): localhost, 192.168.50.248, 192.168.50.59, 192.168.50.60
Attempting to record data for localhost… Success
RRDtool Output: Attempting to record data for 192.168.50.248… Success
RRDtool Output: OK u:0.00 s:0.00 r:0.02
Attempting to record data for 192.168.50.60… Success
RRDtool Output: OK u:0.00 s:0.00 r:0.03
Attempting to record data for 192.168.50.59… Success
RRDtool Output: OK u:0.00 s:0.00 r:0.05
Pinged 4 devices in 2.45s

Any help will be appreciated.
Thanks
srg
4 rows in set (0.00 sec)

Sounds like you set the delay to 5 minutes on your alert rules

The Fast Ping rule I setup is as follows:
devices.status = 0 AND devices.status_reason = “icmp”, severity = critical, Max alerts: -1, Delay: 30s, Interval 60s.

The problem is, that once a single device is flagged as ureachable additional devices that are unreachable are ignored.

The rule groups the together. As multiple items. You should get a got worse alert… I think.