Sub-Minute ping check not alerting correctly

Heath · 5 June 2019 18:14

I’m trying to get the notifications figured out in LibreNMS as I transition away from an old monitoring solution. I’m still getting a few false positives, but my biggest issue is the email alerts are slow to send out.

I’ve configured the Fast Ping Checking for “sub minute ping check” according to the documentation.

(/etc/cron.d/librenms)

(/opt/librenms/config.php)

I’ve configured the alert rule to trigger when the device is down AND the reason is icmp.

(gui Alerts->Alert Rules->Devices up/down)

I’m still a little unsure what the exact behavior this results in. My current monitoring solution polls (sends 1 ping) every 30 seconds. After three successive failed polls, it generates an alert. So I get an alert within 2 minutes of the device going down. Within 1 minute of the device coming back online, the alert is cleared.

Using the above pictured settings, I get the “down” notification from LibreNMS about 3 or 4 minutes after I get the notification from the old system. The “up” notification comes about 5 minutes after the notification from the old system.

Just this morning I had a device reboot from a power blip. It was down and back up and both alerts received from the old system before I got the down email from LibreNMS.
9:24am - “Down” notification from old system
9:26am - “Up” notification from old system
9:27am - “Down” notification from LibreNMS
9:32am - “Up” notification from LibreNMS

So it seems that one of my issues is LibreNMS generating notifications only once every 5 minutes despite my cron file configured otherwise AND despite the documentation saying they run every minute.

Why are notifications being sent out at 5 minute intervals despite the configuration? What do I have wrong?

What is the actual behavior - considering retries, counts, intervals, etc - that triggers the alert? Can someone walk me through the step-by-step of what is happening with this process?

Heath · 7 June 2019 13:06

So nobody has any ideas? Is this a bug? Do I have something wrong?

murrant · 7 June 2019 15:15

No idea without doing some major investigation, which I won’t have time to do any time soon.

If you try the dispatcher service in the docs, does the same thing happen?

Heath · 7 June 2019 15:34

I have not tried the dispatcher service. I just looked at the documentation and will have to study it some more when I get time. I’m not really a Linux guy and didn’t quite understand it all with the first read-through. But I’ll be checking it out. Thanks for the pointer!

murrant · 7 June 2019 19:49

It basically replaces the cron jobs