Issues with 1 minute alerting

Hi,

We have been trying a few different network monitoring systems looking for a replacement for ZenOSS. We have been testing LibreNMS for a few weeks now and it seems to check all the boxes. One serious problem with ZenOSS was the disk I/O wearing out the SSDs and we were pleased to see LibreNMS is much better at minimising I/O to the disks.

Before we decide to spin up some production servers there is one issue to resolve, fast ping checking. We tried following the instructions, changing ping_rrd_step then running the rrdstep.php script (we also have the ping.php line in crontab as instructed) but this just broke the Ping Perf graphs. If you change the ping_rrd_step back to 300 and run the script again the graphs fix themselves with a little lost data. We are using the latest stable rrdtool 1.7.1

Our requirements are to get 1 minute alerting (calling an email

transport) when devices go down or come back up, we are not too bothered about the RRD ping graph, it can stay at 5 minutes, so long as we can get accurate times from the alert logs. We are still only getting alert updates every 5 minutes.

We did some testing by shutting down an interface to a test switch to trigger alerts then bringing it back up, running the ping.php script in debug mode does seem to trigger a call to RunRules() but not 100% of the time. I added the following to the includes/alerts.inc.php file in

RunRules($device_id)

$f = fopen("/opt/librenms/xxx.log", "a");

if ($f) {

fprintf($f, “RunRules “.date(‘l jS \of F Y h:i:s A’).” - Device “.$device_id.”\n”);

fclose($f);

}

This just logs when RunRules is called and despite the ping.php and alerts.php crontab entries being every minute, RunRules is still only called at 5 minute intervals. Does anyone know why this is the case, or anything we should check.

Output of ./validate.php:

-bash-4.2$ ./validate.php

Component Version
LibreNMS 1.50-51-g39ff4c7
DB Schema 2019_02_10_220000_add_dates_to_fdb (132)
PHP 7.2.14
MySQL 5.5.60-MariaDB
RRDTool 1.7.1
SNMP NET-SNMP 5.7.2

====================================

[OK] Composer Version: 1.8.5
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py
[WARN] Your install is over 24 hours out of date, last update: Sat, 20 Apr 2019 17:32:17 +0000
[FIX]:
Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors.
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
app/Jobs/PingCheck.php

I have a PR open to improve alerting if you would like to test it. https://github.com/librenms/librenms/pull/9765.

I’ve taken a look at your patch, at least browsed through the code on github, not sure if it will do what we want.

For us, in the PingCheck recordData function, it is not getting to the RunRules function every minute while doing my shut/no shut testing.
There is something earlier in the code somewhere stopping this. Your patch just replaces the RunRules so would likely not get called in just the same way.

I will keep investigating when I get time. The more I keep looking the more I understand.

i have similar issues, 1 minute alerting is working earlier, but now seems not working.

-bash-4.2$ ./validate.php

Component Version
LibreNMS 1.51-21-ga46fc9d
DB Schema 2019_02_10_220000_add_dates_to_fdb (132)
PHP 7.2.13
MySQL 5.5.60-MariaDB
RRDTool 1.4.8
SNMP NET-SNMP 5.7.2

====================================

[OK] Composer Version: 1.8.5
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
-bash-4.2$