Hello,
I was so excited to find the extension of Fast-ping checks. However, after I have followed the instruction, the fast-ping doesn’t work as I expected.
Set ping_rrd_step
$config[‘ping_rrd_step’] = 60;
Update the rrd files
./scripts/rrdstep.php -h all
Update cron (removing any other ping.php or alert.php entries)
* * * * * librenms /opt/librenms/ping.php >> /dev/null 2>&1
also the config.php:
$config['fping'] = "/usr/sbin/fping";
$config['fping_options']['timeout'] = 500;
$config['fping_options']['count'] = 10;
$config['fping_options']['interval'] = 500;
$config['fping_options']['retries'] = 2;
I also setup the alert delay to 3m and I am expecting I will receive alert for 3m after the host is shut down.
but no, the device down event is only detected during normal polling.
When I ran the debug mode of ping.php, below is the result:
-bash-4.2$ ./ping.php -d -v
SQL[select `devices`.`device_id`, `hostname`, `status`, `status_reason`, `last_ping`, `last_ping_timetaken`, `max_depth` from `devices` left join `devices_attribs` on `devices`.`device_id` = `devices_attribs`.`device_id` and `devices_attribs`.`attrib_type` = ? where `disabled` = ? and (`devices_attribs`.`attrib_value` is null or `devices_attribs`.`attrib_value` != ?) order by `max_depth` asc ["override_icmp_disable",0,"true"] 0.68ms]
Tier 0 (3): 10.202.70.148, cloned-librenms01, 10.202.70.134
'fping' '-f' '-' '-e' '-t' '500' '-r' '2'
cloned-librenms01 is alive (0.13 ms)
Attempting to record data for cloned-librenms01... Deferred
10.202.70.134 is alive (1.10 ms)
Attempting to record data for 10.202.70.134... Deferred
10.202.70.148 is unreachable
Attempting to record data for 10.202.70.148... Deferred
Leftover devices, this shouldn't happen: cloned-librenms01, 10.202.70.134, 10.202.70.148
Devices left in tier:
Pinged 3 devices in 2.41s
EDIT 1:
the Deferred problem is resolved. I discovered some outdated dependency of PHP. once updated, the ping.php can update successfully.
However, the alert still doesn’t work correctly. Please see the Librenms log for your information:
EDIT 2:
I have a quick look on alerts.php and found that it is calculated the duration of the incident based on column time_logged in alert_log table. the ping.php won’t update the table, that is why the alert won’t be fired based on the real event time. Could you please update the ping.php and let it update the time_logged in alert_log table? should I raise as a bug?
EDIT 3:
the previous assumption may be not correct, I don’t know much about PHP coding. but the ping.php really doesn’t make sense to me. similar to the above scenario, which doesn’t trigger the alarm counting, the ping.php doesn’t stop the alert when the device come beck up in time. please see below picture as example:
Strangely, I was monitoring the database and running the rule query manually on database since its status changed to down util the status changed back to up. the query returns correct result as it should be:
But, why the alert is still triggered? this bug cause the fast-ping check is totally unusable.
Please help me in the troubleshooting or at least explain where could be wrong.
Thank you very much.