Alert when device is down longer than x minutes

I have created an alert rule that would email me if a device is down longer than 10 minutes, taken from a post back in 2017. But it isn’t working as expected and wonder if I have the rule set incorrectly.

Here is what I have for the rule.

image

Any help would be appreciated!

The past_10m macro really just derives a datetime value of the current time minus 10 minutes. So if current time minus 10m is '2019-05-21 10:32:50' that rule says…

if '2019-05-21 10:32:50' == 1 AND device.status != 1 then ALERT

Which as you can see doesn’t make logical sense and there’s no correlation between those 2 conditions.
Those macros are more useful when used as a value against a datetime field like…

syslog.timestamp >= macros.past_10m

For this though, you could remove that line and try to tune the delay around your polling interval. For example if you poll at the default 5m rate you might set the delay to 6m. Since you’re at the mercy of the polling interval, that should alert approx. 10m to 15m after the device goes down (if it stayed down during the intermediate poll(s)).

If you need more precision than that you’ll probably have to look into Fast Ping Checks.

Thanks! Think I have it working correctly with the delay.

1 Like