Power Alert

Hi All,

I’m struggeling to get a decent power alert going everything ive tried just gives SPAM alerts not 100% accurate and I’m not sure how to go forward.

Does anyone have a decent rule for when a device actually is down.

my rule @ the moment:

How does your rule behave?
Does it notify your when a device goes down?

its fake news gives me a alert that the device is down and then 1 minute later it says the power is restored.

this is 5 minutes apart.

power down

power up

Are you assuming that power is lost if the device isn’t reachable? That doesn’t seem to be the case in your example. It looks like LibreNMS couldn’t poll the device for a couple minutes but once it came back up the uptime was still over 100 days? So power remained on?

yeah so my alert is generating fake news…

Semi-fake news, maybe? I does seem like the device was unreachable to LibreNMS but it doesn’t seem like the cause was power related.

It’s hard to do a proper power alert without a managed UPS or an unmanaged UPS + environmental monitor (like a RoomAlert or something).

Without that I think the best you can do is add one rule like the “Device Up/Down” alert rule from the collections. If missing pings are likely you could up the delay to 5m or 6m so it takes more than 1 missed poll to trigger. When that rule does trigger you know that the device is unreachable but the cause could be power, a device fault, the network, etc. Then add a second rule like the “Device Rebooted” alert rule from the collections. This rule won’t trigger until the device is back up but if it does trigger you know that the uptime counter was reset so either power was lost (most likely) or the device crashed and rebooted.

1 Like

Edit: slashdoom beat me to it… :grinning:

I cover this with a combination of two alerts: one checks if the device is reachable via ICMP, and the second looks at the uptime of the device; anything less than 7 minutes uptime triggers a “Device Reloaded” alert.

So the first covers any reachability issue, and the second confirms a power event for whatever reason (we have a lot of those round here…). A Host Down recovery alert followed by a Device Reloaded alert tells us what is going on.

Host Down alert rule
macros.device_down = 1 AND devices.status_reason = "icmp"

Device Reloaded alert rule
macros.device = 1 AND devices.uptime < 420 AND macros.device_up = 1

1 Like