Alert on 2 out of 3 hosts fail in a cluster

Tags: #<Tag:0x00007f97571d9228>

Hey guys,
just to post a way to alert only if, let’s say, 2 out of 3 nodes fail in a cluster.

Usually people don’t need and like to receive a call during the night if just one node in a HA cluster has failed.

Lets say you have 3 devices like this: device-1-berlin, device-2-berlin, device-3-berlin.

A query should show if more than one device have failed:
SELECT 'BERLIN Cluster' as hostname,count(device_id) as failed FROM devices WHERE devices.hostname REGEXP "device-[1,2,3]-berlin" AND (devices.status = 0 && (devices.disabled = 0 && devices.ignore = 0)) HAVING failed>1;

Of course you can set the query to use some metrics instead of the device macros. Like:
application_metrics.metric = "TCd" AND application_metrics.value < 2

Be creative and share how do you deal with this :grinning: I would appreciate any new idea.