just to post a way to alert only if, let’s say, 2 out of 3 nodes fail in a cluster.
Usually people don’t need and like to receive a call during the night if just one node in a HA cluster has failed.
Lets say you have 3 devices like this: device-1-berlin, device-2-berlin, device-3-berlin.
A query should show if more than one device have failed:
SELECT 'BERLIN Cluster' as hostname,count(device_id) as failed FROM devices WHERE devices.hostname REGEXP "device-[1,2,3]-berlin" AND (devices.status = 0 && (devices.disabled = 0 && devices.ignore = 0)) HAVING failed>1;
Of course you can set the query to use some metrics instead of the device macros. Like:
application_metrics.metric = "TCd" AND application_metrics.value < 2
Be creative and share how do you deal with this I would appreciate any new idea.