Service-Check Alert

amaizenshtein · 4 January 2018 13:41

Hi Guys,

I’ve a lot of service checks of Nagios plugin, as for today I need to set alert per service check with the following statement %services.service_ip = “x.x.x.x”, in this case I’ll have to prepare alert per service with relevant IP
Is it possible to use one service check alert per all service checks?

Kevin_Krumm · 4 January 2018 14:55

Not sure what you mean by that. ^^^

But you could use this alert rule to check if a service is not working.

amaizenshtein · 4 January 2018 15:13

So if I use this service check status, one alert rule can cover all service check failures? in case I’ve fping service to 100 different destinations?

Kevin_Krumm · 4 January 2018 15:16

Yes, it would.

https://docs.librenms.org/#Extensions/Services/#alerting

amaizenshtein · 4 January 2018 15:37

I’ve set the following, it doesn’t trigger the alert:

Kevin_Krumm · 4 January 2018 15:38

You didn’t type in the rule correctly try it again

%services.service_status != "0"

amaizenshtein · 4 January 2018 15:51

I tried to set "0"
it gives me the following %services.service_status != ""0""
with 2 times "

Kevin_Krumm · 4 January 2018 15:59

Okay, odd try this go into the alerts collection and use the alert rule that is in there, I know for sure it works.

Select the rule named “service up/down” you can name it to whatever you like.

amaizenshtein · 4 January 2018 16:08

i set this one, and it’s still doesn’t work

Kevin_Krumm · 4 January 2018 16:18

Im not sure then, cause it works just fine for me with service checks. Also did you not try the alert that was in the alert collection???

amaizenshtein · 4 January 2018 16:32

I used the alert in the log collection, it had additional macro command I removed it, but also tried to set with the macro, it didn’t work, my checks are based on fping, is this service check status should take it?

amaizenshtein · 4 January 2018 16:33

Maybe I case use this trigger ? services.service_message?
but in that case what variable I should use

Kevin_Krumm · 4 January 2018 16:33

Im using Fping service checks also it alerts on it.

amaizenshtein · 4 January 2018 16:34

i see in the service it shows red, but it doesn’t trigger the alert

Kevin_Krumm · 4 January 2018 16:36

Run service debug on the service that has failed.

post the ouput.

https://docs.librenms.org/#Extensions/Services/#debug

amaizenshtein · 4 January 2018 16:44

DEBUG!
SQL[SELECT * FROM devices AS D, services AS S WHERE S.device_id = D.device_id ORDER by D.device_id DESC]

Nagios Service - 1
Request: /usr/lib64/nagios/plugins/check_fping -H 8.8.8.8 -T 1000 -i 1000 -n 5
Perf Data - DS: loss, Value: 0, UOM: %
Perf Data - DS: rta, Value: 0.001950, UOM: s
Response: FPING OK - 8.8.8.8 (loss=0%, rta=1.950000 ms)
Service DS: {
“loss”: “%”,
“rta”: “s”
}
RRD[update /opt/librenms/rrd/sv3-librenms01.pan.local/services-1.rrd N:0:0.001950]
Sending sv3-librenms01_pan_local.services.services.1.loss 0 1515083968
Sending sv3-librenms01_pan_local.services.services.1.rta 0.001950 1515083968
SQL[UPDATE services set service_message =‘FPING OK - 8.8.8.8 (loss=0%, rta=1.950000 ms)’ WHERE service_id=‘1’]

Nagios Service - 2
Request: /usr/lib64/nagios/plugins/check_fping -H 8.8.4.4 -T 1000 -i 1000 -n 5
Perf Data - DS: loss, Value: 100, UOM: %
Response: FPING CRITICAL - 8.8.4.4 (loss=100% )
Service DS: {
“loss”: “%”
}
RRD[update /opt/librenms/rrd/sv3-librenms01.pan.local/services-2.rrd N:100]
Sending sv3-librenms01_pan_local.services.services.2.loss 100 1515083974

Nagios Service - 3
Request: /usr/lib64/nagios/plugins/check_fping -H 10.106.10.1 -T 1000 -i 1000 -n 5
Perf Data - DS: loss, Value: 0, UOM: %
Perf Data - DS: rta, Value: 0.073200, UOM: s
Response: FPING OK - 10.106.10.1 (loss=0%, rta=73.200000 ms)
Service DS: {
“loss”: “%”,
“rta”: “s”
}
RRD[update /opt/librenms/rrd/sv3-librenms01.pan.local/services-3.rrd N:0:0.073200]
Sending sv3-librenms01_pan_local.services.services.3.loss 0 1515083978
Sending sv3-librenms01_pan_local.services.services.3.rta 0.073200 1515083978
SQL[UPDATE services set service_message =‘FPING OK - 10.106.10.1 (loss=0%, rta=73.200000 ms)’ WHERE service_id=‘3’]

Kevin_Krumm · 4 January 2018 16:45

Status = Critical so it should alert on the rule.

Kevin_Krumm · 4 January 2018 16:52

I just tested mine with this alert rule and it works.

Alert Rule

The Alert

The Service Check

amaizenshtein · 4 January 2018 17:20

it works for other librenms servers , let me restart the server

Kevin_Krumm · 4 January 2018 17:23

Lol, Okay…https://media.giphy.com/media/F7yLXA5fJ5sLC/giphy.gif

Any fails in validate.php