Default values to mark a device as down?

Hi,

I wanted to validate what are the default criterias that make a device being stated as down (from the macro %macros.device_down = “1”)? Is it only SNMP or fping as well, or if one fails the other will “double check”? Is it only timeouts or are there retries?

One of the reasons I am asking is that we in the process of implementing LibreNMS and are getting some host down alerts for which another system hasn’t picked up a single ICMP drop. It’s also a new DC environment with a lot of new technologies, so it’s a bit harder to determine what might be at fault especially as we are not seeing any interface flaps or log errors. So we are trying to see if they are false positives, just slower response times or errors that the other systems might not have picked up for some reason. I also don’t have a gap in any of the RRD graphs for that host at that time period.

The only post I found that had a similar question seemed to hint that hosts were marked down for some SNMP failures, but there hadn’t been an answer to it.

Thanks!

A device down only has to fail either icmp or snmp check.

We only mark a device down from icmp if it has 100% packet loss at present. Are you sure it’s saying it’s down for icmp and not snmp? It will say in the eventlog. Either way, possibly tweaking the settings may help (slightly higher timeout, increase retries, etc).

http://docs.librenms.org/Support/Configuration/#fping
http://docs.librenms.org/Support/Configuration/#snmp-settings

Thanks, somehow I missed the down justification it in the eventlog. The alert and the up hadn’t mentioned why, , but the down confirmed it was indeed because of icmp.

I’ll try to tweak settings to validate then, thanks

I think default behavior if icon ping pass but snmp timeout or for some reason unreachable device mark as down.
I sometimes got false alert device still can ping but snmp unreadable then it alert

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.