Hi,
I wanted to validate what are the default criterias that make a device being stated as down (from the macro %macros.device_down = “1”)? Is it only SNMP or fping as well, or if one fails the other will “double check”? Is it only timeouts or are there retries?
One of the reasons I am asking is that we in the process of implementing LibreNMS and are getting some host down alerts for which another system hasn’t picked up a single ICMP drop. It’s also a new DC environment with a lot of new technologies, so it’s a bit harder to determine what might be at fault especially as we are not seeing any interface flaps or log errors. So we are trying to see if they are false positives, just slower response times or errors that the other systems might not have picked up for some reason. I also don’t have a gap in any of the RRD graphs for that host at that time period.
The only post I found that had a similar question seemed to hint that hosts were marked down for some SNMP failures, but there hadn’t been an answer to it.
Thanks!