BGP false positives monitoring Cisco routers/switches

Hi all,

We’re seeing an increasing number of false positives relating to BGP sessions across our Cisco router and L3 switch estate. At first I thought they were genuine, but I’ve searched the logs and the sessions are solid and have been up for at least a few weeks, most more than a year. Other than these errors the devices appear stable in LibreNMS.

I’ve optimised the poller and I can see all devices are successfully polled within a window of about 200s. Not sure where to look first so some guidance would be really appreciated.

The issue has occurred across multiple devices including ASR1002, ME3600X.

Thanks in advance

George

Anyone have any thoughts on where I could start with investigating this?

Can you elaborate a bit more verbose of what kind of false positives do you have? What is wrong? technical details? SNMP Polling or Traps?

I recently had a case on a 3rd party NMS some mixed IP addresse on recieved SNMP Traps. I hope this is not the case here…

Sure, thanks.

These issues are all based on SNMP polled devices, not traps. We get monitoring alerts that look exactly as if the BGP session has died, happens across multiple monitored devices, and seems to relate to the same peers on the respective devices. On checking the logs of each monitored device there is no corresponding logged drop of the BGP session. The phantom outage usually lasts for a single polling cycle, so after 5 mins the session appears to come back up.

We didn’t used to get any false positives of this type and have been running the server for a long while, so I feel like something has changed, but not really sure where to start looking to understand what or why.