BGP alerts

Hello,

Was there an update last night or recently that had to do with BGP alerts?

I have an ASR9006, LibreNMS says that all the ipv4 sessions are down, but on the router that sessions are up.

LibreNMS says the ipv6 sessions are fine.

I deleted the alert rule and then re-added it. Same results.

What Info do you need?

-Mike

After I posted this message I did the ./daily.sh -

I went from -

[root@librenms librenms]# ./validate.php

Component Version
LibreNMS 1.33-363-g2b1d63c
DB Schema 230
PHP 7.1.11
MySQL 5.1.73
RRDTool 1.3.8
SNMP NET-SNMP 5.5

====================================

to -

[root@librenms librenms]# ./validate.php

Component Version
LibreNMS 1.33-364-g47cb014
DB Schema 230
PHP 7.1.11
MySQL 5.1.73
RRDTool 1.3.8
SNMP NET-SNMP 5.5

====================================

My issue still exists…

-Mike

Hi Mike,

It might be helpful to see your BGP alert rule? Unless they are the default ones with no changes made at all?

-Jeff

It is the default rule.

I also have 2 ASR9001’s and this issue is not happening on them.

I deleted the rule and then re-added, still no change.

Still having issue.

Attached are screen shots of all the alerts that LibreNMS says are down, but are really up, the alerts i have configured, and the config of the BGP down alert.

Something changed with with the LibreNMS update that happened Thursday Night (1/18/18) into Friday Morning (1/19/18) USA East Coast Time.

Ideas?

Thanks,

-Mike

Hello @mvoity,
It is probably connected to bgp-peers refactor by @murrant.
I tested it but probably missed something or some incompatibility
with your device.
Try to collect:
./discover.php -h affected-device -m bgp-peers -d
./poller.php -h affected-device -m bgp-peers -d
and check output for problems.

@zombah Thanks for you reply.

I did the discovery.php and poller.php.

The outputs are a mile long each, first pass i didn’t see any errors.

What info, if any, should i post?

Thanks!

-Mike

@mvoity Hm if no visible signs try to save output into file, then revert bgp-peers update commit and collect again, after that diff outputs to see differences. Usually last part of discover with sql inserts is most readable for comparison.

Here is the output -

Nothing is jumping at me -

https://pastebin.com/9kxwRAjx

@mvoity Ah i forgot that discover.php for Cisco only show peer list, status changes tracked with poller.php where shown all peer stats and its safis stats. Output seems fine, try to grep it for neighbor which status you expect parsed wrong and track all its occurancies.

@zombah, Again, thanks for your response, It’s all the ipv4 Neighbors on this router that LibreNMS sees as down, in reality they are up. The ipv6 neighbors are not reporting down.

Something clearly changed in the LibreNMS code Thursday Night (1/18/18) into Friday Morning (1/19/18) because prior to Friday Morning, LibreNMS saw the device as everything was a-ok with BGP.

-Mike

Can you post the poller debug output? ./poller.php -m bgp-peers -d -h HOSTNAME

Also, if you are willing to donate some test data, we can try to make sure this doesn’t happen again. ./scripts/collect-snmp-data.php -m bgp-peers -h HOSTNAME

@mvoity I see alot “No Such Instance currently exists at this OID” in your log and result empty arrays with peers cbgp data, so problem somehow connected to polling stats from devices.
@murrant it seems @mvoity posted discover and poller data in one paste, just scroll to the bottom

Thanks I’ll check it in a bit.

I checked one of my iosxr boxes with both ipv4 and ipv6 peers, they seems fine. But i have much fewer peers on it and polling with snmp v2.

Try upgrading your net-snmp, 5.5 is pretty old.

Thank you @zombah and @murrant for you replies.

I use v3 on this router.

What version of NET-SNMP do you recommend for a RHEL 6.9 server? 5.7?

Thanks again,

-Mike

Yes, the latest release of net-snmp 5.7 should do.

@murrant Thank you again for your reply, I will give that a shot. Do you know what repo i might find it from easily or is this something i need to do from scratch?

Thanks,

-Mike

@murrant Thanks again for your reply.

Moving to NET-SNMP 5.7 is going to be more difficult then expected.

Like i said in previous post, Something clearly changed in the LibreNMS code Thursday Night (1/18/18) into Friday Morning (1/19/18) because prior to Friday Morning, LibreNMS saw the device as everything was a-ok with BGP.

How can I get someone to look into this deeper and what kind of debugs or data do you need to fix this?

Thanks,

-Mike