Help with BGP Session Flap Error

Hello,

I am having issues with continuous BGP Sessions flapping in the event log. Every time the system polls, I have a new event log, even though when checking the devices, they did not have a flap. I am not sure how to address this. I do not have any rules set up for this, it must be a default alert. It is almost like it is perpetually sending the alert from 1 flap. This is for every Nokia router I have in the system. Any help is appreciated.

./validate.php

Component Version
LibreNMS 25.5.0-60-g105923b65 (2025-05-29T07:33:49-07:00)
DB Schema 2025_05_25_183627_drop_view_port_mac_link (341)
PHP 8.3.15
Python 3.12.3
Database MariaDB 10.11.11-MariaDB-0ubuntu0.24.04.2
RRDTool 1.7.2
SNMP 5.9.4.pre2
===========================================

[OK] Composer Version: 2.8.9
[OK] Dependencies up-to-date.
[OK] Database Connected
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQL and PHP time match
[OK] Active pollers found
[FAIL] Some dispatcher nodes have not checked in recently
Inactive Nodes:
icnmstemp (this node)
[OK] Locks are functional
[OK] Python wrapper cron entry is not present
[OK] Redis is functional
[OK] rrdtool version ok
[OK] Connected to rrdcached

The dispatcher node is turned off on purpose from polling. 2 others are polling and info is populating just fine.

We’d need discovery and poller debug out. Ideally from two runs of each if it’s flapping

Thanks for the advice on the debug. It is not letting me upload the debug files from LibreNMS. They are pretty long, so I also cannot post them. Is there a recommended way to post debugs? Thanks in advance.

Just any pastebin service. Make sure they don’t contain any sensitive data (maybe also restrict the debug commands to -m bgp-peers to reduce the noise.

Thanks for the advice, here are some of those. Not sure if they are what you are looking for

Thanks in advice

lnms device:poll ... -m bgp-peers -v
LibreNMS config cache cleared and config reloaded.
Starting polling run:

Hostname: ... (nokia)
ID: 768
OS: timos
IP: ...

Load poller module bgp-peers

Checking BGP peer ...
Checking BGP peer ...
Checking BGP peer ...

SNMP: [3/1.04s] MySQL: [10/0.19s] RRD: [8/0.00s]
Runtime for poller module ‘bgp-peers’: 1.0479 seconds with 618272 bytes

Unload poller module bgp-peers

Enabled graphs (5): uptime poller_perf availability ping_perf poller_modules_perf

Polled ... (768) in 2.609 seconds <<<

Start Device Groups

End Device Groups (0.0408s)

Start Alerts

Rule #14 (State Sensor Critical):
Status: NOCHG
Rule #50 (Interface Errors):
Status: NOCHG
Rule #68 (Core Link Down):
Status: NOCHG
Rule #70 (LTE Link down):
Status: NOCHG
Rule #59 (Ping Latency):
Status: NOCHG
Rule #3 (Device rebooted):
Status: NOCHG
Rule #51 (Wireless Sensor under limit):
Status: NOCHG
Rule #55 (Devices up/down):
Status: NOCHG

End Alerts (0.0144s)

SNMP [3/1.04s]: Snmpget[1/0.03s] Snmpwalk[2/1.01s]
SQL [39/0.54s]: Select[29/0.17s] Update[5/0.10s] Insert[3/0.11s] Delete[2/0.16s]
RRD [12/0.01s]: Other[6/0.00s] Update[6/0.00s]

lnms device:poll ... -m bgp-peers -v
LibreNMS config cache cleared and config reloaded.
Starting polling run:

Hostname: ... (nokia)
ID: 768
OS: timos
IP: ...

Load poller module bgp-peers

Checking BGP peer ...
Checking BGP peer ...
Checking BGP peer ...

SNMP: [3/1.04s] MySQL: [10/0.19s] RRD: [8/0.00s]
Runtime for poller module ‘bgp-peers’: 1.0479 seconds with 618272 bytes

Unload poller module bgp-peers

Enabled graphs (5): uptime poller_perf availability ping_perf poller_modules_perf

Polled ... (768) in 2.609 seconds <<<

Start Device Groups

End Device Groups (0.0408s)

Start Alerts

Rule #14 (State Sensor Critical):
Status: NOCHG
Rule #50 (Interface Errors):
Status: NOCHG
Rule #68 (Core Link Down):
Status: NOCHG
Rule #70 (LTE Link down):
Status: NOCHG
Rule #59 (Ping Latency):
Status: NOCHG
Rule #3 (Device rebooted):
Status: NOCHG
Rule #51 (Wireless Sensor under limit):
Status: NOCHG
Rule #55 (Devices up/down):
Status: NOCHG

End Alerts (0.0144s)

SNMP [3/1.04s]: Snmpget[1/0.03s] Snmpwalk[2/1.01s]
SQL [39/0.54s]: Select[29/0.17s] Update[5/0.10s] Insert[3/0.11s] Delete[2/0.16s]
RRD [12/0.01s]: Other[6/0.00s] Update[6/0.00s]

snmpbulkwalk -v2c -c <REDACTED_COMMUNITY> <REDACTED_IP> 1.3.6.1.2.1.15.3
iso.3.6.1.2.1.15.3.1.1.<IP_A> = IpAddress: <IP_A>
iso.3.6.1.2.1.15.3.1.1.<IP_B> = IpAddress: <IP_B>
iso.3.6.1.2.1.15.3.1.1.<IP_C> = IpAddress: <IP_C>
iso.3.6.1.2.1.15.3.1.2.<IP_A> = INTEGER: 6
iso.3.6.1.2.1.15.3.1.2.<IP_B> = INTEGER: 6
iso.3.6.1.2.1.15.3.1.2.<IP_C> = INTEGER: 6
iso.3.6.1.2.1.15.3.1.3.<IP_A> = INTEGER: 2
iso.3.6.1.2.1.15.3.1.3.<IP_B> = INTEGER: 2
iso.3.6.1.2.1.15.3.1.3.<IP_C> = INTEGER: 2
iso.3.6.1.2.1.15.3.1.4.<IP_A> = INTEGER: 4
iso.3.6.1.2.1.15.3.1.4.<IP_B> = INTEGER: 4
iso.3.6.1.2.1.15.3.1.4.<IP_C> = INTEGER: 4
iso.3.6.1.2.1.15.3.1.5.<IP_A> = IpAddress: <REDACTED_IP>
iso.3.6.1.2.1.15.3.1.5.<IP_B> = IpAddress: <REDACTED_IP>
iso.3.6.1.2.1.15.3.1.5.<IP_C> = IpAddress: <REDACTED_IP>
iso.3.6.1.2.1.15.3.1.6.<IP_A> = INTEGER: 49555
iso.3.6.1.2.1.15.3.1.6.<IP_B> = INTEGER: 179
iso.3.6.1.2.1.15.3.1.6.<IP_C> = INTEGER: 49542
iso.3.6.1.2.1.15.3.1.7.<IP_A> = IpAddress: <IP_A>
iso.3.6.1.2.1.15.3.1.7.<IP_B> = IpAddress: <IP_B>
iso.3.6.1.2.1.15.3.1.7.<IP_C> = IpAddress: <IP_C>
iso.3.6.1.2.1.15.3.1.8.<IP_A> = INTEGER: 179
iso.3.6.1.2.1.15.3.1.8.<IP_B> = INTEGER: 58324
iso.3.6.1.2.1.15.3.1.8.<IP_C> = INTEGER: 179
iso.3.6.1.2.1.15.3.1.9.<IP_A> = INTEGER: 65535
iso.3.6.1.2.1.15.3.1.9.<IP_B> = INTEGER: 65535
iso.3.6.1.2.1.15.3.1.9.<IP_C> = INTEGER: 65535
iso.3.6.1.2.1.15.3.1.10.<IP_A> = Counter32: 11
iso.3.6.1.2.1.15.3.1.10.<IP_B> = Counter32: 4
iso.3.6.1.2.1.15.3.1.10.<IP_C> = Counter32: 16
iso.3.6.1.2.1.15.3.1.11.<IP_A> = Counter32: 34
iso.3.6.1.2.1.15.3.1.11.<IP_B> = Counter32: 22
iso.3.6.1.2.1.15.3.1.11.<IP_C> = Counter32: 55
iso.3.6.1.2.1.15.3.1.12.<IP_A> = Counter32: 83590
iso.3.6.1.2.1.15.3.1.12.<IP_B> = Counter32: 83575
iso.3.6.1.2.1.15.3.1.12.<IP_C> = Counter32: 83590
iso.3.6.1.2.1.15.3.1.13.<IP_A> = Counter32: 83603
iso.3.6.1.2.1.15.3.1.13.<IP_B> = Counter32: 83592
iso.3.6.1.2.1.15.3.1.13.<IP_C> = Counter32: 83624
iso.3.6.1.2.1.15.3.1.14.<IP_A> = Hex-STRING: 00 00
iso.3.6.1.2.1.15.3.1.14.<IP_B> = Hex-STRING: 06 07
iso.3.6.1.2.1.15.3.1.14.<IP_C> = Hex-STRING: 00 00
iso.3.6.1.2.1.15.3.1.15.<IP_A> = Counter32: 1
iso.3.6.1.2.1.15.3.1.15.<IP_B> = Counter32: 1
iso.3.6.1.2.1.15.3.1.15.<IP_C> = Counter32: 1
iso.3.6.1.2.1.15.3.1.16.<IP_A> = Gauge32: 2506953
iso.3.6.1.2.1.15.3.1.16.<IP_B> = Gauge32: 2506965
iso.3.6.1.2.1.15.3.1.16.<IP_C> = Gauge32: 2506951
iso.3.6.1.2.1.15.3.1.17.<IP_A> = INTEGER: 120
iso.3.6.1.2.1.15.3.1.17.<IP_B> = INTEGER: 120
iso.3.6.1.2.1.15.3.1.17.<IP_C> = INTEGER: 120
iso.3.6.1.2.1.15.3.1.18.<IP_A> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.18.<IP_B> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.18.<IP_C> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.19.<IP_A> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.19.<IP_B> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.19.<IP_C> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.20.<IP_A> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.20.<IP_B> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.20.<IP_C> = INTEGER: 90
iso.3.6.1.2.1.15.3.1.21.<IP_A> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.21.<IP_B> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.21.<IP_C> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.23.<IP_A> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.23.<IP_B> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.23.<IP_C> = INTEGER: 30
iso.3.6.1.2.1.15.3.1.24.<IP_A> = Gauge32: 2490997
iso.3.6.1.2.1.15.3.1.24.<IP_B> = Gauge32: 2507084
iso.3.6.1.2.1.15.3.1.24.<IP_C> = Gauge32: 2483013

It looks like the main issue is that the BGP uptime is not increasing in Librenms. I am not sure when the issue started.

I have now removed the “Send event log Notices” from bgp-peers.inc.php to at least suppress the spam. I am not sure how to troubleshoot that.

You need to add debug to the output, at least -vv or -vvv

It looks like it is stuck at 2.8 seconds even though the BGP is up. Something is causing our Nokia Routers to not update the established time. I am not sure if this is the relevant spot on the config file to show. As the output is very long.

Checking BGP peer <REDACTED_IP_1>
array (
‘bgpPeerState’ => ‘established’,
‘bgpPeerAdminStatus’ => ‘receiveKeepAlive’,
‘bgpPeerInTotalMessages’ => 5,070,030,
‘bgpPeerOutTotalMessages’ => 5,069,750,
‘bgpPeerFsmEstablishedTime’ => 2.8,
)

BGP Session Flap logged:
“last error: Cease - Connection Collision Resolution”

Checking BGP peer <REDACTED_IP_2>
array (
‘bgpPeerState’ => ‘established’,
‘bgpPeerAdminStatus’ => ‘receiveKeepAlive’,
‘bgpPeerInTotalMessages’ => 5,066,000,
‘bgpPeerOutTotalMessages’ => 5,065,330,
‘bgpPeerFsmEstablishedTime’ => 2.8,
)

BGP Session Flap logged:
“last error: Cease - Connection Collision Resolution”

Checking BGP peer <REDACTED_IP_3>
array (
‘bgpPeerState’ => ‘established’,
‘bgpPeerAdminStatus’ => ‘receiveKeepAlive’,
‘bgpPeerInTotalMessages’ => 5,066,687,
‘bgpPeerOutTotalMessages’ => 5,072,142,
‘bgpPeerFsmEstablishedTime’ => 2.8,
)

BGP Session Flap logged:
“last error: Cease - Connection Collision Resolution”

We’re seeing the same thing, I think.

The poller does seem to retrieve the correct values but doesn’t use them. I’m not sure where the inferred “bgpPeerFsmEstablishedTime” value is coming from.

On ours the first BGP neighbor also gets clobbered.

SNMP['/usr/bin/snmpbulkwalk' '-Cr20' '-M' '/opt/librenms/mibs:/opt/librenms/mibs/nokia' '-m' 'SNMPv2-TC:SNMPv2-MIB:IF-MIB:IP-MIB:TCP-MIB:UDP-MIB:NET-SNMP-VACM-MIB' '-v2c' '-c' 'community' '-OQXUt' '-Pu' '-Ob' 'udp:nokia-router:161' 'TIMETRA-BGP-MIB::tBgpPeerNgOperTable']
...
TIMETRA-BGP-MIB::tBgpPeerNgOperEntry.177.[encodedIPv6address].1 = 1026
TIMETRA-BGP-MIB::tBgpPeerNgOperEntry.177.[encodedIPv6address].2 = 1037


Checking BGP peer 2600::1 Checking BGP peer 2600::2 array (
  'bgpPeerState' => 'established',
  'bgpPeerAdminStatus' => 'receiveOpen',
  'bgpPeerInTotalMessages' => 1349,
  'bgpPeerOutTotalMessages' => 3259,
  'bgpPeerFsmEstablishedTime' => 1.94,
)

Is this supposed to be the “correct” values? because this shows an OID that is not defined in the MIB (likely MIB out of date)

Hey Tony,

supposed to be the “correct” values

Evidently, and SR-OS doesn’t seem to implement the standard BGP neighbors MIB.

likely MIB out of date

Looks that way.

TIMETRA-BGP-MIB shipping with LibreNMS:

timetraBgpMIBModule              MODULE-IDENTITY
    LAST-UPDATED "201701010000Z"

From my test router:

timetraBgpMIBModule              MODULE-IDENTITY
    LAST-UPDATED "202302150000Z"
tBgpPeerFsmEstablishedTime       OBJECT-TYPE
    SYNTAX      Gauge32
    UNITS       "seconds"
    MAX-ACCESS  read-only
    STATUS      current
    DESCRIPTION
        "This timer indicates how long (in
         seconds) this peer has been in the
         established state or how long
         since this peer was last in the
         established state.  It is set to zero when
         a new peer is configured or when the router is
         booted."
    REFERENCE
        "BGP4-MIB.bgpPeerConnectRetryInterval"
    ::= { tBgpPeerNgOperEntry 177 }

Should I attempt to generate a PR with updated MIBs?

Thanks.

Replace the mib locally first to see.