False CIsco "Stack Ring - Redundant" alert

DR3EVR8u8c · 12 May 2018 02:26

Hello,
I am wondering I may hit the bug. librenms has reported that one of cisco switch “Stack Ring - Redundant” failed. However, the switch is not stacked.

Step to reproduce:
./snmp-scan.php “the cisco switch ip”

validate.php output:

====================================
Component | Version
--------- | -------
LibreNMS  | 1.39
DB Schema | 249
PHP       | 7.0.27
MySQL     | 5.5.56-MariaDB
RRDTool   | 1.6.0
SNMP      | NET-SNMP 5.7.2
====================================

[OK]    Composer Version: 1.6.5
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database schema correct
[WARN]  Your local git contains modified files, this could prevent automatic updates.
        [FIX] You can fix this with ./scripts/github-remove
        Modified Files:
         html/includes/common/alerts.inc.php
         html/includes/table/alerts.inc.php

I have modified alerts.inc.php to add filter for delayed alerts. but the flase cisco alert is appeared before the modification and it is continually occurring after the modification.

Switch Model: WS-C2960S-48FPDL

Besides, turning off the alert in Heath setting does not help and I am still getting alert for hat.

I would appreciate for your help to fix the issues.
Thanks and regards,
Roger

DR3EVR8u8c · 13 May 2018 10:45

Hello, any help for this?
Also, I forgot to mention that there are 2 switches of the same model (WS-C2960S-48FPDL) in our network and both switches do not stack. However, librenms detects different sensors - one get cswRingRedundant and the 2nd one get other sensors. Please see the below query for your information:

MariaDB [librenms]> select device_id,sensor_type,sensor_descr,sensor_current,sensor_prev from sensors where device_id in (227,284);
+-----------+-----------------------------+-------------------------------+----------------+-------------+
| device_id | sensor_type                 | sensor_descr                  | sensor_current | sensor_prev |
+-----------+-----------------------------+-------------------------------+----------------+-------------+
|       227 | ciscoEnvMonTemperatureState | SW#1, Sensor#1, GREEN         |              1 |           0 |
|       227 | ciscoEnvMonFanState         | Switch#1, Fan#1               |              1 |           0 |
|       227 | ciscoEnvMonSupplyState      | Sw1, PS1 Normal, RPS NotExist |              1 |           0 |
|       227 | cisco                       | SW#1, Sensor#1, GREEN         |             34 |          33 |
|       284 | cswRingRedundant            | Stack Ring - Redundant        |              2 |           0 |
+-----------+-----------------------------+-------------------------------+----------------+-------------+
5 rows in set (0.00 sec)

below query shows these 2 devices are the same model and has the same specs. Note: I have excluded the Gigabitethernet interface for easy looking:

MariaDB [librenms]> select device_id,entPhysicalDescr,entPhysicalModelName from entPhysical where device_id in (227,284) and entPhysicalDescr not like "GigabitEthernet%";
+-----------+-----------------------------------------------+----------------------+
| device_id | entPhysicalDescr                              | entPhysicalModelName |
+-----------+-----------------------------------------------+----------------------+
|       227 | Catalyst C29xx Switch Stack                   |                      |
|       227 | WS-C2960S-48FPD-L                             | WS-C2960S-48FPD-L    |
|       227 | Switch 1 - WS-C2960S-48FPD-L - Fixed Module 0 |                      |
|       227 | Fan Container                                 |                      |
|       227 | Power Supply Container                        |                      |
|       227 | RPS Container                                 |                      |
|       227 | Switch 1 - WS-C2960S-48FPD-L - Power Supply 0 |                      |
|       227 | Switch 1 - WS-C2960S-48FPD-L - Fan 0          |                      |
|       227 | Switch 1 - WS-C2960S-48FPD-L - Sensor 0       |                      |
|       227 | StackPort1                                    |                      |
|       227 | TenGigabitEthernet Container                  |                      |
|       227 | TenGigabitEthernet Container                  |                      |
|       227 | FastEthernet0                                 |                      |
|       284 | Catalyst C29xx Switch Stack                   |                      |
|       284 | WS-C2960S-48FPD-L                             | WS-C2960S-48FPD-L    |
|       284 | Switch 1 - WS-C2960S-48FPD-L - Fixed Module 0 |                      |
|       284 | Fan Container                                 |                      |
|       284 | Power Supply Container                        |                      |
|       284 | RPS Container                                 |                      |
|       284 | Switch 1 - WS-C2960S-48FPD-L - Power Supply 0 |                      |
|       284 | Switch 1 - WS-C2960S-48FPD-L - Fan 0          |                      |
|       284 | Switch 1 - WS-C2960S-48FPD-L - Sensor 0       |                      |
|       284 | StackPort1                                    |                      |
|       284 | TenGigabitEthernet Container                  |                      |
|       284 | TenGigabitEthernet Container                  |                      |
|       284 | FastEthernet0                                 |                      |
+-----------+-----------------------------------------------+----------------------+
26 rows in set (0.01 sec)

Could you please let me know why these 2 devices discovered different sensors? I have tried to delete and re-scan the switch with the same result.
Thanks and regards,
Roger

PipoCanaja · 13 May 2018 11:27

Hello

May be a start for this issue :

Do they run the same IOS version ?
Do they have a stacking module installed or not ?

Basically, LibreNMS will show whatever SNMP answer is received from the switch, so the question you should ask is “why does both switches send me these values?”. Then, LibreNMS may need to be adapted to be overcome SNMP bad values, or be more clever in the way the SNMP values are interpreted.

Bye

DR3EVR8u8c · 13 May 2018 23:27

Hello @PipoCanaja ,
Thanks for your comment.

IOS version:

MariaDB [librenms]> select device_id,hardware,features,type,version from devices where device_id in (227,284);
+-----------+-------------------+-------------+---------+-------------+
| device_id | hardware          | features    | type    | version     |
+-----------+-------------------+-------------+---------+-------------+
|       227 | WS-C2960S-48FPD-L | UNIVERSALK9 | network | 12.2(55)SE5 |
|       284 | WS-C2960S-48FPD-L | UNIVERSALK9 | network | 12.2(55)SE7 |
+-----------+-------------------+-------------+---------+-------------+
2 rows in set (0.02 sec)

Both switches got stacking module installed but not connected.

there is minor different in versions. However, that shouldn’t be the reason why they are being treated differently.

It seems very weird to me. these 2 switches are almost identical but they have been detected different sensors. If I could find out what cause that, I maybe able to figure out how to fix the false alarm.
Regards,
Roger

PipoCanaja · 14 May 2018 07:45

I would, if possible, check if changing the IOS version changes the behaviour of the discovery of the device. If yes, then the analysis of the 2 different SNMP answers should put us on tracks.

DR3EVR8u8c · 15 May 2018 07:08

both switches are located in remote site and connected via vpn. We would not interrupt the switch unless there is some issues. Is there any other things to check?

PipoCanaja · 15 May 2018 08:27

I don’t think so.

LibreNMS is most probably doing exactly the same thing if it receives the same SNMP reply. So if the behaviour is different, most probably the SNMP reply is not the same …

DR3EVR8u8c · 16 May 2018 01:57

In that case, we may just disable the sensor for that device. beside disabling the sensor in health settings, what else do I need? it is still sending alerts when I do that.

DR3EVR8u8c · 24 May 2018 07:11

Hello,
Although it is not a big problem to stop us using LibreNMS, I am still thinking that maybe a flaw of the software and it is caused by discovery. as shown in the log of the device, it shows LibreNMS is regularly deleting and adding sensors back to the health checks, while the switch has no condition changed or rebooted. the status of those sensors in questioned are also changed without any reason. Please note that the switch is not stacked.

Sorry I don’t know how to export the log to text, I just screen dump the web interface.

I suspect that is related with the discovery-wrapper script which is running every 6 hours. For some reasons, LibreNMS discovered the sensors which are not existed, then deleted next discovery, then added again in next next discovery, and so on.

Not sure if anyone experience the same problem, but please help me to figure out why it is happened and how to fix. currently, it occurs on all my cisco C2690S standalone switchs, including both devices mentioned above.
Thanks and regards,
Roger

DR3EVR8u8c · 29 May 2018 00:05

Hello All,
Any ideas why the switch Stacking status changed regularly in LibreNMS without any actual changes on the device?
Thanks and regards

Kevin_Krumm · 29 May 2018 20:22

What versions cisco ios are you running? This could be a bug on that ios.

DR3EVR8u8c · 30 May 2018 07:21

Hi Kevin,
all of them are IOS 12.2(55) from SE5 - 8. they seems all behave the same. so, I am not sure if it is the bug of the Cisco. But I suspecting that maybe related with th estacking module is installed in the device.
Anyway, I have make workaround of it and the workaround seems working quite well so far.

Thanks,
Roger