Cisco Switches are showing down (Cisco 3560, 2960,9200) after few hours of adding

Heloo,

After adding Cisco switches like the 3560, 2960, and 9200, they show as ‘down’ on NMS after a few hours. I and adding as SNMP host. Although they respond to pings from our network computers, they don’t respond when pinged from the NMS console. Removing them from the NMS temporarily fixes the issue, and they start responding to pings again from NMS console.

Please let me know anyone face this issue.

Thank You

When asking for help and support, please provide as much information as possible. This should include:

  • Steps to reproduce an issue.
  • The output of ./validate.php

If it’s an issue with the WebUI then please consider including a screenshot and the browser version you are using.

If you are having troubles with discovery/polling include the pastebin output of:

./discovery.php -h HOSTNAME -d | ./pbin.sh
./poller.php -h HOSTNAME -r -f -d | ./pbin.sh

If you need to post any text longer than a few lines, please use a pastebin service such as https://p.libren.ms using non-expiring pastes.

If you do a discovery on the device while it’s down what errors does it present?

No error, it will say “device will be discovered”

Click on the three dots on the top right area, then capture, then click on Discovery and run that. Read the output there, it should show you why it cant communicate SNMP/ICMP or what ever with the devices.

image

LibreNMS Discovery

SQL[select migration from migrations order by id desc limit 1 0.86ms]

SQL[select count(*) as aggregate from migrations 0.81ms]

SQL[select version() 0.46ms]

===========================================

Component Version
LibreNMS 24.2.0-30-g8c4205c61 (2024-03-19T09:24:07+00:00)
DB Schema 2024_02_07_151845_custom_map_additions (290)
PHP 8.1.2-1ubuntu2.14
Python 3.10.12
Database MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
RRDTool 1.7.2
SNMP 5.9.1
===========================================
DEBUG!

Updating os_def.cache

SQL[SELECT * FROM devices WHERE disabled = 0 AND hostname LIKE ‘192.168.44.46’ ORDER BY device_id DESC 1.13ms]

192.168.44.46 36 iosxe
SQL[select * from devices where device_id = ? limit 1 [36] 0.78ms]

SQL[select * from devices_attribs where devices_attribs.device_id = ? and devices_attribs.device_id is not null [36] 0.64ms]

[FPING] ‘/usr/bin/fping’ ‘-e’ ‘-q’ ‘-c’ ‘3’ ‘-p’ ‘500’ ‘-t’ ‘500’ ‘-O’ ‘0’ ‘192.168.44.46’

response: xmt/rcv/%loss = 3/0/100%

/opt/librenms/discovery.php 192.168.44.46 2024-04-26 05:48:50 - 0 devices discovered in 2.052 secs

SNMP [0/0.00s]:

SQL [7/0.06s]: Select[7/0.06s]

RRD [0/0.00s]:

Override poller modules: unix-agent, core, os, availability, ipmi, sensors, processors, mempools, storage, netstats, hr-mib, ucd-mib, ipSystemStats, ports, xdsl, customoid, bgp-peers, junose-atm-vp, printer-supplies, ucd-diskio, vminfo, wifi, wireless, ospf, isis, cisco-ipsec-flow-monitor, cisco-remote-access-monitor, cisco-cef, slas, cisco-mac-accounting, cipsec-tunnels, cisco-ace-loadbalancer, cisco-ace-serverfarms, cisco-asa-firewall, cisco-voice, cisco-cbqos, cisco-otv, cisco-qfp, cisco-vpdn, nac, netscaler-vsvr, aruba-controller, entity-physical, entity-state, applications, stp, ntp, loadbalancers, mef, mpls

SQL[select migration from migrations order by id desc limit 1 0.83ms]

SQL[select count(*) as aggregate from migrations 0.62ms]

SQL[select version() 0.51ms]

===========================================

Component Version
LibreNMS 24.2.0-30-g8c4205c61 (2024-03-19T09:24:07+00:00)
DB Schema 2024_02_07_151845_custom_map_additions (290)
PHP 8.1.2-1ubuntu2.14
Python 3.10.12
Database MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
RRDTool 1.7.2
SNMP 5.9.1
===========================================

Updating os_def.cache

Starting polling run:

SQL[select device_id from devices where hostname = ? [“192.168.44.46”] 0.73ms]

SQL[select * from devices where device_id = ? limit 1 [36] 0.68ms]

Hostname: 192.168.44.46 (cisco)
ID: 36
OS: iosxe
IP: 192.168.44.46

Attempting to initialize OS: iosxe

OS initialized: LibreNMS\OS\Iosxe

SQL[select * from devices_attribs where devices_attribs.device_id = ? and devices_attribs.device_id is not null [36] 0.61ms]

[FPING] ‘/usr/bin/fping’ ‘-e’ ‘-q’ ‘-c’ ‘3’ ‘-p’ ‘500’ ‘-t’ ‘500’ ‘-O’ ‘0’ ‘192.168.44.46’

response: xmt/rcv/%loss = 3/0/100%

SQL[insert into device_perf (min, max, avg, xmt, rcv, loss, debug, device_id, timestamp) values (?, ?, ?, ?, ?, ?, ?, ?, ?) [0,0,0,3,0,100,“{"poller_name":"nms"}”,36,“2024-04-26 05:54:00”] 1.14ms]

SQL[update devices set last_ping = ? where device_id = ? [“2024-04-26 05:54:00”,36] 0.77ms]

SQL[select * from device_outages where device_outages.device_id = ? and device_outages.device_id is not null and up_again is null order by going_down desc limit 1 [36] 0.59ms]

Load poller module availability

Module enabled: Global + | OS | Device | Manual

SQL[select * from device_outages where device_outages.device_id = ? and device_outages.device_id is not null and up_again >= ? order by going_down asc [36,1714024440] 0.65ms]

SQL[select * from availability where (device_id = ? and duration = ?) limit 1 [36,86400] 0.46ms]

SQL[update availability set availability_perc = ? where availability_id = ? [0,141] 0.4ms]

1 day : 0%

SQL[select * from device_outages where device_outages.device_id = ? and device_outages.device_id is not null and up_again >= ? order by going_down asc [36,1713506040] 0.65ms]

SQL[select * from availability where (device_id = ? and duration = ?) limit 1 [36,604800] 0.57ms]

SQL[update availability set availability_perc = ? where availability_id = ? [0,142] 0.41ms]

1 week : 0%

SQL[select * from device_outages where device_outages.device_id = ? and device_outages.device_id is not null and up_again >= ? order by going_down asc [36,1711518840] 0.71ms]

SQL[select * from availability where (device_id = ? and duration = ?) limit 1 [36,2592000] 0.42ms]

SQL[update availability set availability_perc = ? where availability_id = ? [0,143] 0.3ms]

1 month : 0%

SQL[select * from device_outages where device_outages.device_id = ? and device_outages.device_id is not null and up_again >= ? order by going_down asc [36,1682574840] 0.56ms]

SQL[select * from availability where (device_id = ? and duration = ?) limit 1 [36,31536000] 0.51ms]

SQL[update availability set availability_perc = ? where availability_id = ? [0,144] 0.3ms]

1 year : 0%

SQL[delete from availability where availability.device_id = ? and availability.device_id is not null and availability_id not in (?, ?, ?, ?) [36,141,142,143,144] 0.36ms]

SNMP: [0/0.00s] MySQL: [17/0.09s]

Runtime for poller module ‘availability’: 0.0181 seconds with 188392 bytes

Unload poller module availability

SQL[select * from device_graphs where device_graphs.device_id = ? and device_graphs.device_id is not null [36] 0.55ms]

Enabled graphs (18): uptime poller_modules_perf availability netstat_icmp netstat_icmp_info netstat_ip netstat_ip_frag netstat_snmp netstat_snmp_pkt netstat_udp netstat_tcp ipsystemstats_ipv4 ipsystemstats_ipv4_frag ipsystemstats_ipv6 ipsystemstats_ipv6_frag cisco-voice-ip poller_perf ping_perf

Start Alerts

SQL[select device_groups.*, device_group_device.device_id as pivot_device_id, device_group_device.device_group_id as pivot_device_group_id from device_groups inner join device_group_device on device_groups.id = device_group_device.device_group_id where device_group_device.device_id = ? [36] 0.5ms]

SQL[select exists(select * from alert_schedule where (start <= ? and end >= ? and (recurring = ? or (recurring = ? and ((time(start) < time(end) and time(start) <= ? and time(end) > ?) or (time(start) > time(end) and (time(start) <= ? or time(end) > ?))) and (recurring_day like ? or recurring_day is null)))) and (exists (select * from devices inner join alert_schedulables on devices.device_id = alert_schedulables.alert_schedulable_id where alert_schedule.schedule_id = alert_schedulables.schedule_id and alert_schedulables.alert_schedulable_type = ? and alert_schedulables.alert_schedulable_id = ?))) as exists [“2024-04-26T05:54:00.667220Z”,“2024-04-26T05:54:00.667220Z”,0,1,“05:54:00”,“05:54:00”,“05:54:00”,“05:54:00”,“%”,“device”,36] 0.92ms]

SQL[select * from devices where devices.device_id = ? limit 1 [36] 0.55ms]

SQL[SELECT DISTINCT a.* FROM alert_rules a
LEFT JOIN alert_device_map d ON a.id=d.rule_id AND (a.invert_map = 0 OR a.invert_map = 1 AND d.device_id = ?)
LEFT JOIN alert_group_map g ON a.id=g.rule_id AND (a.invert_map = 0 OR a.invert_map = 1 AND g.group_id IN (SELECT DISTINCT device_group_id FROM device_group_device WHERE device_id = ?))
LEFT JOIN alert_location_map l ON a.id=l.rule_id AND (a.invert_map = 0 OR a.invert_map = 1 AND l.location_id IN (SELECT DISTINCT location_id FROM devices WHERE device_id = ?))
LEFT JOIN devices ld ON l.location_id=ld.location_id AND ld.device_id = ?
LEFT JOIN device_group_device dg ON g.group_id=dg.device_group_id AND dg.device_id = ?
WHERE a.disabled = 0 AND (
(d.device_id IS NULL AND g.group_id IS NULL AND l.location_id IS NULL)
OR (a.invert_map = 0 AND (d.device_id=? OR dg.device_id=? OR ld.device_id=?))
OR (a.invert_map = 1 AND (d.device_id != ? OR d.device_id IS NULL) AND (dg.device_id != ? OR dg.device_id IS NULL) AND (ld.device_id != ? OR ld.device_id IS NULL))
) [36,36,36,36,36,36,36,36,36,36,36] 2.73ms]

Rule #1 (Device Down! Due to no ICMP response.):

SQL[SELECT * FROM devices WHERE (devices.device_id = ?) AND (devices.status = 0 && (devices.disabled = 0 && devices.ignore = 0)) = 1 AND devices.status_reason = “icmp” [36] 0.62ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [1,36] 0.33ms]

Status: NOCHG

SQL[SELECT alert_log.id, alert_log.details FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = ? && alert_log.rule_id = ? && alert_rules.disabled = 0
ORDER BY alert_log.id DESC LIMIT 1 [36,1] 0.62ms]

SQL[UPDATE alert_log set details=? WHERE id = ? 0.86ms]

Rule #2 (Device Down (SNMP unreachable)):

SQL[SELECT * FROM devices WHERE (devices.device_id = ?) AND (devices.status = 0 && (devices.disabled = 0 && devices.ignore = 0)) = 1 AND devices.status_reason = “snmp” [36] 0.79ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [2,36] 0.39ms]

Status: NOCHG
Rule #3 (Device rebooted):

SQL[SELECT * FROM devices WHERE (devices.device_id = ?) AND devices.uptime < 300 AND (devices.disabled = 0 && devices.ignore = 0) = 1 [36] 0.48ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [3,36] 0.38ms]

Status: NOCHG
Rule #4 (Port status up/down):

SQL[SELECT * FROM devices,ports WHERE (devices.device_id = ? AND devices.device_id = ports.device_id) AND (ports.ifOperStatus = “down” && ports.ifAdminStatus != “down” && (ports.deleted = 0 && ports.ignore = 0 && ports.disabled = 0)) = 1 [36] 2.45ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [4,36] 0.41ms]

Status: NOCHG

SQL[SELECT alert_log.id, alert_log.details FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = ? && alert_log.rule_id = ? && alert_rules.disabled = 0
ORDER BY alert_log.id DESC LIMIT 1 [36,4] 0.61ms]

SQL[UPDATE alert_log set details=? WHERE id = ? 1.06ms]

Rule #5 (Ping Latency):

SQL[SELECT * FROM devices WHERE (devices.device_id = ?) AND devices.last_ping_timetaken > 10 [36] 0.84ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [5,36] 0.49ms]

Status: NOCHG

Rule #6 (Port utilisation over threshold):

SQL[SELECT * FROM devices,ports WHERE (devices.device_id = ? AND devices.device_id = ports.device_id) AND (((SELECT IF(ports.ifOutOctets_rate>ports.ifInOctets_rate, ports.ifOutOctets_rate, ports.ifInOctets_rate)*8) / ports.ifSpeed)*100) >= 80 AND (ports.ifOperStatus = “up” && ports.ifAdminStatus = “up” && (ports.deleted = 0 && ports.ignore = 0 && ports.disabled = 0)) = 1 [36] 1.89ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [6,36] 0.52ms]

Status: NOCHG

Rule #7 (Sensor over limit - Check Device Health Settings):

SQL[SELECT * FROM devices,sensors WHERE (devices.device_id = ? AND devices.device_id = sensors.device_id) AND sensors.sensor_current > sensors.sensor_limit AND sensors.sensor_alert = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1 [36] 1ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [7,36] 0.51ms]

Status: NOCHG

Rule #8 (Sensor under limit - Check Device Health Settings):

SQL[SELECT * FROM devices,sensors WHERE (devices.device_id = ? AND devices.device_id = sensors.device_id) AND sensors.sensor_current < sensors.sensor_limit_low AND sensors.sensor_alert = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1 [36] 1.01ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [8,36] 0.51ms]

Status: NOCHG

Rule #9 (Service up/down):

SQL[SELECT * FROM devices,services WHERE (devices.device_id = ? AND devices.device_id = services.device_id) AND services.service_status != 0 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1 [36] 0.91ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [9,36] 0.51ms]

Status: NOCHG

Rule #10 (Wireless Sensor over limit):

SQL[SELECT * FROM devices,wireless_sensors WHERE (devices.device_id = ? AND devices.device_id = wireless_sensors.device_id) AND wireless_sensors.sensor_current >= wireless_sensors.sensor_limit AND wireless_sensors.sensor_alert = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1 [36] 0.99ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [10,36] 0.51ms]

Status: NOCHG

Rule #11 (Wireless Sensor under limit):

SQL[SELECT * FROM devices,wireless_sensors WHERE (devices.device_id = ? AND devices.device_id = wireless_sensors.device_id) AND wireless_sensors.sensor_current <= wireless_sensors.sensor_limit_low AND wireless_sensors.sensor_alert = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1 [36] 0.98ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [11,36] 0.5ms]

Status: NOCHG

Rule #12 (State Sensor Critical):

SQL[SELECT * FROM devices,sensors,sensors_to_state_indexes,state_indexes,state_translations WHERE (devices.device_id = ? AND devices.device_id = sensors.device_id AND sensors.sensor_id = sensors_to_state_indexes.sensor_id AND sensors_to_state_indexes.state_index_id = state_indexes.state_index_id AND state_indexes.state_index_id = state_translations.state_index_id) AND (sensors.sensor_current = state_translations.state_value && state_translations.state_generic_value = 2) = 1 AND sensors.sensor_alert = 1 [36] 1.69ms]

SQL[SELECT state FROM alerts WHERE rule_id = ? AND device_id = ? ORDER BY id DESC LIMIT 1 [12,36] 0.52ms]

Status: NOCHG

End Alerts (0.0621s)

Start Device Groups

SQL[select * from device_groups 0.48ms]

SQL[select * from device_group_device where device_group_device.device_id = ? [36] 0.53ms]

Groups Added: Removed:

End Device Groups (0.0028s)

Device was down, unable to poll.

from switch to nms and nms to switch ping and reachability lost. but both devices are accessible from other network.

it is happening to multiple switches, regardless of ios version. few switches still working without any issue.

if i remove from nms it will start working after some times.