Alert rule for HA devices, need some help with rule

I am running the latest version of librenms, no issues with librenms but I am looking for some guidance on how to proceed with creating this Alert Rule.

Question for those that don’t want to read the full explanation- I’m trying to see if it is possible to monitor a port link for 2.5 Gbps and alert me when that link is no longer detecting port speed of 2.5 Gbps.

I am monitoring many devices over VPN (this is important, more on that below) and I’m having an issue with monitoring two devices in HA mode. When the devices are in HA mode, the secondary/backup device does NOT communicate over the VPN, the devices talk to e/o on a physical connected interface, lets call this interface 10 for now. Per their tech support, they recommend a direct connection using a crossover cable. This has been in place for years and it works well. You can confirm via the primary device (or active device) that the secondary or (backup device) is present.

My issue is that sometimes the backup device becomes unresponsive after a power cycle, which can happen if the site loses power and UPS batteries drain or a firmware is pushed to the devices and sometimes one of the devices get ‘stuck’ and require manual intervention to be brought online. This is a firmware/issue with the vendor that is actively being worked on.

In the meantime, I’m trying to get librenms to alert me when one of the HA devices hangs up/loses connectivity. Since they are connected with an ethernet crossover cable, when everything is working fine the link speed is 2.5 Gbps on this particular model, I can see it in librenms under the ports section for the device. If the device is unplugged (from the other device) the 2.5 Gbps goes away and not listed under librenms web GUI on the ports page.

I can’t figure out how to create a rule to monitor interface 10 and alert me when it no longer sees a 2.5 Gbps active on the interface. The units have their own respective IP addresses, but as stated above, when the backup/secondary unit is not active it is not reachable over the VPN. The two units share data over interface 10 to determine unit status, which unit has connectivity between their own interfaces, etc…

Interface Alias and/or Port ID don’t work/won’t and I’m not sure why. The interface Alias macro I use on network switches works fine, but it doesn’t seem to work for these HA devices (sonicwalls).

When the HA device fails, none of the ports are active on the HA device, only a local management port is active. Meaning, an alert rule monitoring the up/down status of that interface should work, I’m just not sure how to create that alert.

I do have an alert configured for ‘macros.port = 1 AND ports.ifSpeed = “2.5 Gpbs”’ but that is either not the correct way to configure it or something else is not working because the port is down and I haven’t received an alert and the alert rule is not red (which confirms it isn’t a transport mis-configuration).

Thanks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.