False "got worse" notification

Hello,

I have follow up for issue:

We have changed our workflow for notifcations with ports that I thought will minimilize these false “got worse” but they are still occuring.

We use rule with 15 minutes delay for notifications:
ports.ifOperStatus = “down” AND ports.ifOperStatus_prev = “up” AND macros.device_up = 1 AND ports.ifAdminStatus != “down”

  • it is used on switches where customers are connected and we inform them about outage of their port if it is longer period of time so we can avoid working with notifications when customer only restart their connected device
  • we use “Reset Port State” feature in switch settings for longer disconnected ports but NOC can´t use it during weekend

We have this in alert template:
@if ($alert->faults) Faults:

@foreach ($alert->faults as $key => $value)
#{{ $key }}:
Port: {{ $value[‘ifName’] }}
Port Name: {{ $value[‘ifAlias’] }}
Port Status: {{ $value[‘ifOperStatus’] }}
@endforeach
@endif

Problem is that delay is working correctly if fisrt port went DOWN+UP on switch within 15 minutes:
2024-04-23 21:55:43 Gi1/0/12 switch2 ifOperStatus: up → down System
2024-04-23 21:55:43 Gi1/0/12 switch2 ifDuplex: fullDuplex → unknown System
2024-04-23 22:00:52 Gi1/0/12 switch2 ifOperStatus: down → up System
2024-04-23 22:00:52 Gi1/0/12 switch2 ifDuplex: unknown → fullDuplex System

  • no notificatiuon was send

Here is example when there is one port DOWN longer and second port went DOWN/UP:

2024-04-27 12:25:55 Gi2/0/48 switch1 ifDuplex: fullDuplex → unknown System
2024-04-27 12:30:55 Gi2/0/48 switch1 ifOperStatus: up → down System
2024-04-27 12:30:55 Gi2/0/48 switch1 ifSpeed: 1 Gbps → 10 Mbps System
2024-04-27 12:46:02 alert switch1 Issued warning alert for rule ‘060 Port DOWN’ to transport ‘mail’ System
2024-04-27 12:46:02 alert switch1 Issued warning alert for rule ‘060 Port DOWN’ to transport ‘playsms’ System
- correct notification with port Gi2/0/48 in text of mail

2024-04-27 13:41:00 Gi2/0/15 switch1 ifOperStatus: up → down System
2024-04-27 13:41:00 Gi2/0/15 switch1 ifDuplex: fullDuplex → unknown System
2024-04-27 13:45:30 Gi2/0/15 switch1 ifOperStatus: down → up System
2024-04-27 13:45:30 Gi2/0/15 switch1 ifDuplex: unknown → fullDuplex System
2024-04-27 13:56:02 alert switch1 Issued got worse for rule ‘060 Port DOWN’ to transport ‘mail’ System
2024-04-27 13:56:02 alert switch1 Issued got worse for rule ‘060 Port DOWN’ to transport ‘playsms’ System
- false “got worse” notification and there was only port Gi2/0/48 in text of mail

2024-04-28 00:15:44 Gi2/0/32 switch1 ifOperStatus: up → down System
2024-04-28 00:15:44 Gi2/0/32 switch1 ifDuplex: fullDuplex → unknown System
2024-04-28 00:20:47 Gi2/0/32 switch1 ifOperStatus: down → up System
2024-04-28 00:20:47 Gi2/0/32 switch1 ifDuplex: unknown → fullDuplex System
2024-04-28 00:31:02 alert switch1 Issued got worse for rule ‘060 Port DOWN’ to transport ‘mail’ System
2024-04-28 00:31:02 alert switch1 Issued got worse for rule ‘060 Port DOWN’ to transport ‘playsms’ System
- false “got worse” notification and there was only port Gi2/0/48 in text of mail

2024-04-29 09:48:49 switch1 Port state history reset by admin admin
2024-04-29 09:51:02 alert switch1 Issued recovery for rule ‘060 Port DOWN’ to transport ‘mail’ System
2024-04-29 09:51:02 alert switch1 Issued recovery for rule ‘060 Port DOWN’ to transport ‘playsms’ System
- cleared by me

I looks like there is missing check if worse condition is still present before sending notification “got worse”. These false notification are realy confusing for our NOC collegues.

Can you look at this issue?

Thanks

Roman

I could see that that got worse condition is not granular, it only counts the results.
There are two ways around this, update the code to make sure it is the same results (not sure how feasible that is) or update your alert rules to work around the issue.