Alert only when syslog message received multiple times

In the constant effort to keep your alert to noise ratio workable, here’s a way to alert syslog entries but only care if there is more than some number of messages in the time window. As a comparison in the graylog world, this can be done with grace periods.

Use case examples:

  • login failures - one error may be OK, but several in a few minutes may need more investigation
  • MAC flaps - transient wireless roaming > 2 entries may be a bigger issue, else ignore

This topic has come up a few times and it’s always eluded me, so I’ve tired again and here it is. There will be a more efficient way to avoid having to run the query twice, but I can’t find workable shortcut as yet, so this will have to do as a starting point - feel free to improve it!

Standard Query Explanation

Assuming you have an alert rule built via the UI such as:

The Advanced tab will show the SQL query as:

SELECT * FROM devices,syslog WHERE (devices.device_id = ? AND devices.device_id = syslog.device_id) AND syslog.program = "%SW_MATM-4-MACFLAP_NOTIF" AND syslog.timestamp >= (DATE_SUB(NOW(),INTERVAL 5 MINUTE))

Reformatting this for clarity:

SELECT * 
FROM 
  devices,syslog 
/*-- ALERT CONDITIONS---*/
WHERE
  (devices.device_id = ? AND devices.device_id = syslog.device_id) 
    AND syslog.program = "%SW_MATM-4-MACFLAP_NOTIF" 
    AND syslog.timestamp >= (DATE_SUB(NOW(),INTERVAL 5 MINUTE))
/*----------------------*/

This query will return multiple rows if they exist. If we replace the ? with a valid device ID and put a ; on the end, we can test this query on the database CLI.

The ALERT CONDITIONS comment block marks the relevant parts we need to duplicate in the final query.

Counting results as condition

If we only want to report if there is more than one row returned - we can use the alert conditions in a subselect, check the return count, and use this determination as a condition on the main query to suppress output.

The general format is:

SELECT * 
FROM devices,syslog 

<verbatim alert conditions>

    AND (
      SELECT count(*)
      FROM devices,syslog 

      <verbatim alert conditions>

      GROUP BY devices.device_id 
      HAVING count(*) > 1
    ) IS NOT NULL

In the bottom subquery, we are only outputting a result if there are more results than our condition HAVING count(*) > 1. If there is less than or equal to 1, the result is NULL and the main query WHERE clause conditions are not met, meaning no output.

Using the existing example, the full query would be:

SELECT * 
FROM devices,syslog 
/*-- ALERT CONDITIONS---*/
WHERE 
  (devices.device_id = ? AND devices.device_id = syslog.device_id)
    AND syslog.program = "%SW_MATM-4-MACFLAP_NOTIF" 
    AND syslog.timestamp >= (DATE_SUB(NOW(),INTERVAL 5 MINUTE))
/*----------------------*/
    AND (
      SELECT count(*)
      FROM devices,syslog 
/*-- ALERT CONDITIONS---*/
      WHERE 
        (devices.device_id = ? AND devices.device_id = syslog.device_id) 
          AND syslog.program = "%SW_MATM-4-MACFLAP_NOTIF"  
          AND syslog.timestamp >= (DATE_SUB(NOW(),INTERVAL 5 MINUTE)) 
/*----------------------*/
      GROUP BY devices.device_id 
      HAVING count(*) > 1
    ) IS NOT NULL

If we replace the ? with a valid device ID and put a ; on the end, we can test this query on the database CLI.

To test fully, increase the INTERVAL to capture more results, then change the HAVING > x number to match the number of rows returned to ensure the query suppresses output.

Alert template ideas

If the results going to your transports are too verbose, another way to improve readability may be to filter the output in the alert template:

@if ($alert->faults)
Fault count: {{ count($alert->faults) }}
@foreach ($alert->faults as $key => $value)
  @if ($loop->first)
  {{ $key }}: {{ $value['msg'] }}
  @endif
  @if ($loop->last)
  {{ $key }}: {{ $value['msg'] }}
  @endif
@endforeach
@endif

This will print the first and last log entry from the set of results, and put the count at the top to give some context - in slack, it looks like this:

1 Like

Love it. This is what the community is all about. Thanks @rhinoau :slight_smile: