My alert templates

Customising the alert templates to get them looking just how I wanted took me a fair bit of time so I thought I’d share my alert templates and some of the alerts I use with them as they might be of use for others. :slight_smile:

I see there is a thread from 2017 like this already however it predates the change to the alert template syntax in 2018 (?) so I thought it was probably better to keep this separate.

My templates started life as some of the examples in the documentation but have diverged quite a bit as I’ve learnt more about the template syntax. I’ve actually managed to get my alerts down to using just three templates - Default, (used by no ICMP response, no SNMP response and Device Rebooted) Disk Space Alert and Services alerts.

They are all written and tested with HTML email alerts disabled, and are well formatted for both email and Pushover notifications. So here they are:

Default Alert Template:

Operating System: {{ LibreNMS\Config::getOsSetting($alert->os, 'text') }} {{ $alert->version }}
@if ($alert->hardware)
Hardware: {{ $alert->hardware }}
@endif
@if ($alert->location)
Location: {{ $alert->location }}
@endif
Device URL: http://librenms.yourdomain.local/device/device={{ $alert->hostname }}/


{{ $alert->title }}
Severity: {{ $alert->severity }}
@if ($alert->state == 0)
Time elapsed: {{ $alert->elapsed }}
@endif
Timestamp: {{ $alert->timestamp }}
Unique-ID: {{ $alert->uid }}

The Device URL link above gives an easy way to click a link in an email or push notification alert (I use Pushover) which will take you directly to the device page without resorting to formatting all alerts as HTML. So you should change the hostname to match your server or delete the entire line if you don’t want it. The other templates also have this line.

Low Disk Space Alert Template:

Device Name: {{ $alert->sysName }}
Operating System: {{ LibreNMS\Config::getOsSetting($alert->os, 'text') }} {{ $alert->version }}
@if ($alert->hardware)
Hardware: {{ $alert->hardware }}
@endif
@if ($alert->location)
Location: {{ $alert->location }}
@endif
Device URL: http://librenms.yourdomain.local/device/device={{ $alert->hostname }}/


{{ $alert->title }}
Severity: {{ $alert->severity }}
@if ($alert->state == 0)
Time elapsed: {{ $alert->elapsed }}
@endif
Timestamp: {{ $alert->timestamp }}
Unique-ID: {{ $alert->uid }}

@foreach ($alert->faults as $key => $value)
Drive: {{ $value['storage_descr'] }}

Disk Utilization: {{ $value['storage_perc'] }}%
Disk Size: {{ number_format($value['storage_size']/1073741824,2) }} GB
Disk Free: {{ number_format($value['storage_free']/1073741824,2) }} GB

@endforeach

Disk space is stored internally in bytes - not very useful, so I use number_format() to convert to Gigabytes (proper base 1024 ones :stuck_out_tongue: ) rounded to two decimal places. You can adjust the number of decimal places by changing the number after the comma.

Service Warning Alert Template:

Device Name: {{ $alert->sysName }}
Operating System: {{ LibreNMS\Config::getOsSetting($alert->os, 'text') }} {{ $alert->version }}
@if ($alert->hardware)
Hardware: {{ $alert->hardware }}
@endif
@if ($alert->location)
Location: {{ $alert->location }}
@endif
Device URL: http://librenms.yourdomain.local/device/device={{ $alert->hostname }}/


{{ $alert->title }}
Severity: {{ $alert->severity }}
Rule: @if ($alert->name)
{{ $alert->name }}
@else
{{ $alert->rule }}
@endif
@if ($alert->state == 0)
Time elapsed: {{ $alert->elapsed }}
@endif
Timestamp: {{ $alert->timestamp }}
Unique-ID: {{ $alert->uid }}

@if ($alert->faults)
Faults:

@foreach ($alert->faults as $key => $value)
#{{ $key }}: Service: {{ $value['service_ip'] }}
Description: {{ $value['service_desc'] }}
Message: {{ $value['service_message'] }}

@endforeach 
@endif

Not a lot to say about this one - it enumerates all the individual service faults in a nice readable format.

All of these templates use the same boilerplate Alert Title and Recovery Title:

Alert Title:

Alert for device {{ $alert->sysName }} - @if ($alert->name) {{ $alert->name }} @else {{ $alert->rule }} @endif

Recovery Title:

Device {{ $alert->sysName }} recovered from @if ($alert->name) {{ $alert->name }} @else {{ $alert->rule }} @endif

Hope these are of use to someone as while there is quite a bit of documentation for the alert system there are still some significant holes in the template documentation, getting the formatting tidy is a bit fiddly (especially when using @if) and there is a lot that can be done that isn’t really documented.

2 Likes

Most of the actual alerts I’m using are from the stock examples with minimal modification, (Device down due to no ICMP response, device down due to no SNMP response, Device rebooted, service warning and service critical) however I didn’t find a good example for low disk space that worked the way I wanted it to so I thought I’d share that too.

Here are my low disk space and critical disk space warning alerts:

The issue with some of the examples I’ve seen is that they only check percentage of the disk used - with very large disks on the order of terrabytes checking by percentage doesn’t make sense - you really want to check for a certain minimum amount of free space, conversely for relatively small disks (including small ancillary partitions or virtual filesystems on a Linux system) a minimum free space check that is appropriate for a large disk can potentially be unsatisfiable on a small partition because the partition is smaller than the minimum allowed space! :grinning:

So the obvious answer is to somehow combine the two checks, so that’s what I did above. As before, disks are measured in bytes in LibreNMS hence the unwieldy looking numbers in the size comparison.

In the disk space low alert I am checking if free space is < 10GB and the disk is >= 80% full. This means for a disk that is 50GB or larger in size it has to be below 10GB free to trigger an alert, while for a smaller disk it has to be greater than 80% full. (For example a 1GB partition would need less than 200MB free to trigger this alert)

Similarly the disk space critical alert will trigger if there is <1GB free and the disk is >=95% full. So for a 20GB or larger drive less than 1GB of free space would trigger this alert but for a drive smaller than 20GB it would need to be 95% or more full.

The numbers can be tweaked a bit to personal preference but I like the basic idea of ANDing the check for absolute free space with the percentage used, and it seems to cover all scenarios on the servers I’m monitoring including Windows, Mac and Linux.