Email alert fallback

Is there a way to get email alerts to fall back to the default contact if a sysContact is not defined?

Pretty much none of the systems that we monitor have a sysContact email defined and that was just fine when the only group/individual that needed to be emailed was the group receiving the email for the default contact. Now however we’re being asked to on a handful of devices include users outside of that group so on those devices I set it to override the sysContact and in the General alert settings changed “Send alerts to default contact only” to OFF.

In testing this worked great as all the individuals/groups that were added received the email notification as expected but then any of the devices that did not have the sysContact overridden stopped sending email notifications and started showing the following error:

16

If I change “Send alerts to default contact only” back to ON then email alerts do go out on devices without sysContact defined but those that have been overridden go back to just the default. I realize that not everyone would want emails going out to the default contact if the sysContact isn’t defined but even if it could just be an option of “Use default contact when sysContact is not defined” with an ON/OFF toggle that would be helpful. Unfortunately I’m severely lacking in the programming arena and don’t have the faintest idea where/how to even start.

I don’t think this is exposed in the webui but try $config['alert']['default_if_none'] = true; in config.php

Gave that a shot and it didn’t seem to make a difference. Still getting the “You must provide at least one recipient email address” error when it goes to send the email.

New alert or an existing one? if it’s existing then force a new alert on another rule to test.

Brand new alert. I added the statement to the config last night and then toggled “Send alerts to default contact only” to OFF this morning. About 30 minutes later a new alert came in through Slack and I checked and had the email error.

Just to clarify though, that alert wasn’t active in the webui before you turned that setting off was it?

Correct, it occurred about 10 minutes after I made the setting change.

I did some more looking and testing and ran it with the test-alert.php with the -d flag. On a box with no sysContact I see this:

SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = "disable_notify" && device_id = '984'] 
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = 'override_sysContact_bool' AND device_id = '984'] 
SQL[SELECT user_id FROM devices_perms WHERE access_level >= 0 AND device_id = '984'] 
Issuing Alert-UID #55015/1: pagerduty => OKSQL[SELECT * FROM `vrf_lite_cisco` WHERE `device_id` = '984'] 
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'pagerduty\'','')] 
; ; mail => ERROR: You must provide at least one recipient email address.
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','error',NOW(),'5','Could not issue critical alert for rule \'Device down\' to transport \'mail\' Error: You must provide at least one recipient email address.','')] 
; ; ; ; ; ; ; ; ; ; slack => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'slack\'','')] 
; ; ; syslog => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'syslog\'','')] 
; ; ; ; 

And on a box that does have it filled in:

SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = "disable_notify" && device_id = '1000'] 
SQL[SELECT 1 FROM information_schema.COLUMNS WHERE TABLE_NAME = 'devices' && COLUMN_NAME = 'device_id'] 
SQL[SELECT * FROM devices WHERE (devices.device_id = '1000') && (((devices.status = 0  &&  ((devices.disabled = 0  &&  devices.ignore = 0)))) = "1"                   &&  devices.status_reason = "icmp" )] 
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = 'override_sysContact_bool' AND device_id = '1000'] 
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = 'override_sysContact_string' AND device_id = '1000'] 
SQL[SELECT user_id FROM devices_perms WHERE access_level >= 0 AND device_id = '1000'] 
SQL[SELECT hostname, sysName, sysDescr, hardware, version, location, purpose, notes, uptime FROM devices WHERE device_id = '1000'] 
SQL[SELECT `template`,`title`,`title_rec` FROM `alert_templates` JOIN `alert_template_map` ON `alert_template_map`.`alert_templates_id`=`alert_templates`.`id` WHERE `alert_template_map`.`alert_rule_id`='1'] 
Issuing Alert-UID #55014/1: pagerduty => OKSQL[SELECT * FROM `devices` WHERE `device_id` = '1000'] 
SQL[SELECT * FROM `vrf_lite_cisco` WHERE `device_id` = '1000'] 
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('1000','1000','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'pagerduty\'','')] 
; ; mail => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('1000','1000','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'mail\'','')] 
; ; ; ; ; ; ; ; ; ; slack => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('1000','1000','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'slack\'','')] 
; ; ; syslog => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('1000','1000','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'syslog\'','')] 
; ; ; ; 

It doesn’t look like it’s got enough in the debug to show what it’s trying to when it gets to this point in the alerts.inc.php:

# Send email to default contact if no other contact found
if ((count($tmp_contacts) == 0) && ($config['alert']['default_if_none']) && (!empty($config['alert']['default_mail']))) {
    $tmp_contacts[$config['alert']['default_mail']] = 'NOC';
}

Checking in the global settings I can see that default_if_none is set:

28

And default_mail as well (intentionally cut off after the @):

36

Ok I’ve also just tested this and it does work for me :slight_smile:

So default_if_none is actually a db config option, I don’t think changing it in config.php will work. Try running: UPDATE config SET config_value='true' WHERE config_name='alert.default_if_none';

Then see if it works. If it does, we just need to expose that config option.

Set it in the database, made sure I set it in the database :slightly_smiling_face: :

And gave it another shot and still got the same result. :frowning:

Show the rest of your mail config options in webui as I can’t replicate this.

Also, ensure you don’t have anything in config.php that might conflict.

The only thing related to alerting that I’ve got in the config.php is a macro to lump together a bunch of subnets and the default_if_none statement that doesn’t do anything right now :slight_smile:

Here’s the general alert settings:

And the email transport settings:

Same settings as me. I can’t replicate this :frowning:

Have you tried running the alerts capture for a device to see what email addresses it says it would send to?

OK so it just got more puzzling:

While the debug from command line give the no recipient error doing the debug capture in the webui does show the correct default contact. However in both cases no emails are sent. But when I toggle it to send to default contact only it sends just fine to the same default contact. :confounded:

And I did verify that it is showing in the DB:

I can’t replicate this :frowning:

Try editing includes/alerts.inc.php line 623 and add the following below it:

var_dump($obj);

Should then look like:

    $obj = DescribeAlert($alert);
    var_dump($obj);
    if (is_array($obj)) {

Re-run the test-alert script and post the output (in text format please)

Here’s the result with key stuff *'d out:

SQL[SELECT `device_id` FROM `devices` WHERE `hostname` = 'piatl0-itapp01.****.****'] 
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = "disable_notify" && device_id = '984'] 
SQL[SELECT 1 FROM information_schema.COLUMNS WHERE TABLE_NAME = 'devices' && COLUMN_NAME = 'device_id'] 
SQL[SELECT * FROM devices WHERE (devices.device_id = '984') && (((devices.status = 0  &&  ((devices.disabled = 0  &&  devices.ignore = 0)))) = "1"                   &&  devices.status_reason = "icmp" )] 
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = 'override_sysContact_bool' AND device_id = '984'] 
SQL[SELECT sysContact FROM devices WHERE device_id = '984'] 
SQL[SELECT user_id FROM devices_perms WHERE access_level >= 0 AND device_id = '984'] 
SQL[SELECT hostname, sysName, sysDescr, hardware, version, location, purpose, notes, uptime FROM devices WHERE device_id = '984'] 
SQL[SELECT `template`,`title`,`title_rec` FROM `alert_templates` JOIN `alert_template_map` ON `alert_template_map`.`alert_templates_id`=`alert_templates`.`id` WHERE `alert_template_map`.`alert_rule_id`='1'] 
array(23) {
  ["hostname"]=>
  string(22) "piatl0-itapp01.****.****"
  ["sysName"]=>
  string(22) "piatl0-itapp01.****.****"
  ["sysDescr"]=>
  string(129) "Hardware: Intel64 Family 6 Model 79 Stepping 1 AT/AT COMPATIBLE - Software: Windows Version 6.3 (Build 14393 Multiprocessor Free)"
  ["hardware"]=>
  string(9) "Intel x64"
  ["version"]=>
  string(20) "Server 2016 (NT 6.3)"
  ["location"]=>
  string(34) ".****.****"
  ["uptime"]=>
  string(6) "898392"
  ["uptime_short"]=>
  string(14) "10d 9h 33m 12s"
  ["uptime_long"]=>
  string(19) "10 days, 9h 33m 12s"
  ["description"]=>
  NULL
  ["notes"]=>
  NULL
  ["device_id"]=>
  string(3) "984"
  ["template"]=>
  string(533) "{if %transport == mail}<b>%title</b><br>
<b>Severity:</b> %severity<br>
{if %state == 0}<b>Time elapsed:</b> %elapsed<br>{/if}
<b>Timestamp:</b> %timestamp<br>
<b>Rule:</b> {if %name}%name{else}%rule{/if}<br>
{if %faults}<b>Faults:</b> 
%hostname is {if %state == 1}down{else}up{/if}<br>{/if}
<b>Alert sent to:</b> {foreach %contacts}%value <%key> {/foreach}
{/if}
{if %transport == slack}
{if %state == 0}Time elapsed: %elapsed{/if}
Timestamp: %timestamp
{if %faults}Faults: %hostname is {if %state == 1}down{else}up{/if}{/if}
{/if}"
  ["title"]=>
  string(35) "Host piatl0-itapp01.****.**** is Down"
  ["faults"]=>
  array(1) {
    [1]=>
    array(48) {
      ["device_id"]=>
      string(3) "984"
      ["hostname"]=>
      string(22) "piatl0-itapp01.****.****"
      ["sysName"]=>
      string(22) "piatl0-itapp01.****.****"
      ["ip"]=>
      string(10) "10.****"
      ["community"]=>
      string(5) "****"
      ["authlevel"]=>
      NULL
      ["authname"]=>
      NULL
      ["authpass"]=>
      NULL
      ["authalgo"]=>
      NULL
      ["cryptopass"]=>
      NULL
      ["cryptoalgo"]=>
      NULL
      ["snmpver"]=>
      string(3) "v2c"
      ["port"]=>
      string(3) "161"
      ["transport"]=>
      string(3) "udp"
      ["timeout"]=>
      NULL
      ["retries"]=>
      NULL
      ["snmp_disable"]=>
      string(1) "0"
      ["bgpLocalAs"]=>
      NULL
      ["sysObjectID"]=>
      string(25) "enterprises.311.1.1.3.1.2"
      ["sysDescr"]=>
      string(129) "Hardware: Intel64 Family 6 Model 79 Stepping 1 AT/AT COMPATIBLE - Software: Windows Version 6.3 (Build 14393 Multiprocessor Free)"
      ["sysContact"]=>
      string(19) "Systems Engineering"
      ["version"]=>
      string(20) "Server 2016 (NT 6.3)"
      ["hardware"]=>
      string(9) "Intel x64"
      ["features"]=>
      string(14) "Multiprocessor"
      ["location"]=>
      string(34) "****.****"
      ["os"]=>
      string(7) "windows"
      ["status"]=>
      string(1) "0"
      ["status_reason"]=>
      string(4) "icmp"
      ["ignore"]=>
      string(1) "0"
      ["disabled"]=>
      string(1) "0"
      ["uptime"]=>
      string(6) "898392"
      ["agent_uptime"]=>
      string(1) "0"
      ["last_polled"]=>
      string(19) "2017-11-22 07:47:06"
      ["last_poll_attempted"]=>
      NULL
      ["last_polled_timetaken"]=>
      string(4) "7.66"
      ["last_discovered_timetaken"]=>
      string(4) "9.74"
      ["last_discovered"]=>
      string(19) "2017-11-22 06:59:59"
      ["last_ping"]=>
      string(19) "2017-11-22 07:47:06"
      ["last_ping_timetaken"]=>
      string(5) "27.20"
      ["purpose"]=>
      NULL
      ["type"]=>
      string(6) "server"
      ["serial"]=>
      NULL
      ["icon"]=>
      string(11) "windows.svg"
      ["poller_group"]=>
      string(1) "0"
      ["override_sysLocation"]=>
      string(1) "0"
      ["notes"]=>
      NULL
      ["port_association_mode"]=>
      string(1) "1"
      ["string"]=>
      string(185) "sysObjectID => enterprises.311.1.1.3.1.2; sysDescr => Hardware: Intel64 Family 6 Model 79 Stepping 1 AT/AT COMPATIBLE - Software: Windows Version 6.3 (Build 14393 Multiprocessor Free); "
    }
  }
  ["elapsed"]=>
  string(6) "4m 13s"
  ["uid"]=>
  string(5) "56053"
  ["severity"]=>
  string(8) "critical"
  ["rule"]=>
  string(78) "%macros.device_down = "1"                  && %devices.status_reason = "icmp" "
  ["name"]=>
  string(11) "Device down"
  ["timestamp"]=>
  string(19) "2017-11-22 07:49:40"
  ["contacts"]=>
  array(1) {
    ["Systems Engineering"]=>
    string(3) "NOC"
  }
  ["state"]=>
  string(1) "1"
}
Issuing Alert-UID #56053/1: pagerduty => OKSQL[SELECT * FROM `devices` WHERE `device_id` = '984'] 
SQL[SELECT * FROM `vrf_lite_cisco` WHERE `device_id` = '984'] 
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'pagerduty\'','')] 
; ; mail => ERROR: You must provide at least one recipient email address.
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','error',NOW(),'5','Could not issue critical alert for rule \'Device down\' to transport \'mail\' Error: You must provide at least one recipient email address.','')] 
; ; ; ; ; ; ; ; ; ; slack => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'slack\'','')] 
; ; ; syslog => OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('984','984','NULL','alert',NOW(),'1','Issued critical alert for rule \'Device down\' to transport \'syslog\'','')] 
; ; ; ; 

And seeing what it’s spitting out I now know why and it’s something that predates me. The Windows servers we have have SNMP set like this:

31

So the sysContact is being set “correctly” however since it’s not set to an email address of course it fails to send and since it’s not truly empty it doesn’t fall back to using the default.

So it looks I’ve got a lot of work ahead of me to fix someone else’s “mistake” and have sent you on a wild goose chase looking for an error that didn’t exist. However I am very grateful for the assist and without the extra line for debugging I’d probably never have found it. :fist_right::fist_left:

1 Like

We can probably validate the email addresses better, create an issue on Github

Submitted new issue #7814