Only alert recovery emails are being sent for monitored processes

Alert emails aren’t being sent, but recovery emails are. Alerts were working two days ago. We’re on daily release.

  • Steps to reproduce an issue.
  1. Stop the sshd daemon being monitored with the librenms (check_mk) agent on the remote device.
  2. Poll the device: ./poller.php -h myhost -r -d -m unix-agent
  3. Test the alert for the ssh process monitoring rule on that host (no email is sent): $ ./scripts/test-alert.php -r14 -h myhost -d
    SQL[SELECT device_id FROM devices WHERE hostname = ‘spacewalk’]
    SQL[SELECT alert_log.id,alert_log.rule_id,alert_log.device_id,alert_log.state,alert_log.details,alert_log.time_logged,alert_rules.rule,alert_rules.severity,alert_rules.extra,alert_rules.name FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = ‘117’ && alert_log.rule_id = ‘14’ && alert_rules.disabled = 0 ORDER BY alert_log.id DESC LIMIT 1]
    SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = “disable_notify” && device_id = ‘117’]
    SQL[]
    SQL[SELECT hostname, sysName, sysDescr, hardware, version, location, purpose, notes, uptime FROM devices WHERE device_id = ‘117’]
    SQL[SELECT template,title,title_rec FROM alert_templates JOIN alert_template_map ON alert_template_map.alert_templates_id=alert_templates.id WHERE alert_template_map.alert_rule_id=‘14’]
    SQL[SELECT template,title,title_rec FROM alert_templates WHERE name=‘Default Alert Template’]
    Issuing Alert-UID #6131/1: ; ; mail => ERROR: You must provide at least one recipient email address.
    SQL[SELECT * FROM devices WHERE device_id = ‘117’]
    SQL[SELECT * FROM devices_attribs WHERE device_id = ‘117’]
    SQL[SELECT * FROM vrf_lite_cisco WHERE device_id = ‘117’]
    SQL[INSERT INTO eventlog (host,device_id,reference,type,datetime,severity,message,username) VALUES (‘117’,‘117’,‘NULL’,‘error’,NOW(),‘5’,‘Could not issue critical alert for rule ‘ssh daemon is not running’ to transport ‘mail’ Error: You must provide at least one recipient email address.’,’’)]
    ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
  4. Poll the device again (recovery email is sent)
    $ ./scripts/test-alert.php -r14 -h myhost -d
    SQL[SELECT device_id FROM devices WHERE hostname = ‘spacewalk’]
    SQL[SELECT alert_log.id,alert_log.rule_id,alert_log.device_id,alert_log.state,alert_log.details,alert_log.time_logged,alert_rules.rule,alert_rules.severity,alert_rules.extra,alert_rules.name FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = ‘117’ && alert_log.rule_id = ‘14’ && alert_rules.disabled = 0 ORDER BY alert_log.id DESC LIMIT 1]
    SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = “disable_notify” && device_id = ‘117’]
    SQL[]
    SQL[SELECT hostname, sysName, sysDescr, hardware, version, location, purpose, notes, uptime FROM devices WHERE device_id = ‘117’]
    SQL[SELECT template,title,title_rec FROM alert_templates JOIN alert_template_map ON alert_template_map.alert_templates_id=alert_templates.id WHERE alert_template_map.alert_rule_id=‘14’]
    SQL[SELECT template,title,title_rec FROM alert_templates WHERE name=‘Default Alert Template’]
    SQL[SELECT alert_log.id,alert_log.time_logged,alert_log.details FROM alert_log WHERE alert_log.state != 2 && alert_log.state != 0 && alert_log.rule_id = ‘14’ && alert_log.device_id = ‘117’ && alert_log.id < ‘6139’ ORDER BY id DESC LIMIT 1]
    Issuing Alert-UID #6139/0: ; ; mail => OKSQL[SELECT * FROM devices WHERE device_id = ‘117’]
    SQL[SELECT * FROM devices_attribs WHERE device_id = ‘117’]
    SQL[SELECT * FROM vrf_lite_cisco WHERE device_id = ‘117’]
    SQL[INSERT INTO eventlog (host,device_id,reference,type,datetime,severity,message,username) VALUES (‘117’,‘117’,‘NULL’,‘alert’,NOW(),‘1’,‘Issued recovery for rule ‘ssh daemon is not running’ to transport ‘mail’’,’’)]
    ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
  • The output of ./validate.php
    $ ./validate.php -s

    Component Version
    LibreNMS 1.37-83-g607a7f8
    DB Schema 246
    PHP 7.0.28
    MySQL 5.5.56-MariaDB
    RRDTool 1.4.8
    SNMP NET-SNMP 5.7.2

    ====================================

    [OK] Composer Version: 1.6.3
    [OK] Dependencies up-to-date.
    Checking configuration: OK
    Checking database: OK
    [OK] Database connection successful
    [OK] Database schema correct
    Checking disk: OK
    Checking php: OK
    Checking poller: WARN
    [WARN] Some devices have not been polled in the last 5 minutes. You may have performance issues.
    [FIX] Check your poll log and see: http://docs.librenms.org/Support/Performance/
    Devices:
    // CUT THESE OUT - These devices are actually down
    Checking programs: OK
    Checking rrd: OK
    Checking updates: OK
    Checking user: OK

LibreNMS configuration settings in database:
MariaDB [librenms]> select config_id,config_name,config_value from config where config_group = ‘alerting’ and config_name like ‘%mail%’ or config_name like ‘%default%’;
±----------±---------------------------±----------------------------+
| config_id | config_name | config_value |
±----------±---------------------------±----------------------------+
| 452 | alert.default_only | false |
| 453 | alert.default_mail | foo@bar |
| 458 | email_backend | smtp |
| 462 | alert.transports.mail | true |
| 465 | email_from | foo@bar |
| 466 | email_user | LibreNMS |
| 467 | email_sendmail_path | /usr/sbin/sendmail |
| 468 | email_smtp_host | a.b.c.d |
| 469 | email_smtp_port | 25 |
| 470 | email_smtp_timeout | 10 |
| 471 | email_smtp_secure | |
| 472 | email_smtp_auth | false |
| 473 | email_smtp_username | NULL |
| 474 | email_smtp_password | NULL |
| 739 | oxidized.default_group | |
| 746 | email_html | true |
| 761 | webui.default_dashboard_id | 2 |
| 770 | email_auto_tls | true |
| 779 | alert.default_if_none | true |
| 802 | alert.default_copy | true |
±----------±---------------------------±----------------------------+
20 rows in set (0.00 sec)

If it’s an issue with the WebUI then please consider including the browser version you are using.

If you need to post any text longer than a few lines, please use a pastebin service such as https://p.libren.ms using non-expiring pastes.

Actual email addresses, IP address, and hostnames were replaced. We do receive the recovery emails at the “Default contact” email address configured.

I added a syslog alert transport, and that does receive alerts.

Looking through old posts, I think my issue looks similar to this other issue by @stefaned

-bash-4.2$ ./scripts/test-alert.php -r14 -h 117 -d
SQL[SELECT alert_log.id,alert_log.rule_id,alert_log.device_id,alert_log.state,alert_log.details,alert_log.time_logged,alert_rules.rule,alert_rules.severity,alert_rules.extra,alert_rules.name FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = '117' && alert_log.rule_id = '14' && alert_rules.disabled = 0 ORDER BY alert_log.id DESC LIMIT 1]
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = "disable_notify" && device_id = '117']
SQL[]
SQL[SELECT hostname, sysName, sysDescr, hardware, version, location, purpose, notes, uptime FROM devices WHERE device_id = '117']
SQL[SELECT `template`,`title`,`title_rec` FROM `alert_templates` JOIN `alert_template_map` ON `alert_template_map`.`alert_templates_id`=`alert_templates`.`id` WHERE `alert_template_map`.`alert_rule_id`='14']
SQL[SELECT `template`,`title`,`title_rec` FROM `alert_templates`  WHERE `name`='Default Alert Template']
SQL[SELECT alert_log.id,alert_log.time_logged,alert_log.details FROM alert_log WHERE alert_log.state != 2 && alert_log.state != 0 && alert_log.rule_id = '14' && alert_log.device_id = '117' && alert_log.id < '6213' ORDER BY id DESC LIMIT 1]
Issuing Alert-UID #6213/0: ; ; mail => ERROR: You must provide at least one recipient email address.
SQL[SELECT * FROM `devices` WHERE `device_id` = '117']
SQL[SELECT * FROM devices_attribs WHERE `device_id` = '117']
SQL[SELECT * FROM `vrf_lite_cisco` WHERE `device_id` = '117']
SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('117','117','NULL','error',NOW(),'5','Could not issue recovery for rule \'ssh daemon is not running\' to transport \'mail\' Error: You must provide at least one recipient email address.','')]
; ; ; ; ; ; ; ; ; ; ; ; syslog => SQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('117','117','NULL','poller',NOW(),'5','Syslog facility is not an integer: ','')]
OKSQL[INSERT INTO `eventlog` (`host`,`device_id`,`reference`,`type`,`datetime`,`severity`,`message`,`username`)  VALUES ('117','117','NULL','alert',NOW(),'1','Issued recovery for rule \'ssh daemon is not running\' to transport \'syslog\'','')]

I’m having the exact same issue on 2 different LNMS servers. I have a server at home and another at work. Both are sending recovery messages for high CPU usage, but I never receive an email when the devices trigger the alert. The logs all show: “Error: You must provide at least one recipient email address,” but then the recovery email is sent. When I run an alert capture, it appears ok and says at the end:

Found 2 contacts to send alerts to.
xxxx[email protected]
xxxxx[email protected]

Found 1 transports to send alerts to.
Transport: mail

I wanted to add to the conversation to show that there’s more than 1 person affected.

1 Like

I’m assuming the developers are reading this. Here’s 2 snippets from the alerts.log file in debug mode showing a trigger that fails and a recovery that succeeds:
https://p.libren.ms/view/de3bce56

post screen shot of your alert rule and Email config.

Thank you for looking at it.

Need the alert rule also

This could be an old bug.

If it continues to happen for the same device let us know.

Thanks, @Aaron_Q

@Kevin_Krumm @laf I’m still having this issue too, on the same device . Here’s my alert rule and email config:

Oops. Here’s my rule config:

BTW, @laf This happens on all of the devices across 2 librenms servers.

If you are using LDAP or AD auth then you need to have a bind user setup.

If you use mysql auth I can’t replicate this.

1 Like

I’m using RADIUS auth, if it makes a difference?

Sorry for my delay. I didn’t get an email that you replied (ironically). I’m using mysql on both servers. I also just noticed there’s another thread where someone is having trouble with port utilization triggers. I’m also having that too. The recovery emails always arrive, but it never sends an alert email.

@laf Can you tell me where librenms retrieves the email addresses and calls phpmailer? I’d like to run the mysql commands manually and see the mysql response.

Both of my servers updated yesterday (1.39-27-ge898075 - Tue May 08 2018 19:37:06 GMT-0600) and now the alerts are working properly!