Alerts not Transporting


#1

We have been having issues with Alerts being transported for some time now. The device is polled correctly, if it triggers a rule an alert appears under Alerts | Notifications. However, we don’t get anything transported.

We primarily use Email for transport. We have tested the transport using the test function and the email is delivered correctly.

If I run a debug on alerts for a device that is currently down I get the following error.

SQL Error! SQLSTATE[42000]: Syntax error or access violation: 1139 Got error ‘range out of order in character class at offset 7’ from regexp (SQL: SELECT * FROM devices,storage WHERE (devices.device_id = 209 AND devices.device_id = storage.device_id) AND storage.storage_descr REGEXP “’(’^[a-Z]:’)’” AND storage.storage_perc > 95 AND devices.type LIKE ‘%Server%’ AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1) (SQL: SELECT * FROM devices,storage WHERE (devices.device_id = 209 AND devices.device_id = storage.device_id) AND storage.storage_descr REGEXP “’(’^[a-Z]:’)’” AND storage.storage_perc > 95 AND devices.type LIKE ‘%Server%’ AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1)
/opt/librenms/html/includes/output/query.inc.php:50
/opt/librenms/html/ajax_output.php:35
SQL Error! SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ‘~ “^cisco.*State$” && sensors.sensor_alert = “1”) = 1 AND (devices.status = 1 &&’ at line 1 (SQL: SELECT * FROM devices,sensors WHERE (devices.device_id = 209 AND devices.device_id = sensors.device_id) AND (sensors.sensor_current != “1” && sensors.sensor_current != “5” && sensors.sensor_type ~ “^cisco.*State$” && sensors.sensor_alert = “1”) = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1) (SQL: SELECT * FROM devices,sensors WHERE (devices.device_id = 209 AND devices.device_id = sensors.device_id) AND (sensors.sensor_current != “1” && sensors.sensor_current != “5” && sensors.sensor_type ~ “^cisco.*State$” && sensors.sensor_alert = “1”) = 1 AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1)
/opt/librenms/html/includes/output/query.inc.php:50
/opt/librenms/html/ajax_output.php:35
Rule name: P3 - Warning - Global - Device Rebooted
Alert rule: devices.uptime < 300 AND macros.device = 1
Alert query: SELECT * FROM devices WHERE (devices.device_id = ?) AND devices.uptime < 300 AND (devices.disabled = 0 && devices.ignore = 0) = 1
Rule match: no match

Rule name: P3 - Warning - Network - Port Utilisation Over Threshold
Alert rule: macros.port_usage_perc >= 95 AND macros.port_up = 1 AND macros.port = 1
Alert query: SELECT * FROM devices,ports WHERE (devices.device_id = ? AND devices.device_id = ports.device_id) AND (((ports.ifInOctets_rate*8) / ports.ifSpeed)*100) >= 95 AND (ports.ifOperStatus = “up” && ports.ifAdminStatus = “up” && (ports.deleted = 0 && ports.ignore = 0 && ports.disabled = 0)) = 1 AND (ports.deleted = 0 && ports.ignore = 0 && ports.disabled = 0) = 1
Rule match: no match

The appliance is upto date. We had some problems updating the apliance when the transports changed.

[email protected]:/opt/librenms# ./validate.php

Component Version
LibreNMS 1.46-4-g2061d74
DB Schema 273
PHP 7.0.32-0ubuntu0.16.04.1
MySQL 10.0.36-MariaDB-0ubuntu0.16.04.1
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.0
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[FAIL] Some devices have not completed their polling run in 5 minutes, this will create gaps in data.


#2

Further to this. I have tested the Alerts | Alerts Transports | Test Transport for both my Mail and MS Teams transports. The test messages arrive.


There rules have the Transport group (Mail+Teams) associated with them.


#3
  1. this alert query is malformed, the regex should be `^[a-zA-Z]:’ You can’t go from a-Z…
  2. The cisco state query has a tilde in it, not sure how you managed that, but you need to fix that too

So, go fix your alert rules first.


#4

Hi Murant,

Thanks for getting back to me.

I fixed the Regex.
The cisco state query tilde is part of a macro, i’m not sure if we wrote it.

macros.device_component_down_cisco

Ive turned that rule off.

We no longer get errors when running an alert debug.

I have run a number of tests. The first was to get the reboot alert to trigger. I rebooted a linux host, which came back up. Nothing got transported.
I then edited the network processor over 95% rule so it triggers at 50% with no delay on transport.

This has been changed as follows;

The rule triggers and we can see ALerts in the notifications

However, nothing is transported to either mail or MS Teams.


#5

https://docs.librenms.org/Alerting/Testing/#transports


#6

Hi Laf,

Thanks for that. I have run a test agaisnt the same rule that I changed.

13 - Processor useage over 95% (50%)

indent preformatted text by 4 spaces

/opt/librenms# ./scripts/test-alert.php -r 13 -d -h ex3400
SQL[SELECT device_id FROM devices WHERE hostname = ? [“ex3400”] 0.27ms]
SQL[SELECT alerts.id, alerts.device_id, alerts.rule_id, alerts.state, alerts.note, alerts.info FROM alerts WHERE alerts.device_id = 228 && alerts.rule_id = 13 [] 0.31ms]
SQL[SELECT alert_log.id,alert_log.rule_id,alert_log.device_id,alert_log.state,alert_log.details,alert_log.time_logged,alert_rules.rule,alert_rules.severity,alert_rules.extra,alert_rules.name,alert_rules.builder FROM alert_log,alert_rules WHERE alert_log.rule_id = alert_rules.id && alert_log.device_id = ? && alert_log.rule_id = ? && alert_rules.disabled = 0 ORDER BY alert_log.id DESC LIMIT 1 [228,13] 3.34ms]
SQL[SELECT DISTINCT a.* FROM alert_rules a
LEFT JOIN alert_device_map d ON a.id=d.rule_id
LEFT JOIN alert_group_map g ON a.id=g.rule_id
LEFT JOIN device_group_device dg ON g.group_id=dg.device_group_id
WHERE a.disabled = 0 AND ((d.device_id IS NULL AND g.group_id IS NULL) OR d.device_id=? OR dg.device_id=?) [228,228] 2.42ms]
SQL[SELECT attrib_value FROM devices_attribs WHERE attrib_type = “disable_notify” && device_id = ? [228] 0.23ms]
SQL[SELECT hostname, sysName, sysDescr, sysContact, os, type, ip, hardware, version, purpose, notes, uptime, status, status_reason, locations.location FROM devices LEFT JOIN locations ON locations.id = devices.location_id WHERE device_id = ? [228] 0.28ms]
SQL[SELECT * FROM devices_attribs WHERE device_id = ? [228] 0.23ms]
SQL[select * from device_perf where device_id = ? order by timestamp desc limit 1 [228] 4.82ms]
SQL[select * from alert_templates where exists (select * from alert_template_map where alert_templates.id = alert_template_map.alert_templates_id and alert_rule_id = ?) limit 1 [13] 0.31ms]
Issuing Alert-UID #72375/1:
SQL[SELECT rule_id FROM alerts WHERE id=? [16421] 0.18ms]
SQL[SELECT b.transport_id, b.transport_type, b.transport_name FROM alert_transport_map AS a LEFT JOIN alert_transports AS b ON b.transport_id=a.transport_or_group_id WHERE a.target_type=‘single’ AND a.rule_id=? UNION DISTINCT SELECT d.transport_id, d.transport_type, d.transport_name FROM alert_transport_map AS a LEFT JOIN alert_transport_groups AS b ON a.transport_or_group_id=b.transport_group_id LEFT JOIN transport_group_transport AS c ON b.transport_group_id=c.transport_group_id LEFT JOIN alert_transports AS d ON c.transport_id=d.transport_id WHERE a.target_type=‘group’ AND a.rule_id=? [13,13] 0.46ms]
SQL[select * from alert_templates where exists (select * from alert_template_map where alert_templates.id = alert_template_map.alert_templates_id and alert_rule_id is null) limit 1 [] 0.21ms]
SQL[select * from alert_templates where name = ? limit 1 [“Default Alert Template”] 0.21ms]
:: Transport mail => SQL[SELECT transport_config FROM alert_transports WHERE transport_id=? [1] 0.19ms]
Attempting to email Alert for device ex3400 - P3 - Warning - Network - Processor Usage Over 95% to: [email protected]
OKSQL[SELECT devices.*, location, lat, lng FROM devices LEFT JOIN locations ON devices.location_id=locations.id WHERE device_id = ? [228] 0.43ms]
SQL[SELECT * FROM devices_attribs WHERE device_id = ? [228] 0.27ms]
SQL[SELECT * FROM vrf_lite_cisco WHERE device_id = ? [228] 0.28ms]
SQL[INSERT IGNORE INTO eventlog (host,device_id,reference,type,datetime,severity,message,username) VALUES (:host,:device_id,:reference,:type,NOW(),:severity,:message,:username) {“host”:228,“device_id”:228,“reference”:“NULL”,“type”:“alert”,“severity”:1,“message”:“Issued warning alert for rule ‘P3 - Warning - Network - Processor Usage Over 95%’ to transport ‘mail’”,“username”:""} 2.81ms]
:: Transport msteams => SQL[SELECT transport_config FROM alert_transports WHERE transport_id=? [2] 0.23ms]
OKSQL[INSERT IGNORE INTO eventlog (host,device_id,reference,type,datetime,severity,message,username) VALUES (:host,:device_id,:reference,:type,NOW(),:severity,:message,:username) {“host”:228,“device_id”:228,“reference”:“NULL”,“type”:“alert”,“severity”:1,“message”:“Issued warning alert for rule ‘P3 - Warning - Network - Processor Usage Over 95%’ to transport ‘msteams’”,“username”:""} 1.92ms]

The test email and Teams message transported correctly.

But this alert triggered at 06:11 this morning and nothing trasnported;


#7

Hey Guys,

Could you offer anymore help about what else I can check please?

Duncan


#8

Hi All,

I have checked through the syslog, mail.log, dmesg, mysql/error.log and can’t see any problems.

This was previously working fine. The last time we received an email from the system was in Aug.

We had some issues with the upgrade to 1.44 but we though these had been resolved.

We are getting alearts in Alerts | Notifications for my test rule.

However, If I check Recent Events for each one of those hosts I don’t see an even for this Alert has been Transported.
The bottom host shows the following in recent events.

Can you give me any futrther advice on what else I can look at to try and work out what is happening.

Do I need to completly re-build the server on the latest version to try and resolve this?

Regards,

Duncan


#9

The Auto update failed on Sat 18/08/2018 17:42

The following email was sent from the server

Blockquote
We just attempted to update your install but failed. The information below should help you fix this.
warning: unable to access ‘/root/.config/git/ignore’: Permission denied
error: Your local changes to the following files would be overwritten by
merge:
html/includes/common/top-devices.inc.php
html/includes/common/top-interfaces.inc.php
html/includes/common/worldmap.inc.php
Please, commit your changes or stash them before you can merge.
Aborting

Blockquote
We just attempted to update your install but failed. The information below should help you fix this.
warning: unable to access ‘/root/.config/git/attributes’: Permission denied
warning: unable to access ‘/root/.config/git/ignore’: Permission denied
error: Your local changes to the following files would be overwritten by
merge:
includes/definitions.inc.php
Please, commit your changes or stash them before you can merge.
Aborting

We solved this issue and updated but no alerts emails have been received since this point.


#10

A quick look at your screen shot above - you don’t have the rule mapped to any devices…


#11

Hi Tadpole,

Thanks for the input but I think the ‘Map To’ resticts rules to devices or groups. If it is left blank it appies to all devices globally.

Regards,

Duncan


#12

Fair enough… I have seen an error in an alert template cause a silent failure to e-mail you could try ./scripts/test-template.php


#13

I have tested with both

./scripts/test-template.php
./scripts/test-alert.php

They both work as expected. However, live Notifications don’t Alert or Transport?

Im trying to find out what I can check further up the chain to work out why this isn’t working.

Regards,

Duncan


#14

Hi All,

Further to this we have found the following;

production.ERROR: Symfony\Component\Debug\Exception\FatalThrowableError: Parse error: syntax error, unexpected ‘}’, expecting end of file in /opt/librenms/includes/alerts.inc.php(404) : eval()'d code:35

Stack trace:

#0 /opt/librenms/LibreNMS/Alert/Template.php(177): RunJail(’$ret .= "<div s…’, Array)

#1 /opt/librenms/LibreNMS/Alert/Template.php(67): LibreNMS\Alert\Template->legacyBody(Array)

#2 /opt/librenms/includes/alerts.inc.php(876): LibreNMS\Alert\Template->getBody(Array)

#3 /opt/librenms/includes/alerts.inc.php(611): ExtTransports(Array)

#4 /opt/librenms/includes/alerts.inc.php(813): IssueAlert(Array)

#5 /opt/librenms/alerts.php(48): RunAlerts()

#6 {main}

Regards,

Duncan


#15

You have syntax errors in your alter templates.


#16

Hi All,

I have now looked at the templates, there were 3 that were legacy, Two of the templates could be converted and then updated. However, one threw an error when it was converted and updated.

We have now deleted all the templates and they have re-created themselves.

We are now receiving Alerts via email and teams again.

Thanks for your input.

Regards,

Duncan