"Active alerts" dashboard widget takes lot of time loading (>15 seconds)

frank42 · 15 September 2023 14:40

Hi,

First of all, my validate.php:

===========================================
Component | Version
--------- | -------
LibreNMS  | 23.8.2-54-g4b98bd760 (2023-09-15T14:29:15+02:00)
DB Schema | 2023_08_30_105156_add_applications_soft_deleted (259)
PHP       | 8.1.23
Python    | 3.9.2
Database  | MariaDB 10.5.19-MariaDB-0+deb11u2
RRDTool   | 1.7.2
SNMP      | 5.9
===========================================

[OK]    Composer Version: 2.6.2
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQl and PHP time match
[OK]    Active pollers found
[OK]    Dispatcher Service is enabled
[OK]    Locks are functional
[OK]    No active python wrapper pollers found
[OK]    Redis is functional
[OK]    rrdtool version ok
[OK]    Connected to rrdcached

since some time, I have the problem, that the “Active alerts” box on my dashboard takes about 15-20 seconds to display things on each refresh, e.g. the page itself is done loading, but the box keeps showing a “Loading…” below the title column. Same problem applies to the /alerts subpage (via Alerts → Notifications).

Currently I have 124 devices and 50 active alerts (most of them being “Port status up/down” alerts because my networking crew doesn’t remember to disable unused switch ports) according to the dashboard widget. I checked the database table “alerts”, which has an overall amount of 870 entries for all 124 clients. This table also contains several entries for hosts that have no active alerts.

Example:

MariaDB [librenms]> SELECT * FROM alerts where device_id = 2;
+------+-----------+---------+-------+---------+------+------+---------------------+------+
| id   | device_id | rule_id | state | alerted | open | note | timestamp           | info |
+------+-----------+---------+-------+---------+------+------+---------------------+------+
| 2889 |         2 |      14 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2890 |         2 |       1 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2891 |         2 |       3 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2892 |         2 |       4 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2893 |         2 |       9 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2894 |         2 |      10 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2895 |         2 |      11 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
| 2896 |         2 |      12 |     0 |       0 |    0 | NULL | 2023-09-15 15:42:20 |      |
+------+-----------+---------+-------+---------+------+------+---------------------+------+
8 rows in set (0.001 sec)

Even after performing a delete from alerts where device_id = 2;, the entries reappeared a few minutes later:

MariaDB [librenms]> SELECT * FROM alerts where device_id = 2;
+------+-----------+---------+-------+---------+------+------+---------------------+------+
| id   | device_id | rule_id | state | alerted | open | note | timestamp           | info |
+------+-----------+---------+-------+---------+------+------+---------------------+------+
| 2897 |         2 |      14 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2898 |         2 |       1 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2899 |         2 |       3 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2900 |         2 |       4 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2901 |         2 |       9 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2902 |         2 |      10 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2903 |         2 |      11 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
| 2904 |         2 |      12 |     0 |       0 |    0 | NULL | 2023-09-15 16:12:23 |      |
+------+-----------+---------+-------+---------+------+------+---------------------+------+
8 rows in set (0.000 sec)

I assume that there’s some function that recreates them based on the alert_log table, but I have no clue why this is done, since there are no active alerts on that machine and I have no idea if this could be a reason for the problem here…

For testing purposes, I removed the suid bit from /usr/bin/fping, which resulted in all 124 configured hosts to be ‘ping critical’ in active alerts, and while this condition lasted, the display of the Active alerts box went smooth within a second. Well - as long as I keep the “show 50 entries” dropdown. If I switch to show “All” entries, the delay occurs again. This massive delay came back when resetting suid bit and thus clearing all icmp unreachable alerts, which makes me believe that the delay has something to do with the entries in the alerts database table.

Honestly, I have no idea how to debug this, so any hints appreciated.

system · 14 December 2023 14:40

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.