Slow database queries after server migration

I’m trying to migrate a single VM librenms install to a single VM + a CloudSQL db in GCP. I followed the process described https://docs.librenms.org/Support/FAQ/#how-do-i-move-my-librenms-install-to-another-server and got the new instance up and running. Unfortunately the resulting performance is really bad. Increasing the instance size hasn’t helped. The new install has about 200 devices and 9000 ports & is currently running on a 16 core / 60G web & poller instance with a 4 core / 16G db instance.

Using the chrome dev tools I can see 15 to 30 second load times on every graph image which seems to be the primary performance culpret. With slow query logging enabled I see the same query repeatedly taking ~15 seconds.

{
“textPayload”: “select count(*) as aggregate from notifications left join notifications_attribs on notifications_attribs.notifications_id = notifications.notifications_id where (notifications_attribs.key = ‘sticky’ and notifications_attribs.value = 1) or (not exists (select 1 from notifications_attribs where notifications.notifications_id = notifications_attribs.notifications_id and notifications_attribs.user_id = 17));”,
“insertId”: “1#291195673885#5570976157431335143#slow#1567701247728874465#267372-0@a5”,
“resource”: {
“type”: “cloudsql_database”,
“labels”: {
“project_id”: “<<>>”,
“database_id”: “<<>>:librenms-prod2-dev”,
“region”: “us-west2”
}
},
“timestamp”: “2019-09-05T16:34:07.661972Z”,
“logName”: “projects/<<>>/logs/cloudsql.googleapis.com%2Fmysql-slow.log”,
“receiveTimestamp”: “2019-09-05T16:34:13.063899706Z”
},
{
“textPayload”: “# Query_time: 15.769743 Lock_time: 0.000019 Rows_sent: 1 Rows_examined: 29320993”,
“insertId”: “1#291195673885#5570976157431335143#slow#1567701247728874465#267372-0@a3”,
“resource”: {
“type”: “cloudsql_database”,
“labels”: {
“project_id”: “<<>>”,
“database_id”: “<<>>:librenms-prod2-dev”,
“region”: “us-west2”
}
},
“timestamp”: “2019-09-05T16:34:07.661972Z”,
“logName”: “projects/<<>>/logs/cloudsql.googleapis.com%2Fmysql-slow.log”,
“receiveTimestamp”: “2019-09-05T16:34:13.063899706Z”
}

Since the original instance runs fine (~2s page loads) the move from mariadb 10.1 to the cloud mysql 5.7 instance would seem to be the culprit but I’m unsure as to whether this is tuning, configuration, data issue, or an application bug.

$ ./validate.php

Component Version
LibreNMS 1.55-9-g9ac70037b
DB Schema 2019_07_09_150217_update_users_widgets_settings (140)
PHP 7.2.19-0ubuntu0.18.04.2
MySQL 5.7.14-google-log
RRDTool 1.7.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.9.0
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[WARN] Some devices have not been polled in the last 5 minutes. You may have performance issues.
[FIX]:
Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
xx.xx.xx.xx

Dug into this some more. Can replicate the slow query directly via CLI as follows

MySQL [librenms]> select count(*) as aggregate from `notifications` left join `notifications_attribs` on `notifications_attribs`.`notifications_id` = `notifications`.`notifications_id` where (`notifications_attribs`.`key` = 'sticky' and `notifications_attribs`.`value` = 1) or (not exists (select 1 from `notifications_attribs`  where notifications.notifications_id = notifications_attribs.notifications_id and  `notifications_attribs`.`user_id` = 17));
+-----------+
| aggregate |
+-----------+
|         0 |
+-----------+
1 row in set (11.30 sec)

using EXPLAIN I see

MySQL [librenms]> EXPLAIN select count(*) as aggregate from `notifications` left join `notifications_attribs` on `notifications_attribs`.`notifications_id` = `notifications`.`notifications_id` where (`notifications_attribs`.`key` = 'sticky' and `notifications_attribs`.`value` = 1) or (not exists (select 1 from `notifications_attribs`  where notifications.notifications_id = notifications_attribs.notifications_id and  `notifications_attribs`.`user_id` = 17));
+----+--------------------+-----------------------+------------+-------+---------------+------------------------------+---------+------+------+----------+----------------------------------------------------+
| id | select_type        | table                 | partitions | type  | possible_keys | key                          | key_len | ref  | rows | filtered | Extra                                              |
+----+--------------------+-----------------------+------------+-------+---------------+------------------------------+---------+------+------+----------+----------------------------------------------------+
|  1 | PRIMARY            | notifications         | NULL       | index | NULL          | notifications_severity_index | 5       | NULL |   52 |   100.00 | Using index                                        |
|  1 | PRIMARY            | notifications_attribs | NULL       | ALL   | NULL          | NULL                         | NULL    | NULL |  799 |   100.00 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DEPENDENT SUBQUERY | notifications_attribs | NULL       | ALL   | NULL          | NULL                         | NULL    | NULL |  799 |     1.00 | Using where                                        |
+----+--------------------+-----------------------+------------+-------+---------------+------------------------------+---------+------+------+----------+----------------------------------------------------+
3 rows in set, 2 warnings (0.00 sec)

So I added an index

MySQL [librenms]> alter table notifications_attribs add INDEX(notifications_id);
Query OK, 0 rows affected (2.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

And the query time drops from 11s -> 0.01s & all the web performance issues are gone. As expected - running validate.php now complains about the additional index so I suspect this is not the correct solution.

just like ours’ and acritox issue:

hopefully this should be resolved now https://github.com/librenms/librenms/pull/10589

1 Like