LibreNMS getting slow and eventlog widget gets canceled or 504

Hi,

The network team was busy adding devices with a lot of ports and now they complain about degraded performance of the WebUI and I also get 504s for the eventlog widget


After setting fastcgi_read_timeout 300; in /etc/nginx/sites-enabled/librenms.vhost I got consistent canceled messages and no more 504 errors.

Is there a guide, besides Performance - LibreNMS Docs, on how to find the bottlenecks and how to improve performance?

Thanks in advance

This is the issue, librenms by default tries to discover newly added devices every 5 minutes ( below is the cron )

*/5 * * * * librenms /opt/librenms/discovery.php -h new >> /dev/null 2>&1

when huge number of devices with huge number of ports are added, initial discovery process will take long time and simultaneously other polling jobs also running

once initial discovery of new devices is completed, this should become normal

Discovery and polling are also getting slower but this is about the Web UI and the eventlog dashboard widget.

Found the following bottleneck in the db with the help of a coworker:

use librenms;select * from `eventlog` where `eventlog`.`device_id` in (select `device_id` from `device_group_device` where `device_group_id` = 5) order by `datetime` desc limit 50 offset 0;
...
50 rows in set (2 min 53.973 sec)

This is verry slow!

The following sub query isn’t looking too bad but it’s multiplied by the number of rows in eventlog:

select `device_id` from `device_group_device` where `device_group_id` = 5;
...
136 rows in set (0.001 sec)

We see no need to query the group more then once and cache the result:

set @cisco := (select group_concat(`device_id` separator ',') from `device_group_device` where `device_group_id` = 5);

Now it is 21’500 times faster:

use librenms;select * from `eventlog` where `eventlog`.`device_id` in (@cisco) order by `datetime` desc limit 50 offset 0;
...
50 rows in set (0.008 sec)

So, digging around, I found out how this could be implemented.

Before

public function scopeInDeviceGroup($query, $deviceGroup)
    {
        return $query->whereIn($query->qualifyColumn('device_id'), function ($query) use ($deviceGroup) {
            $query->select('device_id')
                ->from('device_group_device')
                ->where('device_group_id', $deviceGroup);
        });
    }

After:

public function scopeInDeviceGroup($query, $deviceGroup)
    {
        // Build the comma-separated list of device IDs in SQL
        $deviceIdsSubquery = \DB::table('device_group_device')
        ->where('device_group_id', $deviceGroup)
        ->pluck('device_id')
        ->implode(',');

        // Use the result in the whereIn clause
        return $query->whereIn($query->qualifyColumn('device_id'), explode(',', $deviceIdsSubquery));
    }

The performance, while still not instant, is way better than before.

1 Like

@murrant what’s your opinion about this solution?

Still some optimizations to be had, but any improvement is welcomed.
Send a PR on Github if you want it included in LibreNMS.

This will solve it: Prefetch devices in group to avoid cost of subquery by slalomsk8er · Pull Request #15511 · librenms/librenms · GitHub

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.