Problem with backend; poller can't keep up; gaps in the graphs

mhmh · 26 September 2024 20:04

12vCPU; 64GB RAM; NVMe SSD all flash SAN with dual 10GBe; Network with dual GBe; Debian 12; single LibreNMS instance running on ProxMox HA cluster; 569 hosts

This system manages Cisco, Juniper and PaloAlto:

We were using the cron based poller which couldn’t complete an entire polling cycle. We moved to the librenms service and made some tweaks. We had to increase the poller workers and increase max_connections for MariaDB.:

Problem solved:

Yes, you did read that correctly; 224 poller workers

laf · 26 September 2024 22:07

That’s a lot of workers. I don’t think you’ll need that many!

What are the typical poll times for the devices you are polling?

mhmh · 26 September 2024 23:07

I agree, it is quite a number. I started out with 16, then 24 and continued to increase the count until it was finally able to poll all devices in under 300 seconds. I also had to increase max_connections in MariaDB as I found out I had to increase both.

I had seen many devices in the 100-200+ second range (there were many well below that range). The system has been running well for almost 72 hours now.

I also increased max_repeaters (150) and max_oids (75) as well.

This was after running the mysql tuner and making adjustments there too. I still need to optimize php.

murrant · 29 September 2024 19:06

Increasing the number often causes more troubles than decreasing the number.

I have about 800 devices in my instance and have 32 workers. I also have low latency to most devices however.

Good luck!

system · 28 December 2024 19:07

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.