Do you have a specific question?
We have about 2000 devices and recently been having lots of issues with polling and keeping the poller running.
When I check the poller_wapper.log I am seeing lots of “disk quota exceeded” errors and some times mysql database (1040, too many connections’)
We have increased the mysql database default connections.
We have 32gb or ram on a VM with 2 sockets, 12 cores.
Any recommendations on what is happening here or if we need to change more settings to handle this many deceives?
Have you enable exporting of poller data to an external InfluxDb system? Maybe try disabling that first to see if the poller keeps up on it’s own to the internal MySQL and RRD data stores. If so you know that the Influx warning are the cause and you can work on addressing the disk quota and ingest settings on that.
We are not exporting poller data to an external InfluxDb, we are running it local.
Thank you for the idea.
Meaning you installed InfluxDb on your LibreNMS server and then configured something like this to ship metrics to Influx?
If so, I would still disable that temporarily to see if the poller works without metrics shipping first and go from there.
You need to get a handle on what is happening with your influx and why it is generating disk quota exceeded messages (see influxdb logs, disk uage, and have you got quota’s on your filesystems etc)
From an operational availability perspective I would put a telegraf instance between LibreNMS and influxdb – so the telegraf instance will take the influxdb updates and batch them for sending to influx. This way if influxdb is slow or unavailable it won’t stop your pollers from working. We have a telegraf instance co-deployed with each poller.
You can setup mysql monitoring in LibreNMS and keep an eye on the ‘max connnections’ , ‘max used connections’ etc… we have our set to 4,000 using about 2.95k connections – from memory you get one connection per poller thread (so if you have 10 pollers with 90 threads that is 960 connections etc)
If you have the whole deployment on a single device you may be running into various limits.
If you are using redis we needed to shift our connection limit to 12k (currently 8k connected) and adjust the /etc/security/limits.d/redis.conf file
We use distributed pollers (54) and a high spec central server to run rrdcached/mysql/influxdb which has all its IO backed onto a large multiple NVMe array.
We are doing ~6k devices, ~480k ports
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.