We are seeing very long polling times for a few devices. And in fact the poller.php process gets “stuck”. Here is one poller’s output of ps -auxf | grep poller.php
. You can see device #621 is repeated here many times. From the date/time column (#9 and #10), some poller.php processes began days ago.
(You may notice device 617 and 625 are in a similar state.
Is there a way to prevent this from happening? Should the distributed pollers know not to poll the same device multiple times?
Maybe there a way to set a maximum time for the poller process, so it doesn’t get stall and multiple polls don’t get started?
Note: we are using distributed pollers with redis (not memcache), and there happen to be three.
Output of validate:
$ ./validate.php
====================================
Component | Version
--------- | -------
LibreNMS | 21.6.0
DB Schema | 2021_25_01_0127_create_isis_adjacencies_table (210)
PHP | 7.3.29-1~deb10u1
Python | 3.7.3
MySQL | 10.3.29-MariaDB-0+deb10u1
RRDTool | 1.7.1
SNMP | NET-SNMP 5.7.3
====================================
[OK] Composer Version: 2.1.4
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[INFO] Detected Dispatcher Service
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
rrd/.gitignore
You may ask, can I post the output of poller.php -v -d -h 621
? The answer is it the file is huge: about 8 MB text file. But here is the summary at the end
./poller.php 621 2021-07-18 09:16:38 - 1 devices polled in 15195 secs
SNMP [46/71.43s]: Get[15/0.81s] Getnext[4/0.22s] Walk[27/70.41s]
MySQL [1101/144158.08s]: Cell[2/0.00s] Row[-1/-0.00s] Rows[17/0.56s] Column[1/0.00s] Update[1081/144157.44s] Insert[1/0.08s] Delete[0/0.00s]
RRD [8242/4961.30s]: Other[4121/4960.87s] Update[4121/0.43s]