Poller.php stuck on same device

We are seeing very long polling times for a few devices. And in fact the poller.php process gets “stuck”. Here is one poller’s output of ps -auxf | grep poller.php. You can see device #621 is repeated here many times. From the date/time column (#9 and #10), some poller.php processes began days ago.

(You may notice device 617 and 625 are in a similar state.

Is there a way to prevent this from happening? Should the distributed pollers know not to poll the same device multiple times?

Maybe there a way to set a maximum time for the poller process, so it doesn’t get stall and multiple polls don’t get started?

Note: we are using distributed pollers with redis (not memcache), and there happen to be three.

Output of validate:

$ ./validate.php 
====================================
Component | Version
--------- | -------
LibreNMS  | 21.6.0
DB Schema | 2021_25_01_0127_create_isis_adjacencies_table (210)
PHP       | 7.3.29-1~deb10u1
Python    | 3.7.3
MySQL     | 10.3.29-MariaDB-0+deb10u1
RRDTool   | 1.7.1
SNMP      | NET-SNMP 5.7.3
====================================
[OK]    Composer Version: 2.1.4
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database schema correct
[INFO]  Detected Dispatcher Service
[WARN]  Your local git contains modified files, this could prevent automatic updates.
        [FIX]: 
        You can fix this with ./scripts/github-remove
        Modified Files:
         rrd/.gitignore

You may ask, can I post the output of poller.php -v -d -h 621? The answer is it the file is huge: about 8 MB text file. But here is the summary at the end

./poller.php 621 2021-07-18 09:16:38 - 1 devices polled in 15195 secs  
SNMP [46/71.43s]: Get[15/0.81s] Getnext[4/0.22s] Walk[27/70.41s]
MySQL [1101/144158.08s]: Cell[2/0.00s] Row[-1/-0.00s] Rows[17/0.56s] Column[1/0.00s] Update[1081/144157.44s] Insert[1/0.08s] Delete[0/0.00s]
RRD [8242/4961.30s]: Other[4121/4960.87s] Update[4121/0.43s]

Go to https://my-librenms/device/621/graphs/group=poller/ and check what module takes so long time, and disable them accordingly

We know from poller debug/verbose output that it’s the bgpPeer module. But that data is of high importance to our organization and we can’t ignore it.

Maybe there needs to be a way to kill stuck poller processes automatically?

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.