Redis value sample by querying “KEYS *”: https://gist.github.com/angryp/c75ba088a480fca8f42f3b68a32dd69c
Redis log sample contents (continuing for a long time):
31501:M 11 Oct 08:28:35.076 * 10 changes in 300 seconds. Saving...
31501:M 11 Oct 08:28:35.080 * Background saving started by pid 6700
6700:C 11 Oct 08:28:35.095 * DB saved on disk
6700:C 11 Oct 08:28:35.098 * RDB: 1 MB of memory used by copy-on-write
31501:M 11 Oct 08:28:35.184 * Background saving terminated with success
Pollers section screenshot: http://prntscr.com/l4rgvj
All pollers are running with INFO log level. One of the hosts that declared himself a Cluster master has the following entry in the logs:
2018-10-10 18:26:57,534 Poller_18-158(ERROR):Poller poller exception! [Errno 24] Too many open files
Traceback (most recent call last):
File "/add_disk/librenms/LibreNMS/queuemanager.py", line 58, in _service_worker
work_func(device_id)
File "/add_disk/librenms/LibreNMS/service.py", line 445, in poll_device
self.call_script('poller.php', ('-h', device_id))
File "/add_disk/librenms/LibreNMS/service.py", line 571, in call_script
return subprocess.check_output(cmd, stderr=subprocess.STDOUT, preexec_fn=os.setsid, close_fds=True).decode()
File "/usr/lib/python3.5/subprocess.py", line 316, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 383, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1185, in _execute_child
errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files
Interesting, this may be the reason. I will update ulimit and restart pollers - will see how it will go.
Here is how switching happens:
2018-10-01 10:28:23,799 poller(INFO):poller is now the master dispatcher
2018-10-01 10:29:12,521 poller(INFO):Creating queue poller:4
2018-10-01 10:29:12,931 poller(INFO):Creating queue discovery:4
2018-10-01 10:29:13,843 poller(INFO):Creating queue poller:8
2018-10-01 10:29:16,791 poller(INFO):Creating queue poller:10
2018-10-01 10:29:24,056 poller(INFO):Creating queue poller:2
2018-10-01 10:29:58,844 poller(INFO):Creating queue poller:14
2018-10-01 10:30:00,384 performance(INFO):Counting up time spent polling
2018-10-01 10:30:00,801 Thread-5(INFO):Using pure python SQL client
2018-10-01 10:30:03,833 poller(INFO):Creating queue poller:11
2018-10-01 10:30:04,351 poller(INFO):Creating queue discovery:11
2018-10-01 10:30:04,868 poller(INFO):Creating queue poller:3
2018-10-01 10:30:10,686 poller(INFO):Creating queue discovery:2
2018-10-01 10:30:11,415 poller(INFO):Creating queue poller:1
2018-10-01 10:30:12,659 poller(INFO):Creating queue discovery:1
2018-10-01 10:30:13,376 poller(INFO):Creating queue discovery:8
2018-10-01 10:30:16,156 poller(INFO):Creating queue poller:12
2018-10-01 10:30:16,666 poller(INFO):Creating queue discovery:12
2018-10-01 10:30:26,956 poller(INFO):Creating queue poller:13
2018-10-01 10:30:27,476 poller(INFO):Creating queue discovery:13
2018-10-01 10:30:42,948 poller(INFO):Creating queue poller:22
2018-10-01 10:30:47,608 poller(INFO):Creating queue poller:19
2018-10-01 10:30:49,767 poller(INFO):Creating queue poller:5
2018-10-01 10:30:50,177 poller(INFO):Creating queue discovery:5
2018-10-01 10:31:02,101 poller(INFO):poller is no longer the master dispatcher