Race condition triggered by dispatcher service. (librenms-service.py)

We are at least two guys experiencing a race condition triggered by the dispatcher service. Anyone with sufficient python-fu capable of telling us how to proceed debugging this?

Code in question:

python 3.6.8:

Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: Traceback (most recent call last):
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/opt/librenms/librenms-service.py", line 48, in <module>
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: service.start()
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/opt/librenms/LibreNMS/service.py", line 404, in start
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: sleep(self.config.master_resolution)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/opt/librenms/LibreNMS/service.py", line 550, in reap
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: handler = signal(SIGCHLD, SIG_DFL)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/usr/lib64/python3.6/signal.py", line 48, in signal
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: return _int_to_enum(handler, Handlers)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/usr/lib64/python3.6/signal.py", line 30, in _int_to_enum
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: return enum_klass(value)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/usr/lib64/python3.6/enum.py", line 293, in __call__
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: return cls.__new__(cls, value)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/usr/lib64/python3.6/enum.py", line 535, in __new__
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: return cls._missing_(value)
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: File "/usr/lib64/python3.6/enum.py", line 548, in _missing_
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: raise ValueError("%r is not a valid %s" % (value, cls.__name__))
Dec 15 11:13:21 librenms.thisisit.it librenms-service.py[19818]: TypeError: 'int' object is not callable


The traceback below is with python 3.8.6:

Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: Traceback (most recent call last):
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: File "/opt/librenms/librenms-service.py", line 48, in <module>
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: Poller_0-36(INFO):Completed poller run for 1206 in 7.16s
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: service.start()
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: File "/opt/librenms/LibreNMS/service.py", line 404, in start
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: Poller_0-36(INFO):Polling device 622
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: sleep(self.config.master_resolution)
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: File "/opt/librenms/LibreNMS/service.py", line 550, in reap
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: handler = signal(SIGCHLD, SIG_DFL)
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: File "/opt/rh/rh-python38/root/lib64/python3.8/signal.py", line 48, in signal
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: return _int_to_enum(handler, Handlers)
Dec 18 13:40:08 librenms.thisisit.it librenms-service.py[9470]: TypeError: 'int' object is not callable


This traceback is from a different environment, with python 3.8.5:

Dec 17 11:09:03 HOST librenms-service.py[755]: Traceback (most recent call last):
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/librenms-service.py", line 48, in <module>
Dec 17 11:09:03 HOST librenms-service.py[755]:     service.start()
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/LibreNMS/service.py", line 395, in start
Dec 17 11:09:03 HOST librenms-service.py[755]:     self.dispatch_immediate_polling(device_id, group)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/LibreNMS/service.py", line 425, in dispatch_immediate_polling
Dec 17 11:09:03 HOST librenms-service.py[755]:     self.poller_manager.post_work(device_id, group)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/LibreNMS/queuemanager.py", line 83, in post_work
Dec 17 11:09:03 HOST librenms-service.py[755]:     self.get_queue(queue_id).put(payload)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/LibreNMS/__init__.py", line 321, in put
Dec 17 11:09:03 HOST librenms-service.py[755]:     self._redis.zadd(self.key, {item: time()}, nx=True)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3/dist-packages/redis/client.py", line 2391, in zadd
Dec 17 11:09:03 HOST librenms-service.py[755]:     return self.execute_command('ZADD', name, *pieces, **options)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3/dist-packages/redis/client.py", line 838, in execute_command
Dec 17 11:09:03 HOST librenms-service.py[755]:     conn.send_command(*args)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3/dist-packages/redis/connection.py", line 686, in send_command
Dec 17 11:09:03 HOST librenms-service.py[755]:     self.send_packed_command(self.pack_command(*args),
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3/dist-packages/redis/connection.py", line 666, in send_packed_command
Dec 17 11:09:03 HOST librenms-service.py[755]:     sendall(self._sock, item)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3/dist-packages/redis/_compat.py", line 8, in sendall
Dec 17 11:09:03 HOST librenms-service.py[755]:     return sock.sendall(*args, **kwargs)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/opt/librenms/LibreNMS/service.py", line 550, in reap
Dec 17 11:09:03 HOST librenms-service.py[755]:     handler = signal(SIGCHLD, SIG_DFL)
Dec 17 11:09:03 HOST librenms-service.py[755]:   File "/usr/lib/python3.8/signal.py", line 47, in signal
Dec 17 11:09:03 HOST librenms-service.py[755]:     handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
Dec 17 11:09:03 HOST librenms-service.py[755]: TypeError: 'int' object is not callable

Did you get around to testing this at all? (I did update the documentation and code a little recently) ?

This so that the service at least restarts itself when it crashes. I can not put this into production when all my pollers can just exit and leave.

As noted on discord, the built-in stuff appears to do the job for me. Still got gaps in the data, but the service would restart automatically.

This topic was automatically closed 730 days after the last reply. New replies are no longer allowed.