Zombie Process Issues (second try)

Hi there,

I’m facing the same problem as mentioned in issue 23614 but since the thread is closed without a working solution, I would like to reopen the discussion.

I recently upgraded my LibreNMS installation from Debian 11 to 12 (Yes, I know I’m late), and php is now on version 8.3.

The usual checks are fine:

librenms@heimdall:~$ ./validate.php
===========================================
Component | Version
--------- | -------
LibreNMS  | 24.11.0-38-g003bc6fdf (2024-12-02T22:36:15+01:00)
DB Schema | 2024_11_22_135845_alert_log_refactor_indexes (308)
PHP       | 8.3.14
Python    | 3.11.2
Database  | MariaDB 10.11.6-MariaDB-0+deb12u1
RRDTool   | 1.7.2
SNMP      | 5.9.3
===========================================

[OK]    Composer Version: 2.8.3
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQL and PHP time match
[OK]    Active pollers found
[OK]    Dispatcher Service is enabled
[OK]    Locks are functional
[OK]    No active python wrapper pollers found
[OK]    Redis is functional
[OK]    rrdtool version ok
[OK]    Connected to rrdcached

At exactly 1am (CET) the dispatcher service spawned zombies.
This happens daily since the upgrade to php-8.3, and they instantly disappear after restarting the dispatcher service:

root@heimdall:~# ps auxwww|grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
librenms 1624757  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
librenms 1624909  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
librenms 1626563  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
librenms 1626803  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
librenms 1627102  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
librenms 1627108  0.0  0.0      0     0 ?        Zs   01:00   0:00 [php] <defunct>
root     2829935  0.0  0.0   3324  1500 pts/1    S+   10:05   0:00 grep Z
root@heimdall:~# service librenms stop
root@heimdall:~# ps auxwww|grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     2829993  0.0  0.0   3324  1528 pts/1    S+   10:05   0:00 grep Z
root@heimdall:~#

I don’t have any error messages in storage/logs/, no error messages in /var/log/messages or any other logfile, and there is no cronjob running at exact this time.

Does someone have an idea how to debug a problem without any error messages?

Does this actually cause any issues you though?

The “working solution” for me was to stop using dispatcher service as I had mentioned on my thread. Please try that and see if that works for you.

Jaysen

Hi,

No, not yet. Currently it’s just annoying, but usually that doesn’t stay this way. It might be possible that they add up and eventually eat more and more RAM that it reduces performance and reliability.

Please don’t take me wrong, but that’s no solution, but rather a workaround. Of course it’s possible to just switch from one procedure to another to circumvent a problem (like when using your right arm hurts and the doctor says “just use the other one”), but I prefer to try to locate the cause of the problem and find a solution for it first.

Usually, zombie processes don’t happen on accident, there must be a reason for them to appear since the upgrade.

When it’s certain that the origin of the defunct processes can’t be located, I can still switch to the workaround.

It might not be “the” solution you are looking for but it was the solution that fixed my issue as well as another user who actually pointed it out to me which is good enough for me.

I researched it for at least a month before posting my thread because “the” solution was not standing out and based on my own research it was more PHP related that the PHP devs did not want to fix. Not the fault of LibreNMS.

I can appreciate you wanting to find the root cause and fix it. Myself however, am not a PHP guru and the solution presented to me works and was quick.

I do wish you the very best of luck and in fact hope you or someone figures this out as I would prefer to use the dispatcher service.

Jaysen