Zombie/Defunct process issue generated post OS upgrade of distributed pollers from Centos 7.9 to RHEL 8.9

Elias · 29 May 2024 14:22

I’m seeing similar issues with a poller that I added to an existing setup.

There was at some point a version mismatch since I didn’t check out a specific tag when adding the new one, but only that one has the problems. The mismatch was between monthly releases in 24 train so not sure if the database part makes a difference here.

Elias · 31 May 2024 12:10

So looks like for me it was caused by Graphite integration, the new poller didn’t have firewall permissions to connect so something was hanging causing the defunct processes. After I allowed the traffic I don’t see the defunct processes anymore!

Vatansha · 3 June 2024 11:13

pm_max children is for php-fpm service, i haven’t increased it but i don’t think this could be the issue. Have you tried this though?

x_MastaYoda_x · 18 June 2024 22:38

still no joy here for me, 8 defunct php process overnight after a restart of the librenms dispatcher. im just restarting the service every week or so. not ideal.

murrant · 21 June 2024 05:06

Is maintenance running? it should be restarting the process every night. (I don’t think it shows in the unit run time) Or maybe the way the maintenance restarts the process doesn’t clear zombies.

Vatansha · 21 June 2024 06:25

I am also still facing the issue, had to restart the librenms service every 5th day. Maintenance is definitely running as every night defunct process count gets increased. On Day1, when you restart the service, it will be around 70 defunct process in my case as the pollers polls many network devices and on day 2 it will be double and so on.

x_MastaYoda_x · 1 July 2024 00:39

mine is less pronounced, not running distribute pollers. around 10 per day it increases see graph below.

happy to pull logs etc, but not sure what to look for at this point.

x_MastaYoda_x · 29 July 2024 09:11

bump…any one fixed?

x_MastaYoda_x · 12 September 2024 01:52

So this is still ongoing for me and i think im over the target.

The above runs at 12pm on my system adn this is when the defunct processes appears. php artisan from my understanding puts libre into maintenance mode @murrant ?

after googling this command and “defunct” i get this article.

now this is above my level of understaning but it appears this command is not terminating these child php processes in the right method. quote…

"The init process (PID 0, such as crond) assumes control over child processes left behind by the primary scheduling process (php artisan schedule:run), which are designed to run in the background (like php artisan command-name with runInBackground).

However, init does not actively take on the responsibility of terminating these child processes. The main scheduling process does not wait for and destroy these background-running child processes, causing them to become zombie processes."

or this person suggest its not waiting long enough?

" I fixed this by including pcntl_waitpid(-1, $status, WNOHANG); in the end of the schedule() function of App/Console/Kernel.php"

thoughts??

Vatansha · 12 September 2024 08:39

I am still facing the issue but i don’t have librenms-scheduler service at my end.

As suggested by murrant earlier in the thread, we can try below-

I simply restart the services every 10/14th day. I have lot of distributed pollers and all in production , so i can’t make these changes in production directly. If you have test environment then you can try this and let me know

x_MastaYoda_x · 13 September 2024 02:35

this is 100% the problem

x_MastaYoda_x · 13 September 2024 04:10

I have just discovered i didn’t have the python module psutil installed. this module is called multiple times in service.py to end pids while the maintenance script is running.

i’ve just installed it with “pip install psutil”. you can check if you have it installed or not with “pip list”. this is mentioned in the requirements.txt and never had a problem before upgrading the OS, i guess it got removed??. the validate script which i thought checked dependencies did not say this was missing…

i’ve just installed it so see what happen in 20 hours or so when maintenance runs again.

Vatansha · 13 September 2024 05:51

i already have psutil installed but still facing issue. do let me know the result in your environment though.

x_MastaYoda_x · 15 September 2024 22:17

unfortunately didnt fix it. will try that code modification this week .