Upstream Timed Out

Hi, Good day. I’m having an issue on our librenms, it’s been bugging me for a week now. Our librenms suffers 504 Gateway timeout almost everyday, all I have to do to bring it back online is to reboot the server. Upon investigating on the librenms error log, this is one of the results

2020/06/03 15:16:51 [error] 18232#18232: *71804 upstream timed out (110: Connect ion timed out) while reading response header from upstream, client: 10X.XXX.XXX.XXX, server: nms.xxxxxx.net, request: “POST /ajax/dash/availability-map HTTP/1.1”,upstream: “fastcgi://unix:/var/run/php/php7.2-fpm.sock”

I also tried to check the error log of nginx but it doesn’t have any error. I also tried to increase the ram of our server. It is now 3GB. We have 120 devices on our librenms, I’m suspecting that it maybe lack of resources issue but I just need to hear it from you guys, just to make it sure. Thanks!

Good day guys, just want to update this thread. Upon increasing the RAM of our server, it doesn’t fix the issue. Our librenms suffered 504 error every day. Maybe 2 times a day. Restarting the server is the only way to solve the issue temporarily.

Upon investigating the logs which is pertaining to fpm.sock, I stumbled upon the logs of the php-fpmand found these logs

[08-Jun-2020 09:06:54] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:07:30] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:08:19] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:16:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:48:59] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:49:27] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it
[08-Jun-2020 09:54:28] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it

So I decided to increase my pm.max_children under /etc/php/7.2/fpm/pool.d/www/conf. The default value was 5, and now I raised it into 20. Now, I just have to monitor our LibreNMS if it will encounter 504 error again tomorrow.

Bumping this post. Still getting UPSTREAM TIMED OUT

If both RAM and pm.max_children are increased, what error do you get?

Also, when it loads, does it load “quick”? Does the 504 happens on any page or in a particular place?

Good day, thanks for responding.

Still getting the same error after increasing pm.max server. here is the error message of my librenms errorlog

[error] 21957#21957: *118646 upstream timed out (110: Connec tion timed out) while reading response header from upstream, client: 10X.XXX.XXX.XXX, server: nms.XXXXXX.ph, request: “GET /graph.php?device=51&type=device_bits& from=1565365800&to=1596901800&height=60&width=113&legend=no HTTP/1.1”, upstream: “fastcgi://unix:/var/run/php/php7.2-fpm.sock”, host: “nms.t2g.net.ph”, referrer : “https://nms.XXXXXX.ph/graphs/device=51/type=device_bits/from=1594860553/to=1 594864153”

I think the error is existing when viewing the graphs of any devices in our librenms

Is it behind reverse proxy or something?

I think it is behind NAT but port forwarded