504 Gateway time-out Error after running out of RAM

We’ve encountered a 502 Bad Gateway and a 504 Gateway time-out error in our LibreNMS web UI. Here are the details:

  1. Environment:
  • Operating System: Ubuntu 20.04 LTS (virtual machine)
  • Web Server: Nginx
  1. Observations:
  • The error occurs consistently.
  • Restarting the machine did not resolve the issue.
  1. Troubleshooting Steps Taken:
  • Reviewed Nginx configuration (proxy settings, upstream server).
  • Verified PHP-FPM configuration.
  • Monitored system resource utilization (CPU, memory).
  1. Request for Assistance:
  • Seeking guidance on further troubleshooting steps.
  • Any insights or recommendations would be appreciated.

Logs Checked:

  • LibreNMS logs
  • Nginx logs
  • System logs

Recently our VPS ran out of RAM so Librenms crashed. After adding more RAM to the VPS the web-ui didn’t want to come back online, it now gives a 502 or a 504 error when I try to go to the website.
We run distributed polling setup, so the server that I am talking about is only web-ui and discovery part.

I unfortunately can’t share the output of the ./daily.sh and the ./validate.php because they don’t want to run properly. They just don’t output anything :(.

Sounds like some type of php configuration corruption?

./validate.php
run as the librenms does nothing???

php -v ?

fpm issue?

root       16907       1  0 Oct25 ?        00:04:38 php-fpm: master process (/etc/php/8.1/fpm/php-fpm.conf)
librenms   65086   16907  0 Oct26 ?        00:00:02 php-fpm: pool librenms
librenms   90688   16907  0 Oct26 ?        00:00:01 php-fpm: pool librenms
librenms   90951   16907  0 Oct26 ?        00:00:04 php-fpm: pool librenms
librenms 2010317 1993004  0 00:55 pts/5    00:00:00 grep fpm```

Log files for FPM etc?

root@lnmsdev:/var/log# ls -l | grep fpm
-rw------- 1 root root 56 Dec 3 00:00 php8.1-fpm.log
-rw------- 1 root root 56 Nov 26 00:00 php8.1-fpm.log.1


Also an old one but a good one -- check you haven't run out of disk space :)

When running ./validate.php there is no output or anything. It looks like it is running something but it never shows the result.
librenms@librenms-web-ui:~$ ./validate.php
image
I can wait for hours but it never gets any further than this! Same for the ./daily.sh

When running php -v as Librenms user, I get:

PHP 8.2.13 (cli) (built: Nov 24 2023 08:46:50) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.2.13, Copyright (c) Zend Technologies
    with Zend OPcache v8.2.13, Copyright (c), by Zend Technologies

Also, when i run grep fpm in the map /var/log, i get this output:

-rw-------  1 root      root                678630 Dec  2 16:08 php8.2-fpm.log.1
-rw-------  1 root      root                 12456 Sep 30 06:36 php8.2-fpm.log.1.0.gz
-rw-------  1 root      root                 11960 Sep 23 06:46 php8.2-fpm.log.1.1.gz
-rw-------  1 root      root                  6684 Sep 16 06:46 php8.2-fpm.log.1.2.gz
-rw-------  1 root      root                146027 Nov 25 22:05 php8.2-fpm.log.2.gz
-rw-------  1 root      root                  8119 Nov 16 12:52 php8.2-fpm.log.3.gz
-rw-------  1 root      root                 12704 Nov 11 06:11 php8.2-fpm.log.4.gz
-rw-------  1 root      root                 15444 Nov  4 06:16 php8.2-fpm.log.5.gz
-rw-------  1 root      root                 14294 Oct 28 05:59 php8.2-fpm.log.6.gz
-rw-------  1 root      root                 14669 Oct 21 05:58 php8.2-fpm.log.7.gz
-rw-------  1 root      root                 13757 Oct 14 05:56 php8.2-fpm.log.8.gz
-rw-------  1 root      root                 13785 Oct  7 06:41 php8.2-fpm.log.9.gz

Disk space is not the problem: Usage of /: 33.9% of 72.53GB

The /etc/php/8.2/fpm/pool.d/*.conf file:

[librenms]
user = librenms
group = librenms

I also wanted to share the /etc/nginx/conf.d/librenms.conf

server {
 server_name librenms.dentech.nl;
 root        /opt/librenms/html;
 index       index.php;

 charset utf-8;
 gzip on;
 gzip_types text/css application/javascript text/javascript application/x-javascript image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon;
 location / {
  try_files $uri $uri/ /index.php?$query_string;
 }
 location ~ [^/]\.php(/|$) {
  fastcgi_pass unix:/run/php-fpm-librenms.sock;
 }
 location ~ /\.(?!well-known).* {
  deny all;
 }
location /nginx-status {
    stub_status on;
    access_log   off;
    allow 127.0.0.1;
    allow ::1;
    allow 172.17.255.200;
    allow 172.17.255.201;
    allow 172.17.255.202;
    allow all;
#    deny all;
}

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/librenms.dentech.nl/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/librenms.dentech.nl/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
    if ($host = librenms.dentech.nl) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


 listen      80;
 server_name librenms.dentech.nl;
    return 404; # managed by Certbot
}

And here is the /var/log/nginx/error.log log file:

2023/12/05 09:54:56 [crit] 130650#130650: *2 connect() to unix:/run/php-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm.sock:", host: "librenms.dentech.nl"
2023/12/05 09:55:05 [crit] 130651#130651: *18 connect() to unix:/run/php-fpm.sock failed (2: No such file or directory) while connecting to upstream, client: ******, server: librenms.dentech.nl, request: "HEAD /remote/fgt_lang?lang=en HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm.sock:", host: "145.131.3.75"
2023/12/05 10:05:04 [crit] 132153#132153: *1 connect() to unix:/run/php-fpm-librenms.sock failed (2: No such file or directory) while connecting to upstream, client: *******, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm-librenms.sock:", host: "145.131.3.75:443"
2023/12/05 10:06:20 [crit] 132428#132428: *1 connect() to unix:/run/php-fpm-librenms.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm-librenms.sock:", host: "librenms.dentech.nl"
2023/12/05 10:07:26 [crit] 132427#132427: *5 connect() to unix:/run/php-fpm-librenms.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm-librenms.sock:", host: "librenms.dentech.nl"
2023/12/05 10:08:40 [crit] 132427#132427: *7 connect() to unix:/run/php-fpm-librenms.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm-librenms.sock:", host: "librenms.dentech.nl"
2023/12/05 10:08:49 [crit] 132428#132428: *8 connect() to unix:/run/php-fpm-librenms.sock failed (2: No such file or directory) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm-librenms.sock:", host: "librenms.dentech.nl"
2023/12/05 10:10:03 [crit] 133079#133079: *2 connect() to unix:/var/run/php8.2-fpm-librenms.sock failed (13: Permission denied) while connecting to upstream, client: 172.16.1.168, server: librenms.dentech.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php8.2-fpm-librenms.sock:", host: "librenms.dentech.nl"

After letting the ./daily.sh script run for like 4 hours this is the only results I get.

librenms@librenms-web-ui:~$ ./daily.sh

In MemcachedConnector.php line 69:

  Class "Memcached" not found


Updating SQL-Schema                                FAIL

In MemcachedConnector.php line 69:

  Class "Memcached" not found

Cleaning up DB

I would be looking in these files for why your php-fpm is borked.

We ended up rebuilding the LibreNMS VPS with the web-ui on it. The database was on a different server, so the data is still intact. After rebuilding, the web-ui works like a charm.

Now we only have to activate monitoring again and we are back to monitoring.

Thanks for the responses!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.