We have been running LIbreNMS for the last 2+ years are have been very happy…
Until recently when the web UI started partially or completely failing.
We traced this down to a out of memory issue reported by PHP. We increased PHP’s memory from 128M to 192M which solved the problem for a about a week.
Today the problem returned so we increased the PHP’s memory to 256M which did not fix the issue.
The specific error we are seeing is:
2017/12/08 15:23:56 [error] 1227#0: *23 FastCGI sent in stderr: “PHP message: PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 8388608 bytes) in /opt/librenms/includes/dbFacile.php on line 376” while reading response header from upstream,
While we can keep increasing PHP’s memory it is concerning that we need to and suspect there is an underlying issue.
We are monitoring 400+ devices and add a handful more each week.
Probably doesn’t matter but for complete info we’re using nginx, php-fpm and memcached.
We do not track commits closely so are wondering if something changed recently that would affect memory usage.
Any ideas or pointers would be very appreciated.
Thanks
Mark
$ ./validate.php
Component
Version
LibreNMS
1.33-218-g46cbc69
DB Schema
222
PHP
5.4.16
MySQL
5.5.56-MariaDB
RRDTool
1.4.8
SNMP
NET-SNMP 5.7.2
====================================
[OK] Database connection successful
[OK] Database schema correct
[WARN] PHP version 5.6.4 will be the minimum supported version on January 10, 2018. We recommend you update to PHP a supported version of PHP (7.1 suggested) to continue to receive updates. If you do not update PHP, LibreNMS will continue to function but stop receiving bug fixes and updates.
Upgrading to PHP 7.1 is the plan, but I would like to resolve this issue first. It will be hard to tell if upgrading is breaks things when I’m starting with things broken…
The problem appears to affect all pages. The left part of the nav bar is displayed all the way to the Ports menu, then the thin blue page update line crosses left to right on the top. At this point the rest of the page never displays.
When I inspect the page in Chrome, the html stops after the incomplete nav bar and the main page reports a 500 Internal system error.
FWIW, after chasing this number past 1G I finally just set my PHP memory_limit to 4G, this has covered about 2k devices and allows even the mem-intensive Maps->Network tool to work without crashing.
On that note, the Maps->Network tool is great for benchmarking your memory setting.
Nothing changed, the current usage of memcached is and has been rubbish.
Someone started to experiment with it and didn’t finish the implementation.
Bad things include caching the output of all SQL queries and not properly invalidating things so stale data is returned.
In general, memcached is a good thing, which is why many people enable it I think.
I’m actually working on a PR now to rip it out.
Use varnish or something if you need caching and enable PHP opcache
We only have 255 devices and probably had 245 a couple of weeks ago. So our installation is not very big.
The last (most recent) git commit that we have is:
commit 46cbc696604b54f04bb4bf910a89a4f7d05e6c29
Author: Kevin Krumm [email protected]
Date: Thu Dec 7 18:09:41 2017 -0600
docs: minor fix to device sensors doc (#7874)
I foobar the backticks.
So fairly recent.
Given the line of the code generating the OOM error, I’m actually wondering if it is a SQL query change that is producing a lot more rows than is should. Feels like that sort of issue.