PHP Out of Memory (after running fine for months)

Hello,

We have been running LIbreNMS for the last 2+ years are have been very happy…

Until recently when the web UI started partially or completely failing.

We traced this down to a out of memory issue reported by PHP. We increased PHP’s memory from 128M to 192M which solved the problem for a about a week.

Today the problem returned so we increased the PHP’s memory to 256M which did not fix the issue.

The specific error we are seeing is:

2017/12/08 15:23:56 [error] 1227#0: *23 FastCGI sent in stderr: “PHP message: PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 8388608 bytes) in /opt/librenms/includes/dbFacile.php on line 376” while reading response header from upstream,

While we can keep increasing PHP’s memory it is concerning that we need to and suspect there is an underlying issue.

We are monitoring 400+ devices and add a handful more each week.

Probably doesn’t matter but for complete info we’re using nginx, php-fpm and memcached.

We do not track commits closely so are wondering if something changed recently that would affect memory usage.

Any ideas or pointers would be very appreciated.

Thanks
Mark


$ ./validate.php

Component Version
LibreNMS 1.33-218-g46cbc69
DB Schema 222
PHP 5.4.16
MySQL 5.5.56-MariaDB
RRDTool 1.4.8
SNMP NET-SNMP 5.7.2
====================================

[OK] Database connection successful
[OK] Database schema correct
[WARN] PHP version 5.6.4 will be the minimum supported version on January 10, 2018. We recommend you update to PHP a supported version of PHP (7.1 suggested) to continue to receive updates. If you do not update PHP, LibreNMS will continue to function but stop receiving bug fixes and updates.

https://docs.librenms.org/#Support/FAQ/#why-do-i-get-blank-pages-sometimes-in-the-webui

I highly suggest you update to PHP 7.1 :wink:

but also what pages are you having an issue with?

Upgrading to PHP 7.1 is the plan, but I would like to resolve this issue first. It will be hard to tell if upgrading is breaks things when I’m starting with things broken…

The problem appears to affect all pages. The left part of the nav bar is displayed all the way to the Ports menu, then the thin blue page update line crosses left to right on the top. At this point the rest of the page never displays.

When I inspect the page in Chrome, the html stops after the incomplete nav bar and the main page reports a 500 Internal system error.

Hmm, can you post a screenshot?

This isn’t a memory issue, otherwise </html> would not be there.

Go through and see what the last bit of ports menu is in the html. Likely something causing an error.

FWIW, after chasing this number past 1G I finally just set my PHP memory_limit to 4G, this has covered about 2k devices and allows even the mem-intensive Maps->Network tool to work without crashing.

On that note, the Maps->Network tool is great for benchmarking your memory setting.

1 Like

Holy crap. Probably some optimizations to be had :slight_smile:

I “tail -f” the nginx error log and it emits the error msg included in the original post upon every page request.

Chrome inserts closing tags to deal with errors. When I “show source” the the HTML is incomplete.

Ok, that makes sense. I guess something is fetching a massive amount of data.

The error doesn’t tell us what it is, which is why I wanted to see where the html stopped and I could guess.

Line 376 of /opt/librenms/includes/dbFacile.php

354 function dbFetchRows($sql, $parameters = array(), $nocache = false)
355 {
356 global $config;
357
358 if ($config[‘memcached’][‘enable’] && $nocache === false) {
359 $result = $config[‘memcached’][‘resource’]->get(hash(‘sha512’, $sql.‘|’.serialize($parameters)));
360 if (!empty($result)) {
361 return $result;
362 }
363 }
364
365 $time_start = microtime(true);
366 $result = dbQuery($sql, $parameters);
367
368 if (mysqli_num_rows($result) > 0) {
369 $rows = array();
370 while ($row = mysqli_fetch_assoc($result)) {
371 $rows = $row;
372 }
373
374 mysqli_free_result($result);
375 if ($config[‘memcached’][‘enable’] && $nocache === false) {
376 $config[‘memcached’][‘resource’]->set(hash(‘sha512’, $sql.‘|’.serialize($parameters)), $rows, $config[‘memcached’][‘ttl’]);
377 }
378 recordDbStatistic(‘fetchrows’, $time_start);
379 return $rows;
380 }
381
382 mysqli_free_result($result);
383
384 // no records, thus return empty array
385 // which should evaluate to false, and will prevent foreach notices/warnings
386 recordDbStatistic(‘fetchrows’, $time_start);
387 return array();
388 }//end dbFetchRows()

Is there an easy way to dump the SQL stmt and the results?

Well, that gives a little info since I was looking at the current version of the code line 376 and you must be on the monthly release.

Step 1, set $config[‘memcached’][‘enable’] to false having that enabled will only cause you problems.

Step 2, send me the last few lines of HTML that are sent to the browser.

<!-- PORTS -->
    <li class="dropdown">
      <a href="ports/" class="dropdown-toggle" data-hover="dropdown" data-toggle="dropdown"><i class="fa fa-link fa-fw fa-lg fa-nav-icons hidden-md" aria-hidden="true"></i> <span class="hidden-sm">Ports</span></a>
      <ul class="dropdown-menu">
        <li><a href="ports/"><i class="fa fa-link fa-fw fa-lg" aria-hidden="true"></i> All Ports</a></li>

        <li><a href="pseudowires/"><i class="fa fa-arrows-alt fa-fw fa-lg" aria-hidden="true"></i> Pseudowires</a></li>            <li role="presentation" class="divider"></li>            <li><a href="customers/"><i class="fa fa-users fa-fw fa-lg" aria-hidden="true"></i> Customers</a></li>            <li><a href="iftype/type=transit/"><i class="fa fa-truck fa-fw fa-lg" aria-hidden="true"></i> Transit</a></li>            <li><a href="iftype/type=peering/"><i class="fa fa-handshake-o fa-fw fa-lg" aria-hidden="true"></i> Peering</a></li>            <li><a href="iftype/type=peering,transit/"><i class="fa fa-rocket fa-fw fa-lg" aria-hidden="true"></i> Peering + Transit</a></li>            <li><a href="iftype/type=core/"><i class="fa fa-code-fork fa-fw fa-lg" aria-hidden="true"></i> Core</a></li>            <li role="presentation" class="divider"></li>

We’ve been using memcached without problem since I installed LibreNMS 2+ years ago. Did something recently change?

And it is news to me that using it would be problematic.

Disabling memcache usage resolves the issue. Good I guess… but something odd is going on.

Nothing changed, the current usage of memcached is and has been rubbish.

Someone started to experiment with it and didn’t finish the implementation.
Bad things include caching the output of all SQL queries and not properly invalidating things so stale data is returned.

In general, memcached is a good thing, which is why many people enable it I think.

I’m actually working on a PR now to rip it out.

Use varnish or something if you need caching and enable PHP opcache :slight_smile:

Got it re the memcache implementation being poor.

We only have 255 devices and probably had 245 a couple of weeks ago. So our installation is not very big.

The last (most recent) git commit that we have is:

commit 46cbc696604b54f04bb4bf910a89a4f7d05e6c29
Author: Kevin Krumm [email protected]
Date: Thu Dec 7 18:09:41 2017 -0600
docs: minor fix to device sensors doc (#7874)
I foobar the backticks. :slight_smile:

So fairly recent.

Given the line of the code generating the OOM error, I’m actually wondering if it is a SQL query change that is producing a lot more rows than is should. Feels like that sort of issue.

dbFetchRows("SELECT * FROM `ports` AS P, `devices` as D WHERE P.`deleted` = '1' AND D.device_id = P.device_id")

So, looks like you have way too many deleted ports.

I suggest you go purge them.

Yeah I just found that query and it results in 12K+ rows.

Is there a preferred way to purge deleted ports?

There is a button in the webui called purge all if you click on the deleted ports menu item.