MariaDB throwing OOM errors

  • Steps to reproduce an issue: issue occurs every day
    • The output of ./validate.php:
Component | Version
--------- | -------
LibreNMS  | 24.4.0 (2024-04-27T00:15:14-04:00)
DB Schema | 2024_04_22_161711_custom_maps_add_group (292)
PHP       | 8.1.2-1ubuntu2.17
Python    | 3.10.12
Database  | MariaDB 10.6.16-MariaDB-0ubuntu0.22.04.1
RRDTool   | 1.7.2
SNMP      | 5.9.1

[OK]    Composer Version: 2.7.6
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database Schema is current
[OK]    SQL Server meets minimum requirements
[OK]    lower_case_table_names is enabled
[OK]    MySQL engine is optimal
[OK]    Database and column collations are correct
[OK]    Database schema correct
[OK]    MySQL and PHP time match
[FAIL]  No active polling method detected
[OK]    Dispatcher Service not detected
[OK]    Locks are functional
[FAIL]  No active python wrapper pollers found
[OK]    Redis is unavailable
[WARN]  Could not check Python dependencies because this script is not running as librenms
        The install docs show how this is done on a new install:
[OK]    rrd_dir is writable
[OK]    rrdtool version ok
[FAIL]  Failed to fetch version from local git: fatal: detected dubious ownership in repository at '/opt/librenms'
To add an exception for this directory, call:

        git config --global --add /opt/librenms
[WARN]  Your local git branch is not master, this will prevent automatic updates.
        You can switch back to master with git checkout master
[FAIL]  You need to run this script as 'librenms' or root
XXXX@librenms:/opt/librenms$ sudo ./validate.php
[sudo] password for XXXX: 
Do not run validate.php as root

We are getting the following in the logfile, mariadb gets an oom error and kills the process. Have to reboot to get things working.

May  9 09:00:28 librenms kernel: [161562.179900] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=php8.1-fpm.service,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=2866,uid=112
May  9 09:00:28 librenms kernel: [161562.183195] Out of memory: Killed process 2866 (mariadbd) total-vm:2754808kB, anon-rss:122352kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:1032kB oom_score_adj:0
May  9 09:00:34 librenms systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
May  9 09:00:39 librenms systemd[1]: mariadb.service: Main process exited, code=killed, status=9/KILL
May  9 09:00:39 librenms systemd[1]: mariadb.service: Failed with result 'oom-kill'.
May  9 09:00:39 librenms systemd[1]: mariadb.service: Consumed 12min 52.269s CPU time.```

From that output I don’t think we have enough info to say MariaDB threw an OOM error here. It’s just that the system’s oom-killer chose the MariaDB process to kill, probably because it had pretty large memory usage and fairly low oom-killer score.

You might be able to figure out what actually caused the OOM condition with more logs or in the kernel logs. Usually somewhere in there you’ll see something invoked oom-killer: blah blah blah and that something is more likely to the app that requested more memory than was available triggering oom-killer. But regardless, the system ran out of memory and MariaDB got killed. If it’s constantly running near max memory utilization and any random usage spike triggers this, then you probably just need to throw more memory or swap at it.