FreeBSD Dispatch Service "graph gap"

./validate.php

Component Version
LibreNMS 21.5.1
DB Schema 2021_04_08_151101_add_foreign_keys_to_port_group_port_table (208)
PHP 7.4.19
Python 3.8.10
MySQL 10.5.10-MariaDB-log
RRDTool 1.7.2
SNMP NET-SNMP 5.9

====================================

[OK] Installed from package; no Composer required
[OK] Database connection successful
[OK] Database schema correct

pkg info librenms

librenms-21.5.1_2,1
Name           : librenms
Version        : 21.5.1_2,1
Installed on   : Thu Aug 26 08:08:12 2021 CDT
Origin         : net-mgmt/librenms
Architecture   : FreeBSD:13:*
Prefix         : /usr/local
Categories     : net-mgmt
Licenses       : GPLv3
Maintainer     : [email protected]
WWW            : http://www.librenms.org
Comment        : Autodiscovering PHP/MySQL/SNMP based network monitoring
Options        :
    DOCS           : on
    EXAMPLES       : on
    FPING          : on
    IPMITOOL       : on
    LIBVIRT        : on
    MYSQLD         : off
    NAGPLUGINS     : on
    NMAP           : on
    WMIC           : on
    X11            : off

Misc Info

Distributed Polling: NO
NAME=FreeBSD
VERSION=13.0-STABLE    
py38-psutil-5.8.0
hw.realmem: 34359738368
hw.ncpu: 14

Traceback

Traceback (most recent call last):
File "/usr/local/www/librenms/librenms-service.py", line 68, in <module>
service.start()
File "/usr/local/www/librenms/LibreNMS/service.py", line 507, in start
self.reap_psutil()
File "/usr/local/www/librenms/LibreNMS/service.py", line 420, in reap_psutil
for p in psutil.Process().children(recursive=False):
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 272, in wrapper
return fun(self, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 899, in children
ppid_map = _ppid_map()
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 256, in _ppid_map
for pid in pids():
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 1365, in pids
ret = sorted(_psplatform.pids())
File "/usr/local/lib/python3.8/site-packages/psutil/_psbsd.py", line 513, in pids
ret = cext.pids()
OSError: [Errno 12] Cannot allocate memory (originated from sysctl)

Synopsis

Our server has been experiencing “graph gap” since June 4th, 2021. We do not use distributed polling at this time. The server has 32G of RAM, 14 vCPUs, and plenty of resources to spare.

When the service crashes, the logs are still populated with information about the billing runs. We don’t have the watchdog enabled, but because the logs are still being populated, the watchdog would not trigger a restart. The scheduled maintenance does run and restart the service. However, this could leave up to 24 hours of “graph gap”.

We have yet to find any correlation to the service crashing, it appears to be random. We just turned on verbose logging with timestamps, so we don’t have enough data to try and correlate the event with a particular host, yet.

Questions

  1. Has anyone else experienced this? If so, how did you resolve it?
  2. How can we configure the watchdog to restart the Dispatch Service when the logs are still being populated?

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.