./validate.php
Component | Version |
---|---|
LibreNMS | 21.5.1 |
DB Schema | 2021_04_08_151101_add_foreign_keys_to_port_group_port_table (208) |
PHP | 7.4.19 |
Python | 3.8.10 |
MySQL | 10.5.10-MariaDB-log |
RRDTool | 1.7.2 |
SNMP | NET-SNMP 5.9 |
====================================
[OK] Installed from package; no Composer required
[OK] Database connection successful
[OK] Database schema correct
pkg info librenms
librenms-21.5.1_2,1
Name : librenms
Version : 21.5.1_2,1
Installed on : Thu Aug 26 08:08:12 2021 CDT
Origin : net-mgmt/librenms
Architecture : FreeBSD:13:*
Prefix : /usr/local
Categories : net-mgmt
Licenses : GPLv3
Maintainer : [email protected]
WWW : http://www.librenms.org
Comment : Autodiscovering PHP/MySQL/SNMP based network monitoring
Options :
DOCS : on
EXAMPLES : on
FPING : on
IPMITOOL : on
LIBVIRT : on
MYSQLD : off
NAGPLUGINS : on
NMAP : on
WMIC : on
X11 : off
Misc Info
Distributed Polling: NO
NAME=FreeBSD
VERSION=13.0-STABLE
py38-psutil-5.8.0
hw.realmem: 34359738368
hw.ncpu: 14
Traceback
Traceback (most recent call last):
File "/usr/local/www/librenms/librenms-service.py", line 68, in <module>
service.start()
File "/usr/local/www/librenms/LibreNMS/service.py", line 507, in start
self.reap_psutil()
File "/usr/local/www/librenms/LibreNMS/service.py", line 420, in reap_psutil
for p in psutil.Process().children(recursive=False):
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 272, in wrapper
return fun(self, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 899, in children
ppid_map = _ppid_map()
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 256, in _ppid_map
for pid in pids():
File "/usr/local/lib/python3.8/site-packages/psutil/__init__.py", line 1365, in pids
ret = sorted(_psplatform.pids())
File "/usr/local/lib/python3.8/site-packages/psutil/_psbsd.py", line 513, in pids
ret = cext.pids()
OSError: [Errno 12] Cannot allocate memory (originated from sysctl)
Synopsis
Our server has been experiencing “graph gap” since June 4th, 2021. We do not use distributed polling at this time. The server has 32G of RAM, 14 vCPUs, and plenty of resources to spare.
When the service crashes, the logs are still populated with information about the billing runs. We don’t have the watchdog enabled, but because the logs are still being populated, the watchdog would not trigger a restart. The scheduled maintenance does run and restart the service. However, this could leave up to 24 hours of “graph gap”.
We have yet to find any correlation to the service crashing, it appears to be random. We just turned on verbose logging with timestamps, so we don’t have enough data to try and correlate the event with a particular host, yet.
Questions
- Has anyone else experienced this? If so, how did you resolve it?
- How can we configure the watchdog to restart the Dispatch Service when the logs are still being populated?