Rrd having gaps in data

So I have rrd stored on a remove server. It was working fine untill about 3 weeks ago and now im getting gaps in the graps. Any help would be great

You’re really not giving us much to go with here. Output of validate? Have you check any log files including the LibreNMS one?

./validate.php

Component Version
LibreNMS 23.11.0-12-g3be233af5 (2023-11-29T17:18:26-05:00)
DB Schema 2023_11_21_172239_increase_vminfo.vmwvmguestos_column_length (274)
PHP 8.1.2-1ubuntu2.14
Python 3.10.12
Database MariaDB 10.6.12-MariaDB-0ubuntu0.22.04.1
RRDTool 1.7.2
SNMP 5.9.1
===========================================

[OK] Composer Version: 2.6.5
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQl and PHP time match
[OK] Distributed Polling setting is enabled globally
[OK] Connected to rrdcached
[OK] Active pollers found
[FAIL] Some dispatcher nodes have not checked in recently
Inactive Nodes:
csp-lnms-srv01********* (this node)
[OK] Locks are functional
[OK] Python wrapper cron entry is not present
[OK] Redis is functional
[OK] rrdtool version ok
[OK] Connected to rrdcached

i know about the dispatcher… i dont have the server doing polling… i have other servers for that

I’m more of a consumer of the setup in question, but can add some more detail on the setup:
We have a number of VMs dedicated to various tasks for librenms:
MariaDB & rrd storage: 8core, 16G
nginx librenms web interface: 4core, 8G
pollers: 2x (16core, 8G) [2 more pollers were added yesterday with no change in the graphing issue]

We have roughly 1200 devices / 200,000 ports.

Pollers have 52 workers each and prior to adding the additional pollers yesterday, only used about 20% of their available worker seconds. With the additional pollers, they’re only using about 10% of the available worker seconds.

Poller Cluster Health shows a relatively even split of devices actioned per poller, no devices pending, yet we have a large number of what we’re calling “zebra graphs” [graphs like the one above, or worse, in which there are missing samples in the RRDs]. Some of our graphs look fine. Others are consistently missing data. This suggests to me that in each poller run, some devices are not getting data into their rrds…but poller health page says everything is fine.

Our poller librenms.log files are full of
ErrorException: mkdir(): Permission denied in /opt/librenms/LibreNMS/Poller.php:247

This is from the mkdir in initRrdDirectory failing…and it seems to me this is a bug, as the poller shouldn’t be bothering trying to mkdir($host_rrd) to initialize an rrd directory if rrdcached is in use.

Try upping your mysql connections? I had gaps in my graphs too and by upping to 1000 connections it fixed it.

[mariadb]
max_connections=1000

added to:
/etc/my.cnf.d/server.cnf

You can also check mysql connections: show status like ‘%onn%’;exit

and look for aborted connections.

so these are in RRD not sql and our connection max is at 5k right now.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.