Changed pooling time and now I have gaps in the graphics

Hello,

I’m in the process of migrate an instance of librenms to a new server and I’m testing the polling of some devices to see how they will work out.

I am going to use three servers, two of them will have a polling time of 5 minutes, and the last one will have a polling time of ten minutes, it will do the polling on some slow devices, with a lot of ports.

On this server i was polling a Palo Alto Box (PA-3050), without polling the ports and with a polling time of 5 minutes, it was working ok, I changed the polling time to 10 minutes and now I have gaps in the graphics, I already changed the rrd step in the config.php and ran the rrdstep.php script, but I still have gaps.

What can I do to do not have those gaps? The device is being polled in less than five minutes.

It’s possible the change from the script hasn’t taken effect. If you can, remove one of the rrd files you care less about and let it be re-created.

No change, it still does not show some pollings and give me gaps.

In the log file I have the information that the device was polled.

/opt/librenms/discovery.php new 2017-12-04 10:00:02 - 0 devices discovered in 0.232 secs
/opt/librenms/poller.php 14 2017-12-04 10:01:11 - 1 devices polled in 69.55 secs
/opt/librenms/discovery.php new 2017-12-04 10:10:01 - 0 devices discovered in 0.001 secs
/opt/librenms/poller.php 14 2017-12-04 10:11:37 - 1 devices polled in 95.84 secs
/opt/librenms/discovery.php new 2017-12-04 10:20:01 - 0 devices discovered in 0.001 secs
/opt/librenms/poller.php 14 2017-12-04 10:21:35 - 1 devices polled in 93.41 secs
/opt/librenms/discovery.php new 2017-12-04 10:30:01 - 0 devices discovered in 0.001 secs
/opt/librenms/poller.php 14 2017-12-04 10:31:22 - 1 devices polled in 81.35 secs
/opt/librenms/discovery.php new 2017-12-04 10:40:02 - 0 devices discovered in 0.009 secs
/opt/librenms/poller.php 14 2017-12-04 10:41:31 - 1 devices polled in 89.22 secs
/opt/librenms/discovery.php new 2017-12-04 10:50:01 - 0 devices discovered in 0.001 secs
/opt/librenms/poller.php 14 2017-12-04 10:51:49 - 1 devices polled in 108.5 secs
/opt/librenms/discovery.php new 2017-12-04 11:00:01 - 0 devices discovered in 0.001 secs
/opt/librenms/poller.php 14 2017-12-04 11:01:54 - 1 devices polled in 112.2 secs
/opt/librenms/discovery.php new 2017-12-04 11:10:01 - 0 devices discovered in 0.012 secs
/opt/librenms/poller.php 14 2017-12-04 11:12:04 - 1 devices polled in 122.7 secs

But some of those polling does not show in the rrd, for example, the polled on 10:41 and 10:51, does not appear in the graphics, which gives me the gap.

On my config.php for this poller I have the following configuration:

$config['rrd']['step'] = 600;

The librenms in cron.d is configured as below:

*/10  *    * * *   librenms    /opt/librenms/discovery.php -h new >> /dev/null 2>&1
*/10  *    * * *   librenms    /opt/librenms/cronic /opt/librenms/poller-wrapper.py 16

The rrdinfo for the files show me that the step is set to 600.


filename = "processor-hr-1.rrd"
rrd_version = "0003"
step = 600
last_update = 1512393023
header_size = 3080

What else do I need to check to fix those gaps?

Are you sure polling is completing within 10 minutes in the first place?

You should also set the heartbeat value as well - have you done that, If so I’m guessing it’s set to 1200 seconds which in part is the length of time between those gaps. I’ve never tested longer time between polls as most people want to go the other way.

Yes, the device is being polled within the 10 minutes, that is what is show in the log file.

The same device is polled on the older server we have here and it has no gaps, even when polling the ports and taking longer to finish the polling.

When the polling time was 5 minutes I had no gaps, only when I changed to 10 minutes.

The rrdinfo shows that the heatbeat is set to 600, do I need to change it to a higher value?

filename = "processor-hr-1.rrd"
rrd_version = "0003"
step = 600
last_update = 1512400204
header_size = 3080
ds[usage].index = 0
ds[usage].type = "GAUGE"
ds[usage].minimal_heartbeat = 600

You should double it from step value.

1 Like

Thank you,

Changing the heartbeat solved the problem.

$config['rrd']['step'] = 600;
$config['rrd']['heartbeat'] = 1200;

2 Likes

TL;DR - rrdtool was doing exactly what it was told and we were graphing the space between the polling and heartbeat (dead air).

This solution also fixes NaN or empty graphs. Had the issue where a blind adjustment from the default 300 seconds caused empty graphs, YET rrdtool was running flawlessly ( inside of ~/rrd/ running the rrdtool command resulted in OK with the output values). The timing and heartbeat, in our case, were exactly off to produce gaps that were the graph. All that said, the thing that kept us digging is that we were using InfluxDB → Graphana and our dashboards were all still fully functional. We updated the values to be double on the heartbeat and then issued the ./scripts/rrdstep.php -h all command

Thanks again for posting this solution!