1 minute polling - rrdstep vs rrd heartbeat timeout. Which one affects graph gaps?

I’m using 1 minute polling which mostly works great however I do get gaps in my graphs.

Per this page:
https://docs.librenms.org/Support/1-Minute-Polling/ it says

Your polling MUST complete in the time you configure for the heartbeat step value. See /pollers/tab=pollers/ in your WebUI for your current value.

So which one is it - the rrdstep value (in my case 60 seconds), or rrd heartbeat value (in my case 180 seconds)?

Here’s the result from GEAR → Pollers → Pollers
Standard Pollers
Poller Name Devices Polled Total Poll Time Last Ran Actions
monitoring 25 87 Seconds 2019-04-21 18:05:29

The ‘total poll time’ varies between 85-130 seconds, mostly hovering around 95 seconds.

Thanks.

Not sure about heartbeat, as it’s in the next poll already after 60sec so i think it’s just a timeout catch. I would leave with step 60 and heartbeat 120.

For devices with gaps, Check your device poller graphs, it sounds like they are not completing within 60 seconds.

Did you convert all RRD graphs in the bottom of the doc you linked ?

Thanks for the reply.

For devices with gaps, Check your device poller graphs, it sounds like they are not completing within 60 seconds.

Yes, I believe this is what is happening, which is why I wasn’t sure if the 60 seconds polling would be extended by a grace period of the heartbeat’s 180 seconds.

Did you convert all RRD graphs in the bottom of the doc you linked ?

I created the devices with config.php already set to 60/180 so I didn’t need to convert them later.

Thanks.

If its always exceeding 60s you can check Settings -> poller -> History

If its a specific time you see gaps in the graph, check Device -> Graph -> Poller, and you will see the modules that consume the most time too… perhaps there you could optimize by turning off some polling modules for that device.

heartbeat would put a gap in the graph if exceeded that time, but upping that won’t fix the issue, see here for more about step and heartbeat https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html#STEP%2C_HEARTBEAT%2C_and_Rows_As_Durations. You would need to switch back to 5 minute polling if device cannot finish polling in 1m.

If its always exceeding 60s you can check Settings → poller → History

A few devices are routinely exceeding 60 seconds (the remote devices), while the devices on the LAN are polling fine within 20-45 seconds.

Device → Graph → Poller, and you will see the modules that consume the most time too

Good tip. Bad news is that these are the ports on the router. And there’s lots of them. Without the ports the poller is about 40 seconds. The ports push it up to double that.

Thanks for the heartbeat link, that clarifies a lot.

What I see here is that the device will not complete in <60s so I’ll need to raise the poller cycle to 5 minutes. That’s a shame because it loses accuracy, but it’s better than not having data at all.

Thank for helping.