Interface graph spikes after periods of not receiving SNMP data

Continuing the discussion from Very spiky graph on one host:

I also see such spikes, destroying port utilisation graphs. As I never saw such spikes on other NMS, I suggest using that approach:

  • Validate in/out octet values before storing them (cannot be not above the interface speed)
  • handle “getting re-alive after outage” situations by:

For those who want to recover asap and are fine with some incorrectness:

  • spread the newly read counter octet value evenly across all the last polling intervals (x) were no data was received and use only that 1/x share of octets for the current interval as an approximation.

or, for those who need correct graphing or otherwise nothing:

  • keep the interface or whole device as unreachable for one additional polling period, store the new octet counter value but do not graph it. Use that updated counter value to calculate the correct Bytes from next polling period onward.

BR

  1. LibreNMS already can cap based on ifspeed.
  2. Rrd can not write past periods of time. How do you tell the difference between skipped intervals and intervals where the device/port was inactive?
  3. Remember how counters work. The value is not a rate, but the total transfered bytes. (For ifInOctets)
  4. The “remove spikes” script is not written for LibreNMS originally. So obviously others deal with this issue.

The problem is mainly related to how rrd works. I will try to explain later. If anyone knows if mrtg or others that use rrd have solved this, please chime in.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.