Cannot Graph over 4Gbps Traffic

Hi All!,

Currently we monitor around 1500 active devices with distributed schema, running almost 1 year flawlessly!
here our system diagram:

image

  • Storage Server running rrdcached and memcached
  • DB Server running mariadb version 10.1.44 with galera cluster
  • Load Balancer using HA Proxy (single write, multi read setup)

our distributed poller run healthy

here one of poller validate:


image

Problem start occured since 05 February 2020, several port with traffic over 4GB is not graphing. especially Cisco ASR9K Devices.
here the example:



I’ve tried to delete the RRD file and it’s graphing again over 4GB, but the history is lost (of course :smile:)

Please Advice.
Thanks!

Have you looked here
https://docs.librenms.org/Support/FAQ/

Look at RRD tune and spikes

Hi Kevin,

Thanks for your reply.
RRDTool is enabled globally in our librenms

I’ve tried the CLI also
image

waiting for 10 minutes and still same :sob:

please advice

Hi,

Found some interesting case,
I changed the device folder and RRD files to 777 and graph start to graphing.
image

Yes I know its dangerous. maybe any bug in librenms?
Please Advice.

I suspect its to do with counter32 overflowing.

See my fix here:

And for more debug info / how I found this issue

Make sure Librenms is set to poll your device with SNMP v2c (or v3) - not SNMPv1. SNMP v1 is limited to 32bit counters only and will rollover at 4Gb.

1 Like

Hi @Satrio_Adi
Any issue with permissions on the RRDs according to validate.php ? Any change on the shared volume of the RRDs between the pollers and the WebServer ? Looks really like an issue on the storage part, we haven’t had any report of RRD issues recently.
Bye

Hi all, sorry for very late reply.

It solved after chmod 777 all RRD and run tune_port for all ports. after all change back all RRD permission.