Erratic Polling Times (Recently)

<%- if @topic_view.topic.tags.present? %>
<%= t 'js.tagging.tags' %>: <%- @topic_view.topic.tags.each do |t| %> <%= t %> <%- end %>
<% end %>

Hi,

I know some other items can impact this, but in case others have seen this - an interesting change in polling times recently (plot below). And the client is running the same firmware that has been in it for a VERY long time (i.e. year or two, no updates at all).

image

Thoughts?

Thanks!

@arrmo just one device, or all of them?

Sorry for the delay! Been on travel … :-(.

Happened on several devices, but not all. Of course LibreNMS is being updated (daily), and underlying Linux machine every so often. So it can be either - any thoughts how to debug?

Thanks!

BTW, I was curious, so checked the longer term history - not sure what changed, but something did … :smile:

image
image

I admit, just not sure where to look to help debug this one.

You need to dig in more, is it a specific kind of devices giving you issues?

My production polling times are relatively stable.

Yes, it seems to be a specific OS. The trouble is that it’s not very consistent, so hard to capture of course.

I can run “manual” queries in a loop. Is there a specific command to try out to get more detailed debug information from the poller?

Thanks!

If you want to get crazy you can record debug info for all polls to log files. These will include times for modules and queries. Just add -d to the wrapper in your cron.

OK, seems I did it right (added -d) … :smile:

I was able to capture a case where the response was very slow (~ 1 min). It seems to be related to the ports module (i.e. I limited to -m ports). Is there a good place to store the log (i.e. a LibreNMS preferred Pastebin?)?

Thanks!

No preferred one, but there is https://p.libren.ms

So it definitely sounds like the device acting wonky.

Yes, very odd. I worked through it (I think, fingers crossed yet … ;-)). I did a few things,

  1. Pulled LibreNMS code from before the shift in performance. Made no difference
  2. Manually executed the commands that were being sent => found that snmpbulkwalk was causing the issue, so,
  3. Temporarily disabled bulk walk for this OK - seems to work (so far)! Odd that something triggered this (Linux change?) … as the OS on the box hasn’t changed in a VERY long time. But let’s see. If this sticks, I’ll submit a PR for it.

BTW, it would be handy in the debug output to show (in text) the actual (full) snmp command that is executed - just to make sure when manually trying things the command is exactly right. Just a thought.

And - the latest plot,