Looking for testers for port polling update

Thanks :slight_smile:

Would one of you be able to get me the following info:

Polling time before the change
Polling time after the change.
The ā€˜time’ value of this command: time snmpbulkwalk -OUneb -m IF-MIB -M /opt/librenms/mibs -v2c -c COMMUNITY HOSTNAME ifOperStatus

Here are two gists (I wasn’t sure exactly which time values would be useful to you, so I used ./poller.php -h HOSTNAME | grep " seconds$\\|s\]$" to get all of the timing-related output):

A fresh standalone Juniper EX3300-48P (nothing active, nothing disabled)

A production Juniper EX3300-48P virtual-chassis with 8 member switches (many active/inactive access ports, many deactivated ports)

I hope that helps.

Actually I was just after the output at the end but that is even better :slight_smile:

Ok so I’ve updated the code to now ignore ports which are currently down and were down on the last poll, if anyone wants to try that then just git pull on my branch

Hang fire, 1 bug

All updated again. I can’t compare before the check for ports down but I’ve seen a few seconds of a couple of devices which have about 1300 ports, non-deleted but a lot of them are down.

Hi Laf,
any intentions to make this also work with Python 3?
I already ā€œhackedā€ the existing script to make it work and I am more then happy to hand it over into professional hands for wider consumption. Just let me know.

Wrong topic?

Ok, after running this for a while, I think there may be a need (for me) to better understand what this is doing. Some observations and possible issues:

  1. ports that are disabled in the switch configuration don’t show up in the Overview and Ports tabs. This isn’t really a problem for me, just and observation that may be confusing.
  2. ports that are disabled for a time, and then are reactivated aren’t polled until after the switch gets re-discovered.
  3. some weirdness with Juniper where an active logical interface, say ge-0/0/46.0, is polled, but the parent physical interface, ge-0/0/46, is marked as deleted and is not polled… Not sure how this gets messed up, nor how to fix.

I’d have to spend some more time learning what this code does in order to even begin to recommend specific changes, but here are my thoughts:

  1. What are the criteria for selecting ports to not poll? I’m guessing checking whether ifAdminStatus == down. Is that correct?
  2. To flag the ports that we don’t want to waste time polling, I’m guessing the mechanism is to set deleted = 1. Is that correct?
  3. Regardless of the flag, I would say that each polling cycle we should check the ifAdminStatus of each port (poll the switch for this value, not looking at the DB)
    • if (ifAdminStatus != down and deleted != 0) then set deleted = 0
    • if (ifAdminStatus == down and ifAdminStatus_prev == down and deleted != 1) then set deleted = 1
  4. After step 2 above, do the detailed interface poll on the ports where deleted != 1

From the code, the only way a port will be deleted is if it doesn’t pass the is_port_valid() check which would happen without this change in code from what I can see.

Ideally I’d need to see the output of ./poller.php -h HOSTNAME -r -f -d -m ports

Ok, I hacked together some code that seems to do what I think it should be doing. Replace your includes/ports.inc.php file with the contents of this gist: https://gist.github.com/twilley/117043c1f2b1112469bf8e9325135d79 a diff can be seen here: https://gist.github.com/twilley/8af93d10a0c924691ff3063f4d8bf285

my changes are:

  1. add a block of code where we check to see the admin and oper status of the deleted ports, if they are up/up, we update the database and set deleted = 0 for these ports. this happens before we loop through the nondeleted ports
  2. instead of checking ifOperStatus, ifOperStatus_prev, ifAdminStatus, and ifAdminStatus_prev to see if they’re ā€œdownā€, check to see if they’re ā€œnot upā€ - this is important at least for ifOperStatus, as the value can be up, down, or lowerLayerDown (maybe other values too?)
  3. later on, there is some code that would only update the ifOperStatus_prev and ifAdminStatus_prev values in the database if the value is null. I was finding that the values would sometimes not be updated at all, so removed the conditional.
  4. I added/modified a some echo statements that could be cut out before merging.

Hi Folks, I think it would be beneficial to have some tweaking especially if we’re heading for 1 minute fast polling. I guess it would make sense to have something like enable modules per device group or bake more intelligence into the polling. Why even try polling for OTV on an ASA or a Linux server or IPMI on a virtual machine?
Also I just wonder if it’s feasible to dynamically group devices or modules by OID response time. Like while I would like to poll IPSLA every minute don’t do that for my full BGP table.

Further enhancement to this feature if anyone wants to test:

https://github.com/librenms/librenms/pull/6037