Librenms Changes port-id values for Interfaces

rhinoau · 4 October 2022 14:08

I used to get this a lot on remote sites that have sporadic packet loss interrupting long discovery runs. It has improved a lot over the last few months, not sure if there were any improvements to the discovery code, or multiple of my sites improved - however I also changed my SNMP max repeaters to different values for each site to find the optimal polling speed which I think reduced the likelihood of it happening significantly (Debugging graph spikes from high latency links - #6 by rhinoau). That whole post has a lot of existential meanderings I went through which may help.

Pretty sure it will happen if you lose connection to a host that is being actively discovered - it might be parsing sensors and you’ll see an SFP or something go nuts until the next discovery, same with an interface changing ID as it potentially gets removed on a failure, then re-added on the next discovery run.

I have a love/hate relationship with weathermaps, however I lean on them so much this was causing me such an issue many months ago that I ended up writing a really ugly shell script which parses node and link titles of a particular format, and then replaces the relevant parts with with the matching ID looked up from the API.

In my case it’s quite particular to some non-standard ways I use nodes (I use the NOTES field to mark them as I have several ON/OFF node scales to display things), but the basics are I name things like this:

LINK FROM_HOST:IF=TO_NODE
  - or -
NODE xxxx
      NOTES FROM_HOST:IF

Then I have a bunch of really lazy sed calls which process the text after them depending on what I’m doing.

It really is super lazy, if I make it prime-time I will contribute it to the project, but the basics are:

# awesome secrets management, documentation, sane config variable use and other setup stuff, you'll just have to trust me :)

for line in $(egrep -h '^(^LINK.+=.+$|\s+NOTES.+=$)' ${TARGETPATH}*.conf | cut -f2 -d' ' | sort | uniq)
do

        ... escape/URI encode and expand stuff, like GigabitEthernet to Gi / vendor dependent in parts etc.

        # obtain port ID from LibreNMS API
        PID=`curl -s --insecure -H "X-Auth-Token: $LNMSAPI" "${APIPATH}${HOST}/ports/${PORTESC}" | jq .port.port_id | tr -d '\n'`

        # lazy SED calls:
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\/id=[[:digit:]]\+\//\/id='"${PID}"'\//g}' ${TARGETPATH}*.conf
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\&id=[[:digit:]]\+\&/\&id='"${PID}"'\&/g}' ${TARGETPATH}*.conf
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+\.rrd/port-id'"${PID}"'\.rrd/g}' ${TARGETPATH}*.conf

                # to handle NODE entries HOST:PORT=
                sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/id=[[:digit:]]\+/id='"${PID}"'/g}' ${TARGETPATH}*.conf
                sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+/port-id'"${PID}"'/g}' ${TARGETPATH}*.conf

If I have any aggregate port graphs defined in dashboard, they get affected too, but I’ve not yet put any effort in to automating them.

If you’re interested, PM me and I can share more of the script - I’ll need to sanitise some parts of it.