Librenms Changes port-id values for Interfaces

I recently noticed that some of my Weathermaps stop graphing, and I get the “No Data file port-id.rrd” error.

Looking at the ‘supposed to be’ port-id.rrd file - and it was renumbered?

For instance: Port-id 150452 became 9951. Most of the other interfaces are still the same; Anyone else have similar experiences?

I noticed a couple of other threads referring to the same issue, but they are all just closed.

Is Librenms supposed to renumber the port-id’s under certain circumstance? And what would trigger this behavior?

Interestingly enough, it looks like the data stitching from the old rrd file, to the new rrd file is a success.

ls -l
total 38328
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 ping-perf.rrd
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 poller-perf-core.rrd
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 poller-perf-os.rrd
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 poller-perf-ports.rrd
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 poller-perf.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id119593.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150446.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150447.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150448.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150449.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150451.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id150453.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id186783.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id186786.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id206552.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id206553.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id9949.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id9951.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id9956.rrd
-rw-rw-r–. 1 librenms librenms 2543768 Oct 4 11:57 port-id9979.rrd
-rw-rw-r–. 1 librenms librenms 171272 Oct 4 11:57 uptime.rrd

./validate.php

Component Version
LibreNMS 22.9.0-34-ge4fdbbd (2022-10-03T19:55:49+02:00)
DB Schema 2021_02_09_122930_migrate_to_utf8mb4 (246)
PHP 8.1.11
Python 3.6.8
Database MariaDB 10.3.35-MariaDB
RRDTool 1.7.1
SNMP 5.7.2

===========================================

[OK] Composer Version: 2.4.2
[OK] Dependencies up-to-date.

I used to get this a lot on remote sites that have sporadic packet loss interrupting long discovery runs. It has improved a lot over the last few months, not sure if there were any improvements to the discovery code, or multiple of my sites improved - however I also changed my SNMP max repeaters to different values for each site to find the optimal polling speed which I think reduced the likelihood of it happening significantly (Debugging graph spikes from high latency links - #6 by rhinoau). That whole post has a lot of existential meanderings I went through which may help.

Pretty sure it will happen if you lose connection to a host that is being actively discovered - it might be parsing sensors and you’ll see an SFP or something go nuts until the next discovery, same with an interface changing ID as it potentially gets removed on a failure, then re-added on the next discovery run.

I have a love/hate relationship with weathermaps, however I lean on them so much this was causing me such an issue many months ago that I ended up writing a really ugly shell script which parses node and link titles of a particular format, and then replaces the relevant parts with with the matching ID looked up from the API.

In my case it’s quite particular to some non-standard ways I use nodes (I use the NOTES field to mark them as I have several ON/OFF node scales to display things), but the basics are I name things like this:

LINK FROM_HOST:IF=TO_NODE
  - or -
NODE xxxx
      NOTES FROM_HOST:IF

Then I have a bunch of really lazy sed calls which process the text after them depending on what I’m doing.

It really is super lazy, if I make it prime-time I will contribute it to the project, but the basics are:

# awesome secrets management, documentation, sane config variable use and other setup stuff, you'll just have to trust me :)

for line in $(egrep -h '^(^LINK.+=.+$|\s+NOTES.+=$)' ${TARGETPATH}*.conf | cut -f2 -d' ' | sort | uniq)
do

        ... escape/URI encode and expand stuff, like GigabitEthernet to Gi / vendor dependent in parts etc.

        # obtain port ID from LibreNMS API
        PID=`curl -s --insecure -H "X-Auth-Token: $LNMSAPI" "${APIPATH}${HOST}/ports/${PORTESC}" | jq .port.port_id | tr -d '\n'`

        # lazy SED calls:
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\/id=[[:digit:]]\+\//\/id='"${PID}"'\//g}' ${TARGETPATH}*.conf
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\&id=[[:digit:]]\+\&/\&id='"${PID}"'\&/g}' ${TARGETPATH}*.conf
                sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+\.rrd/port-id'"${PID}"'\.rrd/g}' ${TARGETPATH}*.conf

                # to handle NODE entries HOST:PORT=
                sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/id=[[:digit:]]\+/id='"${PID}"'/g}' ${TARGETPATH}*.conf
                sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+/port-id'"${PID}"'/g}' ${TARGETPATH}*.conf

If I have any aggregate port graphs defined in dashboard, they get affected too, but I’ve not yet put any effort in to automating them.

If you’re interested, PM me and I can share more of the script - I’ll need to sanitise some parts of it.

Thanks, that makes sense. I think :slight_smile:

I like your sed solution there, might take it a bit further - perhaps build a db with interface to port-id’s, and then do a diff run once every hour to check what changed, and if there was any interface activity during that time, and if there was, go and look for the old port-id, and update it in the wm.conf file.

It is a bit concerning that you have to implement a workaround for something which should not change…in theory.