I used to get this a lot on remote sites that have sporadic packet loss interrupting long discovery runs. It has improved a lot over the last few months, not sure if there were any improvements to the discovery code, or multiple of my sites improved - however I also changed my SNMP max repeaters to different values for each site to find the optimal polling speed which I think reduced the likelihood of it happening significantly (Debugging graph spikes from high latency links - #6 by rhinoau). That whole post has a lot of existential meanderings I went through which may help.
Pretty sure it will happen if you lose connection to a host that is being actively discovered - it might be parsing sensors and you’ll see an SFP or something go nuts until the next discovery, same with an interface changing ID as it potentially gets removed on a failure, then re-added on the next discovery run.
I have a love/hate relationship with weathermaps, however I lean on them so much this was causing me such an issue many months ago that I ended up writing a really ugly shell script which parses node and link titles of a particular format, and then replaces the relevant parts with with the matching ID looked up from the API.
In my case it’s quite particular to some non-standard ways I use nodes (I use the NOTES field to mark them as I have several ON/OFF node scales to display things), but the basics are I name things like this:
LINK FROM_HOST:IF=TO_NODE
- or -
NODE xxxx
NOTES FROM_HOST:IF
Then I have a bunch of really lazy sed calls which process the text after them depending on what I’m doing.
It really is super lazy, if I make it prime-time I will contribute it to the project, but the basics are:
# awesome secrets management, documentation, sane config variable use and other setup stuff, you'll just have to trust me :)
for line in $(egrep -h '^(^LINK.+=.+$|\s+NOTES.+=$)' ${TARGETPATH}*.conf | cut -f2 -d' ' | sort | uniq)
do
... escape/URI encode and expand stuff, like GigabitEthernet to Gi / vendor dependent in parts etc.
# obtain port ID from LibreNMS API
PID=`curl -s --insecure -H "X-Auth-Token: $LNMSAPI" "${APIPATH}${HOST}/ports/${PORTESC}" | jq .port.port_id | tr -d '\n'`
# lazy SED calls:
sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\/id=[[:digit:]]\+\//\/id='"${PID}"'\//g}' ${TARGETPATH}*.conf
sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/\&id=[[:digit:]]\+\&/\&id='"${PID}"'\&/g}' ${TARGETPATH}*.conf
sed -i '/^LINK '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+\.rrd/port-id'"${PID}"'\.rrd/g}' ${TARGETPATH}*.conf
# to handle NODE entries HOST:PORT=
sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/id=[[:digit:]]\+/id='"${PID}"'/g}' ${TARGETPATH}*.conf
sed -i '/NOTES '"${ESCAPE}"'/,/^$/{s/port-id[[:digit:]]\+/port-id'"${PID}"'/g}' ${TARGETPATH}*.conf
If I have any aggregate port graphs defined in dashboard, they get affected too, but I’ve not yet put any effort in to automating them.
If you’re interested, PM me and I can share more of the script - I’ll need to sanitise some parts of it.