Graph spikes with unrealistic value

Is there a graph limit that can be enabled that would allow the graph to be limited to displaying traffic up to the limit specified?

I have several WAN links that I monitor and most of those WAN links are asymmetrical and limited to a specific speed, for example, limited to 100 mbps. I clicked on one of the WAN links to see historical data and I see that last month librenms graphed/captured one of my WAN links hitting 18 gbps. That one spike makes my graphs unreadable unless there is a setting I don’t know about.

Thanks.

Please go through below FAQ, if it matches the issue

1 Like

Thank you, that helped, I changed the setting on the device. I don’t understand this part

Now when a port interface speed changes (this can happen because of a physical change or just because the device has misreported) the max value is set. If you don’t want to wait until a port speed changes then you can run the included script:

./scripts/tune_port.php -h <hostname> -p <ifName>

Is this for traffic going forward or is there a way to fix the large spike that happened in the past, which throws off my graph when I click on 1 or 2 year history?

Thanks.

I made this change to a device a couple of hours ago and the device had a 25G spike on the graph about 20 minutes ago. How is that possible if the setting was toggled on just two hours prior?

I went to the device, edit, misc, and toggled the ‘Enable RRD Tune for all ports?’ to ON.

Thanks.

in my setup, i have enabled in global config.
never tried individual ports/script
lets see if any one else can help here with script

Had to google for a while, but I saw that this script needs to be used

./scripts/removespikes.php --rrdfile=/opt/librenms/rrd/device-hostname/port-idxxxx.rrd

xxxx is the port ID where the spike is located and this script may have to be ran several times to eliminate the spike.

For example, I had another device with a spike at 160G, I ran the script and refreshed the web GUI, the spike only dropped to 120G, still unrealistic for this device/port. I ran it again, 80G, ran it again, 40G, ran it again, 10G, ran it again, 5G, ran it again, 30M, that was a realistic value for this device/port so I did not run the script again.

If someone needs assistance finding the port ID, the easiest way that I was able to get the information I needed was to open a graph and click on the ‘Show RRD Command’ button and in the output below there are many locations where you will see:

/opt/librenms/rrd/hostname/port-idxxxx.rrd

That string is what I used when running the removespikes.php script.

1 Like

Thanks for the details.

If it helps, I have some sporadic sites that I have to do this on so frequently I wrapped it via this lazy script to do all the interfaces on a device, I can only get it to work reliably using the 1.1 script - been too long and forget where I found 1.1, but the subtle changes helped me and I stuck with it - found a matching copy just now here https://github.com/Carlotronics/observium-mirror/blob/2d87e3ab0eab8f8f1bc924b6a138152109b2a2c3/scripts/removespikes.php

#!/bin/bash

DEVICE=$1
RRDPATH=/opt/librenms/rrd/
TARGETDIR=${RRDPATH}${DEVICE}

#METHOD=variance
METHOD=stddev

#BACKUP=
BACKUP="--backup"

DRYRUN=""
#DRYRUN="--dryrun"

if [[ -z $DEVICE ]];
then
        echo "Please enter device name"
        exit 1
fi

if [ -d ${TARGETDIR} ];
then

        for i in ${TARGETDIR}/port-id*[0-9].rrd;
        do
                 echo ">> $i - $METHOD"
                 php '/opt/librenms/scripts/removespikes_v1.1.php' ${BACKUP} -R=$i -M=${METHOD} ${DRYRUN};
        done;
else
        echo "Can't find device directory: ${TARGETDIR}"
        exit 1
fi

For really nasty ones it can take a few stabs, it’s better to get them early before they roll down in to historical averages in the RRD, and depending on the profile of the spikes switching between variance and stddev helps.

I’m also experimenting with using a stddev threshdold depending on the normal issues I see on a site, then it will run a fix on it if it’s out of bounds, with the intention to detect the spikes programmatically and alert me in the future so I don’t miss them - again, a very lazy script:

                 php '/opt/librenms/scripts/removespikes_v1.1.php' ${BACKUP} -R=$i -M=${METHOD} --stddev=${LIMIT} ${DRYRUN} | egrep '^NOTE: NO Standard' > /dev/null
                 if [ $? -eq 1 ]; then
                        echo " VARIANCE FOUND"
                        if [ $FIX ]; then
                                echo will fix
                                php '/opt/librenms/scripts/removespikes_v1.1.php' ${BACKUP} -R=$i -M=${METHOD} --stddev=${LIMIT}
                        else
                                echo will NOT fix
                        fi
                 else
                        echo ""
                 fi

I’m a bit confused with the FAQ form librenms, I thought that following those steps, either globally or per device, would stop the spikes from happening from the moment the setting was enabled. I had one device that didn’t have any spike and as a way to avoid spikes I enabled the setting, per the FAQ, and a short time later there was a traffic spike.

Not sure why that happened. Anyway, the removespikes.php is good enough for me, now that I know it needs to be done. I don’t have an issue with spikes, I only have an issue with spikes when I need to see the graph, I get to the date/time and there is a 100G spike that throws everything off. Prior to knowing about script, I would continue to change the date/time to just before the spike or just after the spike to see realistic data usage. Now I just save that script command and I’ve already documented all the WAN links .rrd file locations and will remove spikes if I need to look at a graph and there is a spike in the view.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.