So I’ve got a new ISP, one that’s going to bill me based on 95th percentile use. This means I have an interest in accurately calculating said value.
It appears that the LibreNMS graphs are pulling the 95th percentile by calculating the value at time-of-graphing using the RRDs as the source data. This produces values which are wrong.
The problems are:
- LibreNMS is producing two 95th percentile values, one for in, one for out. My ISP calculates 95th based on total (in+out) for a given five minute interval. Due to the peaks of the inputs and outputs being in different timeslots, you can’t just add the two values that LibreNMS is providing.
- RRD values are aggregated in the database to save space. In all but the shortest of time windows (which have to end NOW) this means that the peak values are averaged down, which is a problem because it is the peak values we want to inspect.
- in any case, rrdtool takes your start and end times, pulls data from the RRD and picks the points it is going to graph based on your graph geometry, and then calculates the 95th percentile of that. Picking the graph points is an aggregation that is happening before 95th calculation, which again is hiding the peak values we want to inspect.
Given this, 95th values probably shouldn’t be displayed or calculated unless the interval data saved is going to be such that aggregation doesn’t happen.
More details on my blog below.