Making Physical Memory graphs valid and worthwile

Hi. I’ve recently created various Dashboards in LibreNMS and many of them show “physical memory” usage near 100%, where the physical machine has nowhere near that type of memory usage.

I’ve reviewed various links and items where this has been discussed in relation to LibreNMS, in Github:

https://github.com/librenms/librenms/issues/3179

https://github.com/librenms/librenms/issues/5660

on the forums here:

https://community.librenms.org/t/memory-pool/8539

and so on.

I personally believe the LibreNMS memory graphs for Linux are wrong and I’ll explain why. Note, I’m not trying to start a debate here, I just want to highlight practicality and why LibreNMS should have this fixed.

Generally speaking, the purpose of a monitoring solution is to alert an admin on problems that need attention. As admins, we typically set thresholds of 80% usage, 90% usage etc and get notifications in various transports, which alert us of things which need attention.

In the case of “memory usage” in Linux, I’m getting 100% usage for many of the Linux servers when in fact they are much less. So in essence, these particular checks and graphs are useless, since they don’t give a real world indication of memory usage and / or memory free on the real Linux server.

So in essence, I don’t agree with people in Github (like @paulgear) for a server monitoring solution, since (again generally speaking) the primary purpose admins would use a server monitoring solution is not to see pretty graphs, but be notified when things need attention. The pretty graphs, IMHO, comes second to that.

We also employ other monitoring solutions (like OMDlabs / Nagios) which give the real memory usage and notify correctly, but my comments here are for LibreNMS, making this product better and useful is the reason I’m posting here.

Michael.

1 Like

It would be nice if you can contribute code to help resolve the Issue. That’s what drives librenms is people contributing time and code.

yeah would love to see a fix for this!

Hi Kevin. You’re assuming here I’m a coder, I’m not. I’m a sys admin with experience in bash, some perl, etc. I have no idea what LibreNMS is programmed in and how the internals work, how can I possibly code for it?

I’m saying you don’t have to be a programmer to help. I’m asking you to offer a solution instead of just talk. Allot of us here are not “programmers” but are passionate about Librenms and help volunteer time and code.
All I have heard so far is talk about a possible issue but no solution and an excuse of “I’m a sys admin can’t help.”

1 Like

And most importantly, @micoots, we should remember that snmp is providing LibreNMS the values. So if you don’t like the values you see, you need to ask the Kernel and SNMP to provide better value, not the LibreNMS team at all…

LibreNMS does not “compute” those…

That’s a good response to drive people out of the community, thanks.

The issue as I see it is that the SNMP values it’s picking up are the “cached” values, which aren’t the correct ones in this instance.

Again, @micoots if the “available” value is not provided by SNMP, there is nothing really we can do in an SNMP monitoring tool :slight_smile:
You can have a look here for a quick description of the OID :
http://www.debianadmin.com/linux-snmp-oids-for-cpumemory-and-disk-statistics.html
Then if you want some other value, as @Kevin_Krumm said, we are all network engineers, sysadmin, etc, and none of us, to my knowledge, are developpers of the SNMP linux implementation, nor the kernel memory implementation. We have no other choice than using the available values there.
I can understand you expect another answer, but you have to understand there is no other answer you can get from a community driven project. I am just like you, a LibreNMS user, not a dev, doing this on my free time, helping as much as I can. And I cannot rewrite the linux kernel and the snmpd server right now …

1 Like

OK thanks for the reference. I see in that link we have:

Memory Statistics

Total Swap Size: .1.3.6.1.4.1.2021.4.3.0
Available Swap Space: .1.3.6.1.4.1.2021.4.4.0
Total RAM in machine: .1.3.6.1.4.1.2021.4.5.0
Total RAM used: .1.3.6.1.4.1.2021.4.6.0
Total RAM Free: .1.3.6.1.4.1.2021.4.11.0
Total RAM Shared: .1.3.6.1.4.1.2021.4.13.0
Total RAM Buffered: .1.3.6.1.4.1.2021.4.14.0
Total Cached Memory: .1.3.6.1.4.1.2021.4.15.0

So I run an snmpwalk on that node and I see:

UCD-SNMP-MIB::memIndex.0 = INTEGER: 0
UCD-SNMP-MIB::memErrorName.0 = STRING: swap
UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 8388604 kB
UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 8207356 kB
UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 16412812 kB
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 771748 kB
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 8979104 kB
UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000 kB
UCD-SNMP-MIB::memShared.0 = INTEGER: 825640 kB
UCD-SNMP-MIB::memBuffer.0 = INTEGER: 136408 kB
UCD-SNMP-MIB::memCached.0 = INTEGER: 4724628 kB
UCD-SNMP-MIB::memSwapError.0 = INTEGER: noError(0)
UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:

So LibreNMS is picking these two values:

UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 16412812 kB
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 771748 kB

How can I tell it to pick these values instead:

UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 16412812 kB
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 8979104 kB

which would make more sense to me ie. graph the free memory instead (memTotalFree) which I could then generate an alert on if it goes down too much.

Thanks.

Michael.

According to the doc:

memTotalFree: 
The total amount of memory free or available for use on this host. This value typically covers both real memory and swap space or virtual memory.

So this cannot be used in a “physical” metric, cause it includes both physical and virtual.

The value we use now is exactly the one that provide the expected value for a physical metric.

memAvailReal:
The amount of real/physical memory currently unused or available.

So there is nothing that we can change for a “physical memory” metric.

You could indeed make your own graph that uses :

UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 16412812 kB
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 8979104 kB

The tool is open source, you can easily change the OID that is polled. But according to the doc, it makes no sense cause totalFree (physical + swap) can be superior to totalReal (physical). You could have more totalFree memory than the total of physical memory (totalReal). Most probably true just after booting the host, I suppose. I don’t know what you could conclude out of if.

1 Like

To solve this problem, I modified the file:
/opt/librenms/includes/polling/ucd-mib-inc.php and added the line
$memTotalReal = $memTotalReal + $memShared + $memBuffer; on line 142, just before the
$fields = array(
line.

You should submit a Pull Request in git hub so this change can help others.