High CPU usage


#1

Trying to fix an issue where some (4) HPE switches are reporting High CPU usage (>80%), when I verify with show system info on switch, the CPU utilization is <10%.
And when i use snmpget from the same librenms server with same OID, I get <10.

From debug:
Attempting to initialize OS: procurve

OS initilized as Generic

SQL[SELECT * FROM processors WHERE device_id=? [90] 0.74ms]

SNMP[’/usr/bin/snmpget’ ‘-v2c’ ‘-c’ 'COMMUNITY ‘-OUQn’ ‘-M’ ‘/opt/librenms/mibs:/opt/librenms/mibs/hp’ ‘-t’ ‘15’ ‘-r’ ‘3’ ‘udp:HOSTNAME:161’ ‘.1.3.6.1.4.1.11.2.14.11.5.1.9.6.1.0’]

.... = 95

array (
‘.1.3.6.1.4.1.11.2.14.11.5.1.9.6.1.0’ => ‘95’,
)

95%

From server:
[email protected]:/opt/librenms# snmpget -v 2c -c public-OUQn -t 15 -r 3 HOSTNAME .1.3.6.1.4.1.11.2.14.11.5.1.9.6.1.0

.1.3.6.1.4.1.11.2.14.11.5.1.9.6.1.0 = 0

./validate.php

====================================

Component Version
LibreNMS 1.47-23-g05458c006
DB Schema 279
PHP 7.0.33-0+deb9u1
MySQL 10.1.26-MariaDB-0+deb9u1
RRDTool 1.6.0
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.8.0
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[WARN] Your install is over 24 hours out of date, last update: Wed, 09 Jan 2019 12:29:22 +0000
[FIX]:
Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors.


#2

thats not supposed to happen?


#3

Hi,

A reason might be the polling itself. Depending on how the switch calculates the CPU usage (instant or averaged) the high amount of snmp requests is in fact creating a peak at each and every poll. So the librenms graph will show 80% flat whereas the real graph should be 80% peak at each poll and 10% average the rest of the time.
Unfortunately, there is no easy way to avoid this bias. And the peak load is a real peak of CPU usage so it is stil interesting to know it happened …

PipoCanaja


#4

How would I go about fixing that?


#5

I check other switches that are same type without this issue, same “OS initilized as Generic” was in the output.


#6

I have around 30 switches of the same type with CPU at ~10%, only 4 have this “high CPU” issue.


#7

Are they the same model. May be they have a less powerfull CPU and get more stress from the SNMP polls ? May be they have more ports and get more stress from the SNMP polls … etc etc.

I have this issue on some devices, and even on fairly expensives Cisco chassis … It really depends how the SNMP replies are prioritized in the OS of the device, and how the CPU usage is calculated.


#8

Same model, same port count, same OS version, etc.
From reviewing the historic graphs for CPU usage, it seems that it happen after upgrading from 1.3x to 1.4x a couple months back.


#9

Interesting. Same discovery and poller modules loaded etc etc?


#10

Not sure which version was on before, but I have only one server that was upgraded a couple months back. So all the discovery and poller modules will be the same.


#11

I mean on the devices, you can enable/disable modules. Do they all have the same modules activated ?
Sometimes the device has its SNMP code crashed in some way. If you have an opportunity to reload one of the 4 culprits, you can also try that …

That’s the ideas I have so far.


#12

Any Idea what this is set as for default? and if I can manually run the command to see each or this result.


#13

In /opt/librenms (or wherever you installed LibreNMS):
./discovery.php -h xxx
(with xxx being the device id)
You can add “-v” for verbosity and “-d” for debug.

Default values are visible from the GUI (in the “module” list of the device)