I am currently monitoring an Oracle Linux 7.3 VM running on top of Oracle VM server. I installed snmp agent on the Linux box, everything seems to be fine except the CPU usage is always 100%. Please see below snmpd.conf in the Linux box.
group MyROGroup v2c readonly
access MyROGroup “” any noauth 0 all none none
extend .1.3.6.1.4.1.2021.7890.1 distro /usr/bin/distro
view systemview included .1.3.6.1.2.1.1
view systemview included .1.3.6.1.2.1.25.1.1
view all included .1 80
The CPU usage in the Linux box is very low if I run top.
Thanks for help.
Run ./poller.php -d -m processor -h HOSTNAME
and pastebin the output. Also look for what data the device returns from the snmp request.
LibreNMS Poller
SQL[SELECT version FROM `dbSchema` ORDER BY version DESC LIMIT 1]
SQL[SELECT version()]
===================================
Version info:
Commit SHA: 45d35efd00c56a53a1080db4beedf74ec69c14e7
Commit Date: 1534026892
DB Schema: 258
PHP: 7.2.5-1+ubuntu16.04.1+deb.sury.org+1
MySQL: 10.0.31-MariaDB-0ubuntu0.16.04.2
RRDTool: 1.5.5
SNMP: NET-SNMP 5.7.3
==================================DEBUG!
Updating os_def.cache... Done
Override poller modules: processors
Starting polling run:
SQL[SELECT * FROM `devices` WHERE `disabled` = 0 AND `hostname` = 'IP_ADDR' ORDER BY `device_id` ASC]
SQL[SELECT * FROM devices_attribs WHERE `device_id` = '162']
Hostname: IP_ADDR
Device ID: 162
OS: linux
Warning: inet_pton(): Unrecognized address in /opt/librenms/includes/polling/functions.inc.php on line 248
(unix)
[FPING] /usr/bin/fping -e -q -c 3 -p 500 -t 500 IP_ADDR
Array
(
[xmt] => 3
[rcv] => 3
[loss] => 0
[min] => 0.36
[max] => 0.42
[avg] => 0.38
[exitcode] => 0
)
SQL[INSERT INTO `device_perf` (`xmt`,`rcv`,`loss`,`min`,`max`,`avg`,`device_id`,`timestamp`) VALUES ('3','3','0','0.36','0.42','0.38','162',NOW())]
SNMP Check response code: 0
Modules status: Global+ OS Device
#### Load poller module core ####
SNMP[/usr/bin/snmpget -v2c -c COMMUNITY -OQnUt -m SNMPv2-MIB -M /opt/librenms/mibs:/opt/librenms/mibs/supermicro:/opt/librenms/mibs/dell -t 5 -r 5 udp:HOSTNAME:161 sysUpTime.0 sysLocation.0 sysContact.0 sysName.0 sysObjectID.0 sysDescr.0]
.*.*.0 = PhoneNo
.*.*.0 = Company
.*.*.0 = Server Team
.*.*.0 = serverxxxxxx
.*.*.0 = .*.4.1.8*
.*.*.0 = Linux hostname 4.1.12-94.3.6.el7uek.x86_64 #2 SMP Tue May 30 19:25:15 PDT 2017 x86_64
SNMP[/usr/bin/snmpget -v2c -c COMMUNITY -OQnUst -m HOST-RESOURCES-MIB:SNMP-FRAMEWORK-MIB -M /opt/librenms/mibs:/opt/librenms/mibs/supermicro:/opt/librenms/mibs/dell -t 5 -r 5 udp:HOSTNAME:161 snmpEngineTime.0 hrSystemUptime.0]
snmpEngineTime.0 = 239834
hrSystemUptime.0 = 3387214467
Uptime seconds: 33872145
RRD[update /opt/librenms/rrd/IP_ADDR/uptime.rrd N:33872145]
Uptime: 1 years, 27 days, 55m 45s
SQL[SELECT `lat`,`lng` FROM `locations` WHERE `location`='Company' LIMIT 1]
Using cached lat/lng from other device
>> Runtime for poller module 'core': 0.0195 seconds with 34264 bytes
>> SNMP: [2/0.02s] MySQL: [1/0.00s] RRD: [1/0.00s]
#### Unload poller module core ####
RRD[update /opt/librenms/rrd/IP_ADDR/poller-perf-core.rrd N:0.019529104232788]
Modules status: Global+ OS Device
#### Load poller module processors ####
Attempting to initialize OS: linux
Attempting to initialize OS: unix
OS initilized as Generic
SQL[SELECT * FROM processors WHERE device_id='162']
SNMP[/usr/bin/snmpget -v2c -c COMMUNITY -OUQn -M /opt/librenms/mibs:/opt/librenms/mibs/supermicro:/opt/librenms/mibs/dell -t 5 -r 5 udp:HOSTNAME:161 .1.3.6.1.2.1.25.3.3.1.2.196608 .1.3.6.1.2.1.25.3.3.1.2.196609 .1.3.6.1.2.1.25.3.3.1.2.196610 .1.3.6.1.2.1.25.3.3.1.2.196611 .1.3.6.1.2.1.25.3.3.1.2.196612 .1.3.6.1.2.1.25.3.3.1.2.196613 .1.3.6.1.2.1.25.3.3.1.2.196614 .1.3.6.1.2.1.25.3.3.1.2.196615]
.*.*.*608 = 100
.*.*.*609 = 100
.*.*.*610 = 100
.*.*.*611 = 100
.*.*.*612 = 100
.*.*.*613 = 100
.*.*.*614 = 100
.*.*.*615 = 100
Array
(
[.1.3.6.1.2.1.25.3.3.1.2.196608] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196609] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196610] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196611] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196612] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196613] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196614] => 100
[.1.3.6.1.2.1.25.3.3.1.2.196615] => 100
)
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196608.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196609.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196610.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196611.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196612.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196613.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196614.rrd N:100]
100%
RRD[update /opt/librenms/rrd/IP_ADDR/processor-hr-196615.rrd N:100]
>> Runtime for poller module 'processors': 0.0106 seconds with 102488 bytes
>> SNMP: [1/0.01s] MySQL: [1/0.00s] RRD: [9/0.00s]
#### Unload poller module processors ####
RRD[update /opt/librenms/rrd/IP_ADDR/poller-perf-processors.rrd N:0.010571002960205]
### Start Device Groups ###
SQL[SELECT * FROM device_groups ORDER BY name]
SQL[SELECT DISTINCT(devices.device_id) FROM devices WHERE devices.device_id='162' AND (devices.sysName REGEXP "cn") LIMIT 1]
SQL[SELECT DISTINCT(devices.device_id) FROM devices WHERE devices.device_id='162' AND (devices.sysName REGEXP "Print") LIMIT 1]
SQL[SELECT `device_group_id` FROM `device_group_device` WHERE `device_id`='162']
Groups Added:
Groups Removed:
### End Device Groups ###
RRD[update /opt/librenms/rrd/IP_ADDR/ping-perf.rrd N:0.38]
RRD[update /opt/librenms/rrd/IP_ADDR/poller-perf.rrd N:1.048]
SQL[UPDATE `devices` set `uptime` ='33872145',`last_ping` =NOW(),`last_ping_timetaken` ='0.38' WHERE `device_id` = '162']
Updating IP_ADDR
Polled in 1.048 seconds
#### Start Alerts ####
SQL[SELECT `device_group_id` FROM `device_group_device` WHERE `device_id`='162']
SQL[SELECT alert_schedule.schedule_id FROM alert_schedule LEFT JOIN alert_schedule_items ON alert_schedule.schedule_id=alert_schedule_items.schedule_id WHERE ( alert_schedule_items.target = '162' ) && ((alert_schedule.recurring = 0 AND (NOW() BETWEEN alert_schedule.start AND alert_schedule.end)) OR (alert_schedule.recurring = 1 AND (alert_schedule.start_recurring_dt <= date_format(NOW(), '--%d') AND (end_recurring_dt >= date_format(NOW(), '--%d') OR end_recurring_dt is NULL OR end_recurring_dt = '0000-00-00' OR end_recurring_dt = '')) AND (date_format(now(), '%H:%i:%s') BETWEEN `start_recurring_hr` AND end_recurring_hr) AND (recurring_day LIKE CONCAT('%',date_format(now(), ''),'%') OR recurring_day is null or recurring_day = ''))) LIMIT 1]
SQL[SELECT DISTINCT a.* FROM alert_rules a LEFT JOIN alert_device_map d ON a.id=d.rule_id LEFT JOIN alert_group_map g ON a.id=g.rule_id LEFT JOIN device_group_device dg ON g.group_id=dg.device_group_id WHERE a.disabled = 0 AND ((d.device_id IS NULL AND g.group_id IS NULL) OR d.device_id='162' OR dg.device_id='162')]
Rule #38 (Service critical):
SQL[SELECT * FROM devices,services WHERE (devices.device_id = '162' AND devices.device_id = services.device_id) AND services.service_status = 2]
SQL[SELECT state FROM alerts WHERE rule_id = '38' AND device_id = '162' ORDER BY id DESC LIMIT 1]
Status: NOCHG
Rule #39 (Devices up/down):
SQL[SELECT * FROM devices WHERE (devices.device_id = '162') AND (devices.status = 0 && (devices.disabled = 0 && devices.ignore = 0)) = 1 AND devices.status_reason = "icmp" AND devices.hostname != "10.106.32.237" AND devices.hostname != "10.106.33.201" AND devices.hostname != "10.106.36.202" AND devices.hostname != "10.106.8.124" AND devices.hostname != "10.107.2.100" AND devices.hostname != "172.16.242.48" AND devices.hostname != "IP_ADDR"]
SQL[SELECT state FROM alerts WHERE rule_id = '39' AND device_id = '162' ORDER BY id DESC LIMIT 1]
Status: NOCHG
#### End Alerts ####
SQL[INSERT INTO `perf_times` (`type`,`doing`,`start`,`duration`,`devices`,`poller`) VALUES ('poll','IP_ADDR','1534133825.958','7.698','1','hostnamexxxx')]
/opt/librenms/poller.php IP_ADDR 2018-08-13 12:17:13 - 1 devices polled in 7.698 secs
SNMP [4/0.03s]: Get[4/0.03s] Getnext[0/0.00s] Walk[0/0.00s]
MySQL [22/0.01s]: Cell[7/0.00s] Row[1/0.00s] Rows[9/0.00s] Column[2/0.00s] Update[1/0.00s] Insert[2/0.00s] Delete[0/0.00s]
RRD [13/0.00s]: Update[13/0.00s] Create [0/0.00s] Other[0/0.00s]
Looks like the device is returning 100 from the snmp get command.
when i top it, result as below, it should not always 100% cpu usage.
Thanks
I agree, but LibreNMS cannot correct incorrect data it receives. It has to trust that the device sends correct data.
I suggest you check with the device vendor and ask them why it sends incorrect data.