High Cpu load with only 150 devices

Hello,

On the VM where i run LibreNMS i have 100% CPU load
The Webgui is very slow to walk from page to page also startup ./validate.php is very slow.
I have already check de performance documantion but this also won’t solved the problem.

The devices i polled are al Cisco C891F or C892.

Output validate.php
[root@RFH-PI-GSN-523 librenms]# ./validate.php

Component Version
LibreNMS 1.61
DB Schema 2020_02_10_223323_create_alert_location_map_table (159)
PHP 7.2.11
MySQL 10.3.17-MariaDB
RRDTool 1.7.0
SNMP NET-SNMP 5.8

====================================

[OK] Composer Version: 1.10.1
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct
[FAIL] The poller (RFH-PI-GSN-523) has not completed within the last 5 minutes, check the cron job.
[WARN] Your local git contains modified files, this could prevent automatic updates.
[FIX]:
You can fix this with ./scripts/github-remove
Modified Files:
discovery-wrapper.py
poller-wrapper.py

output top:

[root@RFH-PI-GSN-523 librenms]# top
top - 10:09:47 up 20:36, 1 user, load average: 18.73, 24.11, 33.88
Tasks: 254 total, 13 running, 241 sleeping, 0 stopped, 0 zombie
%Cpu(s): 55.2 us, 42.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 1.0 hi, 1.0 si, 0.0 st
MiB Mem : 7766.5 total, 3127.9 free, 2961.7 used, 1676.8 buff/cache
MiB Swap: 4024.0 total, 4008.7 free, 15.3 used. 4494.6 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
112655 librenms 20 0 262552 49676 18424 S 7.6 0.6 0:00.23 php
1609 mysql 20 0 1790016 250772 21240 S 5.9 3.2 54:56.67 mysqld
112572 librenms 20 0 262552 49756 18492 S 5.6 0.6 0:00.27 php
2701 librenms 20 0 268696 56148 18724 S 5.0 0.7 0:47.93 php
32978 librenms 20 0 266648 53836 18464 S 4.6 0.7 0:45.04 php
183161 librenms 20 0 266648 54016 18660 S 4.6 0.7 0:29.09 php
212763 librenms 20 0 264600 52048 18748 S 4.6 0.7 0:25.46 php
123221 librenms 20 0 266648 54196 18836 S 4.3 0.7 0:35.73 php
153224 librenms 20 0 266648 54016 18656 S 4.3 0.7 0:32.25 php
27021 librenms 20 0 264600 51860 18568 S 4.0 0.7 0:16.28 php
242479 librenms 20 0 264600 51856 18556 S 3.6 0.7 0:21.12 php
63082 librenms 20 0 266648 53920 18548 S 2.6 0.7 0:41.99 php
85597 librenms 20 0 262552 49820 18592 S 2.6 0.6 0:06.34 php
93316 librenms 20 0 266648 54148 18784 S 2.6 0.7 0:38.99 php
56469 librenms 20 0 262552 49784 18544 S 2.0 0.6 0:11.45 php
110157 librenms 20 0 262552 50040 18752 S 0.7 0.6 0:00.85 php

Is see maybe a problem with Mysqld?
But how to fix this i don’t know.
I have read some topics in this community but non have solved this problem.
And maybe is the hardware to small for Librenms. What are the best requierments for librenms for a VM-server for polling around 1000 devices and around 12000 ports?

Please help me

Kind regards

Hello,

Somehow de high CPU is solved for the moment.
I think i had to many pollers open in the cronjob i have solved this to only start 1 poller.

But what i see in de validate.php is a FAIL of polling the localhost.

The poller (RFH-PI-GSN-523) has not completed within the last 5 minutes, check the cron job.

This is the server where Librenms is running from.
I try to reach him on the local adres (127.0.0.0)

if i run ./poller.php -h localhost i get the following output:

[root@RFH-PI-GSN-523 librenms]# ./poller.php -h localhost
LibreNMS Poller
Starting polling run:

Hostname: localhost
Device ID: 19
OS: linux
Resolved IP: 127.0.0.1
(unix)

Load poller module core

Uptime: 21 hours 48 minutes 3 seconds

Runtime for poller module ‘core’: 0.0252 seconds with 80184 bytes
SNMP: [2/0.02s] MySQL: [0/0.00s] RRD: [1/0.00s]

Unload poller module core

Module [ unix-agent ] disabled globally.

Load poller module os

Hardware: Generic x86 64-bit
Version: 4.18.0-147.5.1.el8_1.x86_64
Features: CentOS 8.1.1911
Serial:

Runtime for poller module ‘os’: 0.2153 seconds with 57632 bytes
SNMP: [10/0.21s] MySQL: [0/0.00s] RRD: [1/0.00s]

Unload poller module os

Module [ ipmi ] disabled globally.

Load poller module sensors

Runtime for poller module ‘sensors’: 0.0008 seconds with 1464 bytes
SNMP: [0/0.00s] MySQL: [1/0.00s] RRD: [1/0.00s]

Unload poller module sensors

Load poller module processors

29%
28%

Runtime for poller module ‘processors’: 0.0302 seconds with 94136 bytes
SNMP: [1/0.02s] MySQL: [3/0.00s] RRD: [3/0.00s]

Unload poller module processors

Load poller module mempools

Mempool Physical memory: 49.52%
Mempool Virtual memory: 32.74%
Mempool Swap space: 0.36%

Runtime for poller module ‘mempools’: 0.0246 seconds with 21232 bytes
SNMP: [2/0.02s] MySQL: [4/0.01s] RRD: [4/0.00s]

Unload poller module mempools

Load poller module storage

Storage /dev/shm: hrstorage

0%
Storage /run: hrstorage

0%
Storage /sys/fs/cgroup: hrstorage

0%
Storage /: hrstorage

49%
Storage /boot: hrstorage

25%
Storage /boot/efi: hrstorage

1%
Storage /run/user/986: hrstorage

0%
Storage /run/user/1001: hrstorage

0%

Runtime for poller module ‘storage’: 0.0056 seconds with 3040 bytes
SNMP: [0/0.00s] MySQL: [9/0.00s] RRD: [9/0.00s]

Unload poller module storage

Load poller module netstats

ICMP IP IP-FORWARD SNMP TCP TCPHC UDP

Runtime for poller module ‘netstats’: 0.0770 seconds with 35600 bytes
SNMP: [7/0.08s] MySQL: [0/0.00s] RRD: [6/0.00s]

Unload poller module netstats

Load poller module hr-mib

Processes Users

Runtime for poller module ‘hr-mib’: 0.0080 seconds with 3184 bytes
SNMP: [1/0.01s] MySQL: [0/0.00s] RRD: [3/0.00s]

Unload poller module hr-mib

Load poller module ucd-mib

Runtime for poller module ‘ucd-mib’: 0.0212 seconds with 11032 bytes
SNMP: [3/0.02s] MySQL: [0/0.00s] RRD: [18/0.00s]

Unload poller module ucd-mib

Load poller module ipSystemStats

ipv4 ipv6

Runtime for poller module ‘ipSystemStats’: 0.0114 seconds with 17808 bytes
SNMP: [1/0.01s] MySQL: [0/0.00s] RRD: [3/0.00s]

Unload poller module ipSystemStats

Load poller module ports

Caching Oids: Full ports polling ifDescr ifAdminStatus ifOperStatus ifLastChange ifType ifPhysAddress ifMtu ifInErrors ifOutErrors ifInDiscards ifOutDiscards dot3StatsDuplexStatus
Port lo: lo (1 / #422) VLAN = lobps(5.91 kbps/5.91 kbps)bytes(26.71 kB/26.71 kB)pkts(4.59 pps/4.59 pps)
Port eth0: eth0 (2 / #423) dot3Duplex VLAN = eth0bps(361.75 kbps/148.76 kbps)bytes(1.6 MB/671.89 kB)pkts(183.59 pps/179.49 pps)

Runtime for poller module ‘ports’: 0.1117 seconds with 30168 bytes
SNMP: [14/0.10s] MySQL: [5/0.00s] RRD: [3/0.00s]

Unload poller module ports

Load poller module customoid

Runtime for poller module ‘customoid’: 0.0005 seconds with 2040 bytes
SNMP: [0/0.00s] MySQL: [1/0.00s] RRD: [1/0.00s]

Unload poller module customoid

Module [ bgp-peers ] disabled on os.

Module [ junose-atm-vp ] disabled globally.

Module [ toner ] disabled globally.

Load poller module ucd-diskio

sda sda1 sda2 sda3 dm-0 dm-1

Runtime for poller module ‘ucd-diskio’: 0.0093 seconds with 7720 bytes
SNMP: [1/0.01s] MySQL: [1/0.00s] RRD: [7/0.00s]

Unload poller module ucd-diskio

Module [ wifi ] disabled globally.

Module [ wireless ] disabled globally.

Module [ ospf ] disabled on os.

Module [ cisco-ipsec-flow-monitor ] disabled globally.

Module [ cisco-remote-access-monitor ] disabled globally.

Module [ cisco-cef ] disabled globally.

Module [ cisco-sla ] disabled globally.

Module [ cisco-mac-accounting ] disabled globally.

Module [ cipsec-tunnels ] disabled globally.

Module [ cisco-ace-loadbalancer ] disabled globally.

Module [ cisco-ace-serverfarms ] disabled globally.

Module [ cisco-asa-firewall ] disabled globally.

Module [ cisco-voice ] disabled globally.

Module [ cisco-cbqos ] disabled globally.

Module [ cisco-otv ] disabled globally.

Module [ cisco-qfp ] disabled globally.

Module [ cisco-vpdn ] disabled globally.

Module [ nac ] disabled globally.

Module [ netscaler-vsvr ] disabled globally.

Module [ aruba-controller ] disabled globally.

Load poller module entity-physical

Runtime for poller module ‘entity-physical’: 0.0005 seconds with 1976 bytes
SNMP: [0/0.00s] MySQL: [1/0.00s] RRD: [1/0.00s]

Unload poller module entity-physical

Module [ entity-state ] disabled globally.

Load poller module applications

Runtime for poller module ‘applications’: 0.0004 seconds with 1472 bytes
SNMP: [0/0.00s] MySQL: [1/0.00s] RRD: [1/0.00s]

Unload poller module applications

Module [ mib ] disabled globally.

Module [ stp ] disabled on os.

Load poller module ntp

Runtime for poller module ‘ntp’: 0.0001 seconds with 320 bytes
SNMP: [0/0.00s] MySQL: [0/0.00s] RRD: [1/0.00s]

Unload poller module ntp

Module [ loadbalancers ] disabled globally.

Module [ mef ] disabled globally.

Module [ mpls ] disabled globally.

Load poller module services

Runtime for poller module ‘services’: 0.0000 seconds with 0 bytes
SNMP: [0/0.00s] MySQL: [0/0.00s] RRD: [1/0.00s]

Unload poller module services

Start Device Groups

End Device Groups, runtime: 0.0096s

Enabling graphs: uptime netstat_icmp netstat_icmp_info netstat_ip netstat_ip_frag netstat_snmp netstat_snmp_pkt netstat_tcp netstat_udp hr_processes hr_users ucd_cpu ucd_swap_io ucd_io ucd_contexts ucd_interrupts ucd_memory ucd_load ipsystemstats_ipv4 ipsystemstats_ipv4_frag ipsystemstats_ipv6 ipsystemstats_ipv6_frag

Polled in 1.61 seconds

Start Alerts

End Alerts

SNMP [43/0.52s]: Get[19/0.32s] Getnext[4/0.03s] Walk[20/0.17s]
MySQL [33/0.03s]: Cell[1/0.00s] Row[-1/-0.00s] Rows[12/0.01s] Column[0/0.00s] Update[19/0.02s] Insert[2/0.00s] Delete[0/0.00s]
RRD [67/0.00s]: Update[67/0.00s] Create [0/0.00s] Other[0/0.00s]

Hi!

You didnt said whats your VM specs (CPU/MEM)

Also, the datastore is very important.

Regards,

Hello TheGreatDoc,

Thank you for you replay.
But my CPU issue is solved so you can see.
I have made a change in the crontab.
Because i opend every 5 min 3 or 4 pollings to the same devices and that was to much i think.
So the CPU/Mem problem is solved for now.

And the last problem is also solved.

So this topic can besolved maybe for my next time i need little bit more patience.

Thanks for now.

Sorry, I dont understand you.

What have you changted in the cron file?

This is my Crontab for this moment and it’s working without high CPU.

20   0    * * *   librenms    /usr/bin/python3 /opt/librenms/snmp-scan.py
#*/5  *    * * *   librenms    /opt/librenms/discovery-wrapper.py 2 >> /dev/null 2>&1
#*/5  *    * * *   librenms    /opt/librenms/discovery.php -h new >> /dev/null 2>&1
#*/5  *    * * *   librenms    /opt/librenms/cronic /opt/librenms/poller-wrapper.py 2 
*/5  *    * * *   librenms    /opt/librenms/poller.php -h all >> /dev/null 2>&1

This is not good at all. Here is a working file. If you want more help, you need to give us your specifications (number of CPUs, Memory).

librenms@monitoring1:~$ cat /etc/cron.d/librenms 
# Using this cron file requires an additional user on your system, please see install docs.
#night discovery, once a day
33  4   * * *   librenms    /opt/librenms/discovery.php -h all >> /dev/null 2>&1

#Poller wrapper, mandatory, number at the end of the line must be 
# adapted so polling takes less than 5 minutes
*/5  *    * * *   librenms    /opt/librenms/cronic /opt/librenms/poller-wrapper.py 16

#Night daily jobs, mandatory
13   2   * * *   librenms    /opt/librenms/daily.sh >> /dev/null 2>&1

#New device discovery, not mandatory, once every 20 minutes here
*/20  *    * * *   librenms    /opt/librenms/discovery.php -h new >> /dev/null 2>&1

#Not mandatory:
20 0 * * * librenms /usr/bin/python3 /opt/librenms/snmp-scan.py

Hello,

Thanks for you replay.

I have tested the cronjob you have send me.

And this one works fine thanks.

I have also add a new discovery of a new part of my network so we grow from almost 150 device to almost 200 devices all Cisco C800.

De CPU of the server now is 78%.

And I have just add 25% of the total device in LibreNMS and he is working on 78%.

I’m afraid if I add more device in LibreNMS de CPU goes higher to 100%

De Server has 2 CPU (Intel® Xeon® Gold 6148 Processor) and 8 Gig of memory.

Met vriendelijke groet / With kind regards,

Jorg Schrievers

The CPU running 100% is expected. Does not make sense to run half speed :slight_smile: But the critical thing is to check if 5 minutes are sufficient to poll all :

You have to check how long the polling takes:
Capture d’écran 2020-04-01 à 14.10.37

From this polling duration, you can extrapolate the amount of devices you can poll.

For the CPU, please check in real time, on CLI, using command “top”. If the CPU is not used 100%, then you can (and should) try to make it used 100%.