I do not know what to name what I want. I will simply describe what happens:
I have several models of devices (for example D-link DGS-3100-24TG (220.127.116.11.18.104.22.168.94.5)), when polling which the load on their processor increases to 100% and the devices sometimes reboot. I don’t want to turn them off from polling at all - maybe they can be interrogated in some kind of gentle mode? regarding the exclusion of such devices from polling is also an inconvenience - you cannot do this from the web. you have to do this in the database (which will not work for the device that is added after). You can group such devices by oid, but you cannot disable polling for a group …
Its the poller or discovery which make that 100% CPU?
Anyways, I would start identifying which module is causing that by going to
Device -> Graphs -> Poller and, if you dont need it, disable by OS in your config.php, for example:
$config['os']['netmanplus']['discovery_modules']['entity-physical'] = false;
$config['os']['netmanplus']['poller_modules']['entity-physical'] = false;
That disables the entity-physical discovery and poller modules in Netman Plus devices.
I manually run poller.php -h hosname and when poller get data from module ports - I see 100% cpu load on device.
I’m not quite sure that it’s worth disconnecting the modules as you say, because apart from this model of this vendor, I also have other models of this vendor that behave normally.
Well… If it is the ports one you could try the per port polling https://docs.librenms.org/Support/Performance/#per-port-polling-experimental in those devices.
Also, maybe a firmware upgrade could fix the issue with your switches
thanks, I’ll look at the firmware issue - maybe this will really make a difference.
but question about per port polling - which OS do I need to use for this device (D-link DGS-3100-24TG (22.214.171.124.126.96.36.199.94.5))?
I suggest you to do it per device instead per os.
I have 200 of these devices (D-link DGS-3100-24TG (188.8.131.52.184.108.40.206.94.5))
Try first with one device and check if it goes better of worse
Ok, tried one device. Yes, I see that the load on the device was less, the polling was faster, but maybe something else can be done? because the load was only shorter in time, but was present. and I would call it tangible. now it was 95-98% for 3-5 seconds vs. 100% for 20-30 seconds.
perhaps it is necessary for such devices to optionally provide for the possibility of a small (possibly customizable) pause between snmp requests? or not get the whole tree at once, but get it in parts to reduce the load on the device?