Load_average alert rule?

Hello,

So far we haven’t find a way to create alert based on CPU average load. There is graph generated by RRD tool, but we haven’t find any relation between existing (available) entities and RRD graph data. Specific use for this alert is on multicore platform, where if we have rule regarding CPU usage, it raises alert based on single core load. It would be much more convenient if there would be way to trigger alert if there is average load for all CPU cores.
If you need any additional explanation about our question i will gladly provide them.

5 Likes

This isn’t possible at present.

I run into this issue as well with some linux based vpn servers. Multi-core, etc, they will alert for CPU because certain processes are single threaded, and can spike a single cpu briefly. I’d love to see a CPU average across multiple cores.

-Jeff

4 Likes

I’d also like this feature. It’s an odd omission from a monitoring system. Single core CPU usage isn’t terribly useful as a measurement for a multicore *nix box, especially wrt alerting.

There exists a workaround here Overall CPU usage alerting but I’d very much prefer an official solution that’s maintained by the developers rather than a script someone wrote.

4 Likes

This would indeed be very nice.

This topic of CPU summary average (aggregated instead per core) has been circulating with different ways of solving, however I could not find any solution that worked for me, so I did invent something that worked. Even the example provided in the official documentation wasn’t working.

The problem with earlier solutions seems to be that if you try a SQL with AVG(processors.processor_usage) you have to either use a GROUP BY, or run it in a sub-query. The GROUP BY option conflicts with Laravel strict mode, while sub-query option produces an un-groupped column which again causes a conflict. These errors, if happening, are visible in Overview → Eventlog page of LibreNMS.

There might be some other workarounds (like configuring strict mode to allow loose GROUP BY), but the most straight forward looked to move GROUP BY function to another place. In the end I chose to create an SQL VIEW, and it worked. To do so login into your MariaDB and choose “librenms” DB. Then add the view:

CREATE VIEW avgcpu 
AS SELECT devices.device_id, avg(processors.processor_usage) as avgcpu
FROM devices, processors
WHERE devices.device_id = processors.device_id
GROUP BY device_id;

Make sure you don’t add any other columns to the view, otherwise you may step on the same GROUP BY rake again.

Then run a commands in LibreNMS WEB instance:

# Create a macro
lnms config:set alert.macros.rule.cpu_used_avg_perc “(SELECT %avgcpu.avgcpu WHERE %avgcpu.device_id = %devices.device_id)”

# Clears the config cache and make previous step available
lnms config:clear  

Now reload the Alerts page, and you can add a new alert using “macros.cpu_used_avg_perc” from the GUI:

Which will generate this kind of a SQL (just FYI, you don’t need do anything with it):

SELECT * FROM devices, avgcpu
WHERE (devices.device_id = ?)
AND (devices.status = 1 && (devices.disabled = 0 && devices.ignore = 0)) = 1
AND (SELECT avgcpu.avgcpu WHERE avgcpu.device_id = devices.device_id) >= 75

I saw there were complaints that some previous workarounds used to work before and then stopped. This one works with version:

1 Like