Slow prometheus metrics push, discuss possible solutions

As described in this github issue as well: https://github.com/librenms/librenms/issues/13826
We’d like to workaround the existing slowness of pushing prometheus metrics in the poller process. Currently that’s what’s slowing down the polling loop itself and we’re actually missing metrics when prometheus metrics are enabled.
We have two ideas (a tiny bit hacky) described in that issue ^ and would appreciate your feedback for potential ways forward. :slight_smile:

Thank you for your valuable work,
Mike.

Why does it take so long to push a metric to Prometheus? 8s is so long. It should be ~0.1s or less.

Most requests take indeed less than 200ms to answer, but a few might take longer. Sometimes cause the push-gateway is overloaded from the amount of requests here. There’s one sent per measurement and it’s in the same thread so polling blocks while the requests are pending. If the push gateway starts getting more metrics in memory (which is the case for us given the amount of metrics we get from librenms) it starts responding slower as well.

Doing this in a single thread will not really work with a push model as confirmed in various threads on github for push-gateway not being able to scale for large amounts of metrics. It’s also not recommended here. When to use the Pushgateway | Prometheus

We followed the approach with a local file for now and serve the metrics from another python webserver process which can do threading and serve a /metrics endpoint so that does work around the issue. But we would like to either contribute back to librenms for the local-file metrics dumping or maybe test out an actual /metrics endpoint in librenms itself if that’s possible somehow?

I’m sure the original user that contributed the Prometheus Push Gateway support is long gone. (remember this is a community project)

PHP 8.1 supports Fibers, which should allow async io. But LibreNMS likely won’t set the minimum PHP version to PHP 8.1 for a year or so. (See php.net support schedule)

I don’t know much about Prometheus, but if there is another approach that works, but all means, go for it. Feel free to ask any questions on Discord in #devel

Cool :slight_smile: The changes to librenms to make this supported are here Update Prometheus.php by vincnt · Pull Request #1 · vincnt/librenms · GitHub
Do you know where we’d do a PR for this to get in (or just get reviewed? )
The new behavior is behind a flag so it shouldn’t affect any current process and is still useful overall.
We do expect a couple things will need changing after your review of course.

Thanks for responding!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.