Distributed Poller Ideas

Hi All,

We have moved from a mix of Nagios and Observium at work to LibreNMS and after setting it up at 6 sites plus a master (all bar the master are behind NAT and have no VPNs between them for operational reasons) and going through the setup of distributed polling, which I have found to be not so reliable in our setup. I started to wonder if there might be a cleaner way.

Currently we have a number of devices that can be seen via the existing master and these can continue to be polled by the master in the current fashion. This is mainly looking at the remote sites, however I would envisage this allowing polling of local devices to be offloaded to another server(s) as if it was a remote site.

These are my thoughts on a possible replacement and I welcome any suggestions or criticisms on them.

  • Add a method of creating pollers with some sort of unique per poller token.

Devices

Add an endpoint on the API something like /poller/devices

This will produce all of the JSON required for a poller to poll the devices assigned to it.

The poller would then pass the poll data back to an endpoint like /devices/:device_id/result

Where it would be parsed and logged in LibreNMS like a normal poll would be.

Services

Services would operate similarly by using an endpoint something like /poller/services

Where it would use that data to run the checks against against the relevant device using the Nagios plugins installed on the distributed poller.

This data would then be returned via an endpoint similar to /services/:service_id/result

With the relevant alerts setup to point out any pollers/devices/services that have not been reported on in a period.

This is a very rough process, but I feel it would allow expandability and allow users to potentially write their own pollers or come up with ingenious methods of using these endpoints.

I would also like to investigate a way to set services as “passive” meaning they will be skipped by the existing polling methods, this would allow services that want to report their status back to LibreNMS to do so via the above API. For instance backup jobs completing or similar.

After some discussion I would be looking to develop these API endpoints as well as the pollers.

Many Thanks

Dub

tagging @murrant as he’s recently talked about using the api for polling.

I’ll have a read when I get chance.

This is basically what we are looking at doing. But I’m thinking of utilizing websockets to control the pollers as well.

The server would then send a request for data to the poller and the poller would respond with the data. All over a single socket.

My thoughts on how we should approach the V2 web rewrite have shifted a bit. At this time, I want to try to push out a new version of the API with complete coverage. Then I want to modify the current discovery and poller to use the new api. After that we can resume work on the webui rewrite. It also opens up the door for a new poller, possibly in a different language.