Do I need more pollers? 1900+ devices up but 400+ down when they are not

Hello,

Do I need more pollers?

If I go go to one of these down devices I see this:

What can I check?

Thanks

Hello, Unfortunatelly I cannot help, but do you know how more pollers can be added?

Are the pollers able to reach the monitored devices? What does the resource usage look like on the pollers?

I don’t believe there’s a hard limit (could be wrong). Where you’ll run into issues first is on the MariaDB, but a knowledgeable DBA can work around that. I have 13 poller ATM in my deployment.

You have 13 pollers!? How many devices are you polling?

My pollers all look healthy.

It seems to be certain it devices

We have 3 groups of 4 pollers in different geographical regions, then a ‘frontend’ poller which monitors all of the other pollers and is where users access Libre. We currently have like 5-600 devices. It’s overkill ATM, but we plan on adding alot more devices in the futute. Also with the spare pollers we can take one out of service for system upgrades etc with no issue. I mostly disclosed my poller count for Maria who asked if there was a limit.

For your gap issues, I’d suggest running Poller debugs on the devices and see what modules are taking up time that you can go without. Libre/the device likely aren’t exchanging info in the polling window, so either disabling modules or extending the polling window are your options. (Extending the polling window is a Libre-wide config so disabling modules is your best bet)

Edit: Additionally- you might want to verify all of your pollers can reach the devices in question (FW rules). Depending on how you have your poller groups setup, polling of the device can jump to any of the pollers. If that poller can’t reach the device it’ll show as down and create gaps.

Hello, I’ve gone into the pollers and they can ping the devices with the gaps, but nice idea.

On each poller where are the debug logs kept? I’d like to find a device that has gaps and look on each poller to see if it managed to poll it.

I just have 1 group and it looks healthy:

According to your image, it looks healthy.

Do all graphs have gaps or just these? Those two graphs you posted happened to be the last data that is updated at the end of a poll. Perhaps something is interrupting the process before it completes.

Scratch that, if your device is considered down, then polling won’t happen (and those graphs won’t be updated)

I would run a the poller with debug output enabled on one device that has consistent failures a lot by hand until I see it fail.

You mean run this on a ‘down’ device?

./poller.php -h -d