Remote data collector support

crcro · 21 August 2016 21:32

It would be feasible to have support for lets say remote data polling, in my head being something like:

monitored host (some appliance that cannot export data via snmp but with some custom producer tools)
data collector host (that gets the data out of the monitored host and exports it via snmp extend)

now for this to be possible i imagine that there should be some kind of poller module that checks if the monitored host has some data collector host attached to it and if it has them get data from that host

at this time i cannot imagine how to “bind” the remote data (let’s say cpu usage) to the real host

laf · 22 August 2016 23:02

The aim for us has always been to allow other data collection methods.

I don’t think using snmp ‘somewhere’ to then go off and collect this is the best option imho, it could be one option but really we should add hooks in to allow collection of metrics via json, xml, scripts, etc.

crcro · 23 August 2016 12:58

even so it would require some kind of remote collector host, as you could query multiple data from one host for many monitored host

laf · 23 August 2016 13:41

But surely the remote collector is the poller

murrant · 25 August 2016 11:52

Could this be achieved with collectd or munin?

crcro · 25 August 2016 15:23

I think that data can be sent via collectd but then how can that graphs be “put” under a specific host and under specific category? (obviously I’m missing something )

laf · 26 August 2016 06:53

It’s certainly something that needs planning.

I guess we need the ability to collect data via:

http/https
snmp
local scripts
ssh/telnet?

crcro · 4 September 2016 15:27

i think that snmp and some 3rd party (like collectd) would suffice

laf · 4 September 2016 19:00

Does collected need to be installed on the remote device to collect data, if so then I don’t think we can rely on snmp + collectd alone.

adaniels21487 · 4 September 2016 22:34

Hi Team,
I was thinking that the goal should be to have a series of transport modules (http/snmp/telnet/ssh/local/etc). Then each discovery/poller module can call these to retrieve their data. Something like:

$transport = new LibreNMS\Transports\SNMP();
$data = $transport->getData($OID);

or:

$transport = new LibreNMS\Transports\SSH();
$response = $transport->getData('command to run over ssh');

However the concern with using any non SNMP datasource is the delay in writing RRD’s if we are waiting for multiple datasources to time out.

I have a HTTP library I use in #2196 - Adding Cisco CallManager metrics. It is mostly complete but I have kept it back until we can resolve this issue.

Thanks,
Aaron

laf · 5 September 2016 20:42

Why do you see a difference in non-snmp data sources timing out to snmp? We see the same issue if snmp isn’t responding or is responding slow now anyway so as long as sane timeouts are set then it should be do-able.

adaniels21487 · 6 September 2016 00:44

Hi @laf,
You are right, but as all modules are currently SNMP, if it is not responding now, we are not going to get graphs for anything on the device. In a future state using many transport protocols, SNMP failing for several poller modules would slow down other non-SNMP modules or vis versa.

I can see HTTP collected data suffering because 20 or so SNMP modules, each with a 10 second timeout all have to fail.

I thought I had was if the transport was a unique (per device) instance of a class, each data collection function could check / set an internal status flag which would set to false on timeout and mark the transport down. Or we could look at forking the poller to run separatly per module, then we dont really care if a single module goes rogue?

Thanks,
Aaron

laf · 7 September 2016 07:11

I’m not sure I see it the same way, at present if snmp times out we mark the device as down and that happens pretty quickly. In my dev env, only 2 devices are up out of 20 and I can discover them all in < 10 seconds , 4-5 seconds of that are the 2 devices that work. In the future we would just not mark the device as down but merely ‘snmp’ modules as unavailable so other items can be attempted.

With regards to forking the poller, I think we really need to visit the poller rewrite overall. We can have better performance gains by switching out from php and to something with lower overhead and better forking / multi-threading capabilities but that’s a bit of a pipe dream at present