It would be feasible to have support for lets say remote data polling, in my head being something like:
monitored host (some appliance that cannot export data via snmp but with some custom producer tools)
data collector host (that gets the data out of the monitored host and exports it via snmp extend)
now for this to be possible i imagine that there should be some kind of poller module that checks if the monitored host has some data collector host attached to it and if it has them get data from that host
at this time i cannot imagine how to “bind” the remote data (let’s say cpu usage) to the real host
The aim for us has always been to allow other data collection methods.
I don’t think using snmp ‘somewhere’ to then go off and collect this is the best option imho, it could be one option but really we should add hooks in to allow collection of metrics via json, xml, scripts, etc.
I was thinking that the goal should be to have a series of transport modules (http/snmp/telnet/ssh/local/etc). Then each discovery/poller module can call these to retrieve their data. Something like:
$transport = new LibreNMS\Transports\SNMP();
$data = $transport->getData($OID);
$transport = new LibreNMS\Transports\SSH();
$response = $transport->getData('command to run over ssh');
However the concern with using any non SNMP datasource is the delay in writing RRD’s if we are waiting for multiple datasources to time out.
I have a HTTP library I use in #2196 - Adding Cisco CallManager metrics. It is mostly complete but I have kept it back until we can resolve this issue.
Why do you see a difference in non-snmp data sources timing out to snmp? We see the same issue if snmp isn’t responding or is responding slow now anyway so as long as sane timeouts are set then it should be do-able.
You are right, but as all modules are currently SNMP, if it is not responding now, we are not going to get graphs for anything on the device. In a future state using many transport protocols, SNMP failing for several poller modules would slow down other non-SNMP modules or vis versa.
I can see HTTP collected data suffering because 20 or so SNMP modules, each with a 10 second timeout all have to fail.
I thought I had was if the transport was a unique (per device) instance of a class, each data collection function could check / set an internal status flag which would set to false on timeout and mark the transport down. Or we could look at forking the poller to run separatly per module, then we dont really care if a single module goes rogue?
I’m not sure I see it the same way, at present if snmp times out we mark the device as down and that happens pretty quickly. In my dev env, only 2 devices are up out of 20 and I can discover them all in < 10 seconds , 4-5 seconds of that are the 2 devices that work. In the future we would just not mark the device as down but merely ‘snmp’ modules as unavailable so other items can be attempted.
With regards to forking the poller, I think we really need to visit the poller rewrite overall. We can have better performance gains by switching out from php and to something with lower overhead and better forking / multi-threading capabilities but that’s a bit of a pipe dream at present