I have my main LibreNMS cluster in the United States and have various pollars around the world which pull the local devices for that datacenter. I have it setup to use rrdcached on the main server in the United States, it also uses the main mySQL server in the United States.
The problem I’m having is that my Singapore nodes are extremely slow, it takes 300+ seconds to pull the node and I’m trying to track it down.
When I run -vv on the poller, I see this:
(For example, 1.1.1.1 is my device I’m checking locally, 2.2.2.2 is the main server).
>> SNMP: [0/0.00s] MySQL: [1/4.72s] RRD: [2/0.24s]
>> Runtime for poller module 'wireless': 0.4738 seconds with 65248 bytes
RRD[last 1.1.1.1/poller-perf-wireless.rrd --daemon 2.2.2.2:42217]
RRD[update 1.1.1.1/poller-perf-wireless.rrd N:0.47375202178955 --daemon 2.2.2.2:42217]
#### Unload poller module wireless ####
#### Load poller module ospf ####
Module enabled: Global + | OS | Device | Manual
SQL[select * from `vrf_lite_cisco` where `vrf_lite_cisco`.`device_id` = ? and `vrf_lite_cisco`.`device_id` is not null [214] 471.49ms]
>>> Polled 1.1.1.1 (214) in 273.634 seconds <<<
SNMP [368/5.89s]: Snmpget[328/3.54s] Snmpwalk[40/2.34s]
SQL [533/2544.40s]: Select[111/543.96s] Update[417/1976.80s] Delete[5/23.64s]
RRD [276/32.87s]: Other[138/32.86s] Update[138/0.02s]
I’m trying to figure out if this is the best way to do this? Is there a way to have it store the results locally and then upload it to the main server later on, instead of during the poll? The other locations are fine, but Singapore is extremely slow… which makes sense if it has to talk back to the US for every packet.
I think so – or at least on each large land mass separated by large oceans.
We run a local instance of telegraph and feed in the ‘influx’ output and then forward from telegraph to a central instance of influxdb where we build dashboards in grafana for ‘single pane of glass stuff’.
I was hoping it would send the data to Influx post-poll, but it appears to send it at the same time, and the Grafana server is in our main location in the US, so polling went from 20 seconds to 100+