Devices take 300+ seconds in remote datacenters

thaz · 7 July 2024 04:16

Howdy.

I have my main LibreNMS cluster in the United States and have various pollars around the world which pull the local devices for that datacenter. I have it setup to use rrdcached on the main server in the United States, it also uses the main mySQL server in the United States.

The problem I’m having is that my Singapore nodes are extremely slow, it takes 300+ seconds to pull the node and I’m trying to track it down.

When I run -vv on the poller, I see this:

(For example, 1.1.1.1 is my device I’m checking locally, 2.2.2.2 is the main server).

>> SNMP: [0/0.00s] MySQL: [1/4.72s] RRD: [2/0.24s]
>> Runtime for poller module 'wireless': 0.4738 seconds with 65248 bytes
RRD[last 1.1.1.1/poller-perf-wireless.rrd  --daemon 2.2.2.2:42217]
RRD[update 1.1.1.1/poller-perf-wireless.rrd N:0.47375202178955 --daemon 2.2.2.2:42217]
#### Unload poller module wireless ####

#### Load poller module ospf ####

Module enabled: Global + | OS   | Device   | Manual
SQL[select * from `vrf_lite_cisco` where `vrf_lite_cisco`.`device_id` = ? and `vrf_lite_cisco`.`device_id` is not null [214] 471.49ms] 

>>> Polled 1.1.1.1 (214) in 273.634 seconds <<<

SNMP [368/5.89s]: Snmpget[328/3.54s] Snmpwalk[40/2.34s]
SQL [533/2544.40s]: Select[111/543.96s] Update[417/1976.80s] Delete[5/23.64s]
RRD [276/32.87s]: Other[138/32.86s] Update[138/0.02s]

I’m trying to figure out if this is the best way to do this? Is there a way to have it store the results locally and then upload it to the main server later on, instead of during the poll? The other locations are fine, but Singapore is extremely slow… which makes sense if it has to talk back to the US for every packet.

murrant · 8 July 2024 00:29

Your poller spend 2544 seconds running SQL queries (this is what happens when you put the latency between the poller and the sql server).

Offline + Remote polling is not something anyone has ever contributed to LibreNMS. Also, rrdtool does not support offline polling.

thaz · 8 July 2024 00:44

Fair enough. What would be the best option? Having a local LibreNMS instance for each DC?

pjchilds · 8 July 2024 04:55

I think so – or at least on each large land mass separated by large oceans.

We run a local instance of telegraph and feed in the ‘influx’ output and then forward from telegraph to a central instance of influxdb where we build dashboards in grafana for ‘single pane of glass stuff’.

thaz · 10 July 2024 00:25

That’s a good idea, I only really care about the all the transit links on a single page so I can send that to grafana. Thanks!

thaz · 11 July 2024 00:19

I was hoping it would send the data to Influx post-poll, but it appears to send it at the same time, and the Grafana server is in our main location in the US, so polling went from 20 seconds to 100+

Oh well… Will deal with this setup for now.

pjchilds · 11 July 2024 02:27

Yes this is why we run telegraf on the same box as the poller(s).

You just need a input config and an output config and ta da …

[[inputs.influxdb_listener]]
  service_address = ":8086"
  [inputs.influxdb_listener.tags]
    influxdb_database = "librenms"

# Configuration for influxdb server to send metrics to
[[outputs.influxdb]]
  urls = ["https://your-influx-server:8686"] # required
  database = "librenms" # required
  retention_policy = ""
  write_consistency = "any"
  username = "REDACTED"
  password = "REDACTED"
  skip_database_creation = true
  insecure_skip_verify = true
  [outputs.influxdb.tagpass]
    influxdb_database = ["librenms"]

system · 9 October 2024 02:28

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.