Dispatcher Service Issue with Redis

When asking for help and support, please provide as much information as possible. This should include:

  • Steps to reproduce an issue.
  • The output of ./validate.php
    ====================================================================
    If you need to post any text longer than a few lines, please use a pastebin service such as https://p.libren.ms using non-expiring pastes.

Hi all,

I’m having issues with the Dispatcher Service on a brand new RHEL8 build. I think the problem may lay somewhere within the python-Redis version I’m utilizing, but I haven’t tracked it down exactly yet. I’ve obfuscated some info for obvious reasons. Below is my info and troubleshooting:

Red Hat Enterprise Linux release 8.2 (Ootpa)
SELinux enabled.

# su librenms -c ./validate.php 
====================================
Component | Version
--------- | -------
LibreNMS  | 1.68-68-g774d95c11
DB Schema | 2020_10_21_124101_allow_nullable_ospf_columns (187)
PHP       | 7.3.23
Python    | 3.6.8
MySQL     | 10.3.17-MariaDB
RRDTool   | 1.7.0
SNMP      | NET-SNMP 5.8
====================================

[OK]    Composer Version: 1.10.15
[OK]    Dependencies up-to-date.
[OK]    Database connection successful
[OK]    Database schema correct

# su librenms -c './librenms-service.py -d -v'
MainThread(INFO):Configuring LibreNMS service
MainThread(INFO):Attempting to load .env from '/opt/librenms/.env'
MainThread(INFO):Using pure python SQL client
 poller1.mycompany.domain(INFO):Attaching signal handlers on thread  poller1.mycompany.domain
 poller1.mycompany.domain(CRITICAL):ERROR: Redis connection required for distributed polling
 poller1.mycompany.domain(CRITICAL):Could not connect to Redis. Error 111 connecting to 127.0.0.1:6379. Connection refused.

##Note- This is definitely not what’s set in /opt/librenms/.env

https://github.com/librenms/librenms/blob/41ed0537b417a13d5ba47edeee7c87c1b61e2476/LibreNMS/service.py#L516 I go into service.py and add the following line here:

critical("Could not connect to Redis. {}".format(os.getenv('REDIS_HOST')))

The error output now displays the proper IP from the .env file:

# su librenms -c './librenms-service.py -d -v'
MainThread(INFO):Configuring LibreNMS service
MainThread(INFO):Attempting to load .env from '/opt/librenms/.env'
MainThread(INFO):Using pure python SQL client
 poller1.mycompany.domain(INFO):Attaching signal handlers on thread  poller1.mycompany.domain
 poller1.mycompany.domain(CRITICAL):ERROR: Redis connection required for distributed polling
 poller1.mycompany.domain(CRITICAL):Could not connect to Redis. Error 111 connecting to 127.0.0.1:6379. Connection refused.
 poller1.mycompany.domain(CRITICAL):Could not connect to Redis. 172.my.internal.ip

# python3 -V
Python 3.6.8

# su librenms -c 'pip3 show redis'
Name: redis
Version: 3.3.8
Summary: Python client for Redis key-value store
Home-page: https://github.com/andymccurdy/redis-py
Author: Andy McCurdy
Author-email: [email protected]
License: MIT
Location: /usr/lib/python3.6/site-packages
Requires: 

Anyone running Centos/RHEL8 with a working dispatcher service/python poller that could confirm any of these for me, or share their working config?

Thanks!

It should also be noted that I have another RHEL8 poller that is working. It however is the redis-master in my cluster, so being forced to connect to 127.0.0.1 works out fine for that poller. It occurs to me that I could enable the redis instance on this server to be standalone, but then I’d of course be spamming my devices. Any addl insight is welcome.

So as a sanity check, I spun up a new poller with 20.04.1 ubuntu to see what kind of results I’d see. I’m again seeing problems with Redis. Here’s the output:

[email protected]:/opt/librenms# su librenms -c './librenms-service.py -v -d'
MainThread(INFO):Configuring LibreNMS service
MainThread(INFO):Attempting to load .env from '/opt/librenms/.env'
MainThread(INFO):Using pure python SQL client
poller2.mycompany.domain(INFO):Attaching signal handlers on thread poller2.mycompany.domain
poller2.mycompany.domain(WARNING):psutil is not available, polling gap possible
poller2.mycompany.domain(INFO):Created redis lock manager with socket_timeout of 60s
poller2.mycompany.domain(INFO):Starting watchdog timer for log file: /opt/librenms/logs/librenms.log
poller2.mycompany.domain(INFO):Entering main LibreNMS service loop on 150258/poller2.mycompany.domain...
poller2.mycompany.domain(DEBUG):Performing startup checks...
poller2.mycompany.domain(DEBUG):Starting up queue managers...
poller2.mycompany.domain(INFO):Groups: [0]
poller2.mycompany.domain(INFO):Poller QueueManager created: 20 workers, 300s frequency
Poller_0-1(DEBUG):Worker started Poller_0-1
Poller_0-1(INFO):Creating queue poller:0
Poller_0-2(DEBUG):Worker started Poller_0-2
Poller_0-3(DEBUG):Worker started Poller_0-3
Poller_0-4(DEBUG):Worker started Poller_0-4
Poller_0-5(DEBUG):Worker started Poller_0-5
Poller_0-6(DEBUG):Worker started Poller_0-6
Poller_0-7(DEBUG):Worker started Poller_0-7
Poller_0-1(INFO):Created redis queue with socket_timeout of 60s
Poller_0-8(DEBUG):Worker started Poller_0-8
Poller_0-9(DEBUG):Worker started Poller_0-9
Poller_0-10(DEBUG):Worker started Poller_0-10
Poller_0-11(DEBUG):Worker started Poller_0-11
Poller_0-12(DEBUG):Worker started Poller_0-12
Poller_0-1(DEBUG):Worker Poller_0-1 checking queue 0 (38) for work
Poller_0-13(DEBUG):Worker started Poller_0-13
Poller_0-14(DEBUG):Worker started Poller_0-14
Poller_0-15(DEBUG):Worker started Poller_0-15
Poller_0-4(DEBUG):Worker Poller_0-4 checking queue 0 (38) for work
Poller_0-16(DEBUG):Worker started Poller_0-16
Poller_0-17(DEBUG):Worker started Poller_0-17
Poller_0-18(DEBUG):Worker started Poller_0-18
Poller_0-2(DEBUG):Worker Poller_0-2 checking queue 0 (38) for work
Poller_0-19(DEBUG):Worker started Poller_0-19
Poller_0-20(DEBUG):Worker started Poller_0-20
Poller_0-13(DEBUG):Worker Poller_0-13 checking queue 0 (38) for work
Poller_0-12(DEBUG):Worker Poller_0-12 checking queue 0 (38) for work
poller2.mycompany.domain(DEBUG):Started 20 poller threads for group 0
Poller_0-6(DEBUG):Worker Poller_0-6 checking queue 0 (38) for work
Poller_0-14(DEBUG):Worker Poller_0-14 checking queue 0 (38) for work
Poller_0-7(DEBUG):Worker Poller_0-7 checking queue 0 (38) for work
Poller_0-11(DEBUG):Worker Poller_0-11 checking queue 0 (38) for work
Poller_0-15(DEBUG):Worker Poller_0-15 checking queue 0 (38) for work
Poller_0-16(DEBUG):Worker Poller_0-16 checking queue 0 (38) for work
Poller_0-17(DEBUG):Worker Poller_0-17 checking queue 0 (38) for work
Poller_0-5(DEBUG):Worker Poller_0-5 checking queue 0 (38) for work
Poller_0-4(ERROR):Poller poller exception! The previous master is now a slave
Poller_0-10(DEBUG):Worker Poller_0-10 checking queue 0 (38) for work
Poller_0-1(ERROR):Poller poller exception! The previous master is now a slave
Poller_0-9(DEBUG):Worker Poller_0-9 checking queue 0 (38) for work
Poller_0-3(DEBUG):Worker Poller_0-3 checking queue 0 (38) for work
Poller_0-8(DEBUG):Worker Poller_0-8 checking queue 0 (38) for work
Poller_0-18(DEBUG):Worker Poller_0-18 checking queue 0 (38) for work
Poller_0-2(ERROR):Poller poller exception! The previous master is now a slave
Poller_0-20(DEBUG):Worker Poller_0-20 checking queue 0 (38) for work
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
redis.exceptions.ReadOnlyError: You can't write against a read only replica.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
Traceback (most recent call last):
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 64, in read_response
    raise ConnectionError('The previous master is now a slave')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
redis.exceptions.ConnectionError: The previous master is now a slave
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 64, in read_response
    raise ConnectionError('The previous master is now a slave')

During handling of the above exception, another exception occurred:

redis.exceptions.ConnectionError: The previous master is now a slave
Traceback (most recent call last):
poller2.mycompany.domain(INFO):Groups: [0]
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
Poller_0-12(ERROR):Poller poller exception! The previous master is now a slave
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 64, in read_response
    raise ConnectionError('The previous master is now a slave')
Poller_0-13(ERROR):Poller poller exception! The previous master is now a slave
Traceback (most recent call last):
Poller_0-11(ERROR):Poller poller exception! The previous master is now a slave
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
Poller_0-14(ERROR):Poller poller exception! The previous master is now a slave
Traceback (most recent call last):
Traceback (most recent call last):
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
redis.exceptions.ConnectionError: The previous master is now a slave
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
Traceback (most recent call last):
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
Poller_0-19(DEBUG):Worker Poller_0-19 checking queue 0 (38) for work

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response

During handling of the above exception, another exception occurred:

  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
Traceback (most recent call last):
Poller_0-7(ERROR):Poller poller exception! The previous master is now a slave
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
Poller_0-5(ERROR):Poller poller exception! The previous master is now a slave
Traceback (most recent call last):
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
  File "/usr/lib/python3/dist-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 64, in read_response
    raise ConnectionError('The previous master is now a slave')

During handling of the above exception, another exception occurred:

  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
redis.exceptions.ConnectionError: The previous master is now a slave

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
Poller_0-17(ERROR):Poller poller exception! The previous master is now a slave
Traceback (most recent call last):
Traceback (most recent call last):
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 55, in read_response
    return super(SentinelManagedConnection, self).read_response()
  File "/usr/lib/python3/dist-packages/redis/connection.py", line 718, in read_response
    raise response
  File "/usr/lib/python3/dist-packages/redis/client.py", line 2477, in bzpopmin
    return self.execute_command('BZPOPMIN', *keys)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 839, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/usr/lib/python3/dist-packages/redis/client.py", line 853, in parse_response
    response = connection.read_response()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
redis.exceptions.ReadOnlyError: You can't write against a read only replica.
Poller_0-15(ERROR):Poller poller exception! The previous master is now a slave
  File "/opt/librenms/LibreNMS/queuemanager.py", line 54, in _service_worker
    device_id = self.get_queue(queue_id).get(True, 10)
  File "/opt/librenms/LibreNMS/__init__.py", line 325, in get
    item = self._redis.bzpopmin(self.key, timeout=timeout)
  File "/usr/lib/python3/dist-packages/redis/sentinel.py", line 64, in read_response
    raise ConnectionError('The previous master is now a slave')

Redis info:

# Server
redis_version:5.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:636cde3b5c7a3923
redis_mode:standalone
os:Linux 5.4.0-52-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.2.1
process_id:149323
run_id:818023e9a03ed8ae85b609dfcd0b43dc6a17218a
tcp_port:6379
uptime_in_seconds:784
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:9640377
executable:/usr/bin/redis-server
config_file:/etc/redis/redis.conf

# Clients
connected_clients:7
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0

# Memory
used_memory:2144264
used_memory_human:2.04M
used_memory_rss:6004736
used_memory_rss_human:5.73M
used_memory_peak:2865248
used_memory_peak_human:2.73M
used_memory_peak_perc:74.84%
used_memory_overhead:2031110
used_memory_startup:796320
used_memory_dataset:113154
used_memory_dataset_perc:8.39%
allocator_allocated:2276336
allocator_active:2674688
allocator_resident:5894144
total_system_memory:8319873024
total_system_memory_human:7.75G
used_memory_lua:41984
used_memory_lua_human:41.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.17
allocator_frag_bytes:398352
allocator_rss_ratio:2.20
allocator_rss_bytes:3219456
rss_overhead_ratio:1.02
rss_overhead_bytes:110592
mem_fragmentation_ratio:2.91
mem_fragmentation_bytes:3943424
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:183998
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:365
rdb_bgsave_in_progress:0
rdb_last_save_time:1603475715
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:421888
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:1568
total_commands_processed:13565
instantaneous_ops_per_sec:12
total_net_input_bytes:930006
total_net_output_bytes:5752119
instantaneous_input_kbps:0.63
instantaneous_output_kbps:11.80
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:1079
keyspace_misses:44
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:481
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:83947277
master_link_down_since_seconds:73
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:06da7d208267c0e22b19e28b40b8e79ce969d886
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:83947277
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:83537050
repl_backlog_histlen:410228

# CPU
used_cpu_sys:1.968231
used_cpu_user:1.344047
used_cpu_sys_children:0.003935
used_cpu_user_children:0.000000

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=27,expires=26,avg_ttl=0