Smokeping failures on reload

I have smokeping integrated with LibreNMS running in the packer VM and there is a cron task to regenerate smokeping configs and reload the systemd service.

[email protected]:~# cat /etc/cron.hourly/librenms-smokeping
#! /usr/bin/env bash

sudo -u librenms /opt/librenms/lnms smokeping:generate --targets > /etc/smokeping/config.d/librenms-targets.conf
sudo -u librenms /opt/librenms/lnms smokeping:generate --probes  > /etc/smokeping/config.d/librenms-probes.conf

systemctl reload smokeping > /dev/null 2<&1

I found that occasionally when the cron task fires, it causes the parent process (864410 below) to exit uncleanly, causing the systemd unit to go into a failed state which prevents subsequent tasks from completing.

Mar 03 22:17:01 librenms systemd[1]: Reloading Latency Logging and Graphing System.
Mar 03 22:17:01 librenms systemd[1]: Reloaded Latency Logging and Graphing System.
Mar 03 22:17:01 librenms smokeping[864410]: Reloading configuration.
Mar 03 22:17:02 librenms smokeping[1609673]: Got HUP signal, exiting gracefully.
Mar 03 22:17:02 librenms smokeping[1609673]: Exiting due to HUP signal.
Mar 03 22:17:02 librenms smokeping[1609674]: Got HUP signal, exiting gracefully.
Mar 03 22:17:02 librenms smokeping[1609674]: Exiting due to HUP signal.
Mar 03 22:17:02 librenms smokeping[864410]: Waiting for child processes to terminate.
Mar 03 22:17:02 librenms smokeping[864410]: Child processes terminated, restarting with new configuration.
Mar 03 22:17:02 librenms smokeping[864410]: Entering multiprocess mode.
Mar 03 22:17:02 librenms smokeping[864410]: No targets defined for probe FPing6, skipping.
Mar 03 22:17:02 librenms smokeping[864410]: No targets defined for probe lnmsFPing6-0, skipping.
Mar 03 22:17:02 librenms smokeping[864410]: No targets defined for probe FPing, skipping.
Mar 03 22:17:02 librenms smokeping[864410]: Child process 1615858 started for probe lnmsFPing-0.
Mar 03 22:17:02 librenms smokeping[864410]: Child process 1615859 started for probe lnmsFPing-1.
Mar 03 22:17:02 librenms smokeping[864410]: No targets defined for probe lnmsFPing6-1, skipping.
Mar 03 22:17:02 librenms smokeping[864410]: All probe processes started successfully.
Mar 03 22:17:02 librenms smokeping[1615859]: lnmsFPing-1: probing 4 targets with step 300 s and offset 191 s.
Mar 03 22:17:02 librenms smokeping[1615858]: lnmsFPing-0: probing 5 targets with step 300 s and offset 169 s.
Mar 03 23:17:02 librenms systemd[1]: Reloading Latency Logging and Graphing System.
Mar 03 23:17:02 librenms smokeping[864410]: Reloading configuration.
Mar 03 23:17:02 librenms systemd[1]: Reloaded Latency Logging and Graphing System.
Mar 03 23:17:03 librenms smokeping[1615858]: Got HUP signal, exiting gracefully.
Mar 03 23:17:03 librenms smokeping[1615858]: Exiting due to HUP signal.
Mar 03 23:17:03 librenms smokeping[1615859]: Got HUP signal, exiting gracefully.
Mar 03 23:17:03 librenms smokeping[1615859]: Exiting due to HUP signal.
Mar 03 23:17:03 librenms smokeping[864410]: Waiting for child processes to terminate.
Mar 03 23:17:03 librenms smokeping[864410]: Can't call method "step" on an undefined value at /usr/share/perl5/Smokeping.pm line 4406.
Mar 03 23:17:03 librenms systemd[1]: smokeping.service: Main process exited, code=exited, status=1/FAILURE
Mar 03 23:17:03 librenms systemd[1]: smokeping.service: Failed with result 'exit-code'.
Mar 04 00:17:02 librenms systemd[1]: smokeping.service: Unit cannot be reloaded because it is inactive.
Mar 04 01:17:01 librenms systemd[1]: smokeping.service: Unit cannot be reloaded because it is inactive.
Mar 04 02:17:01 librenms systemd[1]: smokeping.service: Unit cannot be reloaded because it is inactive.
Mar 04 03:17:01 librenms systemd[1]: smokeping.service: Unit cannot be reloaded because it is inactive.

To remedy this, I modified the script to run systemctl restart if the systemd reload fails

(systemctl reload smokeping || systemctl restart smokeping) > /dev/null 2<&1

I wasn’t sure if this should be raised as an issue but hopefully that will help someone in future.

Thanks for posting this- I was experiencing the same thing. Seems like it will run fine for a random amount of time and then just sort of stop and not be heard from until there is a manual interaction. This certainly helped me!