I’ve written a custom Munin-style agent to collect data on job queues for HTCondor. When a submit node gets very busy, sometimes the command to get job statuses will time out. This takes much longer than the default 10s unix agent timeout.
When this timeout happens, a new
munin_plugins_ds entry is created with an empty string for
ds_name. Once this happens the RRD graphs fail, because it tries to construct a graph command with a malformed data source,
DEF:=/path/to/munin_htcondor_.rrd with that empty
ds_name string in the DEF (between the : and = where a real name should be).
I have written timeouts into the Munin script, and upped the unix-agent timeout in config.php, but it’d be nicer if the empty
ds_name entries weren’t created.