Proxmox and ceph graphs stopped working on october 5th

When asking for help and support, please provide as much information as possible. This should include:

===========================================

Component Version
LibreNMS 23.9.1-82-g8e8fa8365 (2023-10-13T00:09:41+02:00)
DB Schema 2023_10_07_231037_application_metrics_add_primary_key (262)
PHP 8.2.11
Python 3.9.2
Database MariaDB 10.5.22-MariaDB-log
RRDTool 1.7.2
SNMP 5.9
===========================================

[OK] Composer Version: 2.6.5
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database Schema is current
[OK] SQL Server meets minimum requirements
[OK] lower_case_table_names is enabled
[OK] MySQL engine is optimal
[OK] Database and column collations are correct
[OK] Database schema correct
[OK] MySQl and PHP time match
[OK] Active pollers found
[OK] Dispatcher Service not detected
[OK] Locks are functional
[OK] Python poller wrapper is polling
[OK] Redis is unavailable
[OK] rrdtool version ok
[OK] Connected to rrdcached

If you are having troubles with discovery/polling include the pastebin output of:

./discovery.php -h HOSTNAME -d | ./pbin.sh
./poller.php -h HOSTNAME -r -f -d | ./pbin.sh

lnms device:poll -vv -x -m unix-agent,applications
seems to successfully pull data from the unix-agent for both ceph and proxmox:

‘app’ =>
array (
‘ceph’ => ‘
.mgr:0:0:0
S3520:4:1108:156611
S4510:33:249674:8530

osd.0:1:1
osd.9:1:1
osd.3:0:0
osd.6:0:0
osd.11:0:0
osd.10:0:0
osd.8:0:0
osd.7:0:0
osd.5:0:0
osd.4:0:0
osd.2:2:2
osd.1:1:1

c:17283258777600:1799918342144:15483340435456
.mgr:2953434103808:538718208:44
S3520:4360961523712:530356441402:46681
S4510:492237520896:1254776979787:121047’,
‘proxmox’ => ‘clustername
1002/net0/687329343/654589757/vm1002
1004/net0/6873465217/21532207545/vm1004
1005/net0/6820106436/684764491/vm1005
1007/net0/4159822726/886573315/vm1007
1008/net0/9239700387/2363092556/vm1008
1009/net0/14700688165/8178852725/vm1009
100/net0/3164129311381/1600494972925/vm100
100/net1/1634852400920/3281317807698/vm100
1011/net0/339862362/345573012/vm1011
1017/net0/297578918/131951028/vm1017
102/net0/393748098998/413770976681/vm102
1102/net0/566802761/403142687/vm1102
1103/net0/14155649923/21334881532/vm1103
1104/net0/893151470/312148994/vm1104
1107/net0/376739040/210655459/vm1107’,
),
)

But the data is not found then it wants to update the rrds, insteads it tries to fetch it again with snmp:

Application: proxmox, app_id=19SNMP[‘/usr/bin/snmpget’ ‘-v2c’ ‘-c’ ‘COMMUNITY’ ‘-Oqv’ ‘-M’ ‘/opt/librenms/mibs’ ‘udp:HOSTNAME:161’ ‘.1.3.6.1.4.1.8072.1.3.2.3.1.2.7.112.114.111.120.109.111.120’]
No Such Instance currently exists at this OID

includes/polling/applications/proxmox.inc.php contains:

$name = ‘proxmox’;
if (\LibreNMS\Config::get(‘enable_proxmox’) && ! empty($agent_data[‘app’][$name])) {
$proxmox = $agent_data[‘app’][$name];
} elseif (\LibreNMS\Config::get(‘enable_proxmox’)) {
$options = ‘-Oqv’;
$oid = ‘.1.3.6.1.4.1.8072.1.3.2.3.1.2.7.112.114.111.120.109.111.120’;
$proxmox = snmp_get($device, $oid, $options);
$proxmox = preg_replace(‘/^.+\n/’, ‘’, $proxmox);
$proxmox = str_replace(“<<>>\n”, ‘’, $proxmox);
}

so i assume that the test for ! empty($agent_data[‘app’][$name]) fails to find data.

If i add
‘extend proxmox /usr/bin/sudo /usr/lib/check_mk_agent/local/proxmox’
to the snmp configuration on the proxmox host, the data gets fetched a second time and the graphs for proxmox starts to update again in librenms:

Application: proxmox, app_id=4SNMP[‘/usr/bin/snmpget’ ‘-v2c’ ‘-c’ ‘COMMUNITY’ ‘-Oqv’ ‘-M’ ‘/opt/librenms/mibs’ ‘udp:HOSTNAME:161’ ‘.1.3.6.1.4.1.8072.1.3.2.3.1.2.7.112.114.111.120.109.111.120’]
“<<>>
clustername
1002/net0/695290798/661863191/vm1002
1004/net0/6936145115/21687126085/vm1004
1005/net0/6888318507/691611814/vm1005
1007/net0/4161748379/887419609/vm1007
1008/net0/9328342305/2382567269/vm1008
1009/net0/14822337265/8243321299/vm1009
100/net0/3173956244017/1602257937208/vm100
100/net1/1636863777001/3292091194460/vm100
1011/net0/341615444/348937740/vm1011
1017/net0/299183535/133337231/vm1017
102/net0/396072904734/416160300817/vm102
1102/net0/571450340/407523237/vm1102
1103/net0/14261869368/21441745466/vm1103
1104/net0/902949301/315534693/vm1104
1107/net0/379339055/213219311/vm1107”

the ceph poller in includes/polling/applications/ceph.inc.php does not have a fallback to snmp, it just checks for the data collected from the unix-agent:

$name = ‘ceph’;
if (! empty($agent_data[‘app’][$name])) {
$ceph_data = $agent_data[‘app’][$name];

so device:poll only returns
Application: ceph, app_id=13
and no rrds/application_metrics are updated.

I do not have any errors in logs/librenms.log, only INFO lines about the devices polled.

Run the poller in debug mode and check the output. lnms device:poll -vvv <hostname>

Ok, i only used -vv when i did the first post, but i tried it again with -vvv.
It did not seem to give any new insights on why the data is not found in the applications module.

unix-agent finds data for ceph and proxmox with the same output as in the first post, and the application module still seem to fail the test for !empty($agent_data[‘app’][$name]).

The output is:
#### Load poller module applications ####

Module enabled: Global + | OS | Device | Manual +
SQL[select * from applications where applications.device_id = ? and applications.device_id is not null and applications.deleted_at is null [33] 0.59ms]

Application: ceph, app_id=13
Application: proxmox, app_id=4SNMP[‘/usr/bin/snmpget’ ‘-v2c’ ‘-c’ ‘<snip>’ ‘-Oqv’ ‘-M’ ‘/opt/librenms/mibs’ ‘udp:<host>:161’ ‘.1.3.6.1.4.1.8072.1.3.2.3.1.2.7.112.114.111.120.109.111.120’]

and then it proceeds to fetch the proxmox data over snmp and update only the proxmox graphs.

Is there something specific that i should look for in the nms device:poll -vvv output?

I added print_r($agent_data) just before the !empty test in includes/polling/applications/ceph.inc.php.

That only output from that print_r is:
Array
(
)

I then tested to add ‘global $agent_data;’ instead of the print_r and now the data is found and the graphs are updated, but that is most likely not a very good solution to the issue.

But the issue seems to be that the $agent_data variable is not available to the ceph application module.

1 Like

After applying 7a8e479, both proxmox and ceph updates their graphs with data collected from the unix-agent again.

Thanks for the quick solution to the issue.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.