Single device randomly showing offline despite being online

Since I’ve switched my homelab from esxi to proxmox, I have a strange issue. lnms will periodically claim the proxmox host is down for hours, while happily graphing all the vm’s that run on this supposedly offline machine.

lnms event log claims “Device status changed to Down from snmp check.”, then few hours later it will say up from snmp check and so on.

Anytime lnms sees it as offline, if I log on the lnms machine, I can succesfully ping the proxmox host, and fping returns “… is alive”. Both snmpbulkwalk and telnet 6556 correctly fetch all values. The graylog plugin continues to feed the syslog messages from the proxmox host to lnms, so they keep popping up, but the machine name is red and there are huge gaps in the graphs. All the machines finish polling in time, the proxmox host takes on average 10-15sec to poll.

Any idea what could be going on? I already tried the windows solution of restarting everything from the router to each switch and machine to no avail.

I’ve activated 1 min ping but despite both ping and fping can reach the machine lnms won’t pick it up. The only thing I can think of that is different to the esxi host is that the proxmox host only has ipv4 address which is how I purposefully set it up as I do not understand ipv6 addresses at all, and seeing as my isp doesn’t provide ipv6 connection I have no use for them.

./validate.php output:

Component Version
LibreNMS 1.54-31-gd64c884b4
DB Schema 2019_07_09_150217_update_users_widgets_settings (140)
PHP 7.2.9-1+b2
MySQL 10.3.15-MariaDB-1
RRDTool 1.7.1
SNMP NET-SNMP 5.7.3

====================================

[OK] Composer Version: 1.9.0
[OK] Dependencies up-to-date.
[OK] Database connection successful
[OK] Database schema correct

So, it gets weirder. Today when I woke up I noticed the proxmox host as “offline” in nms again, then it came “online” from snmp check minutes later. Upon inspecting the graphs, the gaps in proxmox machine graphs and the “up/down from snmp check” messages in eventlog, align almost perfectly with the gaps in my bedroom main/gaming computer graphs (which I turn off when I’m at work or sleep).

These are 2 separate physical machines, with the gaming pc being a win10 pc and having exactly zero to do with any part of the lnms install. It’s as if lnms (which runs on another separate physical machine) only polls the proxmox host when I open nms webpage from my gaming computer, which makes zero sense at all.

I’m at a complete loss here, any ideas anybody?

edit: I just confirmed this by switching my main computer off few times, within 10minutes of switching my main computer off, lnms will report the proxmox host as gone down from snmp check as well. What the hell is going on, the 2 machines have literally nothing to do with one another

You sure about that? I would be doing some packet captures, etc.

“Figured” what it was, while redoing the homelab I was also redoing the cabling and swapped a few cables and switches around. So I swapped the switches back where they were and for last 9 hours all works as it should.

I’ve no idea why or whether that had something to do with it, the switches are unmanaged and it’s my understanding they simply forward the packets depending on source ip or something to that extent (i’m no IT expert just an enthusiast). Thus I thought swapping cables around wouldn’t cause no issue, turns out I was yet again wrong.

Unless it was something different in which case I’m just happy it works now. Hate issues like this, it’s like I don’t deal with a strictly organised systems but with some technological voodoo at times :smiley: