Device Storage % warning getting reset

bbach · 3 October 2017 14:38

Hi, I have a device (Juniper SRX220) that I am setting the % warn value for a storage device to 100%. Every night it seems to be getting reset to 60%. In looking at the event log, it appears like certain attributes of the device are getting removed and then 6ish hours later getting re-added. I’m assuming this has something to do with discovery process. That said, I don’t see anything about the storage device in the event log. Any hints as to what might be going on or how to debug it? Thanks. – Bud

bbach · 3 October 2017 14:54

Just another data point.

There is a 6 hour gap in some of the charts for the device too:
Processor, Memory, Disk Usage, Temperature, State, the ports.

Some charts seem fine:
Running Processes, Users Logged in, System Uptime.

Kevin_Krumm · 4 October 2017 02:38

Please run . /validate.php and pastebin the output.

Thank you

bbach · 4 October 2017 13:10

Sorry, meant to include this originally…

====================================

Component	Version
LibreNMS	1.32-4-ga1d7826
DB Schema	209
PHP	5.5.9-1ubuntu4.20
MySQL	5.5.52-MariaDB-1ubuntu0.14.04.1
RRDTool	1.4.7
SNMP	NET-SNMP 5.7.2

====================================

[OK] Database connection successful
[OK] Database schema correct

Kevin_Krumm · 4 October 2017 13:31

Use Pastebin and post the output of the discovery and poller for this device.

bbach · 4 October 2017 15:54

discovery.php: https://pastebin.com/RP1p09zr
poller.php: https://pastebin.com/Yt4t69Sj

Kevin_Krumm · 4 October 2017 16:07

try ./discovery.php -r -f -h HOSTNAME -d

bbach · 4 October 2017 17:29

Here you go Kevin…

Output of /discovery.php -r -f -h HOSTNAME -d

https://pastebin.com/E0i2aFnP

laf · 4 October 2017 19:01

That last discovery doesn’t show any storage being removed and/or added.

Can you post the screenshot of the eventlog showing the storage changes?

bbach · 4 October 2017 19:32

Events where you can see stuff getting removed and added:

bbach · 4 October 2017 19:33

So I’m wondering if the device is slow to respond and librenms gives up on it. Below, you can see a number of “reboots” but the device has not rebooted (look at the “after days”):

laf · 4 October 2017 19:33

That’s the processor not storage, which is it that’s not working?

bbach · 4 October 2017 19:38

I never see the storage getting deleted in the event log. The symptom is the %warn gets reset from my user defined 100 back to the default 60 (currently set to 100):

bbach · 4 October 2017 19:47

Here is the poller graph for that device which is interesting:

Selection_137

CPU spikes around those times too. Also, during those spikes, the storage device in question /junos show zero use (even though it is 100% used).

murrant · 5 October 2017 03:25

Most of the time when I see that behavior, the snmp response is truncated. So it may be an snmp bug in the device or a bad network connection.

A firmware update may fix this.

If you can verify this is not a network connectivity issue, you may need to open a ticket with Juniper support.

bbach · 5 October 2017 14:58

OK, Thanks Tony. – Bud

bbach · 5 October 2017 19:12

As an FYI, I found out I could restart the snmp process on junos with “restart snmp”. Going to see if that clears up the issue at all.

bbach · 6 October 2017 14:39

The ‘restart snmp’ seems to have helped this device. CPU never went over 75% on the last discovery and previously it was running at 100%. No weird gaps on any charts last night and nothing in the eventlog so it seems much better. We’ll see over the next few days… Thanks for your help! – Bud

bbach · 10 October 2017 14:44

One more update. The ‘restart smnp’ did not actually help. Still seems like there is an issue with snmp responses from this device. Probably the SRX…