Strange behaviour after december update

Hello, I´ve got a rather large setup with 100+ devices (mostly switches) being monitored. Everything worked fine until last friday, since then, most of my devices are being unpolled, this behavour seems erratic, validate.php shows everything is ok, then I run it again a bit later and it says almost 50 devices are being unpolled… I´m attaching a screenshot of one random device. Could anyone please give any clue? Could this be due to a librenms update? Thanx in advance

I would start with this -> https://docs.librenms.org/#Support/Performance/

Also, check the poller log in the Web UI -> Settings Icon -> Pollers -> Poller History. See if have any devices taking long to complete polling.

Hi Kevin, I started using librenms about a year ago. I followed the instructions you suggest when I initially set up the system, and everything worked fine since then. This problem arised last friday, that’s why I associated it with an update. Aditionally, the poller history shows the same polling duration as before (about 35s the slowest device).

Are you monitoring the server that the poller is running on? That may indicate something if you have any information on the actual server?

-J

You really should have provided the info we asking for when creating a new thread, most specifically the output of validate.

Hi laf, sorry for that. This is the validate output at a given time (it varies from everything ok to 50+ devs unpolled):

[email protected]:~$ sudo ./validate.php
[sudo] password for librenms:

Component Version
LibreNMS 1.33-202-g782dee6
DB Schema 221
PHP 7.0.22-0ubuntu0.16.04.1
MySQL 10.0.31-MariaDB-0ubuntu0.16.04.2
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

====================================

[OK] Database connection successful
[OK] Database schema correct
[WARN] Some devices have not been polled in the last 5 minutes. You may have performance issues.
[FIX] Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
192.168.76.167
192.168.76.178
192.168.76.146
192.168.76.198
[email protected]:~$

librenms is running on a vm on esxi 6.5, Nothing changed in its behaviour either

What was the version you were on before?

Look for Updated from 782dee6 to ..... in logs/daily.log

this is the only instance of that version i can find (it´s today anyway):
Fetching notifications
[ Mon, 04 Dec 2017 00:15:36 -0300 ] http://www.librenms.org/notifications.rss (23)
[ Mon, 04 Dec 2017 00:15:37 -0300 ] misc/notifications.rss (30)
[ Mon, 04 Dec 2017 00:15:37 -0300 ] Updating DB Done
Returned: 0
Caching PeeringDB data
Returned: 0
Checking PHP version
Returned: 0
Updating to latest codebase
Returned: 0
Updated from de1e47aa7052d0114d0aa4c263a5c99c5f238db2 to 782dee60fe77950b2b44fadca4b6556cf4104084
Returned: 0

And this is the update which seems to be coincident with my issue:
Fetching notifications
[ Fri, 01 Dec 2017 00:15:38 -0300 ] http://www.librenms.org/notifications.rss (23)
[ Fri, 01 Dec 2017 00:15:38 -0300 ] misc/notifications.rss (30)
[ Fri, 01 Dec 2017 00:15:38 -0300 ] Updating DB Done
Returned: 0
Caching PeeringDB data
Returned: 0
Checking PHP version
Returned: 0
Updating to latest codebase
Returned: 0
Updated from e3082873f6f42cdcfed9d1b230b02c902c44ec75 to 3dcadcccce26d3bd1d29dfd77952721ae1872b84
Returned: 0

You can go back to that commit last friday and see: git checkout e3082873f6f42cdcfed9d1b230b02c902c44ec75

However going off the commits in between, nothing should have broken polling like that.

Ok, I´ll try that tomorrow.

this is what ./validate says after that:

[email protected]:~$ sudo ./validate.php

====================================

Component Version
LibreNMS 1.33-186-ge308287
DB Schema 221
PHP 7.0.22-0ubuntu0.16.04.1
MySQL 10.0.31-MariaDB-0ubuntu0.16.04.2
RRDTool 1.5.5
SNMP NET-SNMP 5.7.3

====================================

[OK] Database connection successful
[WARN] Your schema (221) is newer than than expected (217). If you just switch to the stable release from the daily release, your database is in between releases and this will be resolved with the next release.
[FAIL] Database: extra table (application_metrics)
[FAIL] Database: extra table (entityState)
[FAIL] We have detected that your database schema may be wrong, please report the following to us on IRC or the community site (link removed):
[FIX] Run the following SQL statements to fix.
SQL Statements:
DROP TABLE application_metrics;
DROP TABLE entityState;
[WARN] Some devices have not been polled in the last 5 minutes. You may have performance issues.
[FIX] Check your poll log and see: http://docs.librenms.org/Support/Performance/
Devices:
192.168.76.132
192.168.76.129
192.168.76.237
192.168.76.174
192.168.76.178
192.168.76.198
[WARN] Your install is over 24 hours out of date, last update: Thu, 30 Nov 2017 22:51:20 +0000
[FIX] Make sure your daily.sh cron is running and run ./daily.sh by hand to see if there are any errors.
[WARN] Your local git branch is not master, this will prevent automatic updates.
[FIX] You can switch back to master with git checkout master

should I aply the recommended fixes to de db?

No leave those for now and just see how you get on.

well, nothing changed. but I think I´m taking a wrong approach here, friday update ocurred at
Fri, 01 Dec 2017 00:15:38 -0300
and all my graphs (like the one i posted above) shows the issue started friday at 12:00 aprox.
so, doesn’t seems to be due to the update as I initially thought.
I’m reverting changes, should I use:
git checkout master
or just run a ,/daily.sh?

hi laf, I’ve finally solved it. It had nothing to do with the update neither librenms. It was a misconfigured switch (an allied telesys with terrible vlan support an extremely buggy firmware). I strongly apologize for stealing your time with this. I don´t think this thread can help anyone, so, if any mod think it should be deleted, please do it. Thanx again for your time and support.

It’s all good. Thanks for at least replying back.

Yes, git checkout master and then you may as well run ./daily.sh after.