Validate Fail cron job

==========================================================

Component Version
LibreNMS 2f5a1742c3a9b8ed515a69f7794751f88cdb5e63
DB Schema 188
PHP 7.0.18
MySQL 5.5.52-MariaDB
RRDTool 1.4.8
SNMP NET-SNMP 5.7.2

==========================================================

[OK] Database connection successful
[OK] Database schema correct
[FAIL] The poller has not run in the last 5 minutes, check the cron job

I keep getting Fail on The poller has not run in the last 5 minutes, check the cron job.

I know the poller is still polling every 5 min i can see it working in top and in the WebUI poller history. I doubled checked the cron job and it all looks good and working. Not sure why it keeps say fail cron job.

Not sure what the issue is.

Are you sure all your devices are finishing in 5 minutes?

Check /poll-log/ in your webui.

Yes they are

Do you have the same timezone in mysql as your local install.

Not sure where would i check that?

Yes the timezone is correct the database is getting it off the Host time.

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE ‘time_zone’;
±--------------±-------+
| Variable_name | Value |
±--------------±-------+
| time_zone | SYSTEM |
±--------------±-------+
1 row in set (0.01 sec)

run select * from pollers;

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ‘run select * from pollers’ at line 1

don’t type run

Sorry i got it now
±—±---------------±--------------------±--------±-----------+
| id | poller_name | last_polled | devices | time_taken |
±—±---------------±--------------------±--------±-----------+
| 1 | nms.nbisd.edu
| 2017-05-02 11:33:19 | 279 | 1698 |
±—±---------------±--------------------±--------±-----------+
1 row in set (0.00 sec)

Run that query straight after validate.php and also run date in the cli. Post the output of all

MariaDB [librenms]> select * from pollers;
±—±---------------±--------------------±--------±-----------+
| id | poller_name | last_polled | devices | time_taken |
±—±---------------±--------------------±--------±-----------+
| 1 | nms.nbisd.edu
| 2017-05-02 18:41:42 | 279 | 101 |
±—±---------------±--------------------±--------±-----------+
1 row in set (0.00 sec)

MariaDB [librenms]> select * from pollers date;
±—±---------------±--------------------±--------±-----------+
| id | poller_name | last_polled | devices | time_taken |
±—±---------------±--------------------±--------±-----------+
| 1 | nms.nbisd.edu
| 2017-05-02 18:41:42 | 279 | 101 |
±—±---------------±--------------------±--------±-----------+
1 row in set (0.00 sec)

Well it got worse this morning. the poller randomly stopped polling…not sure why.

You’ve got to have devices that are taking too long to poll, it could be random but it looks like that’s what’s happening.

Run this via ssh:

tail -2000 logs/librenms.log| grep -P 'secs$' |grep poller

Do you see any times over 100/200 seconds?

1 Like

/opt/librenms/poller.php 194 2017-05-03 10:25:38 - 1 devices polled in 5.434 secs
/opt/librenms/poller.php 254 2017-05-03 10:25:38 - 1 devices polled in 6.776 secs
/opt/librenms/poller.php 70 2017-05-03 10:25:38 - 1 devices polled in 5.616 secs
/opt/librenms/poller.php 160 2017-05-03 10:25:38 - 1 devices polled in 5.507 secs
/opt/librenms/poller.php 48 2017-05-03 10:25:38 - 1 devices polled in 3.660 secs
/opt/librenms/poller.php 133 2017-05-03 10:25:38 - 1 devices polled in 3.643 secs
/opt/librenms/poller.php 38 2017-05-03 10:25:38 - 1 devices polled in 5.974 secs
/opt/librenms/poller.php 251 2017-05-03 10:25:38 - 1 devices polled in 2.284 secs
/opt/librenms/poller.php 123 2017-05-03 10:25:38 - 1 devices polled in 3.648 secs
/opt/librenms/poller.php 174 2017-05-03 10:25:39 - 1 devices polled in 4.097 secs
/opt/librenms/poller.php 166 2017-05-03 10:25:39 - 1 devices polled in 6.534 secs
/opt/librenms/poller.php 259 2017-05-03 10:25:39 - 1 devices polled in 5.813 secs
/opt/librenms/poller.php 213 2017-05-03 10:25:39 - 1 devices polled in 6.791 secs
/opt/librenms/poller.php 56 2017-05-03 10:25:39 - 1 devices polled in 4.915 secs

On the other hand, i just removed are WLC it was massive with lots of interfaces I suspect that was causing the log poller time.

Should everything return to normal after that?

It should if that was the issue. What polled module was taking the longest amount of time for that device?

That was the issue with are HP MSM 765 it always overloaded and CPU pegged out, (its going to be replaced this summer) It has 1,350 WLAN interfaces on it and looks like that was causing the poller to take so long.

Thank you for your help.

You can enable selective port polling and disable a load of interfaces you don’t care about to improve that - vastly in some cases.