Is any one able to help please? I have a 3 polling cluster for around 750 hosts. I have some gaps on Juniper hosts which appears to be an issue within their SNMP stack. However every day I get around 1hours worth of gap between 12am - 1am. I cannot seem to find what is causing it. My Cron jobs are default and if run manually do not appear to take long.
Has any one seen this please?
Are you using the Dispatcher Service? I am seeing similar behavior since switching from the Cron jobs to the service. I am in GMT-0400, and have gaps every evening from 20:00 - 20:20.
Anyone else running the service care to comment on whether you’ve seen this behavior?
I actually think this is correct. Looking at my information from the application it showed 24 million lines in my event log and the daily.sh was locking when running.I manually dealt with the 24 million lines so I will see if the gaps disappear tonight. Despite the 3 polling machines, off board SQL the processors on all items are running terribly high and I do wish Juniper would sort their SNMP engine out so it polled suitably in the polling time.
In daily.logs I see it was dealing with around 1M rows, so what I’ve tried is to purge a bit more, I set syslog messages to be cleaned each day and re-run ./daily.sh:
Syslog cleared for entries over 1 days (about 6158000 rows)
And no gaps.
Seems something else but not purging unused data locks the poller.
Updating to latest codebase
fatal: unable to access 'https://github.com/librenms/librenms.git/': Operation timed out after 300018 milliseconds with 0 out of 0 bytes received
Returned: 1
Updating Composer packages
[Composer\Downloader\TransportException]
The "https://getcomposer.org/versions" file could not be downloaded: SSL: Handshake timed out
Failed to enable crypto
failed to open stream: operation failed
self-update [-r|--rollback] [--clean-backups] [--no-progress] [--update-keys] [--stable] [--preview] [--snapshot] [--set-channel-only] [--] [<version>]
> LibreNMS\ComposerHelper::preInstall
Loading composer repositories with package information
Installing dependencies from lock file
Nothing to install or update
Generating optimized autoload files
> Illuminate\Foundation\ComposerScripts::postAutoloadDump
> @php artisan package:discover --ansi
Discovered Package: ^[[32mdarkghosthunter/larapoke^[[39m
Discovered Package: ^[[32mfideloper/proxy^[[39m
Discovered Package: ^[[32mlaravel/laravel^[[39m
Discovered Package: ^[[32mlaravel/tinker^[[39m
Discovered Package: ^[[32mmartinlindhe/laravel-vue-i18n-generator^[[39m
Discovered Package: ^[[32mnesbot/carbon^[[39m
Discovered Package: ^[[32moriceon/toastr-5-laravel^[[39m
Discovered Package: ^[[32mspatie/laravel-cors^[[39m
Discovered Package: ^[[32mtightenco/ziggy^[[39m
^[[32mPackage manifest generated successfully.^[[39m
> LibreNMS\ComposerHelper::postInstall
> Illuminate\Foundation\ComposerScripts::postInstall
> @php artisan vue-i18n:generate --multi-locales --format=umd
Returned: 0
Updating SQL-Schema
^[[32mNothing to migrate.^[[39m
Returned: 0
Updating submodules
Returned: 0
Cleaning up DB
Refreshing alert rules queries
Clearing OS cache
Refreshing device group table relationships
Syslog cleared for entries over 30 days (about 0 rows)
Authlog cleared for entries over 30 days
Perf times cleared for entries over 30 days
Device perf cleared for entries over 7 days
Deleting alert_logs more than 365 days that are not active
Deleting history of active alert_logs more than 365 days
All deleted ports now purged
Returned: 0
Fetching notifications
[ Thu, 05 Mar 2020 01:06:24 +0100 ] http://www.librenms.org/notifications.rss (1)
[ Thu, 05 Mar 2020 01:07:24 +0100 ] misc/notifications.rss (13)
[ Thu, 05 Mar 2020 01:07:24 +0100 ] Updating DB Done
Returned: 0
Caching PeeringDB data
Peering DB integration disabled
Returned: 0
Could this issue be related to this error while accessing github? It is a proxy variable issue, as daily.sh works nicely when I run it manually:
librenms@no1-librenms-1:~/logs$ …/./daily.sh
Updating to latest codebase OK
Updating Composer packages OK
Updated from baee55a to 2c91724 OK
Updating SQL-Schema OK
Updating submodules OK
Cleaning up DB OK
Fetching notifications OK
Caching PeeringDB data OK
I have set the Environment variables in the service definition file now, defining the proxy URL. Tomorrow, when daily upgrade runs again, I will validate the change.
Updated:
The problem didn’t happen last night with the discover running with only 2 workers.
I am still facing the issue. Today I increased the snmpbulkwalk repeaters to speed up the polls, let’s see what happens - overall poller time was reduced by 20%. The problem is that these gaps appear usually in the early hours, so I have to wait until tomorrow.
In the meantime, I have requested the server team to upscale the VM. Today I have an all-in-one 16-CPU box, monitoring 410 devices and over 20k ports, so it is very likely that a performance issue is also present.
Hello,
Confirming this issue for dispatcher service without distributed poller setup.
No gaps with crontab, daily ~30min gap with systemd service, while service just stop polling wihout any errors in log.
I also found out that librenms waits all processes to complete, and it causes the gaps:
Mar 18 00:04:31 librenms librenms-service.py[50328]: maintenance(INFO):Restarting service...
...
Mar 18 00:10:51 librenms librenms-service.py[50328]: Discovery_0-4(INFO):Completed discovery run for 18 in 583.28s
Mar 18 00:10:51 librenms librenms-service.py[50328]: MainThread(INFO):Configuring LibreNMS service
might be, but as you can see in the logs above there is a strict correlation between completion of the last discovery process and starting librenms-service