My experience with new setup, for the next guy

Harmon20 · 8 August 2019 14:02

I’ve had a heckuva time getting LibreNMS up and working. It’s probably stupid n00b problems, but I thought I’d post my experience - and solutions - here for the next n00b that could use the help. I know this is a stupidly long post, but I’m putting in all the boring details so googlers can find this if they’re looking for the exact same problems.

I start at www . librenms . org and clicked the Downloads link in the header. I went for a VM and clicked the OVA Images link to arrive at github . com / librenms / packer-builds / releases / tag / 1.54 where I downloaded the librenms-centos-7.6-x86_64.ova file.

PROBLEM 1 - no import of .ova to VMware vSphere

In theory this is an open format that I should be able to import into my vSphere environment. (vSphere Standard 6.5 with vCenter Standard) I try to import the .ova file from the vCenter HTML 5 GUI using the “Deploy OVF Template”. The validation fails with the error:

The provided manifest file is invalid: Invalid OVF checksum algorithm: SHA1.

I tried to load the .ova file into VirtualBox (went fine) and export a new .ova from VirtualBox (went fine). However, when importing into vCenter I got the same error.

PROBLEM 2 - no import of .ova to VMware Workstation

I tried whatever I could get out of Google, but what ended up working was an attempt that I really didn’t expect to work, so obstinate was this problem. I tried to import the original .ova into VMware Workstation Player. I immediately got what appeared to be the same error as I got in vCenter as far as meaning goes. This one said

The import failed because [.ova filename] did not pass OVF specification conformance or virtual hardware compliance checks.

SOLUTION 2 - RTFM

The popup with the above error also had a paragraph right below it instructing me to click the Retry button to ignore the error. Clicking the Retry button caused the .ova to go ahead and import into Workstation.

I didn’t read that section of text in the error popup the first couple times it appeared. I simply dismissed it because I’d seen the error before and knew what it was about so I didn’t read past that point, resulting in me being unable to import into Workstation. All I had to do was read the whole thing instead of dismissing the popup after reading the first few words, as if I knew what the rest of it said. I hate it when Users do that.

User: It doesn’t work.
Me: What does “doesn’t work” mean?
User: I try to run it but I just get an error.
Me: What does the error say?
User: I don’t know. It’s just an error.

Yeah, I can be a User, too. If it matters, I felt appropriately (I think) abashed after I figured out all my problems would go away if I would just read the thing.

SOLUTION 1 - vCenter Converter Standalone

After closing VMware Workstation I was able to use VMware vCenter Converter Standalone to push the VM to my vSphere datacenter. I set the source file type as “VMware Workstation or other VMware virtual machine” and pointed to the Workstation .vmx file, set the destination type as “VMware Infrastructure virtual machine” and pointed it to the vCenter instance. The rest of the settings are as you would expect for vSphere. I A/B tested changing the hardware config during this conversion and leaving as is. It didn’t make any difference so I went ahead and upped the specs from 1 vCPU to 4, 512MB RAM to 8GB, and set the disk type to Thin provision.

PROBLEM 3 - vboxguest error at login screen

When starting up the VM in vSphere everything seems fine but there are errors on the login screen that read

[timestamp] vboxguest: loading out-of-tree module taints kernel.
[timestamp] vboxguest: module verification failed: signature and/or required key missing - tainting kernel
[timestamp] vboxguest: PCI device not found, probably running on physical hardware.

SOLUTION 3 - [unresolved]

These errors are the same whether I left the hardware from the original .ova as-is or modified in the conversion. The don’t seem to cause a problem so I’ve not dealt with it yet. I’ll update this post when/if I figure it out.

I log in to the VM and update yum, install NetworkManager, NetworkManager-tui, Nano, and NTP. (Because I like them, that’s why. Don’t judge.) Set hostname, IP info, and NTP info.

PROBLEM 4 - NTP won’t run at startup

After installing this stuff I reboot, just because Windows is in my head so that’s what I do when too many things change at once. When it comes back up ntpd isn’t running. I get a status of

ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Active: inactive (dead)

I’m trying to get NTP going at this point because doing so seems to prevent some time errors that have popped up in other installation attempts.

SOLUTION 4 - systemctl disable chronyd

That’s it. That fixed it.

So I move on to LibreNMS. I run validate.php from the CLI and get some errors to be sorted out.

PROBLEM 5 - Swedish?

Validation gives me the error

[FAIL] MySQL Database collation is wrong: latin1 latin1_swedish_ci
[FIX]:
Check https:// t.libren.ms/-zdwk for info on how to fix.

SOLUTION 5 - MySQL command

Happily that link does provide the fix. It is the command

echo ‘ALTER DATABASE librenms CHARACTER SET utf8 COLLATE utf8_unicode_ci;’ | mysql -p -u librenms librenms

After which this error disappears from validation. It appears to be custom made for copy-paste into a terminal window so you don’t have to mess with MySQL directly, so that’s what I did. Works as intended.

PROBLEM 6 - Discovery not run

The next error in validation is

[FAIL] Discovery has not completed in the last 24 hours.
[FIX]:
Check the cron job to make sure it is running and using discovery-wrapper.py

Even if I wait around this won’t resolve. I assumed it is because I’ve not added any device yet, but when I go to the web GUI I see that the localhost is added as a device. So…?

SOLUTION 6 - Run Discovery

So I go to the machine edit screen for the localhost in the device list, set my hostname and location, then use the “Rediscover Device” button. I let the page reload, and just for good measure reboot. (Windows has warped my fragile little mind.) The Discovery error is gone, but I get a new error, which I recall seeing before but can’t recall the details of the circumstances.

PROBLEM 7 - Poller not running

The error now is

[FAIL] The poller (localhost.localdomain) has not completed within the last 5 minutes, check the cron job.

I go to Settings (gear in upper right of page) > Pollers > Pollers and see that I have a listing for both my custom hostname and one for “localhost.localdomain”. Odd, since they are the same machine. Why didn’t my edit of the device details simply change the existing poller instead of creating a new instance?

SOLUTION 7 - Delete

Since they are really the same VM and the one with the custom hostname is the most accurate and up-to-date I delete the generic one with the little trash can icon beside it. The first time I looked at this list was a few minutes after reboot and at that time deleting anything wasn’t an option. After a few more minutes I came back to this page and the trash can icon had appeared. Waiting for a scheduled event to come around, I guess?

I run validation again on the CLI and it is now all clear. At this point I ran into the problem that caused me to delete my first somewhat successful full install of LibreNMS: the creation of device groups.

PROBLEM 8 - Can’t create device groups.

I tried every solution out there and what finally worked was posted by Rodeorat316 in this thread a few hours before my writing this. Whenever I tried to click the “New Device Group” button I would land on an error page that says

Whoops, looks like something went wrong. Check your librenms.log.

Check your log for more details. (librenms.log)

If you need additional help, you can find how to get help at https:// docs.librenms. org/Support.

When I check librenms.log with a tail I see a 77 line stacktrace that all kicked off with the errors

[timestamp] production.ERROR: file_put_contents(/opt/librenms/cache/devices_relationships.cache): failed to open stream: Permission denied {“userId”:1,“exception”:"[object] (ErrorException(code: 0): file_put_contents(/opt/librenms/cache/devices_relationships.cache): failed to open stream: Permission denied at /opt/librenms/LibreNMS/DB/Schema.php:173)

What the heck? This is a custom VM built by someone that presumably knew what they were doing and should have all the users, groups, and directory permissions sorted out already. I check the file opt/librenms/cache/devices_relationships.cache only to find it doesn’t exist.

SOLUTION 8 - tame SELINUX

I won’t bore you and embarrass myself any more with the details of all the things I tried in my attempting to get around this roadblock to usability. Below is the fix, where the italicized parts are the relevant commands to be issued and the rest is for context:

[librenms@libre cache]$ ls -lah
total 248K
drwxr-xr-x. 2 librenms librenms 45 Aug 5 15:15 .
drwxr-xr-x. 32 librenms librenms 4.0K Aug 7 22:21 …
-rw-rw-r–. 1 librenms librenms 71 Aug 5 15:12 .gitignore
-rw-r–r–. 1 librenms librenms 239K Aug 7 21:35 os_defs.cache
[librenms@libre cache]$ touch devices_relationships.cache
[librenms@libre cache]$ ls -Z
-rw-rw-r–. librenms librenms unconfined_u:object_r:usr_t:s0 devices_relationships.cache
-rw-r–r–. librenms librenms system_u:object_r:usr_t:s0 os_defs.cache
[librenms@libre cache]$ chcon -t httpd_sys_rw_content_t devices_relationships.cache
[librenms@libre cache]$ sudo setenforce 1
[librenms@libre cache]$

And just like that the New Devices page works.

PROBLEM 9 - The next morning.

I wrote this out at the end of a work day. At the time of writing a validation on CLI came back clean. I came in the next morning to find validation failing with

[FAIL] Some folders have incorrect file permissions, this may cause issues.
[FIX]:
sudo chown -R librenms:librenms /opt/librenms
sudo setfacl -d -m g::rwx /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
sudo chmod -R ug=rwX /opt/librenms/rrd /opt/librenms/logs /opt/librenms/bootstrap/cache/ /opt/librenms/storage/
Files:
/opt/librenms/bootstrap/cache/packages.php

SOLUTION 9 - follow instructions

I ran the commands as instructed and validation came back clean again.

RANDOM OTHER PROBLEMS

I had a smattering of other issues that seem to have been resolved by doing things in the above order. One of them was time issues.

Time between this server and the mysql database is off

and

You have a different system timezone (CDT) than the php configured timezone (UTC)

Apparently installing and configuring NTP when I did in this walk-through prevents these errors from popping up. They appeared during one of my many attempts at using provided appliances or doing ground-up builds.

Another popup concerned composer. It occurred when using a provided VM. I never did find a solution to to it. Uninstall, re-install, daily.sh, … nothing worked. It never appeared in the above walk-through. The message as it appeared in the web GUI was

Fail: No composer available, please install composer

That’s all I have for now. I’ll come back and update this thread as I find more issues.

Harmon

Heath_Barnhart · 8 August 2019 15:43

I haven’t had much luck with the VM appliance either. I just setup a generic Centos VM and then follow the instructions for a full build. I’ve gone through that setup 3 times without issues now and it takes about an hour.

https://docs.librenms.org/Installation/Installation-CentOS-7-Apache/