I have successfully deployed Librenms to monitor some of our more critical locations for my company. The team really likes what this tool provides and has asked me to take it past a Proof of concept stage. With that in mind I am curious on what would be the best way to set this up.
Currently I have one server, for around 120 network devices, and am eating up approximately 33Gig of space on my drive. I currently have 16Gig memory and 4 virtual CPU’s ( intel xeon Gold 2148 clocked at 2.4Ghz) .
With this load, and polling intervals set to 5 minutes, running htop, i see the CPU go nuts, between 90-100 % consistent. I am ignoring polling/alerting on interfaces that I don’t care about i.e ( end user switchports). I am also concerned about redundancy, if this server goes down, we lose visiblity to everything.
I was thinking of standing up another instance on a VM in the another data center. I could split up the polling of 120 devices between the two data centers. We want to add some customer edge routers around 80 or so, so it would essentially be 200 devices, split up between the two instances.
What i’m not sure of, is the following:
- Considering the # of devices we are adding, is two instances enough to support the # of devices we have? Even if I split 100 in each, i’m almost at the same processing as I am now with 118 devices, not sure it buys me much from a performance perspective. Also - I can see quite possibly us adding another 2 more instances in a more secure part of the network , and those instances would report back to the server I have today. I’m thinking around 300-400 routers/switches/load balancers would be the highest this could go within the next couple years.
With that being said, if that one server, goes kaboom, we lose everything. My thought is to have two full blown instances setup for our (head end webUI, memcache/DB, alerting), distribute the polling among the two, and receive polling data from the other 2 servers in our DMZ. I imagine, I would have to turn off Email alerting on the 2nd head end instance, as to not annoy the hell out of everyone with double alerts, and manually enable it on the secondary if the primary fails. Does this sound like a good approach? Or would you make any recommendations. I I don’t want to re-invent the wheel if this has been done by folks already :slight_smile