I have an interest in using LibreNMS to monitor the state of a software RAID within a Linux server. I have a separate post inquiring whether this is possible at the moment, but it appears that it will require a new feature within LibreNMS. I am more of a sys admin and not a developer.
The states I’m referring to are things like: “Degraded”, “Fail”, “FailSpare”, “Rebuild”. Any status which an admin would like to be notified of and may require physical interaction to replace a disk and prevent data loss.
Some of these cases may not necessarily mean that a disk is missing as software may choose to fail a disk based on its own criteria.
MD Raid provides a status of multiple RAID devices in /proc/mdstat:
# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[3] sdc[1] sdb[0]
7813772288 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[===================>.] recovery = 97.8% (3821043136/3906886144) finish=127.3min speed=11229K/sec
bitmap: 0/30 pages [0KB], 65536KB chunk
This is definitely something that can be generalized as to RAID reporting. Over all the concept is and all the code is easy enough. The big question I currently wondering is how to do this in terms of code structure and the like.
The best way I am thinking is via CPAN, but that sorta of breaks with how we handle lots of SNMP extend stuff. Via CPAN though allows for breaking out each backend into its own module. Also makes it easy for people to extend via writing new backend modules.
A bit closer to this. Wrote up a frame work for both a SNMP extend and nagios/icinga checks. Plan to work on adding MegaCLI and a few more FreeBSD software raid backends next.