Feature Request: Software MD RAID Monitoring

Hello,

I have an interest in using LibreNMS to monitor the state of a software RAID within a Linux server. I have a separate post inquiring whether this is possible at the moment, but it appears that it will require a new feature within LibreNMS. I am more of a sys admin and not a developer.

The states I’m referring to are things like: “Degraded”, “Fail”, “FailSpare”, “Rebuild”. Any status which an admin would like to be notified of and may require physical interaction to replace a disk and prevent data loss.

Some of these cases may not necessarily mean that a disk is missing as software may choose to fail a disk based on its own criteria.

MD Raid provides a status of multiple RAID devices in /proc/mdstat:
# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[3] sdc[1] sdb[0]
7813772288 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[===================>.] recovery = 97.8% (3821043136/3906886144) finish=127.3min speed=11229K/sec
bitmap: 0/30 pages [0KB], 65536KB chunk

    unused devices: <none>

There’s also some projects which seek to make this data available through SNMP:
https://wiki.opennms.org/wiki/Linux_Mdadm_monitoring_thru_snmp

1 Like

With a simple script like exim-stats we can get the arrays with
mdadm --detail --scan
grep the md0/1/2/x/x arrays and then --detail them:

mdadm --detail --scan

ARRAY /dev/md/0 metadata=1.2 name=rescue:0 UUID=646d966e:934836ac:6505a39a:bbc57181
ARRAY /dev/md/2 metadata=1.2 name=rescue:2 UUID=7328a393:5c91efeb:7b30d226:42fcc619
ARRAY /dev/md/1 metadata=1.2 name=rescue:1 UUID=00a3d19e:d99c9bd4:e2df84bf:58bb9357

mdadm --detail /dev/md/0

/dev/md/0:
Version : 1.2
Creation Time : Thu Dec 21 14:48:27 2017
Raid Level : raid1
Array Size : 1047552 (1023.00 MiB 1072.69 MB)
Used Dev Size : 1047552 (1023.00 MiB 1072.69 MB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

   Update Time : Sat Apr  6 17:17:43 2019
         State : clean
Active Devices : 3

Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Consistency Policy : resync

          Name : rescue:0
          UUID : 646d966e:934836ac:6505a39a:bbc57181
        Events : 236

From that I think we need the “State” (for alerts) and the devices something like that but for every array that we found with --scan:

mdadm --detail /dev/md/0 | grep “State :”

         State : clean

mdadm --detail /dev/md/0 | grep “Devices”

  Raid Devices : 3
 Total Devices : 3
Active Devices : 3

Working Devices : 3
Failed Devices : 0
Spare Devices : 0

After that, I still don’t know how to proceed. Help needed :slight_smile:

I’ve managed to get output something like that easily with no extra configs:

./mdadm-stats.sh

/dev/md/0:
Raid Level : raid1
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
/dev/md/2:
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
/dev/md/1:
Raid Level : raid1
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

With this:
for i in $( mdadm --detail --scan | awk ‘{print $2}’ ); do
mdadm --detail “$i” | grep ‘md|State :|Devices|Level :’
done

But no experience how to integrate it in LibreNMS

Ohh! This seems like fun.

Thanks for the idea!

This is definitely something that can be generalized as to RAID reporting. Over all the concept is and all the code is easy enough. The big question I currently wondering is how to do this in terms of code structure and the like.

The best way I am thinking is via CPAN, but that sorta of breaks with how we handle lots of SNMP extend stuff. Via CPAN though allows for breaking out each backend into its own module. Also makes it easy for people to extend via writing new backend modules.

I had in mind the exim stats scripts:

talking for mdadm - (software raid strictly), it’s linux only so a shell script to extend in /etc/snmp/ is doable.

But still, no experience how to integrate it in LibreNMS :confused:

Howdy!

A bit closer to this. Wrote up a frame work for both a SNMP extend and nagios/icinga checks. Plan to work on adding MegaCLI and a few more FreeBSD software raid backends next.

MDADM Application state issues: