Feature Request: Software MD RAID Monitoring

Nicholas_P · 5 July 2018 17:07

Hello,

I have an interest in using LibreNMS to monitor the state of a software RAID within a Linux server. I have a separate post inquiring whether this is possible at the moment, but it appears that it will require a new feature within LibreNMS. I am more of a sys admin and not a developer.

The states I’m referring to are things like: “Degraded”, “Fail”, “FailSpare”, “Rebuild”. Any status which an admin would like to be notified of and may require physical interaction to replace a disk and prevent data loss.

Some of these cases may not necessarily mean that a disk is missing as software may choose to fail a disk based on its own criteria.

MD Raid provides a status of multiple RAID devices in /proc/mdstat:
# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[3] sdc[1] sdb[0]
7813772288 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
[===================>.] recovery = 97.8% (3821043136/3906886144) finish=127.3min speed=11229K/sec
bitmap: 0/30 pages [0KB], 65536KB chunk

    unused devices: <none>

There’s also some projects which seek to make this data available through SNMP:
https://wiki.opennms.org/wiki/Linux_Mdadm_monitoring_thru_snmp

chrismfz · 6 April 2019 16:56

With a simple script like exim-stats we can get the arrays with
mdadm --detail --scan
grep the md0/1/2/x/x arrays and then --detail them:

mdadm --detail --scan

ARRAY /dev/md/0 metadata=1.2 name=rescue:0 UUID=646d966e:934836ac:6505a39a:bbc57181
ARRAY /dev/md/2 metadata=1.2 name=rescue:2 UUID=7328a393:5c91efeb:7b30d226:42fcc619
ARRAY /dev/md/1 metadata=1.2 name=rescue:1 UUID=00a3d19e:d99c9bd4:e2df84bf:58bb9357

mdadm --detail /dev/md/0

/dev/md/0:
Version : 1.2
Creation Time : Thu Dec 21 14:48:27 2017
Raid Level : raid1
Array Size : 1047552 (1023.00 MiB 1072.69 MB)
Used Dev Size : 1047552 (1023.00 MiB 1072.69 MB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

   Update Time : Sat Apr  6 17:17:43 2019
         State : clean
Active Devices : 3

Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Consistency Policy : resync

          Name : rescue:0
          UUID : 646d966e:934836ac:6505a39a:bbc57181
        Events : 236

From that I think we need the “State” (for alerts) and the devices something like that but for every array that we found with --scan:

mdadm --detail /dev/md/0 | grep “State :”

         State : clean

mdadm --detail /dev/md/0 | grep “Devices”

  Raid Devices : 3
 Total Devices : 3
Active Devices : 3

Working Devices : 3
Failed Devices : 0
Spare Devices : 0

After that, I still don’t know how to proceed. Help needed

chrismfz · 6 April 2019 17:57

I’ve managed to get output something like that easily with no extra configs:

./mdadm-stats.sh

/dev/md/0:
Raid Level : raid1
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
/dev/md/2:
Raid Level : raid5
Raid Devices : 3
Total Devices : 3
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
/dev/md/1:
Raid Level : raid1
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

With this:
for i in $( mdadm --detail --scan | awk ‘{print $2}’ ); do
mdadm --detail “$i” | grep ‘md|State :|Devices|Level :’
done

But no experience how to integrate it in LibreNMS

VVelox · 11 April 2019 18:58

Ohh! This seems like fun.

Thanks for the idea!

This is definitely something that can be generalized as to RAID reporting. Over all the concept is and all the code is easy enough. The big question I currently wondering is how to do this in terms of code structure and the like.

The best way I am thinking is via CPAN, but that sorta of breaks with how we handle lots of SNMP extend stuff. Via CPAN though allows for breaking out each backend into its own module. Also makes it easy for people to extend via writing new backend modules.

chrismfz · 12 April 2019 11:34

I had in mind the exim stats scripts:

github.com

librenms/librenms/blob/master/doc/Extensions/Applications.md#exim-stats

source: Extensions/Applications.md
path: blob/master/doc/
## Introduction

You can use Application support to graph performance statistics of many applications.

Different applications support a variety of ways to collect data: 1) by direct connection to the application, 2) snmpd extend, or 3) [the agent](Agent-Setup.md). The monitoring of applications could be added before or after the hosts have been added to LibreNMS.

If multiple methods of collection are listed you only need to enable one.

### SNMP Extend

When using the snmp extend method, the application discovery module will pick up which applications you have set up for monitoring automatically, even if the device is already in LibreNMS. The application discovery module is enabled by default for most \*nix operating systems, but in some cases you will need to manually enable the application discovery module. 

### Enable the application discovery module

1. Edit the device for which you want to add this support
2. Click on the *Modules* tab and enable the `applications` module.
3. This will be automatically saved, and you should get a green confirmation pop-up message.

This file has been truncated. show original

talking for mdadm - (software raid strictly), it’s linux only so a shell script to extend in /etc/snmp/ is doable.

But still, no experience how to integrate it in LibreNMS

VVelox · 24 April 2019 10:55

Howdy!

A bit closer to this. Wrote up a frame work for both a SNMP extend and nagios/icinga checks. Plan to work on adding MegaCLI and a few more FreeBSD software raid backends next.

DerDanilo · 31 March 2020 21:01

MDADM Application state issues: