But how to tell the system to ignore if there is -nan? This is true for HDDs.
We could build a rule to just ignore everything that is 0 and below. But this wouldn’t tell us if a disk reached 0 % of calculated SSD Life Left (only until 1% is left).
Warning application_metrics.metric LIKE '%_id231' AND application_metrics.value <= 20 AND application_metrics.value > 0
Critical application_metrics.metric LIKE '%_id231' AND application_metrics.value <= 5 AND application_metrics.value > 0
Any idea on how to detect if it’s a HDD?
If the SMART output is ‘NULL’ it still puts ‘0’ to the DB. Hence we cannot check that. Referencing previous values doesn’t work either since they might not be there after changing a disk (when using serial numbers).
Combining to check e.g. for Airflow Temp (not existing in NVMes) doesn’t work either since I don’t think there is a way to have the check reference exactly the same disk Airflow Temp value, it could be any value.
I wish there was a plugin and a separate APP for SSD/NVMe wearout and spare alone.
Yes, agreed on the wearout app. I checked my drives (in LibreNMS), and the HDD’s are all 0 (like you say), and of course this could be the SSD value as well (i.e. it could get to zero). I’m not seeing a good way to detect the device type
We would need an app that uses either uses the existing SMART data but checks specific values that only exist in HDD or SSD/NVMe and the shows alerts.
Or a new app that does this entirely different, e.g. using nvme-cli via an snmp extension script.
@Community: Please don’t ask us to submit a PR if we want this feature. I am no programmer nor do I have enough experience to write such tools. We can help with required feature design and testing though.
We could have one tool for wearout, spare left and flash drive health in general. The tool should use smart data and (smart)data from tools like nvme-cli to provide better device support.
Idea for flash media:
Use smart data but do checks to detect SSDs and NVMes (Check e.g. if any value exists that is only provided from an HDD)
For NVMes (optional): nvme smart-log -H -o json /dev/nvme0n1
Use nvme-cli via snmp extension
Use smart data from nvme-cli instead of smartctl data
Have better support for values like “spare left” via nvme-cli that smart data doesn’t provide in some cases
Yes, very much like that - and that’s a very helpful nvme command, thanks! Of course, other SSD’s should be supported as well (i.e. SATA). I just checked here, and my (840 EVO) drive doesn’t include 231 - makes it a bit more messy, agreed?
There are some devices that only support other values. For EVOs it seems that you want to check id177 . A value of 95 there seems to mean 5% wearout.
What is more important is the available spare or replaced sectors values I think. We would need an application which knows about common values to check for ssd/nvme media health based on collected data from multiple sources but for the same device.