Get SMART
I’ve been looking into ways to better protect my data. You may remember that I recently experienced a failure on a large hard disk that nearly devastated me. So, now I’ve gotten SMART — Self-Monitoring Analysis and Reporting Technology System. SMART is built into most modern ATA and SCSI hard disks. In many cases, it can provide advanced warning of hard disk failure. When it’s enabled, the disk records attributes like seek/read error rate, calibration retries, and reallocated sectors. (Note that my disk failed after it filled its sector reallocation table.)
So I installed the SmartMon Tools on my Linux boxes. It enables SMART on your disks and monitors changes in performance attributes. The disk manufacturer decided which attributes to expose and what threshold values indicate you should worry about it. An attribute may indicate the disk will fail (”pre-fail”) or that it has worn-out past its intended life (”old-age”). For example, power-on hours is an old-age attribute, while calibration retries is pre-fail.
SmartMon just enables and monitors the SMART attributes. It can take action when an attribute reaches its threshold, or when it simply changes. Actions can be an e-mail messages or running a script. The best part about SMART is that it doesn’t impact the performance of your drive, as long as you use online attributes. SmartMon can also occassionally (every four hours) check offline attibutes, which brings the drive offline during an idle period to run a short (and more thorough) test (usually lasting just seconds).
Add to my list of improvements for a distro to have SmartMon running as a default service. (Along with starting services asynchronously to improve the
boot time.) Man, are Microsoft OSes doing SMART? Because they are missing out once again.
September 22nd, 2003 at 1:51 pm
This sounds like a total necessity for all hard drives these days. When you’ve got GIGABYTES of data all residing on a single hard drive, a HD failure can wipe out your entire electronic world. Granted, when we were all running 800MB hard drives, a HD failure would wipe you out just the same.
This begs the question that you eluded to … why isn’t this technology enabled by default?
Question: Does this then keep you from making a RAID array? Is it necessary now that you can intimately monitor the health of your hard drives?
September 22nd, 2003 at 2:53 pm
If you use software RAID, then each disk can still be monitored with SMART. With hardware RAID solutions, you would probably need modified SMART monitoring software. The SmartMon tools were modified to talk to disks connected to a 3ware RAID card.
Even with redundancy provided by RAID, I would still want to know if one of the disks are about to fail. One thing that scares me about RAID is that if something bad happens and you are in a data recovery situation, the data format makes it extremely difficult to scan the disk looking for file systems. I still haven’t decided if I want to run RAID or just use extra disks to archive data. Right now, I’m leaning towards the latter.