If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
I'm not sure if this is the right group for this discussion, but I had
a couple questions in relation to bad sectors and the correlation of a hard drive nearing a point of failure. We currently use software to monitor, among other things, event log errors on Windows machines. Windows will write error messages to the system log when it finds a bad disk block. Sometimes these come in large numbers (groups of 10+ messages at a time) and/or appear frequently even after running, say, chkdsk. My questions primarily reside in the nature of stand-alone IDE or SATA hard drives, not RAID configurations of any sort, though not sure of potential SMART status given that I'm thinking in very general terms with a large amount of different computers & networks. How accurate are the Windows event log messages in indicating that a hard drive has a good potential of going bad soon and should be replaced? Is there a threshold of sorts? Are there better software tools (small Linux- distro utilities, perhaps) to monitor the actual physical health of a disk, or to get a better picture of disk health going forward? In general, I'm looking for a good way to automate disk health checking in order to accurately tell a client "You need to buy a new hard drive" before the disk itself is mucked past the point of simple data backup/recovery operations. |
#2
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
Phil wrote:
I'm not sure if this is the right group for this discussion, Yes it is. but I had a couple questions in relation to bad sectors and the correlation of a hard drive nearing a point of failure. We currently use software to monitor, among other things, event log errors on Windows machines. Windows will write error messages to the system log when it finds a bad disk block. Sometimes these come in large numbers (groups of 10+ messages at a time) and/or appear frequently even after running, say, chkdsk. The hard drive SMART data is much better for bad sectors that show up. Everest shows that data most readably and you need to focus on the actual numbers reported, not just the OKs. http://www.majorgeeks.com/download.php?det=4181 My questions primarily reside in the nature of stand-alone IDE or SATA hard drives, not RAID configurations of any sort, though not sure of potential SMART status given that I'm thinking in very general terms with a large amount of different computers & networks. How accurate are the Windows event log messages in indicating that a hard drive has a good potential of going bad soon and should be replaced? Nowhere near as good as the SMART data. Is there a threshold of sorts? Yes, one or two reallocated sectors are nothing to worry about, many more than that and more showing up over time is and indication that something is going bad. Not necessarily the hard drive tho, it can be just the drive running at too high a temperature of a power supply going bad. Are there better software tools (small Linux- distro utilities, perhaps) to monitor the actual physical health of a disk, or to get a better picture of disk health going forward? Yes, everest or smartctl. In general, I'm looking for a good way to automate disk health checking in order to accurately tell a client "You need to buy a new hard drive" before the disk itself is mucked past the point of simple data backup/recovery operations. |
#3
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
Previously Phil wrote:
I'm not sure if this is the right group for this discussion, but I had a couple questions in relation to bad sectors and the correlation of a hard drive nearing a point of failure. We currently use software to monitor, among other things, event log errors on Windows machines. Windows will write error messages to the system log when it finds a bad disk block. Sometimes these come in large numbers (groups of 10+ messages at a time) and/or appear frequently even after running, say, chkdsk. My questions primarily reside in the nature of stand-alone IDE or SATA hard drives, not RAID configurations of any sort, though not sure of potential SMART status given that I'm thinking in very general terms with a large amount of different computers & networks. How accurate are the Windows event log messages in indicating that a hard drive has a good potential of going bad soon and should be replaced? Not very. Is there a threshold of sorts? No. Are there better software tools (small Linux- distro utilities, perhaps) to monitor the actual physical health of a disk, or to get a better picture of disk health going forward? Definitely. For bad sectors, look at the reallocated sector count in the SMART attribute. It will give you a far more accurate bad sector estimate than the event log, sicne marginal sectors are in here as well. You can also look for other exceeded or suspicuous SMART attributes. The tool would just be the smartmontools with automatic monitoring done (actions and thresholds are user-defined) by smartd and smartctl for direct querying. In general, I'm looking for a good way to automate disk health checking in order to accurately tell a client "You need to buy a new hard drive" before the disk itself is mucked past the point of simple data backup/recovery operations. The thing I made good experiences with is to monitor the realloacted sector count for an increase of, say, more than 10 in a week and the others for exceeded threshold. I have smartd send email in case the reallocated cound increases. Also a good idea is to run a full smart selftest (smartctl -t long device) regularly. I usually run one every 14 days from a cron0job (anacron for not allways-on machines). YMMV. Arno |
#4
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
On Feb 16, 9:21 pm, Arno Wagner wrote:
Are there better software tools (small Linux- distro utilities, perhaps) to monitor the actual physical health of a disk, or to get a better picture of disk health going forward? Definitely. For bad sectors, look at the reallocated sector count in the SMART attribute. It will give you a far more accurate bad sector estimate than the event log, sicne marginal sectors are in here as well. You can also look for other exceeded or suspicuous SMART attributes. The tool would just be the smartmontools with automatic monitoring done (actions and thresholds are user-defined) by smartd and smartctl for direct querying. The thing I made good experiences with is to monitor the realloacted sector count for an increase of, say, more than 10 in a week and the others for exceeded threshold. I have smartd send email in case the reallocated cound increases. Also a good idea is to run a full smart selftest (smartctl -t long device) regularly. I usually run one every 14 days from a cron0job (anacron for not allways-on machines). YMMV. Thanks for the tips. I'll have to mess around with smartctl & smartd more to figure out how to enumerate the reallocated sector count (if I can get enough information from just smartctl, that'd be best, for I can handle things like scheduling and automated email alerts elsewhere) and any other pertinent SMART data I would need. |
#5
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
Previously Phil wrote:
On Feb 16, 9:21 pm, Arno Wagner wrote: Are there better software tools (small Linux- distro utilities, perhaps) to monitor the actual physical health of a disk, or to get a better picture of disk health going forward? Definitely. For bad sectors, look at the reallocated sector count in the SMART attribute. It will give you a far more accurate bad sector estimate than the event log, sicne marginal sectors are in here as well. You can also look for other exceeded or suspicuous SMART attributes. The tool would just be the smartmontools with automatic monitoring done (actions and thresholds are user-defined) by smartd and smartctl for direct querying. The thing I made good experiences with is to monitor the realloacted sector count for an increase of, say, more than 10 in a week and the others for exceeded threshold. I have smartd send email in case the reallocated cound increases. Also a good idea is to run a full smart selftest (smartctl -t long device) regularly. I usually run one every 14 days from a cron0job (anacron for not allways-on machines). YMMV. Thanks for the tips. I'll have to mess around with smartctl & smartd more to figure out how to enumerate the reallocated sector count (if I can get enough information from just smartctl, that'd be best, for I can handle things like scheduling and automated email alerts elsewhere) nd any other pertinent SMART data I would need. That is definitely possible. I used to have a cron-job that ran smartctl every hour and evaluate the results with a perl-script and the stored previous values. Took about a day to write and ran for several years on 24 PCs without problems.. Arno |
#6
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
On Feb 20, 11:52 am, Arno Wagner wrote:
That is definitely possible. I used to have a cron-job that ran smartctl every hour and evaluate the results with a perl-script and the stored previous values. Took about a day to write and ran for several years on 24 PCs without problems.. Did you just run a regex against a/specific line(s) of the smartctl -a output? I was thinking something among those lines, or a conditional on WHEN_FAILED and TYPE = Pre-fail. I'm not sure which will take more time - getting smartd to run how I'd want it (I'd like to run smartd selectively, if anything, so the service wasn't running at all times on the machines...but still have it able to throw errors to the event log), grinding my teeth through trying to do regex in VB so I can easily call a cscript script.vbs on Windows client machines, or touching up on my perl and distributing a small Windows-based perl compiler out to all the managed workstations so I can run a script. |
#7
|
|||
|
|||
Bad sectors/blocks - automating discovery of hard drives 'going bad'
Previously Phil wrote:
On Feb 20, 11:52 am, Arno Wagner wrote: That is definitely possible. I used to have a cron-job that ran smartctl every hour and evaluate the results with a perl-script and the stored previous values. Took about a day to write and ran for several years on 24 PCs without problems.. Did you just run a regex against a/specific line(s) of the smartctl -a output? I was thinking something among those lines, or a conditional on WHEN_FAILED and TYPE = Pre-fail. I basically isolated temperature and reallocated count with regexps. I'm not sure which will take more time - getting smartd to run how I'd want it (I'd like to run smartd selectively, if anything, so the service wasn't running at all times on the machines...but still have it able to throw errors to the event log), grinding my teeth through trying to do regex in VB so I can easily call a cscript script.vbs on Windows client machines, or touching up on my perl and distributing a small Windows-based perl compiler out to all the managed workstations so I can run a script. If you have to do this on windows, I would suggest trying out smartd first. Although I do not know whether it can send email on windows. If you write something yoruself, best install perl for windows, I think, since regexp in perl are superious to any other implementation I have seen. Arno |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bad sectors in three new hard drives. | Venom | Asus Motherboards | 4 | March 19th 06 11:58 PM |
bad blocks found but SMART reports zero reallocated sectors | IronFelix | Storage (alternative) | 5 | January 28th 06 04:16 AM |
[Maybe OT] What causes "bad blocks" to appear in disk drives | Chaos Master | General | 5 | September 4th 04 06:42 PM |
Bad Blocks on 15Krpm 36GB Hot Plug Drives | Scumbag Adie | Compaq Servers | 0 | February 12th 04 08:07 PM |
difference between logical and physical bad blocks on a hard disk | aln | General Hardware | 0 | January 6th 04 12:27 PM |