A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage (alternative)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Bad sectors/blocks - automating discovery of hard drives 'going bad'



 
 
Thread Tools Display Modes
  #1  
Old February 16th 07, 06:19 PM posted to comp.sys.ibm.pc.hardware.storage
Phil[_2_]
external usenet poster
 
Posts: 3
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

I'm not sure if this is the right group for this discussion, but I had
a couple questions in relation to bad sectors and the correlation of a
hard drive nearing a point of failure.


We currently use software to monitor, among other things, event log
errors on Windows machines. Windows will write error messages to the
system log when it finds a bad disk block. Sometimes these come in
large numbers (groups of 10+ messages at a time) and/or appear
frequently even after running, say, chkdsk.


My questions primarily reside in the nature of stand-alone IDE or SATA
hard drives, not RAID configurations of any sort, though not sure of
potential SMART status given that I'm thinking in very general terms
with a large amount of different computers & networks. How accurate
are the Windows event log messages in indicating that a hard drive has
a good potential of going bad soon and should be replaced? Is there a
threshold of sorts? Are there better software tools (small Linux-
distro utilities, perhaps) to monitor the actual physical health of a
disk, or to get a better picture of disk health going forward?


In general, I'm looking for a good way to automate disk health
checking in order to accurately tell a client "You need to buy a new
hard drive" before the disk itself is mucked past the point of simple
data backup/recovery operations.

  #2  
Old February 17th 07, 02:09 AM posted to comp.sys.ibm.pc.hardware.storage
Rod Speed
external usenet poster
 
Posts: 8,559
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

Phil wrote:

I'm not sure if this is the right group for this discussion,


Yes it is.

but I had a couple questions in relation to bad sectors and
the correlation of a hard drive nearing a point of failure.


We currently use software to monitor, among other things,
event log errors on Windows machines. Windows will write
error messages to the system log when it finds a bad disk block.
Sometimes these come in large numbers (groups of 10+ messages
at a time) and/or appear frequently even after running, say, chkdsk.


The hard drive SMART data is much better for bad sectors that show up.

Everest shows that data most readably and you need to
focus on the actual numbers reported, not just the OKs.
http://www.majorgeeks.com/download.php?det=4181

My questions primarily reside in the nature of stand-alone IDE or SATA
hard drives, not RAID configurations of any sort, though not sure of
potential SMART status given that I'm thinking in very general terms
with a large amount of different computers & networks. How accurate
are the Windows event log messages in indicating that a hard drive
has a good potential of going bad soon and should be replaced?


Nowhere near as good as the SMART data.

Is there a threshold of sorts?


Yes, one or two reallocated sectors are nothing to worry about, many
more than that and more showing up over time is and indication that
something is going bad. Not necessarily the hard drive tho, it can be just
the drive running at too high a temperature of a power supply going bad.

Are there better software tools (small Linux- distro utilities,
perhaps) to monitor the actual physical health of a disk,
or to get a better picture of disk health going forward?


Yes, everest or smartctl.

In general, I'm looking for a good way to automate disk
health checking in order to accurately tell a client "You
need to buy a new hard drive" before the disk itself is mucked
past the point of simple data backup/recovery operations.



  #3  
Old February 17th 07, 04:21 AM posted to comp.sys.ibm.pc.hardware.storage
Arno Wagner
external usenet poster
 
Posts: 2,796
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

Previously Phil wrote:
I'm not sure if this is the right group for this discussion, but I had
a couple questions in relation to bad sectors and the correlation of a
hard drive nearing a point of failure.



We currently use software to monitor, among other things, event log
errors on Windows machines. Windows will write error messages to the
system log when it finds a bad disk block. Sometimes these come in
large numbers (groups of 10+ messages at a time) and/or appear
frequently even after running, say, chkdsk.



My questions primarily reside in the nature of stand-alone IDE or SATA
hard drives, not RAID configurations of any sort, though not sure of
potential SMART status given that I'm thinking in very general terms
with a large amount of different computers & networks. How accurate
are the Windows event log messages in indicating that a hard drive has
a good potential of going bad soon and should be replaced?


Not very.

Is there a
threshold of sorts?


No.

Are there better software tools (small Linux-
distro utilities, perhaps) to monitor the actual physical health of a
disk, or to get a better picture of disk health going forward?


Definitely. For bad sectors, look at the reallocated sector count in the
SMART attribute. It will give you a far more accurate bad sector
estimate than the event log, sicne marginal sectors are in here as well.
You can also look for other exceeded or suspicuous SMART attributes.
The tool would just be the smartmontools with automatic monitoring done
(actions and thresholds are user-defined) by smartd and smartctl for
direct querying.

In general, I'm looking for a good way to automate disk health
checking in order to accurately tell a client "You need to buy a new
hard drive" before the disk itself is mucked past the point of simple
data backup/recovery operations.


The thing I made good experiences with is to monitor the
realloacted sector count for an increase of, say, more than 10 in a
week and the others for exceeded threshold. I have smartd send email in
case the reallocated cound increases. Also a good idea is to
run a full smart selftest (smartctl -t long device) regularly.
I usually run one every 14 days from a cron0job (anacron for
not allways-on machines). YMMV.

Arno


  #4  
Old February 20th 07, 04:32 PM posted to comp.sys.ibm.pc.hardware.storage
Phil[_2_]
external usenet poster
 
Posts: 3
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

On Feb 16, 9:21 pm, Arno Wagner wrote:
Are there better software tools (small Linux-
distro utilities, perhaps) to monitor the actual physical health of a
disk, or to get a better picture of disk health going forward?


Definitely. For bad sectors, look at the reallocated sector count in the
SMART attribute. It will give you a far more accurate bad sector
estimate than the event log, sicne marginal sectors are in here as well.
You can also look for other exceeded or suspicuous SMART attributes.
The tool would just be the smartmontools with automatic monitoring done
(actions and thresholds are user-defined) by smartd and smartctl for
direct querying.


The thing I made good experiences with is to monitor the
realloacted sector count for an increase of, say, more than 10 in a
week and the others for exceeded threshold. I have smartd send email in
case the reallocated cound increases. Also a good idea is to
run a full smart selftest (smartctl -t long device) regularly.
I usually run one every 14 days from a cron0job (anacron for
not allways-on machines). YMMV.




Thanks for the tips. I'll have to mess around with smartctl & smartd
more to figure out how to enumerate the reallocated sector count (if I
can get enough information from just smartctl, that'd be best, for I
can handle things like scheduling and automated email alerts
elsewhere) and any other pertinent SMART data I would need.


  #5  
Old February 20th 07, 06:52 PM posted to comp.sys.ibm.pc.hardware.storage
Arno Wagner
external usenet poster
 
Posts: 2,796
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

Previously Phil wrote:
On Feb 16, 9:21 pm, Arno Wagner wrote:
Are there better software tools (small Linux-
distro utilities, perhaps) to monitor the actual physical health of a
disk, or to get a better picture of disk health going forward?


Definitely. For bad sectors, look at the reallocated sector count in the
SMART attribute. It will give you a far more accurate bad sector
estimate than the event log, sicne marginal sectors are in here as well.
You can also look for other exceeded or suspicuous SMART attributes.
The tool would just be the smartmontools with automatic monitoring done
(actions and thresholds are user-defined) by smartd and smartctl for
direct querying.


The thing I made good experiences with is to monitor the
realloacted sector count for an increase of, say, more than 10 in a
week and the others for exceeded threshold. I have smartd send email in
case the reallocated cound increases. Also a good idea is to
run a full smart selftest (smartctl -t long device) regularly.
I usually run one every 14 days from a cron0job (anacron for
not allways-on machines). YMMV.




Thanks for the tips. I'll have to mess around with smartctl & smartd
more to figure out how to enumerate the reallocated sector count (if I
can get enough information from just smartctl, that'd be best, for I
can handle things like scheduling and automated email alerts
elsewhere) nd any other pertinent SMART data I would need.


That is definitely possible. I used to have a cron-job that ran
smartctl every hour and evaluate the results with a perl-script and
the stored previous values. Took about a day to write and ran
for several years on 24 PCs without problems..

Arno


  #6  
Old February 20th 07, 07:14 PM posted to comp.sys.ibm.pc.hardware.storage
Phil[_2_]
external usenet poster
 
Posts: 3
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

On Feb 20, 11:52 am, Arno Wagner wrote:

That is definitely possible. I used to have a cron-job that ran
smartctl every hour and evaluate the results with a perl-script and
the stored previous values. Took about a day to write and ran
for several years on 24 PCs without problems..



Did you just run a regex against a/specific line(s) of the smartctl -a
output? I was thinking something among those lines, or a conditional
on WHEN_FAILED and TYPE = Pre-fail.


I'm not sure which will take more time - getting smartd to run how I'd
want it (I'd like to run smartd selectively, if anything, so the
service wasn't running at all times on the machines...but still have
it able to throw errors to the event log), grinding my teeth through
trying to do regex in VB so I can easily call a cscript script.vbs
on Windows client machines, or touching up on my perl and distributing
a small Windows-based perl compiler out to all the managed
workstations so I can run a script.

  #7  
Old February 20th 07, 09:09 PM posted to comp.sys.ibm.pc.hardware.storage
Arno Wagner
external usenet poster
 
Posts: 2,796
Default Bad sectors/blocks - automating discovery of hard drives 'going bad'

Previously Phil wrote:
On Feb 20, 11:52 am, Arno Wagner wrote:


That is definitely possible. I used to have a cron-job that ran
smartctl every hour and evaluate the results with a perl-script and
the stored previous values. Took about a day to write and ran
for several years on 24 PCs without problems..



Did you just run a regex against a/specific line(s) of the smartctl -a
output? I was thinking something among those lines, or a conditional
on WHEN_FAILED and TYPE = Pre-fail.


I basically isolated temperature and reallocated count with regexps.

I'm not sure which will take more time - getting smartd to run how I'd
want it (I'd like to run smartd selectively, if anything, so the
service wasn't running at all times on the machines...but still have
it able to throw errors to the event log), grinding my teeth through
trying to do regex in VB so I can easily call a cscript script.vbs
on Windows client machines, or touching up on my perl and distributing
a small Windows-based perl compiler out to all the managed
workstations so I can run a script.


If you have to do this on windows, I would suggest trying
out smartd first. Although I do not know whether it can send
email on windows. If you write something yoruself, best
install perl for windows, I think, since regexp in perl
are superious to any other implementation I have seen.

Arno

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bad sectors in three new hard drives. Venom Asus Motherboards 4 March 19th 06 11:58 PM
bad blocks found but SMART reports zero reallocated sectors IronFelix Storage (alternative) 5 January 28th 06 04:16 AM
[Maybe OT] What causes "bad blocks" to appear in disk drives Chaos Master General 5 September 4th 04 06:42 PM
Bad Blocks on 15Krpm 36GB Hot Plug Drives Scumbag Adie Compaq Servers 0 February 12th 04 08:07 PM
difference between logical and physical bad blocks on a hard disk aln General Hardware 0 January 6th 04 12:27 PM


All times are GMT +1. The time now is 09:40 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.