Bad sectors on new IDE drive

#1 April 17th 04, 12:19 AM

I have a brand new IDE drive (250GB Maxtor) that has a sector
that consistently triggers a CRC error on read. I find this
puzzling, as I thought IDE disks were supposed to automatically
remap bad sectors. And always have in my past experience.

However, this is not the first time that I come across IDE disks
that don't remap in the past few years. Can someone clue me in
as to what's happening ?

--
André Majorel URL:http://www.teaser.fr/~amajorel/
"Finally I am becoming stupider no more." -- Paul Erdös' epitaph

#2 April 17th 04, 01:50 AM

Andre Majorel wrote:
I have a brand new IDE drive (250GB Maxtor) that has a sector
that consistently triggers a CRC error on read. I find this
puzzling, as I thought IDE disks were supposed to automatically
remap bad sectors. And always have in my past experience.

However, this is not the first time that I come across IDE disks
that don't remap in the past few years. Can someone clue me in
as to what's happening ?

They should silently remap bad sectors internally until all of the
"spare" sectors are used up. When that point is reached, the bad
sectors become visible.

Perhaps the drive in question is actually a refurb, or was mis-handled
during shipping. Or it just happens to have a manufacturing defect.

-WD

#3 April 18th 04, 02:26 PM

Andre Majorel wrote:
I have a brand new IDE drive (250GB Maxtor) that has a sector
that consistently triggers a CRC error on read. I find this
puzzling, as I thought IDE disks were supposed to automatically
remap bad sectors. And always have in my past experience.

However, this is not the first time that I come across IDE disks
that don't remap in the past few years. Can someone clue me in
as to what's happening ?

Well, it could be that all the "spare" sectors used, if that's
happened it won't remap at all, because there's no place to remap to.

If that happens on a NEW disk, get it replaced...

Also, you're talking about CRC errors on read, remember that while
remapping on WRITE is trivial, since the data that was lost wasn't
interresting anyway (it was going to be overwritten)...

For READ it's more complicated, if it could recover the data using the
ECC codes it can just remap it and be done with it, but if the data is
lost that must be handled special somehow, the error MUST be reported
back to the upper layers, so that they know data has been lost.

It can either remap the sector immediately, but mark it as
"temporarily bad", or defer the remapping until the sector is written
the next time (marking the sector as needing remapping). In either
case it will continue to report CRC errors until the sector has been
written, since that's the only way to indicate that the real data has
been lost!

It's of course also possible to ignore the data loss and just silently
remap on unreadable sectors during read, but one could at least HOPE
that no IDE manufacturer cares that little about their customers
data! :-)

#4 April 18th 04, 07:06 PM

On 2004-04-18, Torbjorn Lindgren wrote:
Andre Majorel wrote:
I have a brand new IDE drive (250GB Maxtor) that has a sector
that consistently triggers a CRC error on read. I find this
puzzling, as I thought IDE disks were supposed to automatically
remap bad sectors. And always have in my past experience.

However, this is not the first time that I come across IDE disks
that don't remap in the past few years. Can someone clue me in
as to what's happening ?

Well, it could be that all the "spare" sectors used, if that's
happened it won't remap at all, because there's no place to remap to.

If that happens on a NEW disk, get it replaced...

This doesn't seem to be the case :

| # smartctl -a /dev/hdc
| Device Model: Maxtor 7Y250P0
| SMART support is: Available - device has SMART capability.
| SMART support is: Enabled

| capabilities: (0x5b) SMART execute Offline immediate.
| Auto Offline data collection on/off support.
| Suspend Offline collection upon new command.
| Offline surface scan supported.
| Self-test supported.
| No Conveyance Self-test supported.
| Selective Self-test supported.
|
| SMART Attributes Data Structure revision number: 16
| Vendor Specific SMART Attributes with Thresholds:
| ID# ATTRIBUTE_NAME FLAG VAL WOR THR TYPE UPDATED RAW_VALUE
| 3 Spin_Up_Time 0x0027 252 252 063 Pre-fail Always 1311
| 4 Start_Stop_Count 0x0032 253 253 000 Old_age Always 4
| 5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail Always 2

I'm not quite sure how to read SMART values but 253 seems to
mean "very good" or "perfect". The raw value (2) may be the
number of remapped sectors.

Is there any way to get the details of reallocated sectors from
the drive, by the way ?

| 6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail Offline 0
| 7 Seek_Error_Rate 0x000a 253 252 000 Old_age Always 0
| 8 Seek_Time_Performance 0x0027 252 251 187 Pre-fail Always 53547

Brand new :

| 9 Power_On_Minutes 0x0032 253 253 000 Old_age Always 39h+19m

The drive seems to be in fairly good shape so far :

| 10 Spin_Retry_Count 0x002b 252 252 157 Pre-fail Always 0
| 11 Calibration_Retry_Count 0x002b 252 252 223 Pre-fail Always 0
| 12 Power_Cycle_Count 0x0032 253 253 000 Old_age Always 6
| 192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age Always 0
| 193 Load_Cycle_Count 0x0032 253 253 000 Old_age Always 0
| 194 Temperature_Celsius 0x0032 253 253 000 Old_age Always 43
| 195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age Always 12285
| 196 Reallocated_Event_Count 0x0008 253 253 000 Old_age Offline 0
| 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline 2
| 198 Offline_Uncorrectable 0x0008 253 253 000 Old_age Offline 0
| 199 UDMA_CRC_Error_Count 0x0008 199 199 000 Old_age Offline 0
| 200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age Always 0
| 201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age Always 0
| 202 TA_Increase_Count 0x000a 253 252 000 Old_age Always 0
| 203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail Always 0
| 204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age Always 0
| 205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age Always 0
| 207 Spin_High_Current 0x002a 252 252 000 Old_age Always 0
| 208 Spin_Buzz 0x002a 252 252 000 Old_age Always 0
| 209 Offline_Seek_Performnce 0x0024 149 149 000 Old_age Offline 0
| 99 Unknown_Attribute 0x0004 253 253 000 Old_age Offline 0
| 100 Unknown_Attribute 0x0004 253 253 000 Old_age Offline 0
| 101 Unknown_Attribute 0x0004 253 253 000 Old_age Offline 0

The last five errors were uncorrectable read errors like this
one :

| Error 352 occurred at disk power-on lifetime: 22 hours
| When the command that caused the error occurred, the device
| was in an unknown state.
|
| After command completion occurred, registers we
| ER ST SC SN CL CH DH
| -- -- -- -- -- -- --
| 40 51 2a d6 bc 43 e0 Error: UNC 42 sectors at LBA = 0x0043bcd6 = 4439254
|
| Commands leading to the command that caused the error we
| CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
| -- -- -- -- -- -- -- -- --------- --------------------
| 25 00 2a d6 bc 43 e0 08 13047.376 READ DMA EXT
| 25 00 2c d4 bc 43 e0 08 12980.816 READ DMA EXT
| 25 00 2e d2 bc 43 e0 08 12979.792 READ DMA EXT
| 25 00 30 d0 bc 43 e0 08 12978.736 READ DMA EXT
| 25 00 32 ce bc 43 e0 08 12977.712 READ DMA EXT

Also, you're talking about CRC errors on read, remember that while
remapping on WRITE is trivial, since the data that was lost wasn't
interresting anyway (it was going to be overwritten)...

For READ it's more complicated, if it could recover the data using the
ECC codes it can just remap it and be done with it, but if the data is
lost that must be handled special somehow, the error MUST be reported
back to the upper layers, so that they know data has been lost.

It can either remap the sector immediately, but mark it as
"temporarily bad", or defer the remapping until the sector is written
the next time (marking the sector as needing remapping). In either
case it will continue to report CRC errors until the sector has been
written, since that's the only way to indicate that the real data has
been lost!

OK, got it. Thank you.

--
André Majorel URL:http://www.teaser.fr/~amajorel/
"Finally I am becoming stupider no more." -- Paul Erdös' epitaph

#5 April 19th 04, 04:49 PM

| 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline 2

Andre, this is a sign of trouble. There are two sectors on the disk
that could not be read by the operating system.

| 40 51 2a d6 bc 43 e0 Error: UNC 42 sectors at LBA = 0x0043bcd6 = 4439254

One of the unreadable sectors is at LBA = 0x0043bcd6 = 4439254 . It is
uncorrectable, meaning that the ECC bytes are inconsistent.

Have a look at http://smartmontools.sourceforge.net/BadBlockHowTo.txt
for some suggestions.

If you run an extended self-test '-t long' on the disk, it should fail
at these unreadable LBAs.

Bruce

#6 April 21st 04, 02:57 AM

On 2004-04-19, Bruce Allen wrote:

| 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline 2

Andre, this is a sign of trouble. There are two sectors on the disk
that could not be read by the operating system.

| 40 51 2a d6 bc 43 e0 Error: UNC 42 sectors at LBA = 0x0043bcd6 = 4439254

One of the unreadable sectors is at LBA = 0x0043bcd6 = 4439254 . It is
uncorrectable, meaning that the ECC bytes are inconsistent.

Have a look at http://smartmontools.sourceforge.net/BadBlockHowTo.txt
for some suggestions.

Interesting reading, thank you for the link (and for
smartmontools, too).

I used a destructive bad blocks scanner and the UNC errors went
away. Interestingly, the raw value for Reallocated_Sector_Ct is
now zero. This suggests that, after rewriting the sector, the
drive found it reliable enough to keep using it. A bit scary,
but at that price, I guess I can't complain.

--
André Majorel URL:http://www.teaser.fr/~amajorel/
"Finally I am becoming stupider no more." -- Paul Erdös' epitaph

#7 April 21st 04, 12:03 PM

| 197 Current_Pending_Sector 0x0008 253 253 000 Old_age Offline 2

Andre, this is a sign of trouble. There are two sectors on the disk
that could not be read by the operating system.

| 40 51 2a d6 bc 43 e0 Error: UNC 42 sectors at LBA = 0x0043bcd6 = 4439254

One of the unreadable sectors is at LBA = 0x0043bcd6 = 4439254 . It is
uncorrectable, meaning that the ECC bytes are inconsistent.

Have a look at http://smartmontools.sourceforge.net/BadBlockHowTo.txt
for some suggestions.

Interesting reading, thank you for the link (and for
smartmontools, too).

You're welcome.

I used a destructive bad blocks scanner and the UNC errors went
away.

Good -- problem fixed.

Interestingly, the raw value for Reallocated_Sector_Ct is
now zero. This suggests that, after rewriting the sector, the
drive found it reliable enough to keep using it.

That's possible. If the drive was powered off when the sector was
being written, that might have made it uncorrectable because it had a
corrupted ECC value written to the disk. In which case, when the
sector is written again, the ECC code get laid down consistently and
all is well.

A bit scary, but at that price, I guess I can't complain.

There's nothing to be scared about if you back up your data. So don't
complain, make backups instead.

Cheers,
Bruce

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Disk Management - New Partition option Greyed Out	Tapas Das	Dell Computers	3	March 23rd 05 03:58 PM
how to test psu and reset to cmos to default	Tanya	General	23	February 7th 05 09:56 AM
HELP! MY Computer cannot find hard drive	Michael S.	Asus Motherboards	8	June 25th 04 07:13 AM
Can't See New HD After Cloning with Ghost 2003	Nehmo Sergheyev	General	15	March 27th 04 09:15 PM
Upgrade Difficulties	Ron B	Gateway Computers	0	February 14th 04 03:26 AM