A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Disk to disk copying with overclocked memory



 
 
Thread Tools Display Modes
  #1  
Old March 11th 04, 02:07 AM
JT
external usenet poster
 
Posts: n/a
Default Disk to disk copying with overclocked memory

On Thu, 11 Mar 2004 00:40:47 GMT, Mark M
wrote:

I use a partition copier which boots off a floppy disk before any
other OS is launched.

If I copy a partition from one hard drive to another, then is there
any risk of data corruption if the BIOS has been changed to
aggressively speed up the memory settings?

For example the BIOS might set the memory to CAS=2 rather than
CAS=3. Or other memory timing intervals might also be set to be
shorter than is normal.

I am thinking that maybe the IDE cable and drive controllers handle
data fairly independently of the memory on the motherboard. So
maybe data just flows up and down the IDE cable and maybe the
motherboard is not involved except for sync pulses.

There are three scenarios I am thinking about:

(1) Copying a partition from one hard drive on one IDE cable to
another hard drive on a different IDE cable.

(2) Copying a partition from one hard drive to another which is on
the same IDE cable.

(3) Copying one partition to another on the same hard drive.

How much effect would "over-set" memory have on these situations?

Do the answers to any of the above three scenarios change if the
copying of large amounts of data files is done from within WinXP?
Personally, I would guess that it is more likely that motherboard
memory comes into play if Windows is involved.


1. All copies go through memory using at least a block sized buffer of ram.
Buffers at least large enough to hold an entire track will be used,
probably larger for more effeciency. Data is always copied from a drive to
a memory buffer first. Might be directly, using DMA (the M is memory), but
it will be to and from memory. What part of memory is used will vary
depending on the program and whether you are running it under windows, but
a single bit error in the wrong place in memory can be a major problem.

2. If your memory timing is aggressive enough that errors are likely, then
there are a number of things that could go wrong. There could be an error
in the data that gets copied. You could also have the wrong disk address
stored in ram so the data goes to the wrong place. Could be the wrong
instruction so the program crashes. Could be any one of hundreds of
possible single bit failures that might go unnoticed. ECC would help here
(would catch most possible memory errors). If you want reliability in
anything (not just copying disks) then don't push your memory (or other
components) to the edge.

JT
  #2  
Old March 11th 04, 02:49 AM
Colin Painter
external usenet poster
 
Posts: n/a
Default

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors than the
guys who built the memory planned on. If the memory is not ECC memory then
you may get more single bit errors which will cause your machine to stop
when they occur. ECC memory can correct single bit errors but non-ECC memory
can only detect them and when that happens windows will blue screen. Most
home PCs have non-ECC memory because it's cheaper.

Overclocking could also cause the occasional double bit error which non-ECC
memory cannot detect. This would be bad. As JT indicates, this could cause
all sorts of mayhem. If you're lucky, windows could execute a broken
instruction or reference a memory address in outer space and then blue
screen. If you are unlucky it could blunder on using bad data and do
something nasty to your file system (or it could harmlessly stick an umlaut
onto the screen somewhere.) Hard to predict.

cp




"Mark M" wrote in message
...
I use a partition copier which boots off a floppy disk before any
other OS is launched.

If I copy a partition from one hard drive to another, then is there
any risk of data corruption if the BIOS has been changed to
aggressively speed up the memory settings?

For example the BIOS might set the memory to CAS=2 rather than
CAS=3. Or other memory timing intervals might also be set to be
shorter than is normal.

I am thinking that maybe the IDE cable and drive controllers handle
data fairly independently of the memory on the motherboard. So
maybe data just flows up and down the IDE cable and maybe the
motherboard is not involved except for sync pulses.

There are three scenarios I am thinking about:

(1) Copying a partition from one hard drive on one IDE cable to
another hard drive on a different IDE cable.

(2) Copying a partition from one hard drive to another which is on
the same IDE cable.

(3) Copying one partition to another on the same hard drive.

How much effect would "over-set" memory have on these situations?

Do the answers to any of the above three scenarios change if the
copying of large amounts of data files is done from within WinXP?
Personally, I would guess that it is more likely that motherboard
memory comes into play if Windows is involved.



  #3  
Old March 11th 04, 06:21 AM
CBFalconer
external usenet poster
 
Posts: n/a
Default

Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.


Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.

--
Chuck F ) )
Available for consulting/temporary embedded and systems.
http://cbfalconer.home.att.net USE worldnet address!

  #4  
Old March 11th 04, 06:37 AM
CJT
external usenet poster
 
Posts: n/a
Default

CBFalconer wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.



Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.

You seem to have confused ECC and parity. ECC means error checking
and correcting, which involves more redundancy than simple single bit
parity error checking.

--
The e-mail address in our reply-to line is reversed in an attempt to
minimize spam. Our true address is of the form .
  #5  
Old March 11th 04, 06:49 AM
Rod Speed
external usenet poster
 
Posts: n/a
Default


"CJT" wrote in message ...
CBFalconer wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.



Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.


You seem to have confused ECC and parity.


Or you have. **** all ram is parity anymore.

ECC means error checking and correcting, which involves
more redundancy than simple single bit parity error checking.


Which isnt seen much anymore.


  #6  
Old March 11th 04, 07:41 AM
CBFalconer
external usenet poster
 
Posts: n/a
Default

CJT wrote:
CBFalconer wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.


Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.

You seem to have confused ECC and parity. ECC means error checking
and correcting, which involves more redundancy than simple single bit
parity error checking.


Nothing uses parity checking today - that requires writing
individual 9 bit bytes. Expanded to a 64 bit wide word (for the
various Pentia etc.) the parity or ECC bits both fit in an extra 8
bits, i.e. a 72 bit wide word. If todays systems have no ECC they
have no checking of any form. ECC is actually no harder to handle
on wide words.

Memory configurations that can use parity can use ECC, the reverse
is not true.

Exception - some embedded systems with smaller memory paths may
use parity.

--
Chuck F ) )
Available for consulting/temporary embedded and systems.
http://cbfalconer.home.att.net USE worldnet address!

  #7  
Old March 11th 04, 02:21 PM
Arno Wagner
external usenet poster
 
Posts: n/a
Default

In comp.sys.ibm.pc.hardware.storage CBFalconer wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.


Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.


Crashes are not your worst enemy. Undetected data corruption is.

I once debugged a fileserver that did flip one bit on average per
2GB read or written. This thing had been used in this condition for
several months by several people on a daily basis. Then one person
noted that he got a corrupted archive sometimes (was a large file)
when reading it, and sometimes not. There where likely quite
a few changed files on disk at that time. If you have files that
react badly to changed bits, that is a desaster.

The solution was just to set the memory timing more conservatively.
I made it two steps slower, without noticable impact on performance.

Note on ECC: If you get very little single bit-errors without
ECC active, ECC will likely solve your problem. If you a lot of
single-bit errors, or even only very fwe multiple-bit errors, then
ECC wil not really help and will let errors through. For my scenario
(single, random bit every 2GB), ECC would have done fine.

Arno
--
For email address: lastname AT tik DOT ee DOT ethz DOT ch
GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
"The more corrupt the state, the more numerous the laws" - Tacitus


  #8  
Old March 11th 04, 02:43 PM
J. Clarke
external usenet poster
 
Posts: n/a
Default

Arno Wagner wrote:

In comp.sys.ibm.pc.hardware.storage CBFalconer
wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.


Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.


Crashes are not your worst enemy. Undetected data corruption is.

I once debugged a fileserver that did flip one bit on average per
2GB read or written. This thing had been used in this condition for
several months by several people on a daily basis. Then one person
noted that he got a corrupted archive sometimes (was a large file)
when reading it, and sometimes not. There where likely quite
a few changed files on disk at that time. If you have files that
react badly to changed bits, that is a desaster.

The solution was just to set the memory timing more conservatively.
I made it two steps slower, without noticable impact on performance.

Note on ECC: If you get very little single bit-errors without
ECC active, ECC will likely solve your problem. If you a lot of
single-bit errors, or even only very fwe multiple-bit errors, then
ECC wil not really help and will let errors through. For my scenario
(single, random bit every 2GB), ECC would have done fine.


The ECC implemented on PCs can typically correct 1-bit errors and detect
2-bit errors.

One machine I worked with came up with a parity error one day. It was about
a week old at the time so I sent it back to the distributer, who, being one
of these little hole in the wall places and not Tech Data or the like,
instead of swapping the machine or the board, instead had one of his
high-school dropout techs "fix" it. The machine came back sans parity
error. Ran fine for a while, then started getting complaints of data
corruption. Tracked it down finally to a bad bit in the memory. Sure
enough the guy had "fixed" it by disabling parity. Should have sued.

This is one of the pernicious notions surrounding the testing of PCs--the
notion that the only possible failure mode is a hang, totally ignoring the
possibility that there will be data corruption that does not cause a hang,
at least not of the machine, although it may cause the tech to be hung by
the users.

But if you're getting regular errors then regardless of the kind of memory
you're using something is broken. Even with ECC if you're getting errors
reported in the log you should find out why and fix the problem rather than
just trusting the ECC--ECC is like RAID--it lets you run a busted machine
without losing data--doesn't mean that the machine isn't busted and doesn't
need fixing.

Arno


--
--John
Reply to jclarke at ae tee tee global dot net
(was jclarke at eye bee em dot net)
  #9  
Old March 11th 04, 05:10 PM
CBFalconer
external usenet poster
 
Posts: n/a
Default

"J. Clarke" wrote:
Arno Wagner wrote:
CBFalconer wrote:
Colin Painter wrote:

If I can add a bit to JT's reply...

If you are overclocking your memory you risk getting more errors
than the guys who built the memory planned on. If the memory is
not ECC memory then you may get more single bit errors which will
cause your machine to stop when they occur. ECC memory can
correct single bit errors but non-ECC memory can only detect them
and when that happens windows will blue screen. Most home PCs
have non-ECC memory because it's cheaper.


Correction here - non ECC memory won't even detect any errors, it
will just use the wrong value. Sometimes that MAY cause the OS to
crash. Unfortunately the rest of the thread is lost due to
top-posting.


Crashes are not your worst enemy. Undetected data corruption is.

I once debugged a fileserver that did flip one bit on average per
2GB read or written. This thing had been used in this condition for
several months by several people on a daily basis. Then one person
noted that he got a corrupted archive sometimes (was a large file)
when reading it, and sometimes not. There where likely quite
a few changed files on disk at that time. If you have files that
react badly to changed bits, that is a desaster.

The solution was just to set the memory timing more conservatively.
I made it two steps slower, without noticable impact on performance.

Note on ECC: If you get very little single bit-errors without
ECC active, ECC will likely solve your problem. If you a lot of
single-bit errors, or even only very fwe multiple-bit errors, then
ECC wil not really help and will let errors through. For my scenario
(single, random bit every 2GB), ECC would have done fine.


The ECC implemented on PCs can typically correct 1-bit errors and
detect 2-bit errors.

One machine I worked with came up with a parity error one day. It
was about a week old at the time so I sent it back to the distributer,
who, being one of these little hole in the wall places and not Tech
Data or the like, instead of swapping the machine or the board,
instead had one of his high-school dropout techs "fix" it. The
machine came back sans parity error. Ran fine for a while, then
started getting complaints of data corruption. Tracked it down
finally to a bad bit in the memory. Sure enough the guy had "fixed"
it by disabling parity. Should have sued.

This is one of the pernicious notions surrounding the testing of
PCs--the notion that the only possible failure mode is a hang,
totally ignoring the possibility that there will be data corruption
that does not cause a hang, at least not of the machine, although
it may cause the tech to be hung by the users.

But if you're getting regular errors then regardless of the kind of
memory you're using something is broken. Even with ECC if you're
getting errors reported in the log you should find out why and fix
the problem rather than just trusting the ECC--ECC is like RAID--it
lets you run a busted machine without losing data--doesn't mean
that the machine isn't busted and doesn't need fixing.


Well, this is somewhat refreshing. Usually when I get on my horse
about having ECC memory I am greeted with a chorus of pooh-poohs,
and denials about sneaky soft failures, cosmic rays, useless
backups, etc. etc. In fact, walk into most computer stores and
start talking about ECC and you will be greeted with blank stares.

--
Chuck F ) )
Available for consulting/temporary embedded and systems.
http://cbfalconer.home.att.net USE worldnet address!


  #10  
Old March 11th 04, 04:53 PM
Alexander Grigoriev
external usenet poster
 
Posts: n/a
Default

I've had an MB, which occasionally corrupted bit 0x80000000, but only during
disk I/O! And the corrupted bit position was unrelated to I/O buffers! Of
course, standalone memory test didn't find anything. I've had to modify the
test to make it run under Windows and also run parallel disk I/O threads. In
that mode, the failure was detected in a minute. Had to dump the MB.
Replacing memory and CPU didn't help.

"Arno Wagner" wrote in message
...
Crashes are not your worst enemy. Undetected data corruption is.

I once debugged a fileserver that did flip one bit on average per
2GB read or written. This thing had been used in this condition for
several months by several people on a daily basis. Then one person
noted that he got a corrupted archive sometimes (was a large file)
when reading it, and sometimes not. There where likely quite
a few changed files on disk at that time. If you have files that
react badly to changed bits, that is a desaster.

The solution was just to set the memory timing more conservatively.
I made it two steps slower, without noticable impact on performance.



 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Safe" memory testing Timothy Lee General 1 March 8th 04 09:04 PM
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming...(Server Problems) Aaron Dinkin General 0 December 30th 03 03:29 AM
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming... Aaron Dinkin General 0 December 30th 03 03:12 AM
Buying Kingston RAM chips... Wald General 7 December 6th 03 05:56 AM
Chaintech 7NIF2 motherboard - memory problems Wuahn General 1 July 26th 03 01:29 PM


All times are GMT +1. The time now is 11:26 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.