If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
Disk to disk copying with overclocked memory
On Thu, 11 Mar 2004 00:40:47 GMT, Mark M
wrote: I use a partition copier which boots off a floppy disk before any other OS is launched. If I copy a partition from one hard drive to another, then is there any risk of data corruption if the BIOS has been changed to aggressively speed up the memory settings? For example the BIOS might set the memory to CAS=2 rather than CAS=3. Or other memory timing intervals might also be set to be shorter than is normal. I am thinking that maybe the IDE cable and drive controllers handle data fairly independently of the memory on the motherboard. So maybe data just flows up and down the IDE cable and maybe the motherboard is not involved except for sync pulses. There are three scenarios I am thinking about: (1) Copying a partition from one hard drive on one IDE cable to another hard drive on a different IDE cable. (2) Copying a partition from one hard drive to another which is on the same IDE cable. (3) Copying one partition to another on the same hard drive. How much effect would "over-set" memory have on these situations? Do the answers to any of the above three scenarios change if the copying of large amounts of data files is done from within WinXP? Personally, I would guess that it is more likely that motherboard memory comes into play if Windows is involved. 1. All copies go through memory using at least a block sized buffer of ram. Buffers at least large enough to hold an entire track will be used, probably larger for more effeciency. Data is always copied from a drive to a memory buffer first. Might be directly, using DMA (the M is memory), but it will be to and from memory. What part of memory is used will vary depending on the program and whether you are running it under windows, but a single bit error in the wrong place in memory can be a major problem. 2. If your memory timing is aggressive enough that errors are likely, then there are a number of things that could go wrong. There could be an error in the data that gets copied. You could also have the wrong disk address stored in ram so the data goes to the wrong place. Could be the wrong instruction so the program crashes. Could be any one of hundreds of possible single bit failures that might go unnoticed. ECC would help here (would catch most possible memory errors). If you want reliability in anything (not just copying disks) then don't push your memory (or other components) to the edge. JT |
#2
|
|||
|
|||
If I can add a bit to JT's reply...
If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Overclocking could also cause the occasional double bit error which non-ECC memory cannot detect. This would be bad. As JT indicates, this could cause all sorts of mayhem. If you're lucky, windows could execute a broken instruction or reference a memory address in outer space and then blue screen. If you are unlucky it could blunder on using bad data and do something nasty to your file system (or it could harmlessly stick an umlaut onto the screen somewhere.) Hard to predict. cp "Mark M" wrote in message ... I use a partition copier which boots off a floppy disk before any other OS is launched. If I copy a partition from one hard drive to another, then is there any risk of data corruption if the BIOS has been changed to aggressively speed up the memory settings? For example the BIOS might set the memory to CAS=2 rather than CAS=3. Or other memory timing intervals might also be set to be shorter than is normal. I am thinking that maybe the IDE cable and drive controllers handle data fairly independently of the memory on the motherboard. So maybe data just flows up and down the IDE cable and maybe the motherboard is not involved except for sync pulses. There are three scenarios I am thinking about: (1) Copying a partition from one hard drive on one IDE cable to another hard drive on a different IDE cable. (2) Copying a partition from one hard drive to another which is on the same IDE cable. (3) Copying one partition to another on the same hard drive. How much effect would "over-set" memory have on these situations? Do the answers to any of the above three scenarios change if the copying of large amounts of data files is done from within WinXP? Personally, I would guess that it is more likely that motherboard memory comes into play if Windows is involved. |
#3
|
|||
|
|||
Colin Painter wrote:
If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. -- Chuck F ) ) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address! |
#4
|
|||
|
|||
CBFalconer wrote:
Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. You seem to have confused ECC and parity. ECC means error checking and correcting, which involves more redundancy than simple single bit parity error checking. -- The e-mail address in our reply-to line is reversed in an attempt to minimize spam. Our true address is of the form . |
#5
|
|||
|
|||
"CJT" wrote in message ... CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. You seem to have confused ECC and parity. Or you have. **** all ram is parity anymore. ECC means error checking and correcting, which involves more redundancy than simple single bit parity error checking. Which isnt seen much anymore. |
#6
|
|||
|
|||
CJT wrote:
CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. You seem to have confused ECC and parity. ECC means error checking and correcting, which involves more redundancy than simple single bit parity error checking. Nothing uses parity checking today - that requires writing individual 9 bit bytes. Expanded to a 64 bit wide word (for the various Pentia etc.) the parity or ECC bits both fit in an extra 8 bits, i.e. a 72 bit wide word. If todays systems have no ECC they have no checking of any form. ECC is actually no harder to handle on wide words. Memory configurations that can use parity can use ECC, the reverse is not true. Exception - some embedded systems with smaller memory paths may use parity. -- Chuck F ) ) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address! |
#7
|
|||
|
|||
In comp.sys.ibm.pc.hardware.storage CBFalconer wrote:
Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. Arno -- For email address: lastname AT tik DOT ee DOT ethz DOT ch GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F "The more corrupt the state, the more numerous the laws" - Tacitus |
#8
|
|||
|
|||
Arno Wagner wrote:
In comp.sys.ibm.pc.hardware.storage CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. The ECC implemented on PCs can typically correct 1-bit errors and detect 2-bit errors. One machine I worked with came up with a parity error one day. It was about a week old at the time so I sent it back to the distributer, who, being one of these little hole in the wall places and not Tech Data or the like, instead of swapping the machine or the board, instead had one of his high-school dropout techs "fix" it. The machine came back sans parity error. Ran fine for a while, then started getting complaints of data corruption. Tracked it down finally to a bad bit in the memory. Sure enough the guy had "fixed" it by disabling parity. Should have sued. This is one of the pernicious notions surrounding the testing of PCs--the notion that the only possible failure mode is a hang, totally ignoring the possibility that there will be data corruption that does not cause a hang, at least not of the machine, although it may cause the tech to be hung by the users. But if you're getting regular errors then regardless of the kind of memory you're using something is broken. Even with ECC if you're getting errors reported in the log you should find out why and fix the problem rather than just trusting the ECC--ECC is like RAID--it lets you run a busted machine without losing data--doesn't mean that the machine isn't busted and doesn't need fixing. Arno -- --John Reply to jclarke at ae tee tee global dot net (was jclarke at eye bee em dot net) |
#9
|
|||
|
|||
"J. Clarke" wrote:
Arno Wagner wrote: CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. The ECC implemented on PCs can typically correct 1-bit errors and detect 2-bit errors. One machine I worked with came up with a parity error one day. It was about a week old at the time so I sent it back to the distributer, who, being one of these little hole in the wall places and not Tech Data or the like, instead of swapping the machine or the board, instead had one of his high-school dropout techs "fix" it. The machine came back sans parity error. Ran fine for a while, then started getting complaints of data corruption. Tracked it down finally to a bad bit in the memory. Sure enough the guy had "fixed" it by disabling parity. Should have sued. This is one of the pernicious notions surrounding the testing of PCs--the notion that the only possible failure mode is a hang, totally ignoring the possibility that there will be data corruption that does not cause a hang, at least not of the machine, although it may cause the tech to be hung by the users. But if you're getting regular errors then regardless of the kind of memory you're using something is broken. Even with ECC if you're getting errors reported in the log you should find out why and fix the problem rather than just trusting the ECC--ECC is like RAID--it lets you run a busted machine without losing data--doesn't mean that the machine isn't busted and doesn't need fixing. Well, this is somewhat refreshing. Usually when I get on my horse about having ECC memory I am greeted with a chorus of pooh-poohs, and denials about sneaky soft failures, cosmic rays, useless backups, etc. etc. In fact, walk into most computer stores and start talking about ECC and you will be greeted with blank stares. -- Chuck F ) ) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address! |
#10
|
|||
|
|||
I've had an MB, which occasionally corrupted bit 0x80000000, but only during
disk I/O! And the corrupted bit position was unrelated to I/O buffers! Of course, standalone memory test didn't find anything. I've had to modify the test to make it run under Windows and also run parallel disk I/O threads. In that mode, the failure was detected in a minute. Had to dump the MB. Replacing memory and CPU didn't help. "Arno Wagner" wrote in message ... Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Safe" memory testing | Timothy Lee | General | 1 | March 8th 04 09:04 PM |
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming...(Server Problems) | Aaron Dinkin | General | 0 | December 30th 03 03:29 AM |
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming... | Aaron Dinkin | General | 0 | December 30th 03 03:12 AM |
Buying Kingston RAM chips... | Wald | General | 7 | December 6th 03 05:56 AM |
Chaintech 7NIF2 motherboard - memory problems | Wuahn | General | 1 | July 26th 03 01:29 PM |