If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Do SSD drives really fail a lot ?
Do SSD drives really fail a lot ?
http://www.codinghorror.com/blog/201...ive-scale.html "… I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty. " Lynn |
#2
|
|||
|
|||
Do SSD drives really fail a lot ?
"Lynn McGuire" wrote in message
... Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html "… I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty. " LM omitted from the next page: "Solid state hard drives are so freaking amazing performance wise, and the experience you will have with them is so transformative, that I don't even care if they fail every 12 months on average! I can't imagine using a computer without a SSD any more; it'd be like going back to dial-up internet .. . . " -- Don Phillipson Carlsbad Springs (Ottawa, Canada) |
#3
|
|||
|
|||
Do SSD drives really fail a lot ?
Lynn McGuire wrote:
Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html "? I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty. " Lynn It depends on your usage pattern and the SSD. Failure rate is a designed feature with SSDs, i.e. the manufacturers know pretty well how much writing an SSD can take. By designing wear-leveling and spare capacity, they can design a specific write load that kills a drive. In the beginning, this process is shaky though and whole drive series can have worse reliability. The typical reliability design goal is a 5% failure rate per year for an average usage pattern. Consumers are willing to tolerate that. That is a real failure rate, but it is not "all the time". There are people that think because SSDs are not suceptible to mechanical damage, they could do without backup. Thise people will lose their data, no matter what storage medium it is on, untill some day no money can be saved by aiming for that 5% and reliability slowly goes up. That said, I think the coding horror person (which has some prrry nice things about coding in his blog) has a census of mostly early models. These, like any new technology, have increased failure rates, as the manufacturers try to aim for that 5%/year but make mistakes in the process. It could also just be a statistical annomaly. There is one additional thing: SSDs are susceptible to heat, just like any other electronics and to bad power. It is possible that the guy with the 8 of 8 dead deives just killed them by overheating or by voltage-spikes from a cheap/bad PSU. For heat, rule of thumb is half the lifetime every 10C for semiconductors and this works pretty well. I have seen it several times now, one a 22 unit network card sample. As SSDs contain power circutry, some parts of them run much hotter (step-up regulators for converting 5V to the write-voltage needed), and lifetime of 5 years is typically calculated at 40C environmental temperature. Run them at 60C and you get 1.25 years average lifetime. Other example: Memory and logic chips have something like 30 years at 25C (figure from a very old Intel databook). Run them at 65C and you get around 2 years lifetime. That means you get the first failured (depending on sample size) after 1-1.5 years and after 3 years most are dead. This incidentally was my intital measurement and prediction for the 22 network cards and what happened then. Note that high-performance CPUs are different, as they are more designed as power semiconductors. But chipsets are not. I have seen several fail from inadequate cooling in 1-3 years. There is one other effect at work he A lot of people expected SSDs to be much more reliable than HDDs. They are not in general, see above. This can lead to disappointments causing overstatement of the problem. Altogether, I don't believe we are seeing more than early-adopter problems, and they are always the same. Also, there are certainly cheap SSDs and better SSDs, just like allways and it is possible to treat SSDs well or badly. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans |
#4
|
|||
|
|||
Do SSD drives really fail a lot ?
On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put
finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. - Franc Zabkar -- Please remove one 'i' from my address when replying by email. |
#5
|
|||
|
|||
Do SSD drives really fail a lot ?
Franc Zabkar wrote:
On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. That should not happen if the firmware designers know how to do this. The trick is to have a log-structure. In addition enough stored power to complete one write is also a good idea but not strictly needed. I did have USB flash drives lose all data and return different data on each read. That would be an explanation. The problem went away after a full overwrite. I guess the developers of these devices are still learning how to do this right. Not that the relevant algorithms have been around for several decades. This possibly is an education problem. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. With a supercap you can always complete the write. It is possible to deal with this issue in the filesystem case by accepting that writes some time before the power failure (seconds) may get lost. The filesystem needs to be aware of the SSD blocksize though. Otherwise you can get corruption in data that was not actually requested to be written, which is really bad. I guess how to do this in practice is still being hashed out at this time. Personally, I do not trust SSDs at the moment, because of this error amplification property and for other reasons. The one SSD I have with critical data is in a RAID1 with normal disks. Reads are done from the SSD, unless there is an error, which gives me SSD speeds for my apllication. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans |
#6
|
|||
|
|||
Do SSD drives really fail a lot ?
On Tue, 17 May 2011 10:43:49 +1000 Franc Zabkar
wrote in Message id: : On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. Be wary of the new Intel SSD 320 series. Currently, there's a bug in the controller that can cause the device to revert to 8MB during a power failure. AFAIK they have not yet publicly announced it, and won't have a firmware fix ready for release until the end of July. We had an SSD 320 600GB 2.5" SATA drive in for evaluation from our Intel rep. I was able to kill it in two or three hours by power cycling it. Apparently (according to the Intel rep) when the power failure is happening, the SSD device tries to reconnect with the SATA port instead of initiating a proper shutdown. Something to do with interrupt priority being higher for reconnection rather than a proper shutdown. I was able to kill their 80GB device as well. We've sent both drives back to Intel and they're going to give us their pre-release firmware for testing. |
#7
|
|||
|
|||
Do SSD drives really fail a lot ?
JW wrote:
On Tue, 17 May 2011 10:43:49 +1000 Franc Zabkar wrote in Message id: : On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. Be wary of the new Intel SSD 320 series. Currently, there's a bug in the controller that can cause the device to revert to 8MB during a power failure. AFAIK they have not yet publicly announced it, and won't have a firmware fix ready for release until the end of July. We had an SSD 320 600GB 2.5" SATA drive in for evaluation from our Intel rep. I was able to kill it in two or three hours by power cycling it. Apparently (according to the Intel rep) when the power failure is happening, the SSD device tries to reconnect with the SATA port instead of initiating a proper shutdown. Something to do with interrupt priority being higher for reconnection rather than a proper shutdown. I was able to kill their 80GB device as well. We've sent both drives back to Intel and they're going to give us their pre-release firmware for testing. Interesting. Goes to show that firmware development is apparently not done any better than other software development. I am tempted to run my next SSD through similar tests before using it. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans |
#8
|
|||
|
|||
Do SSD drives really fail a lot ?
On Tue, 17 May 2011 06:32:45 -0400 JW wrote in Message id:
: On Tue, 17 May 2011 10:43:49 +1000 Franc Zabkar wrote in Message id: : On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. Be wary of the new Intel SSD 320 series. Currently, there's a bug in the controller that can cause the device to revert to 8MB during a power failure. AFAIK they have not yet publicly announced it, and won't have a firmware fix ready for release until the end of July. We had an SSD 320 600GB 2.5" SATA drive in for evaluation from our Intel rep. I was able to kill it in two or three hours by power cycling it. Apparently (according to the Intel rep) when the power failure is happening, the SSD device tries to reconnect with the SATA port instead of initiating a proper shutdown. Something to do with interrupt priority being higher for reconnection rather than a proper shutdown. I was able to kill their 80GB device as well. We've sent both drives back to Intel and they're going to give us their pre-release firmware for testing. The Pre-release firmware also had the problem. I ended up supplying Intel SSD engineering with my test platform and they reproduced the problem and have a fix pending. See: http://communities.intel.com/thread/24121?tstart=0 The firmware is not yet released however. Looks like this Usenet thread caused quite a bit of commotion on their forum: http://communities.intel.com/thread/22227?tstart=0 |
#9
|
|||
|
|||
Do SSD drives really fail a lot ?
JW wrote:
On Tue, 17 May 2011 06:32:45 -0400 JW wrote in Message id: : On Tue, 17 May 2011 10:43:49 +1000 Franc Zabkar wrote in Message id: : On Tue, 03 May 2011 10:30:46 -0500, Lynn McGuire put finger to keyboard and composed: Do SSD drives really fail a lot ? http://www.codinghorror.com/blog/201...ive-scale.html The most common reason for failure (90%) in flash drives appears to be translator corruption (damaged lookup tables), especially if the power fails while the translator is being updated. Afterwards the drive powers up in safe mode with a very small capacity. What are the Flash drives' typical failures [Public Forum]: http://www.salvationdata.com/forum/topic1873.html I suspect that SSDs may be similarly affected. Perhaps that's why some newer models have large super capacitors for power backup. Be wary of the new Intel SSD 320 series. Currently, there's a bug in the controller that can cause the device to revert to 8MB during a power failure. AFAIK they have not yet publicly announced it, and won't have a firmware fix ready for release until the end of July. We had an SSD 320 600GB 2.5" SATA drive in for evaluation from our Intel rep. I was able to kill it in two or three hours by power cycling it. Apparently (according to the Intel rep) when the power failure is happening, the SSD device tries to reconnect with the SATA port instead of initiating a proper shutdown. Something to do with interrupt priority being higher for reconnection rather than a proper shutdown. I was able to kill their 80GB device as well. We've sent both drives back to Intel and they're going to give us their pre-release firmware for testing. The Pre-release firmware also had the problem. I ended up supplying Intel SSD engineering with my test platform and they reproduced the problem and have a fix pending. See: http://communities.intel.com/thread/24121?tstart=0 This is rather patheric on their side (not so at all on your side, obviously). The firmware is not yet released however. Looks like this Usenet thread caused quite a bit of commotion on their forum: http://communities.intel.com/thread/22227?tstart=0 Understandable. The conclusion can only be to stay away from Intel SSDs for the next few years, until they have demonstrated they their Q/A under control and have started to take the date safety of their customers seriously. It also underlines somethign I have been saying for a while, namely that SSDs should be regarded as less reliable than HDDs at this time, because of engineering screw-ups like this one. My SSDs are either in a RAID with non-SSDs (with "write mostly" that gives SSD read-speeds under Linux software RAID) or do not have critical data on them. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
SSD drives rock | John Doe | Homebuilt PC's | 9 | March 13th 10 06:55 AM |
Info on connecting ssd drives to motherboards | SantaClaus | Homebuilt PC's | 4 | October 24th 09 12:27 AM |
ssd esata drives | P. Kaminski | General | 7 | June 8th 09 08:50 PM |
SSD drives -- anyone have experience? | journey | Dell Computers | 3 | March 13th 09 05:15 PM |
raid10.. how many drives can fail and still have the array in tact?(4 drives/8 drives) | markm75 | Storage (alternative) | 18 | December 23rd 07 03:48 AM |