View Single Post
  #13  
Old August 9th 18, 12:28 PM posted to alt.comp.hardware.pc-homebuilt
Paul[_28_]
external usenet poster
 
Posts: 1,467
Default "Why I Will Never Buy a Hard Drive Again"

Bill wrote:
mike wrote:

How's the reliability?
I'm still reading that they fail catastrophically without warning.


I've heard that the reliability of SSDs far exceeds that of the
mechanical hard drives (for, in fact, an obvious reason--no moving
parts). The "trim" software for my Intel SSD even provides an indication
of the drive's reliability (I'm not sure how well that works). I do
regular backups too.


You're the perfect customer for an SSD.

You're mixing up reliability and wear life.

Reliability consists of two components. Say a solder joint
on the PCB fails. It causes the device to stop delivering
the intended function. That's part of the reliability
number. Let's pretend for the sake of argument, it's
an MTBF of 2 million hours. In some cases, just the
tiny power converter inside, making VCore for some chip,
might dominate the reliability calc (you can't make
a power converter better than about 10 million hours
or 100 FITS).

OK, well, what rate do bugs show up in the SSD firmware ?
We don't know. We do know, that early SSDs "bricked"
due to firmware. In some cases, the drive even "bricked"
during a firmware update (but of course the owner backed
up the data, making the situation not quite the same).
In a system at work, our reliability expert (a guy with
a PhD in the subject), warned that some large products
we were selling, it was quite possible the software
was dropping the system reliability by a factor of 10.

Now the MTBF is down to 200,000 hours. You will find
Seagate and WDC unwilling to factor this in. While our
reliability expert argued for this, only field data
could indicate how sucky our software was.

*******

Wear life is different. Both hard drives and SSDs wear.
In the case of the SSD, the mechanism is known and
predictable. If you know the temperature when the
writes were done, you know the temperature of the
media over long-term life, you can make a reasonably
accurate prediction of wear. (High temperatures
anneal defects, but high temperatures might also
shorten retention time.)

Hard drives are different. The manufacturer won't admit
to wear. The manufacturer won't prepare large quantities
of drives, and simulate life conditions, and provide
curves related to wear. But, third party studies have
noted wear characteristics in the failure population
curves. Instead of a traditional bathtub curve, drive
failures have another shape in the graph. There are
tremendous differences between various model numbers
for this (things that might be noted by Newegg reviewers
if a model is for sale for long enough).

*******

Now, let's summarize:

What do you have to know as an SSD owner.

1) Consider the history of the technology. You're doing
basically what my PhD guy at work was doing, consulting
a "field return data" log and noting brickage, brickage
caused by bad firmware. For early SSD drives, you
wouldn't touch them with a barge pole. Especially
the ones with "predictable brickage", where the
device fails after being powered for exactly
30 days. Owners who didn't hear about the 30 day
brickage, might not have known (in time) that there
was a firmware update for it, to be applied in advance.
If it bricked and you had no backup (because it was
"reliable"), well, "fool you once". Now you're learning.

2) Consider the wear life. The drives are taking fewer
and fewer write cycles per flash location, as the
technology "advances". The storage cells are getting
"mushy". SLC, MLC, TLC, QLC. SLC is great stuff. Maybe
100,000 write cycles and 10 year retention. QLC might
be 1,000 write cycles and ?? year retention. A Samsung
TLC was showing signs of being "mushy", by requiring
significant error correction inside (to the point it
was slowing the read rate). Roughly 10% of the storage
capacity on the drive, is reserved for ECC code storage,
protecting the data from errors. That is a very high ratio,
much highe than hard drives in the past. It's quite possible
every sector has at least one error in it, corrected
by the CPU inside before you get it. And now, they're just
starting to ship QLC.

3) Consider the end of life policy. Not all the drive
brands have the same policy. Some return an error
on each write at end of life (as a cheap way of warning
you), causing the SSD to enter "read-only state". That
is a reasonable policy, helping to warn and cover people
who refuse to make backups. Windows won't run on a read only
device, so you'll be smothered in error dialogs. That
will get your attention, and make you back up the drive.

But Intel just "bricks" the drive, when the *computed*
wear value is exceeded. With an Intel brand SSD, you
had better be monitoring the "life remaining percentage"
*very very carefully* . That's why the promotion in that
Toms article above is particularly egregious. The dude is
promoting an Intel QLC SSD (yuck!) which has a total-brickage
end-of-life policy (double yuck!). What could go wrong ?
If you're not paying attention, Beuler, you suddenly
lose access to your data. Did you have backups ? No ?
"Fool you twice".

So, yeah, SSDs have no moving parts, and hay, they're
"reliable". A stupid MIL spec calc prepared by the
marketing department (not by engineers), says so.

The firmware could have bugs. Not quantified in a
MIL spec calc. They could have include field data in the
MIL spec calc, but they'd be nuts to do so. No one is
there to slap their fingers for failing to do this.
The history of SSDs would mean dropping the MIL spec calc by a
factor of ten. No marketing guy is going to allow that.
But if your Sherman Tank is booted off an SSD, you
can be damn sure two PhDs got into a spat about what
the real reliability is. Between big companies doing
business, the MTBF is "negotiated". The customer
would say "hay, idiot, include firmware reliability
in your calc".

The wear life is tangible. There's an indicator in
SMART. What is the brickage policy of your brand ?
Pay attention!

Is an SSD the same as a hard drive ?

No, it is not.

HTH,
Paul