View Single Post
  #1  
Old October 16th 14, 03:44 PM posted to comp.sys.ibm.pc.hardware.storage
Mark F[_2_]
external usenet poster
 
Posts: 164
Default SSD life self monitoring question

Do any SSDs use any pages to monitor the expected life of the product.

Pages in various physical locations could be set to known values.
These pages would not be refreshed by the usual periodic rewrites
or moving.

As the device had data written to it, additional pages would start
to be used for monitoring. The addition pages would be selected
by virtue of having already been rewritten an interesting number
of times (Say 10%, 20%...100%, 110%... of the expected average
rewrite lifetime for pages.)

The pages being monitored would be checked every once in a while.
If "enough" pages showed "enough" decay or needed "enough"
error correction, then all of the pages that had been
rewritten that many times or more and which hadn't been refreshed
for the same length of time or more would have their data moved
or refreshed into the same location. The SSD could be divided
into areas depending on physical location on the device, and
the "extra" rewrites done in each area based on monitored pages
within the area.

Simpler alternatives:
1. only refresh a page when read error rate exceeds typical
value for pages
2. only refresh a page when read error rate indicates decay
with be lost soon compared to typical values for pages
3. refresh everything that hasn't been refreshed in some
amount of time. Perhaps this time is automatically changed based
on experience for this particular device. Perhaps the time
interval is based on the current total number of writes
for this particular device.

My question/proposal is about adding monitoring at a
finer grain than the entire device.

NOTE:
The manufacturers keep everything secret, so I
can't guess how much data loss rate would decrease,
the read speed would increase, or if the average useable
life in total data written by the user would increase or decrease.

It might be the case that refreshing everything once a month
would be enough to greatly decrease read error correction
time and greatly reduce data loss, while at the same using
only less than %10 of the life of a device.\
(10 year design life means 120 writes used.
Typical MLC life numbers higher quality devices
are 10 full write/day for 5 years = 365*10*5 = 1825 average writes
of user capacity amount. Even if you reduce take into
account over population, you still have an average of more
than 1200 writes/cell available. (These devices might actually
have an expected life of about 3000 writes/cell.)

Lower quality devices typically are rated for 5 years and
probably have a design life of 5 years also. These can
be written 0.1/day. This would indicate that expected
average writes/cell is only 180 or so, but I judging
by the press, I think that the life expected average
life is 700 so. 60 periodic rewrites might "waste"
1/3 to 1/10 of the device life. Thus, I think that
finer grained monitoring could pay for these devices.

I started thinking about this due to the
Samsung 840 EVO Performance Drop that turns out to
have been related to excessive time taken by read
error recovery of "old" data, as indicated by
trade publications.

I haven't seen a press release or page at www.samsung.com that
confirms that the problem is due to read error recovery,
but here is a pointer to a description of the patch:
https://www.samsung.com/global/busin...downloads.html
at: "Samsung SSD 840 EVO Performance Restoration Software"