A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Is EMC's CAS Centera considered "permanent data"?



 
 
Thread Tools Display Modes
  #1  
Old December 6th 05, 10:27 PM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

My EMC rep couldn't answer this question... EMC's Centera is supposed
to replace optical disk ("permanent data"), but an optical disks life
is 35 years plus. Is the Centera considered "permanent data"?

What are existing Centera customers supposed to do when EMC eventually
EOL's and stops supporting, the first generations of Centera purchased
3 to 4 years ago, in the next 3 to 5 years? Buy another newer
generation Centera and migrate the data?

R-

  #2  
Old December 7th 05, 06:44 PM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

In article ,
HVB wrote:
....
The unspoken policy (not just from EMC) has always been to get disk
customers to upgrade to the latest technology within 3 to 5 years of
initial purchase. Usually this is achieved by making the maintenance
costs higher than acquisition costs for the new equipment. However, I
still know some people using 10 year old Symmetrix arrays because they
dare not change anything.


This is not just unspoken policy. It is also
- sensible: Disks have a finite MTBF. If you care about your data,
you have to store it on redundant disks. And even redundancy can't
deal with frequent disk failures - because as soon as disks in a
redundancy group (for example in a mirror pair) fail, the group is
no longer redundant, until a spare disk can be brought into the
redundancy group. And with really old equipment, it is no longer
feasible to acquire spare disks that are compatible with the old
ones. Therefore you have to move your data to new disks (or in
general, to new media, which often implies to a next-generation
system).
- openly known: I doubt that any sales/marketing info coming from the
major vendors will make you believe that it is both technically
possible and economically viable to run your storage systems (disk
arrays) for very long times, for example in excess of 10 or 20
years. It is always possible that there are dishonest sales people
that might imply that, or even say it (without backing it up on
paper); I would even guess that not even my past, present and future
employers are immune from having at least a few dishonest sales
people. But as a rule, storage systems (both disk and tape) have
finite lifetimes, and everyone knows it.

With disk MTBFs traditionally having been around 100K hours (about 12
years), and the requirement that within a redundancy group no more
than 1 disk (with traditional RAID levels 1...5) fail at a time, an
expected lifetime of about 5 years for a storage system makes sense.
Massively increasing that requires either longer-life disks, or
regularly replacing the disks.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy
  #3  
Old December 7th 05, 08:29 PM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

wrote:

....

with really old equipment, it is no longer
feasible to acquire spare disks that are compatible with the old
ones.


Do contemporary arrays really still typically require any real degree of
compatibility in replacement disks, as long as they are at least as
large as what they're replacing (and use the appropriate interface, of
course)? Any array that virtualizes the redundancy at any higher level
than the original physical RAID layouts should certainly not need to
(not necessarily even the 'at least as large' requirement if sufficient
spare distributed space is available).

....

With disk MTBFs traditionally having been around 100K hours (about 12
years), and the requirement that within a redundancy group no more
than 1 disk (with traditional RAID levels 1...5) fail at a time, an
expected lifetime of about 5 years for a storage system makes sense.


Not to be unduly pedantic, but service life (normally 5 years for array
disks) and MTBF (which these days is at least speced to be more like 1+
M hours) should be largely independent of each other as long as the
latter significantly exceeds the former. And as long as you replace
disks as they near the end of their service lives, it's not clear why
you couldn't keep your array running safely as long as disks were made
that it could use.

- bill
  #4  
Old December 8th 05, 01:14 AM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

On 6 Dec 2005 14:27:56 -0800, "RobertDavid"
wrote:

My EMC rep couldn't answer this question... EMC's Centera is supposed
to replace optical disk ("permanent data"), but an optical disks life
is 35 years plus. Is the Centera considered "permanent data"?

What are existing Centera customers supposed to do when EMC eventually
EOL's and stops supporting, the first generations of Centera purchased
3 to 4 years ago, in the next 3 to 5 years? Buy another newer
generation Centera and migrate the data?

R-


Sorry to piggy back a question on someone else's post but is the
Centera the CAS solution from EMC? The one that uses an MD5 hash
table to determine uniqueness?

~F
  #5  
Old December 8th 05, 12:49 PM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

HVB a écrit :
On Thu, 08 Dec 2005 01:14:26 GMT, Faeandar wrote:


Sorry to piggy back a question on someone else's post but is the
Centera the CAS solution from EMC? The one that uses an MD5 hash
table to determine uniqueness?



Yes, that's the one.

HVB.

It depends on what you mean with permanent.
Does your retention time exceed 5 year, or 10 years or more?

Francois
Brennus Solutions
  #6  
Old December 9th 05, 12:04 AM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

On Thu, 08 Dec 2005 09:10:21 +0000, HVB wrote:

On Thu, 08 Dec 2005 01:14:26 GMT, Faeandar wrote:

Sorry to piggy back a question on someone else's post but is the
Centera the CAS solution from EMC? The one that uses an MD5 hash
table to determine uniqueness?


Yes, that's the one.

HVB.


Now, not being a math geek I can't personally verify this but a friend
who actually is a math geek tells me that using MD5 as the method to
determine uniqueness is seriously flawed. Something to do with the
maximum number of unique combinations not being near enough for most
of today's medium to large storage environments.

Anyone got info on that?

~F
  #7  
Old December 9th 05, 01:41 AM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

Faeandar writes:
Now, not being a math geek I can't personally verify this but a friend
who actually is a math geek tells me that using MD5 as the method to
determine uniqueness is seriously flawed. Something to do with the
maximum number of unique combinations not being near enough for most
of today's medium to large storage environments.


The number of unique combinations isn't a problem. MD5 is deprecated
for a different reason, which is a security problem. Using some
sophisticated methods it turns out to be possible to construct, with
considerable effort, pairs of differing files that have the same MD5
hash. That's not the same as being able to construct a file that
hashes to a specific number, but it's still not good. However,
whether it affects Centera depends on how Centera is used. There's no
significant chance of such collisions occurring by accident; it has to
be a deliberate attack.
  #8  
Old December 9th 05, 06:23 AM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

Hi Bill,

In article ,
Bill Todd wrote:
with really old equipment, it is no longer
feasible to acquire spare disks that are compatible with the old
ones.

Do contemporary arrays really still typically require any real degree of
compatibility in replacement disks, as long as they are at least as
large as what they're replacing (and use the appropriate interface, of
course)?


As far as I know (and I'm not really an administrator or user of large
disk arrays, I just have them around as part of my job), for a
high-end array you need to find replacement disks that
- have the same interface (SCSI, FC, SATA, SSA, ...)
- have the right connector (for example for SCSI, this usually means
the 80-pin single connector that has both power and data, although
older Hitachi arrays used dual-ported SCSI disks with two 50-pin
Centronics connectors),
- have firmware versions the array controllers recognize (most
high-end arrays will only accept drives of known model and firmware
version),
- and have the correct capacity (not always true, but for traditional
RAID controllers, if one disk in the RAID group has higher capacity,
the extra capacity can't be used, unless the array virtualizes RAID
groups across physical drives, which if any is done only recently)
which pretty much restricts you to same-model replacement drives,
obtained through the array vendor (not off the street) to get the
correctly modified drive firmware.

Everything except the first two requirements sound pretty harsh. But
look at it from the point of view of the array vendor: they have spent
a huge amount of time and money qualifying and testing the whole array
and its constituent components to work correctly, when using disk
drives model X capacity Y firmware version Z. Now some idiot comes,
and gets a replacement drive at Fry's or CompUSA or Best Buy, and it
is model A capacity BY and firmware version C. At this point, any
bug or incompatibility puts all the data on the disk array at risk.
And high-end disk arrays are not supposed to lose data, and customers
tend to get very ticked off when it happens. So the array software is
being so restrictive to protect both the customer (who is being
penny-wise pound-foolish) and the array vendor.

Or to put it differently: customers who are willing to spend on the
order of $1M on a disk array should be smart enough to budget the
expected maintenance and part replacement outlays.

Now, with low-end arrays (for example my favorite for home use, the
3Ware cards using IDE drives), this is a different story. I think on
a 3Ware card you can mix and match drives with reckless abandon -
except that your RAID group will have a capacity which is the minimum
of the drives in the RAID group.

The big difference here is one of customer expectation. If a large
bank or insurance company buys a disk array from one of the big
vendors [EHI][BDM][CMS], and the array loses a lot of data, the CEO of
the bank/insurance will call Joe or Shinjiro or Sam on the cellphone
while they are in the middle of a golf game, and verbally tear into
them, thereby disrupting their golf scores. Bad scene. If my home
machine loses its data because I used a cheap flea market disk on my
3Ware or Adaptec or LSI RAID controller, I go to the kitchen, pour
myself a stiff drink, tell my wife to not use the computer for a day
or two, kick the dog for good measure, and start looking on the bottom
shelf of the gun safe for my most recent set of backup tapes (FYI, we
don't have a dog at home, that part of the anecdote was a joke).
Therefore, big disk arrays built by the big companies are more
paranoid than PCI cards intended for a different user population.

Not to be unduly pedantic, but service life (normally 5 years for array
disks) and MTBF (which these days is at least speced to be more like 1+
M hours) should be largely independent of each other as long as the
latter significantly exceeds the former.


True - economically viable service life is today much shorter than
MTBF. It is very tempting to replace 9GB disks with 180GB disks, as
the 180GB disk is as fast and uses as little power. This might change
in the future, as we are expecting the capacity growth of disk drives
to dramatically slow; it is very possible that disks bought in 2006
will still be economically viable (capacity-wise) in 2011 or 2016.
And if they actually achieve the rated million-hour MTBF (this is a
big if), enough of them will still be running to make it sensible to
continue using them that far into the future.

We are sort of at a strange inflection point. The MTBF of drives has
in the last few years increased massively, from O(100K) to O(1M)
hours. Whether this million-hour MTBF can actually be delivered in
practice remains to be seen (ask me again in a decade or two). At the
same time, the capacity of drives has been increasing by 60% or 80%
per year, even faster than Moores law for CPUs. This means that there
are lots of old, failing drives out there, and it is extremely
tempting to replace them with new drives with much higher capacity. I
would expect that capacity increase will slow down massively, while
the new drives have extremely high MTBF (which may actually decrease
again in the future, as we replace well-built enterprise-grade FC/SCSI
disks with consumer-grade SATA disks, but it is starting at a very
high level). This is likely to change the economics of the storage
industry significantly.

And as long as you replace
disks as they near the end of their service lives, it's not clear why
you couldn't keep your array running safely as long as disks were made
that it could use.


Today, that would be possible, but expensive - you'd be paying
hundreds of $$$ for replacement drives, when for the same money you
can get much bigger capacity drives. So much bigger that it can
easily pay for replacing all the disk array including controllers. I
think this is one of the big reasons driving the trend to replace
high-end disk arrays with mid-range arrays (here defined as: high-end
looks like a set of multiple refrigerators, while mid-range looks like
a 4U or 7U rackmount box): The purchase cost of a new mid-range array
at the same capacity point is much lower than the maintenance and
power and floor-space cost for the old dinosaur.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy
  #9  
Old December 9th 05, 06:43 AM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

In article ,
Faeandar wrote:
On Thu, 08 Dec 2005 09:10:21 +0000, HVB wrote:

Now, not being a math geek I can't personally verify this but a friend
who actually is a math geek tells me that using MD5 as the method to
determine uniqueness is seriously flawed. Something to do with the
maximum number of unique combinations not being near enough for most
of today's medium to large storage environments.


AFAIK, the Centera uses a 128-bit hash code. So the probability that
two documents have the same hash code is 2^(-128), which is about
10^(-39). But that's not the relevent question - which is the
birthday paradox. Remember: If you have 23 people in a room, the
probability that two of them have the same birthday is about 1/2 -
even though there are 365 distinct birthdays. That's because with 23
people, there are lots of combination which could give you a match.
For a hash code system, the probability of a hash collision begins to
be significant when the number of hashes is roughly the square root of
the number of distinct hash values, or in our case 2^64. Now let's
assume that the typical object or file stored in a Centera is about
1MB in size (I made up that number completely out of thin air, but it
sounds plausible). To store 2^64 = 1.8 * 10^19 objects, you would
need a Centera that can store about 1.8 * 10^10 PB (or 18 billion
petabytes). I don't know what the largest Centera is you can buy, but
it is less than 1PB in size (in practice, it is probably closer to
1/10 of that). So we have lots and lots of safety margin.

Now obviously, this ignores the possibility of a smart attacker
(already pointed out by Paul Rubin) deliberately gaming the system to
create a hash collision (a bad thing to do). And even with
infinitesimally small probabilities, I would sleep better at night
knowing that in the case of a hash collision (or a hash match during
object lookup), a full-content comparison has been performed. But
then, I store my personal data on crappy consumer grade equipment
myself, so who am I to talk.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy
  #10  
Old December 10th 05, 08:16 PM posted to comp.arch.storage
external usenet poster
 
Posts: n/a
Default Is EMC's CAS Centera considered "permanent data"?

wrote:
petabytes). I don't know what the largest Centera is you can buy, but
it is less than 1PB in size (in practice, it is probably closer to
1/10 of that). So we have lots and lots of safety margin.


Here available several sets of different files with same MD5 hashsum:
http://www1.corest.com/corelabs/proj...rch_topics.php

And here you can get MD5 Collision generator:
http://www.stachliu.com/collisions.html

It would be nice if someone can pass it thru Centerra and look what comes
out

But then, I store my personal data on crappy consumer grade equipment
myself, so who am I to talk.


Hehe, Centera consists of 250Gb ATA Harddisk enclosed in several 1U Dell
servers interconnected via Allied Telesyn Ethernet switch. Nothing fancy
or enterprise grade here.
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is EMC's CAS Centera considered "permanent data"? RobertDavid Storage & Hardrives 0 December 6th 05 10:27 PM


All times are GMT +1. The time now is 12:43 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.