How would you store 100TB data?

[email protected] · #11 March 1st 06, 06:54 AM posted to comp.arch.storage

flux wrote:
In article ,
HVB wrote:

On Tue, 28 Feb 2006 00:14:40 -0500, flux wrote:

I wonder if there are ANY data centers that store 100 TB let alone...

?!?!

I know one data centre that has 1PB (yes, you read that right) of
useable storage, with over half of it actually consumed.

I'm currently designing a new data centre which will require over
2.5PB of useable storage capacity. The amount of actual raw storage
required is very much higher than this.

These sound like special exceptions.

So, yes, plenty of data centres actually store more than 100TB of
data.

You seem to be saying that 2 cases you just quoted count for plenty.

While 100TB shops are certainly at the larger end of the spectrum,
they're hardly uncommon. If they were, why would the big storage
vendors all sell single arrays that can store more than that? For
example, IBM's DS8000 can store 192TB, EMC's Symmetrix DMX2000 holds
118TB, the DMX3000 230TB, and the DMX-3 1052TB, and HP's StorageWorks
XP12000 supports 332TB.

Faeandar · #12 March 1st 06, 06:51 PM posted to comp.arch.storage

On Tue, 28 Feb 2006 00:14:40 -0500, flux wrote:

In article 1140933623.229616@smirk,
wrote:

Since disks are so unreliable (they are typically the least reliable

Oh really?

Really. Rarely do I need to have a mb or power supply swapped. Same
goes for cpu although a few bugs have caused more ram swaps than I
would like.

But disks fail every day. I manage a decent sized NAS environment and
of the 400TB of usable storage I've only once had to have a
motherboard replaced, twice ram, and the occasional power
supply/cable/misc. But drives are replaced by the shipment every
week.

computer thing in a data center, excluding the air conditioning, which

What about electronics, motherboards, CPUs, memory?

Rarely any issues with these unless you are unlucky enough to run into
a bug.

In todays data centers, 100TB systems are common;

I wonder if there are ANY data centers that store 100 TB let alone...

I wonder if you have a clue.

~F

_firstname_@lr_dot_los-gatos_dot_ca.us · #13 March 2nd 06, 12:53 AM posted to comp.arch.storage

In article .com,
wrote:
I already have right at a TB of storage in my home and I do not
consider myself unique.

Oh - you have two disk drives, I see. :-)

That joke was a little glib and cruel; not that many 500GB drives have
shipped in the consumer channel yet. I'm still at 300 some GB in my
server (3 drives), but the disks are not full yet, so I haven't seen a
need to upgrade for a few years. I know several people who have
multi-TB systems at home. The easy way to need and fill that disk
space is to build your own PVR, or to rip all your DVDs onto disk,
which makes it easier for the kids to watch the movies they want to
watch (like Nemo or Toy Story) without risk of the DVDs getting
scratched.

Clearly, the way consumers use disk space at home, and the way
corporations use disk space, are very different. Interestingly,
digital movie production is a large consumer of disk space; supposedly
making a feature film today consumes many PB in temporary space.

I'm not sure how close Google and Yahoo
are at closing in on an EB of storage, but suspect one or both will
reach that level soon.

There are no firm numbers in public about their storage capacites,
those are closely guarded secrets. From usually reliable sources
(lots of people live in the bay area, and people talk), I hear that
Google had at the minimum several times 3PB in the Mountain View data
center alone about 2 years ago; if you include their remote data
centers, they are probably at dozens or hundreds of PB today.

100TB? As mentioned earlier, this is now only two standard 19" racks
of storage. How many corporate data centers have only two racks in
those beautiful computer rooms they built and manage?

There are pictures of large data centers around the web, google for
them. They typically have hundreds of racks. A good fraction of that
is storage. It is not uncommon to see a dozen Sharks, Lightnings or
Symmetrix in one room; with up-to-date models that is a PB right
there. This is not even counting racks and racks of 1U- or 2U-servers
being used as storage devices.

The largest single file system I know of (and I probably missed a few)
is over 2PB (single file system means that you can mount it a single
mount point and access it as a single name space with a single data
space). Google for "ASCI Purple". Quite a few other customers have
storage plants that size, just not in a single file system.

In article , Faeandar
wrote:

On Tue, 28 Feb 2006 00:14:40 -0500, flux wrote:
In todays data centers, 100TB systems are common;
I wonder if there are ANY data centers that store 100 TB let alone...

I wonder if you have a clue.

Possibly he doesn't. Which is OK: anyone in the storage industry who
claims that 100TB systems don't exist will be irrelevant in a short
period. Or possibly he is a troll with a clue. Either way is fine
with me.

To be honest, I've not built a 100TB myself yet. Somewhere on the
public web is a picture of a 30TB system I built 2.5 years ago, with
another guy and me standing proudly in front of it; took 4 racks back
then (using SCSI disks). But then, I don't work with real customers
in real data centers.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy

flux · #14 March 2nd 06, 06:31 AM posted to comp.arch.storage

In article ,
Faeandar wrote:

Since disks are so unreliable (they are typically the least reliable

Oh really?

Really. Rarely do I need to have a mb or power supply swapped. Same
goes for cpu although a few bugs have caused more ram swaps than I
would like.

My experience is essentially the opposite.

But disks fail every day. I manage a decent sized NAS environment and
of the 400TB of usable storage I've only once had to have a

Well, 400 TB is an awful lot of storage. I really got to wonder what's on there.

motherboard replaced, twice ram, and the occasional power
supply/cable/misc. But drives are replaced by the shipment every
week.

400 TB at 500 GB drives is 800 drives. So how many motherboards are there?

flux · #15 March 2nd 06, 06:35 AM posted to comp.arch.storage

In article .com,
wrote:

100TB? As mentioned earlier, this is now only two standard 19" racks
of storage. How many corporate data centers have only two racks in
those beautiful computer rooms they built and manage?

How does the availability of that capacity automatically mean it is
present and in use in these data centers? Certainly, a Lamborghini
doesn't take up a lot of space, but that does that mean it is common?

[email protected] · #16 March 2nd 06, 07:10 AM posted to comp.arch.storage

flux wrote:
My experience is essentially the opposite.

But disks fail every day. I manage a decent sized NAS environment and
of the 400TB of usable storage I've only once had to have a

Well, 400 TB is an awful lot of storage. I really got to wonder what's on there.

motherboard replaced, twice ram, and the occasional power
supply/cable/misc. But drives are replaced by the shipment every
week.

400 TB at 500 GB drives is 800 drives. So how many motherboards are there?

He's almost certainly not using all 500GB drives. Assuming it's 150GB
drives, you'd expect a failed drive every 2.5 weeks based on the
(optimistic) published MTBFs (typically 1.2M hours for high end SCSI/FC
drives, divided by 2700 drives). If a chunk of his array is
performance critical, he may well be using 36 or 72GB drives in that
portion. While you start with the 2.5 weeks per "real" failure,
remember that all high end arrays do a considerable amount of
monitoring and tend to call for drive replacements when correctable
error counts start increasing (and whatever other events they're
monitoring). Hopefully *before* they actually fail. The typical
process is that the drive that's acting up is migrated to the hot
spare, the questionable drive is remarked as the hot spare, and its
replacement is scheduled. The high end arrays will all phone home to
let the support folks know to make sure a new drive gets shipped.

Construction of the big arrays varies considerably, but you typically
have 7-15 drives plugging into a single backplane. The backplane isn't
usually too smart, but does have the power management and isolation
circuitry needed to isolate and hot swap the drives, plus various
indicators and whatnot (usually a few LEDs for each drive, sometimes
powered locks for each drive).

Those backplanes are typically plugged into controller boards, which
contain the actual smarts of the array. Controllers in big arrays
typically handle 4-16 backplanes each. Then you have some
interconnnect, I/O cards for the host interface, and often a higher
level of management hardware. Again, actual implementations are all
over the place.

Paul Rubin · #17 March 2nd 06, 11:41 AM posted to comp.arch.storage

HVB writes:
When you ring a call centre and they tell you that your call "may be
monitored", what they really mean is "your call *is* being recorded".
This data is kept for a long time, if not forever.

One manufacturing client of mine creates huge amounts of video data.
For quality control purposes they use video to check their production
runs. They keep this for a long time, in case they need to check for
faults. An outsider to the business would probably never consider
storing data like this. They used to use video tape, but that has
it's own problems and for them the advantages of online storage
outweighed the costs.

Again, those are just two examples.

Applications like that probably tend to mostly use the most recent
data. Is it really worth keeping so much older, rarely used data
spinning all the time, instead of having some big tape robots (or even
cabinets full of tape cartridges) like we used to see before disks got
so cheap?

Paul Rubin · #18 March 2nd 06, 02:19 PM posted to comp.arch.storage

HVB writes:
They keep the data on ATA drives, so they get relatively low cost
storage and practically instant visual access to any manufacturing run.

Do they keep those hundreds (thousands?) of ATA drives spinning all
the time in case of some rare access to any particular one, or do they
have some way of powering them up only when needed? The dozen or so
seconds of latency from that would probably be tolerable.

Torbjorn Lindgren · #19 March 2nd 06, 03:43 PM posted to comp.arch.storage

Paul Rubin wrote:
HVB writes:
They keep the data on ATA drives, so they get relatively low cost
storage and practically instant visual access to any manufacturing run.

Do they keep those hundreds (thousands?) of ATA drives spinning all
the time in case of some rare access to any particular one, or do they
have some way of powering them up only when needed? The dozen or so
seconds of latency from that would probably be tolerable.

Spinning down the disks would presumably have both benefits and
drawbacks (spin-up can cause failures...). But there are ready-made
products that seems to be designed for scenarios like this and will
automatically manage the disks.

I see that the densest storage box I know of (Nexsan ATABeast/
SATABeast, 42 disks in 4U!) is now available in a SATA version, which
seems to have added something they call AutoMAID(TM) (Massive Array of
Idle Disks) which seems tailored for this (no mention of this in the
older ATABeast, wonder if they have or potentially could add this on
it via new firmware)

They list 210 TB in a standard rack (40U, leaving 2U for two FC
switches), but that's the raw capacity before removing RAID overhead
or hot-spares (and using 500GB disks). Say 150 TB usable perhaps
(somewhere in the 100-170TB range depending on RAID array size and hot
spares).

So if 100+ TB on multiple (FC/SAN) volumes is OK it can actually be
done in less than a rack (29-40U depending on degree of redundancy
required).

_firstname_@lr_dot_los-gatos_dot_ca.us · #20 March 2nd 06, 09:58 PM posted to comp.arch.storage

In article ,
Paul Rubin wrote:
HVB writes:
Do they keep those hundreds (thousands?) of ATA drives spinning all
the time in case of some rare access to any particular one, or do they
have some way of powering them up only when needed? The dozen or so
seconds of latency from that would probably be tolerable.

Nearly all disk drives have been kept on and spinning.

Traditionally, there has been a lot of scepticism towards spinning
disks down, as it is not clear that they will ever spin back up. The
old "sticktion" problems come to mind. There are also questions about
what happens to spindle lubricants if the spindle isn't rotating for
long periods.

In spite of these questions, systems are now being built in which the
bulk of all (SATA) disks are kept spun down; this technology has even
acquired a new acronym, namely MAID (and I don't remember what it
stands for exactly). Please google for Copan systems.

To my knowledge (which is guaranteed to be only partial), the Copan
system has the highest density of storage in TB/sqft of floor space,
or TB/cuft of data center colume, or TB/kW of power used, or some
metric like that (I don't remember the details). Probably Copan's
website will have such information. I would bet that a Copan system
is several hundred disks in a rack. There may be other vendors that
are providing disk systems that spin down or have similar densities.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Seagate Barracuda 160 GB IDE becomes corrupted. RMA?	Dan_Musicant	Storage (alternative)	79	February 28th 06 08:23 AM
Be a Smart Worker - Projects Available - Data Entry	Data Network Forum	Storage & Hardrives	0	November 13th 04 06:31 AM
Modem connection speed	Neil Barnwell	General	58	July 14th 04 07:18 PM
Network File Server	Bob	Storage (alternative)	37	May 4th 04 09:07 PM
help with motherboard choice	S.Boardman	Overclocking AMD Processors	30	October 20th 03 10:23 PM