Managing data sprawl

**[email protected]** · #1 February 6th 07, 10:15 PM posted to comp.arch.storage

Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

**[email protected]** · #2 February 6th 07, 11:13 PM posted to comp.arch.storage

On Feb 6, 2:15 pm, wrote:
Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

Some new companies are looking into these types of technologies... its
very new, but promising... I know we can use something like a StoreWiz
in our data center... The issue I had with them so far was their
scalability. If they performed faster, my boss would probably
consider them I think

Dvy

**[email protected]** · #3 February 7th 07, 12:46 AM posted to comp.arch.storage

On Feb 6, 3:13 pm, wrote:
On Feb 6, 2:15 pm, wrote:

Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

Some new companies are looking into these types of technologies... its
very new, but promising... I know we can use something like a StoreWiz
in our data center... The issue I had with them so far was their
scalability. If they performed faster, my boss would probably
consider them I think

Dvy
Thanks.
What other companies? Any pointers.

Also come to think about, doing this might require a newer type of
filesystem. That means let go our big irons, which may not hapen so
easily. If it is read only it might be better but read/write (I
should say the update) is going to make it harder. Do you see this
sprawl in your datacenter

**[email protected]** · #4 February 7th 07, 12:57 AM posted to comp.arch.storage

On Feb 6, 4:46 pm, wrote:
On Feb 6, 3:13 pm, wrote:

On Feb 6, 2:15 pm, wrote:

Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

Some new companies are looking into these types of technologies... its
very new, but promising... I know we can use something like a StoreWiz
in our data center... The issue I had with them so far was their
scalability. If they performed faster, my boss would probably
consider them I think

Dvy

Thanks.
What other companies? Any pointers.

Also come to think about, doing this might require a newer type of
filesystem. That means let go our big irons, which may not hapen so
easily. If it is read only it might be better but read/write (I
should say the update) is going to make it harder. Do you see this
sprawl in your datacenter

I dont know of the companies by name. I do see this sprawl in my
DC... I suspect everyone has similar problems...

**[email protected]** · #5 February 7th 07, 05:43 PM posted to comp.arch.storage

Where do you store these not so often used files (but often
enuf).....netapp R300 types?

On Feb 6, 4:57 pm, wrote:
On Feb 6, 4:46 m, wrote:

On Feb 6, 3:13 pm, wrote:

On Feb 6, 2:15 pm, wrote:

Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

Some new companies are looking into these types of technologies... its
very new, but promising... I know we can use something like a StoreWiz
in our data center... The issue I had with them so far was their
scalability. If they performed faster, my boss would probably
consider them I think

Dvy

Thanks.
What other companies? Any pointers.

Also come to think about, doing this might require a newer type of
filesystem. That means let go our big irons, which may not hapen so
easily. If it is read only it might be better but read/write (I
should say the update) is going to make it harder. Do you see this
sprawl in your datacenter

I dont know of the companies by name. I do see this sprawl in my
DC... I suspect everyone has similar problems...

**[email protected]** · #6 February 7th 07, 06:02 PM posted to comp.arch.storage

On Feb 7, 9:43 am, wrote:
Where do you store these not so often used files (but often
enuf).....netapp R300 types?

On Feb 6, 4:57 pm, wrote:

On Feb 6, 4:46 m, wrote:

On Feb 6, 3:13 pm, wrote:

On Feb 6, 2:15 pm, wrote:

Over last couple of years the certain types of data is exploding in
our enterprise (probably true for others too). These include source
code, binary images (and many many variations of those), all kinds of
documents (word, excel), wikis etc. These are kept like regular files
(available via NAS) because they must be available on demand. These
are not being used heavily but it is not something that can be backed
up and retrieved. In a sense they are like reference data but some
modifications do happen from time to time.

This kind of data is growing like mushroom, almost increasing by
20-40TB+ per year and increasing. Is there a way to reduce the amount
of these data by doing,
- inline compression. I saw some discussion about storewiz, but
nobody seems to have used that. Wonder why?
- Can it be moved to some kind of CAS box transparently and then
pulled in when there is an access.

Is this type of growth is happening in other enterprises, businesses?
I would suspect it is.
How do you manage this?

Sorry if I am repeating some earlier discussion.

Thanks

Sam

Some new companies are looking into these types of technologies... its
very new, but promising... I know we can use something like a StoreWiz
in our data center... The issue I had with them so far was their
scalability. If they performed faster, my boss would probably
consider them I think

Dvy

Thanks.
What other companies? Any pointers.

Also come to think about, doing this might require a newer type of
filesystem. That means let go our big irons, which may not hapen so
easily. If it is read only it might be better but read/write (I
should say the update) is going to make it harder. Do you see this
sprawl in your datacenter

I dont know of the companies by name. I do see this sprawl in my
DC... I suspect everyone has similar problems...

We leave them on our 960. We dont know when a file will be accessed,
so we leave them there.

**Faeandar** · #7 February 7th 07, 06:07 PM posted to comp.arch.storage

On 7 Feb 2007 10:02:25 -0800, wrote:

On Feb 7, 9:43 am, wrote:
Where do you store these not so often used files (but often
enuf).....netapp R300 types?

Thanks.
What other companies? Any pointers.

Also come to think about, doing this might require a newer type of
filesystem. That means let go our big irons, which may not hapen so
easily. If it is read only it might be better but read/write (I
should say the update) is going to make it harder. Do you see this
sprawl in your datacenter

I dont know of the companies by name. I do see this sprawl in my
DC... I suspect everyone has similar problems...

We leave them on our 960. We dont know when a file will be accessed,
so we leave them there.

Simpley put in a DFS or automount infrastructure (depending on CIFS or
NFS) and move them as you see fit. As long as users are not mounting
the filers directly you can move to R200's, 3020's with SATA, or even
a Sun host with disk hanging off of it.
Anything that's cheaper than a 900 class filer.

~F

Jc · #8 February 8th 07, 05:50 PM posted to comp.arch.storage

You could look at products such as the Centera from EMC, or RISS from
HP. Both are designed to store reference data. They both have good
indexing tools to enable you to find the information once you off line
it!

Jc

**[email protected]** · #9 February 9th 07, 02:23 AM posted to comp.arch.storage

On Feb 8, 9:50 am, "Jc" wrote:
You could look at products such as the Centera from EMC, or RISS from
HP. Both are designed to store reference data. They both have good
indexing tools to enable you to find the information once you off line
it!

Jc

Yes. But that requires you to move data from one type of storage to
another. So the data has to be first classified. I haven't looked at
these products, do they offer a NFS front end? Also modification is
rare but not exactly uncommon. In that case data has to move from RISS
type of platform to another filer and then migrated.

**[email protected]** · #10 February 9th 07, 04:17 AM posted to comp.arch.storage

On Feb 8, 6:23 pm, wrote:
On Feb 8, 9:50 am, "Jc" wrote:

You could look at products such as the Centera from EMC, or RISS from
HP. Both are designed to store reference data. They both have good
indexing tools to enable you to find the information once you off line
it!

Jc

Yes. But that requires you to move data from one type of storage to
another. So the data has to be first classified. I haven't looked at
these products, do they offer a NFS front end? Also modification is
rare but not exactly uncommon. In that case data has to move from RISS
type of platform to another filer and then migrated.

Sumandra, there are many types of platforms that try and move data
inline, like neopath and acopia.. .from my experience with them, they
are like a forklift change of my network and too much hassle... If
what you are experiencing is too much file growth, then do look at
storewiz... it will actually help reduce duplicates and manage chaotic
growth. I'd caution to really test out their performance... unless
they have made too many improvements, you are dead in the water.
Another area where we could use their help is with backup.... I had a
seperate post related to this which Faender eloquently helped me with
as usual, but if I can keep my backups on tier2 and have them
compressed with a device like storewiz, that would truly aid me. I
dont know if they can do that or not. It would involve them speaking
to a backup server I suppose.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
The science of Data Recovery according to "Mr. Stack"	[email protected]	Storage & Hardrives	0	October 18th 05 11:25 PM
Modem connection speed	Neil Barnwell	General	58	July 14th 04 07:18 PM
my new mobo o/c's great	rockerrock	Overclocking AMD Processors	9	June 30th 04 08:17 PM
Sata and Data Corruption	Robert Neville	Storage (alternative)	27	May 8th 04 06:20 PM
help with motherboard choice	S.Boardman	Overclocking AMD Processors	30	October 20th 03 10:23 PM