how do you guys do archives and keep track of it?

#1 October 7th 03, 12:17 AM

Storage Gurus,
We have been using Netbackup for years, and have used it for archives.
But our end users are not comfortable with this solution. Saving the bplist
output is not an elegant solution for them either.
They loose track of where their data was when it was archived and so are
unable to give us in IT the details of from where it needs to be recovered.
How do most of you guys do the archives? Does your solution have some kind
of searchable file/database for the enduser to go look at ?
Any pointers on products or home grown scripts, freeware utils....
Thanks in Advance
G

#2 October 7th 03, 12:57 AM

In article m,
"Sto Rage©" writes:
| Storage Gurus,
| We have been using Netbackup for years, and have used it for archives.
| But our end users are not comfortable with this solution. Saving the bplist
| output is not an elegant solution for them either.
| They loose track of where their data was when it was archived and so are
| unable to give us in IT the details of from where it needs to be recovered.
| How do most of you guys do the archives? Does your solution have some kind
| of searchable file/database for the enduser to go look at ?
| Any pointers on products or home grown scripts, freeware utils....
| Thanks in Advance
| G

Consider Storage Migrator for the archives. You create a filesystem
to receive the archival data and then Migrator moves the data blocks to
tape leaving the directory structure on disk for future browsing.

#3 October 7th 03, 02:51 AM

Interesting, but it's overkill, and doesn't quite meet the requirements.
First, HSM is not an archive, in that it does not create "point in time"
copies of data. You can change data on the storage migrator volume in a
manner that makes the original data irretrievable. Furthermore, HSM is more
appropriate for data that will be referenced more often than it sounds like
this will. The storage migrator solution is heavy in cost, learning curve,
and management.

If you determine HSM is the way to go, I recommend re-posting with details
on your environment, including platform info, read-intensity, etc.

Legato includes archiving functionality, but I'm sure you don't want to move
just to get this. It's relatively easy just to do your bplist and load it
into a database (SQL Server, PostgreSQL, Oracle, whatever). From there you
can write some queries and a simple web front-end for it where a user can
search for a filename and a report returns with the versions of the file
that are available.

Consider Storage Migrator for the archives. You create a filesystem
to receive the archival data and then Migrator moves the data blocks to
tape leaving the directory structure on disk for future browsing.

#4 October 7th 03, 05:51 PM

In article A%ogb.58632$a16.35114@lakeread01,
"Paul" pgaljan_delete_between the writes:
| Interesting, but it's overkill, and doesn't quite meet the requirements.
| First, HSM is not an archive, in that it does not create "point in time"
| copies of data. You can change data on the storage migrator volume in a
| manner that makes the original data irretrievable. Furthermore, HSM is more
| appropriate for data that will be referenced more often than it sounds like
| this will. The storage migrator solution is heavy in cost, learning curve,
| and management.
|
| If you determine HSM is the way to go, I recommend re-posting with details
| on your environment, including platform info, read-intensity, etc.
|
| Legato includes archiving functionality, but I'm sure you don't want to move
| just to get this. It's relatively easy just to do your bplist and load it
| into a database (SQL Server, PostgreSQL, Oracle, whatever). From there you
| can write some queries and a simple web front-end for it where a user can
| search for a filename and a report returns with the versions of the file
| that are available.

A point in time copy of data is a backup; a self-consistent collection
of related data transferred somewhere for safekeeping is an archive. So
you can use an HSM as an archive if you control access to it throughout
the lifecycle of the data. Reference frequency relates more to the media
type used than the software.

It's true you can do archiving with Legato or Tivoli, which keeps the
data after the original has been deleted. This is fine but you don't
get good future browsability unless you do as suggested, bplist-Oracle,
etc. HSM solves that problem by keeping the directory structure on disk.

#5 October 7th 03, 06:44 PM

I agree with your assessment of HSM, and in a sense with your definition of
backup. But a lot of folks want to archive data outside of their backup
scheme so they can go back at some point in the future to retrieve data that
might have been deleted. But they don't want these backups to be indexed,
since the catalog would grow beyond manageability. I guess the original
poster should clarify what the goals are with this archiving. If his or
her goal is to reclaim disk space from data that is rarely accessed, then
HSM is the way to go. I caution, however, that HSM is not something to be
undertaken lightly.

For elegance and maintainability, I usually recommend that HSM be managed
outside of the backup scheme, since they are different functions, and things
like backup vaulting can interfere and complicate matters.

A point in time copy of data is a backup; a self-consistent collection
of related data transferred somewhere for safekeeping is an archive. So
you can use an HSM as an archive if you control access to it throughout
the lifecycle of the data. Reference frequency relates more to the media
type used than the software.

It's true you can do archiving with Legato or Tivoli, which keeps the
data after the original has been deleted. This is fine but you don't
get good future browsability unless you do as suggested, bplist-Oracle,
etc. HSM solves that problem by keeping the directory structure on disk.

#6 October 7th 03, 07:56 PM

In article bZCgb.60471$a16.14960@lakeread01,
"Paul" pgaljan_delete_between the writes:
| I agree with your assessment of HSM, and in a sense with your definition of
| backup. But a lot of folks want to archive data outside of their backup
| scheme so they can go back at some point in the future to retrieve data that
| might have been deleted. But they don't want these backups to be indexed,
| since the catalog would grow beyond manageability. I guess the original
| poster should clarify what the goals are with this archiving. If his or
| her goal is to reclaim disk space from data that is rarely accessed, then
| HSM is the way to go. I caution, however, that HSM is not something to be
| undertaken lightly.
|
| For elegance and maintainability, I usually recommend that HSM be managed
| outside of the backup scheme, since they are different functions, and things
| like backup vaulting can interfere and complicate matters.
|
|
| A point in time copy of data is a backup; a self-consistent collection
| of related data transferred somewhere for safekeeping is an archive. So
| you can use an HSM as an archive if you control access to it throughout
| the lifecycle of the data. Reference frequency relates more to the media
| type used than the software.
|
| It's true you can do archiving with Legato or Tivoli, which keeps the
| data after the original has been deleted. This is fine but you don't
| get good future browsability unless you do as suggested, bplist-Oracle,
| etc. HSM solves that problem by keeping the directory structure on disk.

I agree. HSM is more space management than archive management, and works
best if the all the data has similar access and retention requirements,
and if inflow rates are regular and controllable. NOT for a general
purpose fileserver.

There is always the Records & Information Management (RIM) products but
that's a whole 'nother paradigm:

http://www.arma.org/buyers_guide/index.cfm

#7 October 8th 03, 12:37 AM

rant
Oh! Please don't tell me to go "back" to HSM

We have been using Veritas HSM on Unix for about 3 years now. We have close
to 10 TB of data on tapes. That's another nightmare that I can talk about
for days. The problem with that system is that its good for sending data off
to tapes, but fails miserably when files are to be retrieved. Especially if
the retrieves are large. We have been increasing the size of the primary
volumes over the years and it always seems to be filled up, never enough
capacity available for recalls causing a "recall storm".
We originally thought it was due to the fact that we have these volumes
shared out via Samba that causes some of these "recall-storms" we got their
Windows version. Boy, the two products are so totally different and the
Windows version sucks big time. Guess what, when we pulled out some tapes
from the library, because it became full, VSM lost track of the tapes! We
had to use Backup Exec to read those tapes back in, NBU 4.5 couldn't read
them!
Veritas Tech support said the solution to our problem is to upgrade to a
bigger library so that we don't have to pull out tapes. Can you believe
that?
/rant

"Keith Michaels" wrote in message
...
In article m,
"Sto Rage©" writes:
| Storage Gurus,
| We have been using Netbackup for years, and have used it for
archives.
| But our end users are not comfortable with this solution. Saving the
bplist
| output is not an elegant solution for them either.
| They loose track of where their data was when it was archived and so
are
| unable to give us in IT the details of from where it needs to be
recovered.
| How do most of you guys do the archives? Does your solution have some
kind
| of searchable file/database for the enduser to go look at ?
| Any pointers on products or home grown scripts, freeware utils....
| Thanks in Advance
| G

Consider Storage Migrator for the archives. You create a filesystem
to receive the archival data and then Migrator moves the data blocks to
tape leaving the directory structure on disk for future browsing.

#8 October 8th 03, 12:52 AM

Thanks Paul,
You nailed the problem right there. We want to keep the archives out of
the backup catalogs.
And yes the goal is to reclaim disk space, but as I said in my other post,
if we were to use HSM, we would
invariably have to make more and more room for the retrievals on the primary
storage.
We don't want the data to come back to the expensive primary storage, its
OK for us to restore it back on a different location as long as users have
the ability to search for the data (based on filename/folder name or some
other metadata) and put in a helpdesk request for retrieving it.
We were thinking on the similar lines as you mentioned using some kind of
database. But I thought there are some other solutions already available
that works.
thanks
-G

"Paul" pgaljan_delete_between the wrote
in message news:bZCgb.60471$a16.14960@lakeread01...
I agree with your assessment of HSM, and in a sense with your definition
of
backup. But a lot of folks want to archive data outside of their backup
scheme so they can go back at some point in the future to retrieve data
that
might have been deleted. But they don't want these backups to be indexed,
since the catalog would grow beyond manageability. I guess the original
poster should clarify what the goals are with this archiving. If his or
her goal is to reclaim disk space from data that is rarely accessed, then
HSM is the way to go. I caution, however, that HSM is not something to be
undertaken lightly.

For elegance and maintainability, I usually recommend that HSM be managed
outside of the backup scheme, since they are different functions, and
things
like backup vaulting can interfere and complicate matters.

A point in time copy of data is a backup; a self-consistent collection
of related data transferred somewhere for safekeeping is an archive. So
you can use an HSM as an archive if you control access to it throughout
the lifecycle of the data. Reference frequency relates more to the
media
type used than the software.

It's true you can do archiving with Legato or Tivoli, which keeps the
data after the original has been deleted. This is fine but you don't
get good future browsability unless you do as suggested, bplist-Oracle,
etc. HSM solves that problem by keeping the directory structure on
disk.

#9 October 8th 03, 01:20 AM

There are only two problems with HSM: the hardware and the software ;-).
Seriously, many people buy HSM because they don't have money to buy disk, so
they end up getting getting cheap tape technology. Then they're surprised
when time to data approaches 5 minutes per file. It took me about 8 months
to get 12TB off of a DLT-based HSM and on to disk, and even then I was
successful with only a little over 99% of the data (admittedly, a lot of
that time was spent trying to get irretrievable files).

To top it all off, you have everyone and their brother modifying their
backup software to act as an HSM, which makes for a kludgy software
implementation. I don't know much about the windows HSM solutions out
there, but I know enough to stay the heck away from ADIC's Data Manager (aka
Data Mangler) on UNIX. The best HSM implementation I've seen is SAM-FS
(bought by Sun and renamed to StorEdge utilization suite).

The next problem is that people don't examine their usage patterns for the
data on the HSM. I had a customer who had an HSM that archived seismic data
on tape as it rolled in. When they wanted to retrieve data over a long
period of time for a particular region, they had to touch almost every tape
in the system, even though they only wanted a fraction of the data. This
made certain types of studies and experiments literally impossible. In
order to fix that problem, they would have had to archive everything twice.

I read somewhere about a year ago that the break even point for HSM and tape
was at 20TB. If your usage patterns dictate that you archive everything
twice, you're looking at 10TB. Since then, all the big players have come
out with IDE backend solutions that probably have tripled that estimate. If
you're looking at HSM of less than 30TB, take a good hard look at going to
disk for it. On the high end, you have EMC's Centera, Netapps R150,
StorageTek's Bladestor, and NExSAN has some offerings, I believe. A ton of
other smaller players are in the market such as Zzyzx. The only downside is
that you have to buy tape to back it up. However, tape is surprisingly
inexpensive nowadays as well, and you can usually put 300G of data on an
LTO-2 tape that costs less than $100.

Oh! Please don't tell me to go "back" to HSM
We have been using Veritas HSM on Unix for about 3 years now. We have
close
to 10 TB of data on tapes. That's another nightmare that I can talk about
for days. The problem with that system is that its good for sending data
off
to tapes, but fails miserably when files are to be retrieved. Especially
if
the retrieves are large. We have been increasing the size of the primary
volumes over the years and it always seems to be filled up, never enough
capacity available for recalls causing a "recall storm".

#10 October 8th 03, 02:41 AM

| We have been using Veritas HSM on Unix for about 3 years now. We have close
| to 10 TB of data on tapes. That's another nightmare that I can talk about
| for days. The problem with that system is that its good for sending data off
| to tapes, but fails miserably when files are to be retrieved. Especially if
| the retrieves are large. We have been increasing the size of the primary
| volumes over the years and it always seems to be filled up, never enough
| capacity available for recalls causing a "recall storm".
| We originally thought it was due to the fact that we have these volumes
| shared out via Samba that causes some of these "recall-storms" we got their
| Windows version. Boy, the two products are so totally different and the
| Windows version sucks big time. Guess what, when we pulled out some tapes
| from the library, because it became full, VSM lost track of the tapes! We
| had to use Backup Exec to read those tapes back in, NBU 4.5 couldn't read
| them!
| Veritas Tech support said the solution to our problem is to upgrade to a
| bigger library so that we don't have to pull out tapes. Can you believe
| that?
| /rant

Thanks for the rant. We use Veritas HSM on Unix and manage 25 TB and
40 million files. Sometimes we see recall storms too but the problem
isn't the product but the way it's used (or perhaps the expectations
of it). If someone expects HSM to be totally transparent and support
all file access patterns then they will be disappointed; HSM assumes
older data has low access patterns and then you can configure enough
tape drives to meet that expected demand. In reality data can sit 5
years untouched, then you're audited or something and need it all
yesterday. Then HSM goes nuts, thashing tape drives, full filesystems,
hours or days to get a single file.

Also the amount of diskspace must be sized for the average demand not
the peak demand, the theoretical peak demand being everyone wants
their files back now so you'd need enough space for everything.
In that scenario there no cost benefit to having HSM in the first
place.

We have issues with VSM 4.1 on Windows too, not supporting a proper
vaulting process or out of library tapes. Doesn't really seem
datacenter ready. I'd love to hear some good experiences deploying
any HSM on Windows 2000.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode