If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
how do you guys do archives and keep track of it?
Storage Gurus,
We have been using Netbackup for years, and have used it for archives. But our end users are not comfortable with this solution. Saving the bplist output is not an elegant solution for them either. They loose track of where their data was when it was archived and so are unable to give us in IT the details of from where it needs to be recovered. How do most of you guys do the archives? Does your solution have some kind of searchable file/database for the enduser to go look at ? Any pointers on products or home grown scripts, freeware utils.... Thanks in Advance G |
#2
|
|||
|
|||
In article m,
"Sto RageŠ" writes: | Storage Gurus, | We have been using Netbackup for years, and have used it for archives. | But our end users are not comfortable with this solution. Saving the bplist | output is not an elegant solution for them either. | They loose track of where their data was when it was archived and so are | unable to give us in IT the details of from where it needs to be recovered. | How do most of you guys do the archives? Does your solution have some kind | of searchable file/database for the enduser to go look at ? | Any pointers on products or home grown scripts, freeware utils.... | Thanks in Advance | G Consider Storage Migrator for the archives. You create a filesystem to receive the archival data and then Migrator moves the data blocks to tape leaving the directory structure on disk for future browsing. |
#3
|
|||
|
|||
Interesting, but it's overkill, and doesn't quite meet the requirements.
First, HSM is not an archive, in that it does not create "point in time" copies of data. You can change data on the storage migrator volume in a manner that makes the original data irretrievable. Furthermore, HSM is more appropriate for data that will be referenced more often than it sounds like this will. The storage migrator solution is heavy in cost, learning curve, and management. If you determine HSM is the way to go, I recommend re-posting with details on your environment, including platform info, read-intensity, etc. Legato includes archiving functionality, but I'm sure you don't want to move just to get this. It's relatively easy just to do your bplist and load it into a database (SQL Server, PostgreSQL, Oracle, whatever). From there you can write some queries and a simple web front-end for it where a user can search for a filename and a report returns with the versions of the file that are available. Consider Storage Migrator for the archives. You create a filesystem to receive the archival data and then Migrator moves the data blocks to tape leaving the directory structure on disk for future browsing. |
#4
|
|||
|
|||
|
#5
|
|||
|
|||
I agree with your assessment of HSM, and in a sense with your definition of
backup. But a lot of folks want to archive data outside of their backup scheme so they can go back at some point in the future to retrieve data that might have been deleted. But they don't want these backups to be indexed, since the catalog would grow beyond manageability. I guess the original poster should clarify what the goals are with this archiving. If his or her goal is to reclaim disk space from data that is rarely accessed, then HSM is the way to go. I caution, however, that HSM is not something to be undertaken lightly. For elegance and maintainability, I usually recommend that HSM be managed outside of the backup scheme, since they are different functions, and things like backup vaulting can interfere and complicate matters. A point in time copy of data is a backup; a self-consistent collection of related data transferred somewhere for safekeeping is an archive. So you can use an HSM as an archive if you control access to it throughout the lifecycle of the data. Reference frequency relates more to the media type used than the software. It's true you can do archiving with Legato or Tivoli, which keeps the data after the original has been deleted. This is fine but you don't get good future browsability unless you do as suggested, bplist-Oracle, etc. HSM solves that problem by keeping the directory structure on disk. |
#7
|
|||
|
|||
rant
Oh! Please don't tell me to go "back" to HSM We have been using Veritas HSM on Unix for about 3 years now. We have close to 10 TB of data on tapes. That's another nightmare that I can talk about for days. The problem with that system is that its good for sending data off to tapes, but fails miserably when files are to be retrieved. Especially if the retrieves are large. We have been increasing the size of the primary volumes over the years and it always seems to be filled up, never enough capacity available for recalls causing a "recall storm". We originally thought it was due to the fact that we have these volumes shared out via Samba that causes some of these "recall-storms" we got their Windows version. Boy, the two products are so totally different and the Windows version sucks big time. Guess what, when we pulled out some tapes from the library, because it became full, VSM lost track of the tapes! We had to use Backup Exec to read those tapes back in, NBU 4.5 couldn't read them! Veritas Tech support said the solution to our problem is to upgrade to a bigger library so that we don't have to pull out tapes. Can you believe that? /rant "Keith Michaels" wrote in message ... In article m, "Sto RageŠ" writes: | Storage Gurus, | We have been using Netbackup for years, and have used it for archives. | But our end users are not comfortable with this solution. Saving the bplist | output is not an elegant solution for them either. | They loose track of where their data was when it was archived and so are | unable to give us in IT the details of from where it needs to be recovered. | How do most of you guys do the archives? Does your solution have some kind | of searchable file/database for the enduser to go look at ? | Any pointers on products or home grown scripts, freeware utils.... | Thanks in Advance | G Consider Storage Migrator for the archives. You create a filesystem to receive the archival data and then Migrator moves the data blocks to tape leaving the directory structure on disk for future browsing. |
#8
|
|||
|
|||
Thanks Paul,
You nailed the problem right there. We want to keep the archives out of the backup catalogs. And yes the goal is to reclaim disk space, but as I said in my other post, if we were to use HSM, we would invariably have to make more and more room for the retrievals on the primary storage. We don't want the data to come back to the expensive primary storage, its OK for us to restore it back on a different location as long as users have the ability to search for the data (based on filename/folder name or some other metadata) and put in a helpdesk request for retrieving it. We were thinking on the similar lines as you mentioned using some kind of database. But I thought there are some other solutions already available that works. thanks -G "Paul" pgaljan_delete_between the wrote in message news:bZCgb.60471$a16.14960@lakeread01... I agree with your assessment of HSM, and in a sense with your definition of backup. But a lot of folks want to archive data outside of their backup scheme so they can go back at some point in the future to retrieve data that might have been deleted. But they don't want these backups to be indexed, since the catalog would grow beyond manageability. I guess the original poster should clarify what the goals are with this archiving. If his or her goal is to reclaim disk space from data that is rarely accessed, then HSM is the way to go. I caution, however, that HSM is not something to be undertaken lightly. For elegance and maintainability, I usually recommend that HSM be managed outside of the backup scheme, since they are different functions, and things like backup vaulting can interfere and complicate matters. A point in time copy of data is a backup; a self-consistent collection of related data transferred somewhere for safekeeping is an archive. So you can use an HSM as an archive if you control access to it throughout the lifecycle of the data. Reference frequency relates more to the media type used than the software. It's true you can do archiving with Legato or Tivoli, which keeps the data after the original has been deleted. This is fine but you don't get good future browsability unless you do as suggested, bplist-Oracle, etc. HSM solves that problem by keeping the directory structure on disk. |
#9
|
|||
|
|||
There are only two problems with HSM: the hardware and the software ;-). Seriously, many people buy HSM because they don't have money to buy disk, so they end up getting getting cheap tape technology. Then they're surprised when time to data approaches 5 minutes per file. It took me about 8 months to get 12TB off of a DLT-based HSM and on to disk, and even then I was successful with only a little over 99% of the data (admittedly, a lot of that time was spent trying to get irretrievable files). To top it all off, you have everyone and their brother modifying their backup software to act as an HSM, which makes for a kludgy software implementation. I don't know much about the windows HSM solutions out there, but I know enough to stay the heck away from ADIC's Data Manager (aka Data Mangler) on UNIX. The best HSM implementation I've seen is SAM-FS (bought by Sun and renamed to StorEdge utilization suite). The next problem is that people don't examine their usage patterns for the data on the HSM. I had a customer who had an HSM that archived seismic data on tape as it rolled in. When they wanted to retrieve data over a long period of time for a particular region, they had to touch almost every tape in the system, even though they only wanted a fraction of the data. This made certain types of studies and experiments literally impossible. In order to fix that problem, they would have had to archive everything twice. I read somewhere about a year ago that the break even point for HSM and tape was at 20TB. If your usage patterns dictate that you archive everything twice, you're looking at 10TB. Since then, all the big players have come out with IDE backend solutions that probably have tripled that estimate. If you're looking at HSM of less than 30TB, take a good hard look at going to disk for it. On the high end, you have EMC's Centera, Netapps R150, StorageTek's Bladestor, and NExSAN has some offerings, I believe. A ton of other smaller players are in the market such as Zzyzx. The only downside is that you have to buy tape to back it up. However, tape is surprisingly inexpensive nowadays as well, and you can usually put 300G of data on an LTO-2 tape that costs less than $100. Oh! Please don't tell me to go "back" to HSM We have been using Veritas HSM on Unix for about 3 years now. We have close to 10 TB of data on tapes. That's another nightmare that I can talk about for days. The problem with that system is that its good for sending data off to tapes, but fails miserably when files are to be retrieved. Especially if the retrieves are large. We have been increasing the size of the primary volumes over the years and it always seems to be filled up, never enough capacity available for recalls causing a "recall storm". |
#10
|
|||
|
|||
In article m,
"Sto RageŠ" writes: | rant | Oh! Please don't tell me to go "back" to HSM | We have been using Veritas HSM on Unix for about 3 years now. We have close | to 10 TB of data on tapes. That's another nightmare that I can talk about | for days. The problem with that system is that its good for sending data off | to tapes, but fails miserably when files are to be retrieved. Especially if | the retrieves are large. We have been increasing the size of the primary | volumes over the years and it always seems to be filled up, never enough | capacity available for recalls causing a "recall storm". | We originally thought it was due to the fact that we have these volumes | shared out via Samba that causes some of these "recall-storms" we got their | Windows version. Boy, the two products are so totally different and the | Windows version sucks big time. Guess what, when we pulled out some tapes | from the library, because it became full, VSM lost track of the tapes! We | had to use Backup Exec to read those tapes back in, NBU 4.5 couldn't read | them! | Veritas Tech support said the solution to our problem is to upgrade to a | bigger library so that we don't have to pull out tapes. Can you believe | that? | /rant Thanks for the rant. We use Veritas HSM on Unix and manage 25 TB and 40 million files. Sometimes we see recall storms too but the problem isn't the product but the way it's used (or perhaps the expectations of it). If someone expects HSM to be totally transparent and support all file access patterns then they will be disappointed; HSM assumes older data has low access patterns and then you can configure enough tape drives to meet that expected demand. In reality data can sit 5 years untouched, then you're audited or something and need it all yesterday. Then HSM goes nuts, thashing tape drives, full filesystems, hours or days to get a single file. Also the amount of diskspace must be sized for the average demand not the peak demand, the theoretical peak demand being everyone wants their files back now so you'd need enough space for everything. In that scenario there no cost benefit to having HSM in the first place. We have issues with VSM 4.1 on Windows too, not supporting a proper vaulting process or out of library tapes. Doesn't really seem datacenter ready. I'd love to hear some good experiences deploying any HSM on Windows 2000. |
Thread Tools | |
Display Modes | |
|
|