SDLT wear & tear (small files vs. big files)

#11 September 28th 03, 10:30 PM

In article , Peter da Silva ruminated:
In article ,
Eric Lee Green wrote:
As far as staging of entire backups via backup software, if we're
talking about a small office network that may be satisfactory, but I'm
having difficulty conceiving how to handle terabyte-sized backups in a
reasonable manner that way.

Instead of interleaving the streams straight to tape, write them to disk.

That part is easy enough. Tapio didn't care where it was sending its
data. The actual tape writer accepted a stream (interleaved or not)
and stored it to tape, ticking out stream ID, stream block ID, and
tape location info as it did so in order that they could be registered
in the location database so that the data could be easily restored. It
didn't care where the stream was coming from.

When a stream is full, or you have a tapes worth, write to tape.

The problem is knowing when you have a tape's worth, given the uneven
compressibility of data and an unknown compression algorithm on the
part of the tape drive. Either you end up wasting space, or you end up
having to span multiple tapes. Firmware compression algorithms
complicate things greatly. One notion I considered was to simply
disable any firmware compression algorithm, and do a block-by-block
compression at the software level. The problem there is that then we
become compute-bound on the tape server rather than hardware-bound. At
the time, server hardware really wasn't very CPU-heavy and wasn't
capable of handling the load. Bumping the compression out to the
client level was also a possibility. That actually probably would have
worked okay, but at the time (Pentium II 300mhz with 128mb of RAM was
normal back then, a dual PII-450 or Xeon 450 with 512mb of RAM was the
super-deluxe server hardware) client hardware just didn't have much
oomph.

If one disk won't keep the tape happy, slice the stream across multiple
drives, either explicitly or using RAID.

Disk thruput is not a big deal nowdays. Faster computers are making
many things feasible now that back in the day weren't really
credible. For example, something like LUFS (Linux Userland File
System) used as a framework for a time-based snapshotting filesystem
for a storage appliance would have been utterly ludicrous even three
years ago. CPU's were so slow back then that the only way to get
acceptable performance from a filesystem was to run it in kernel-land,
where you had direct access to the unified buffer cache and driver
layer without any kernel/userland transisions. When I benchmarked LUFS
on modern hardware (P4 2.4ghz with 512mb of RAM) back in May, with
some minor optimizations I obtained well over 150mb/sec raw
throughput, which would have been utterly ludicrous with that 300mhz
Pentium II a few years ago.

--
Eric Lee Green
Linux/Unix Software Engineer seeks employment
see http://badtux.org for resume

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----

#12 September 29th 03, 04:39 PM

In article ,
Eric Lee Green wrote:
When a stream is full, or you have a tapes worth, write to tape.

The problem is knowing when you have a tape's worth, given the uneven
compressibility of data and an unknown compression algorithm on the
part of the tape drive.

Before I go into compression, let's clarify that point. When I say
"a tape's worth" here, I mean "enough that it's worthwhile to start
dumping to tape". If it's more than the tape can hold then you handle
the end of tape and leave the rest of the stream on disk until you
switch tapes. Whether you pick up the next tape with the same stream
or not is a policy decision, really.

But...

Don't compress on the tape drive, make sureit's compressed by the time it
spools to disk, and run the drive without compression. Tape compression
gets you 2:1 at most, with the better algorithms you can use on the server
I've got 10:1 for some partitions.

At the very worst, you will never do *worse* than tape compression.

One notion I considered was to simply
disable any firmware compression algorithm, and do a block-by-block
compression at the software level.

That's exactly what I do with Amanda, except I'm using streaming
compression (gzip -9) rather than block compression.

The problem there is that then we
become compute-bound on the tape server rather than hardware-bound.

CPU is even cheaper than disk. We use Alphas so we've been "CPU rich" for
years, but at home I've been using a K6-3/400 for my Amanda server and it's
not breathing hard doing server-side compression for a couple of bitty
boxes. Most of them compress on the client.

But that's an implementation detail... the results are similar.

--
I've seen things you people can't imagine. Chimneysweeps on fire over the roofs
of London. I've watched kite-strings glitter in the sun at Hyde Park Gate. All
these things will be lost in time, like chalk-paintings in the rain. `-_-'
Time for your nap. | Peter da Silva | Har du kramat din varg, idag? 'U`

#13 September 29th 03, 11:07 PM

Thanks to everyone for their input. I have thought about staging the
data to some cheap IDE drives first. I'll play around with it some
more.

-george

"Scott" wrote in message ...
"Peter da Silva" wrote in message
...
I asked a question: can Netbackup use a local disk as cache to buffer tape
writes and prevent shoeshining?

Only as a two-step process.
1) Backup from servers to disk
2) Disk-to-tape copy

So, it works, but does increase the amount of time it takes to complete
backups (though it may decrease the amount of time that the servers being
backed up are busy)

Scott

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Far Cry	Glitch	Ati Videocards	14	June 30th 05 11:47 PM
novice asks - Installing a scanner	John McGaw	General	8	September 20th 04 05:19 PM
novice asks - Installing a scanner	Noozer	Homebuilt PC's	7	September 20th 04 05:19 PM
Microtek scanner made my PC unusable.	Robert Clark	Scanners	5	June 22nd 04 10:46 PM
I need help storing files with Nero	HBYardSale	Cdr	0	June 26th 03 08:51 PM