Open source storage

**S[_4_]** · #1 February 4th 08, 10:51 PM posted to comp.arch.storage

So in the past few months there have been some interesting moves
towards Open Source Storage: ZFS on Solaris, and Nexenta's software
appliance.

Has anyone out there deployed it to where it actually does anything
useful? The cost savings are phenomenal, but nothing is truly free,
you pay for it one way or another. On the flip side, its the LAST part
of the stack which is still proprietary, and a part of me thinks its
inevitable.

SC

**Anton Rang** · #2 February 14th 08, 04:04 AM posted to comp.arch.storage

S writes:

So in the past few months there have been some interesting moves
towards Open Source Storage: ZFS on Solaris, and Nexenta's software
appliance.

Of course, Linux has had more sophisticated file systems, including
several clustered file systems, available as open source for some
time....

Has anyone out there deployed it to where it actually does anything
useful? The cost savings are phenomenal, but nothing is truly free,
you pay for it one way or another.

Fundamentally, you pay by doing the support and maintenance yourself,
and by not having as much focused tuning expertise, formal testing,
and relationships with database, operating system, backup vendors.

Of course, you also take on the responsibility of making sure that
whatever disks/tapes you buy work reliably with the controllers,
motherboard, and operating system. (Does that SYNCHRONIZE CACHE
command realy work?)

If you're saving very much, you've probably also lost the hardware
redundancy that's built into a hardware RAID system -- dual-ported
access to disks, independent buses (not sharing a controller chip),
etc.

On the flip side, it's the LAST part of the stack which is still
proprietary, and a part of me thinks its inevitable.

Actually it's not; the firmware on the controllers is proprietary
in nearly all cases, and the firmware in the drives is as well.

-- Anton

**the wharf rat** · #3 February 18th 08, 12:16 AM posted to comp.arch.storage

In article , Anton Rang wrote:

Of course, Linux has had more sophisticated file systems, including
several clustered file systems, available as open source for some
time....

More sophisticated than ZFS?

**Bill Todd** · #4 February 18th 08, 12:59 AM posted to comp.arch.storage

the wharf rat wrote:
In article , Anton Rang wrote:
Of course, Linux has had more sophisticated file systems, including
several clustered file systems, available as open source for some
time....

More sophisticated than ZFS?

ReiserFS (especially Reiser4) is beyond question more sophisticated than
ZFS - not only in concept (generic data-clustering ability, for example)
but in execution (e.g., it incorporates batch-update mechanisms somewhat
similar to ZFS's without losing sight of the importance of on-disk file
contiguity).

Extent-based XFS also does a significantly better job of promoting
on-disk contiguity than ZFS does (even leaving aside the additional
depredations caused by ZFS's brain-damaged 'RAID-Z' design) - and
contributed the concept of allocate-on-write to ZFS (and Reiser) IIRC.

GFS (and perhaps GPFS) support concurrent device sharing among the
clustered systems that Anton mentioned (last I knew ZFS had no similar
capability).

ZFS is something of a one-trick pony. Its small-write performance is
very good (at least when RAID-Z is not involved), but with access
patterns that create fragmented files its medium-to-large read
performance is just not competitive - and last I knew it didn't even
have a defragmenter to alleviate that situation (defragmenting becomes
awkward when you perform snapshots at the block level).

And despite its hype about eliminating the LVM layer as soon as you need
to incorporate redundancy in your storage up it pops again in the form
of device groups - so there's relatively little net gain in that respect
over a well-designed LVM interface (not that a ZFS-like approach
*couldn't* have done a better job of eliminating LVM-level management,
mind you).

I wouldn't be so critical of ZFS if its marketeers and accompanying
zealots hadn't hyped it to the moon and back: it's a refreshing change
from the apparent complete lack of corporate interest in file-system
development over the last decade or so, even if its design leaves a bit
to be desired and its implementation is less than full-fledged - and it
should be very satisfactory for environments that don't expose its
weaknesses.

(And yes, I do like its integral integrity checksums, but their
importance has been over-hyped as well - given the number of
significantly higher-probability hazards that data is subject to.)

- bill

**S[_4_]** · #5 February 18th 08, 10:42 PM posted to comp.arch.storage

One might argue that Reiser has done a pretty slick job of marketing
his FS as well. I have heard that he hasn't really run his FS on any
enterprise class storage. Understandable considering he's a small
shop. Maybe this has changed.

XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.

I don't have any direct experience with ZFS...I'm trying to talk one
of my buddies into letting me play with it on a system he has on his
site though.

So I really think the issues stopping people from deploying open
source storage a
1. Lack of snapshots, which may not be an issue if ZFS gains traction.
2. No coherent DR strategy. I don't consider rsync a mirroring
solution if it needs to walk the tree each time.
3. It seems like storage admins still need to have that support
hotline printed out and pinned next to their workstation :-)

Anyone think any different?

On Feb 17, 4:59 pm, Bill Todd wrote:
the wharf rat wrote:
In article , Anton Rang wrote:
Of course, Linux has had more sophisticated file systems, including
several clustered file systems, available as open source for some
time....

More sophisticated than ZFS?

ReiserFS (especially Reiser4) is beyond question more sophisticated than
ZFS - not only in concept (generic data-clustering ability, for example)
but in execution (e.g., it incorporates batch-update mechanisms somewhat
similar to ZFS's without losing sight of the importance of on-disk file
contiguity).

Extent-based XFS also does a significantly better job of promoting
on-disk contiguity than ZFS does (even leaving aside the additional
depredations caused by ZFS's brain-damaged 'RAID-Z' design) - and
contributed the concept of allocate-on-write to ZFS (and Reiser) IIRC.

GFS (and perhaps GPFS) support concurrent device sharing among the
clustered systems that Anton mentioned (last I knew ZFS had no similar
capability).

ZFS is something of a one-trick pony. Its small-write performance is
very good (at least when RAID-Z is not involved), but with access
patterns that create fragmented files its medium-to-large read
performance is just not competitive - and last I knew it didn't even
have a defragmenter to alleviate that situation (defragmenting becomes
awkward when you perform snapshots at the block level).

And despite its hype about eliminating the LVM layer as soon as you need
to incorporate redundancy in your storage up it pops again in the form
of device groups - so there's relatively little net gain in that respect
over a well-designed LVM interface (not that a ZFS-like approach
*couldn't* have done a better job of eliminating LVM-level management,
mind you).

I wouldn't be so critical of ZFS if its marketeers and accompanying
zealots hadn't hyped it to the moon and back: it's a refreshing change
from the apparent complete lack of corporate interest in file-system
development over the last decade or so, even if its design leaves a bit
to be desired and its implementation is less than full-fledged - and it
should be very satisfactory for environments that don't expose its
weaknesses.

(And yes, I do like its integral integrity checksums, but their
importance has been over-hyped as well - given the number of
significantly higher-probability hazards that data is subject to.)

- bill

**Bill Todd** · #6 February 19th 08, 12:52 AM posted to comp.arch.storage

S wrote:
One might argue that Reiser has done a pretty slick job of marketing
his FS as well.

Yes, he has - but there's more relative substance behind that marketing
than there is behind ZFS's (after all, when you promote yourself as "The
Last Word In File Systems" it's easy to fall quite embarrassingly short).

I have heard that he hasn't really run his FS on any
enterprise class storage.

The subject was not breadth of existing deployment but sophistication.

....

XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.

I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).

....

So I really think the issues stopping people from deploying open
source storage a
1. Lack of snapshots, which may not be an issue if ZFS gains traction.

My impression is that snapshots have been available in Linux, BSD, and
for that matter Solaris itself for many years in various forms
associated with LVMs and/or file systems.

2. No coherent DR strategy. I don't consider rsync a mirroring
solution if it needs to walk the tree each time.

Synchronous mirroring at the driver level has been available for ages,
and is entirely feasible across distances of at least 100 miles - enough
to survive any disaster which your business is likely to survive as long
as your remote site is reasonably robust. If write performance
requirements can be relaxed a bit distances can be significantly
greater. I haven't looked recently, so I don't know how well those
facilities deal with temporary link interruptions and subsequent
catch-up (if you've got dedicated fiber to a robust back-up site that
may not be too likely to occur, but in other circumstances it would be
very desirable).

- bill

**S[_4_]** · #7 February 19th 08, 08:01 AM posted to comp.arch.storage

On Feb 18, 4:52 pm, Bill Todd wrote:
S wrote:
One might argue that Reiser has done a pretty slick job of marketing
his FS as well.

Yes, he has - but there's more relative substance behind that marketing
than there is behind ZFS's (after all, when you promote yourself as "The
Last Word In File Systems" it's easy to fall quite embarrassingly short).

Thats pretty funny and I would have to agree :-)

I have heard that he hasn't really run his FS on any

enterprise class storage.

The subject was not breadth of existing deployment but sophistication.

Right, but if Reiser hasn't run his FS on any enterprise-class storage
how can we assume its ready for prime-time, enterprise-class
deployment?

XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.

I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).

See section 6.1: Delaying allocation

http://oss.sgi.com/projects/xfs/pape...nix/index.html

I remember reading another paper with detailed descriptions of causing
data corruption on XFS through power manipulation but of course I
can't find it anymore.

So I really think the issues stopping people from deploying open
source storage a
1. Lack of snapshots, which may not be an issue if ZFS gains traction.

My impression is that snapshots have been available in Linux, BSD, and
for that matter Solaris itself for many years in various forms
associated with LVMs and/or file systems.

I believe you can only have 1 snapshot at a time in LVM. Nowhere near
the sophistication of WAFL snapshots.

2. No coherent DR strategy. I don't consider rsync a mirroring
solution if it needs to walk the tree each time.

Synchronous mirroring at the driver level has been available for ages,
and is entirely feasible across distances of at least 100 miles - enough
to survive any disaster which your business is likely to survive as long
as your remote site is reasonably robust. If write performance
requirements can be relaxed a bit distances can be significantly
greater. I haven't looked recently, so I don't know how well those
facilities deal with temporary link interruptions and subsequent
catch-up (if you've got dedicated fiber to a robust back-up site that
may not be too likely to occur, but in other circumstances it would be
very desirable)

Can you name some examples of synchronous mirroring at the driver
level? Is it open source? Easy to deploy?

Bottom line: I'd like to see people deploy Open Source Storage in
their data centers. I'm just wondering why it hasn't happened yet and
offering possible reasons.

S

- bill

**Bill Todd** · #8 February 20th 08, 05:50 AM posted to comp.arch.storage

S wrote:

....

if Reiser hasn't run his FS on any enterprise-class storage
how can we assume its ready for prime-time, enterprise-class
deployment?

Because any failure of enterprise-class storage to faithfully mimic
(e.g.) SCSI behavior should be considered to be an enterprise-storage
bug rather than any problem with the file system?

XFS has had its own issues. Yes you have on-disk continuity, but if
you lose power while XFS is building its extant, you've got data
corruption.
I'd like to see a credible reference for that allegation (unless you're
simply referring to the potential inconsistency that virtually all
update-in-place file systems have when *updating* - rather than writing
for the first time - multiple sectors at once).

See section 6.1: Delaying allocation

http://oss.sgi.com/projects/xfs/pape...nix/index.html

There's nothing there that even remotely hints at data corruption on
power loss: the defined semantics of any normal Unix-style file system
(including ZFS) specifies that any user data that hasn't been explicitly
flushed to disk may or may not be on the disk, in whole or in part,
should power fail (that's what write-back caching is all about: if you
want atomic on-disk persistence, you use fsync or per-request
write-through - though even those won't necessarily guarantee
full-request, let alone multi-request, atomicity beyond the individual
file block level should power fail before the request completes, even on
ZFS; about the only difference with ZFS is that individual file block
disk writes are guaranteed to be atomic rather than just the
near-guarantee that disks provide that individual sector writes will be
atomic).

It's been many years since I read that paper, though, and it provided a
pleasant trip down memory lane. XFS did a lot of interesting things for
the early '90s, even if not all of them were necessarily optimal.

....

So I really think the issues stopping people from deploying open
source storage a
1. Lack of snapshots, which may not be an issue if ZFS gains traction.
My impression is that snapshots have been available in Linux, BSD, and
for that matter Solaris itself for many years in various forms
associated with LVMs and/or file systems.

I believe you can only have 1 snapshot at a time in LVM. Nowhere near
the sophistication of WAFL snapshots.

But all that you need to do an on-line backup, one of the most important
consumers of snapshot technology. Other uses of snapshots tend to be
more like inferior substitutes for 'continuous data protection'
facilities, though the advent of writable snapshots (clones) has opened
up new uses (at least new imaginable uses: how much actual utility they
have I'm not sure).

The old Solaris fssnap mechanism may have been limited to a single
snapshot. Peter Braam et al. produced alpha and beta releases of a more
general snapshot facility called snapfs in 2001 which I thought either
got further developed or replaced with another product of the same name,
but I didn't find further information on it. The Linux LVM and LVM2
support snapshots (the latter including writable snapshots) - and a
quick glance at the documentation didn't seem to indicate that they
supported only one at a time.

2. No coherent DR strategy. I don't consider rsync a mirroring
solution if it needs to walk the tree each time.
Synchronous mirroring at the driver level has been available for ages,
and is entirely feasible across distances of at least 100 miles - enough
to survive any disaster which your business is likely to survive as long
as your remote site is reasonably robust. If write performance
requirements can be relaxed a bit distances can be significantly
greater. I haven't looked recently, so I don't know how well those
facilities deal with temporary link interruptions and subsequent
catch-up (if you've got dedicated fiber to a robust back-up site that
may not be too likely to occur, but in other circumstances it would be
very desirable)

Can you name some examples of synchronous mirroring at the driver
level? Is it open source? Easy to deploy?

I'm not all that familiar with the offerings, but my impression is that
DRDB may be the current Linux standard in this area; a 2003 description
can be found at http://www.linux-mag.com/id/1502 , and it's still being
developed (just Google it). You may have been able to roll your own
remote replication before DRDB by using a remote disk paired
(RAID-1-style) with a local disk under local LVM facilities.

- bill

**Steve Cousins** · #9 February 20th 08, 04:13 PM posted to comp.arch.storage

Bill Todd wrote:

There's nothing there that even remotely hints at data corruption on
power loss: the defined semantics of any normal Unix-style file
system (including ZFS) specifies that any user data that hasn't been
explicitly flushed to disk may or may not be on the disk, in whole or
in part, should power fail (that's what write-back caching is all
about: ...

Sorry to go off on a tangent but I think it is somewhat relevant since S
was talking about Enterprise storage: How common is it for enterprise
storage vendors to have disks with firmware that makes it impossible to
the enable write-back cache? We have an SGI NAS (IS4500) where this is
the case and it took me a little by surprise, although it does make a
lot of sense when you have 100 TB storage. Does most or all enterprise
storage permanently disable write-back cache?

Thanks,

Steve

**Cydrome Leader** · #10 February 20th 08, 09:29 PM posted to comp.arch.storage

Steve Cousins wrote:

Bill Todd wrote:

There's nothing there that even remotely hints at data corruption on
power loss: the defined semantics of any normal Unix-style file
system (including ZFS) specifies that any user data that hasn't been
explicitly flushed to disk may or may not be on the disk, in whole or
in part, should power fail (that's what write-back caching is all
about: ...

Sorry to go off on a tangent but I think it is somewhat relevant since S
was talking about Enterprise storage: How common is it for enterprise
storage vendors to have disks with firmware that makes it impossible to
the enable write-back cache? We have an SGI NAS (IS4500) where this is
the case and it took me a little by surprise, although it does make a
lot of sense when you have 100 TB storage. Does most or all enterprise
storage permanently disable write-back cache?

Thanks,

Steve

Ha, I'm more shocked that anything from SGI is still in use.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[OT] Open source wars	Robert Myers	General	4	June 16th 05 03:07 PM
IBM turning Power into Open Source?	Black Jack	General	24	April 15th 04 01:06 PM
SCO: IBM trying to hijack Open Source says supporter	Daeron	Homebuilt PC's	6	February 11th 04 01:10 AM
TI calculators go open source	Yousuf Khan	General	4	December 22nd 03 07:37 PM
Massachusetts goes open-source	Yousuf Khan	General	14	October 22nd 03 05:56 AM