comments on Panasas Object Storage

#1 October 27th 03, 04:07 PM

Hi all,

Just came across this article:

http://www.infoworld.com/article/03/...rinside_1.html

http://www.panasas.com/activescaleos.html

Sounds like an interesting product. Anybopdy got any comments or experience
with this?

#2 October 27th 03, 10:04 PM

"darren" wrote in message
...
Hi all,

Just came across this article:

http://www.infoworld.com/article/03/...rinside_1.html

http://www.panasas.com/activescaleos.html

Sounds like an interesting product. Anybopdy got any comments or
experience
with this?

It is indeed an interesting product, but Infoworld doesn't seem to have been
able to separate hype from fact.

Multi-protocol file servers (which can serve up any individual file the way
the client wants it, whether via NFS or CIFS) are nothing new. Clustered
file servers actually aren't anything new, either (VMS has supported them
for decades, and some commercial Unixes have more recently), and while
they're relatively new to commodity hardware Panasas' isn't the first
(Tricord, which went belly-up last year, actually shipped a product called
'Illumina' with some similar characteristics, for example - including IIRC
the ability to reshuffle data to restore redundancy around failed components
and shift load to added hardware).

Their CTO is Garth Gibson, who's been touting 'object storage devices' for
many years (starting with his work at CMU on 'network-attached secure
disks'). After several somewhat misguided shots at the problem this one
looks worthwhile, though still somewhat sub-optimal in a few respects. And
while there may be some 'object' flavor to some of the internal mechanisms
used, it's still basically just a file system that distributes files across
an expandible set of storage servers: as I said, a worthwhile product, but
hardly unique (see, for example, IBM's 'Storage Tank' and the emerging
Lustre product on Linux).

- bill

#3 October 29th 03, 01:47 AM

Hello Bill,

To your rejoinder points... a couple of questions, if I may.

...while there may be some 'object' flavor to some of the internal mechanisms
used, it's still basically just a file system that distributes files across
an expandible set of storage servers...

To realize a "true" OSD (as defined by Intel, et al), is it the case
that the physical drives themselves in such a configuration (the drive
manufacturer's firmware) and/or disk controllers must posses the
capability of "handling" the Object Storage model, along with
correspondingly aligned "filesystem" (objsystem) software systems?

I don't know the Panasas product offering(s), but I'm curious as to
how far they go toward the idealized OBS/OSD model, and to what degree
they might have [dis]advantages vis-a-vis Lustre and something like
Sistina's GFS (the latter may be a stretch, but I'm trying to
understand if an OSD model could be imposed on something like a GFS as
well).

I'll appreciate any info.

Thank you.

--
VS

"Bill Todd" wrote in message ...
"darren" wrote in message
...
Hi all,

Just came across this article:

http://www.infoworld.com/article/03/...rinside_1.html

http://www.panasas.com/activescaleos.html

Sounds like an interesting product. Anybopdy got any comments or
experience
with this?

It is indeed an interesting product, but Infoworld doesn't seem to have been
able to separate hype from fact.

Multi-protocol file servers (which can serve up any individual file the way
the client wants it, whether via NFS or CIFS) are nothing new. Clustered
file servers actually aren't anything new, either (VMS has supported them
for decades, and some commercial Unixes have more recently), and while
they're relatively new to commodity hardware Panasas' isn't the first
(Tricord, which went belly-up last year, actually shipped a product called
'Illumina' with some similar characteristics, for example - including IIRC
the ability to reshuffle data to restore redundancy around failed components
and shift load to added hardware).

Their CTO is Garth Gibson, who's been touting 'object storage devices' for
many years (starting with his work at CMU on 'network-attached secure
disks'). After several somewhat misguided shots at the problem this one
looks worthwhile, though still somewhat sub-optimal in a few respects. And
while there may be some 'object' flavor to some of the internal mechanisms
used, it's still basically just a file system that distributes files across
an expandible set of storage servers: as I said, a worthwhile product, but
hardly unique (see, for example, IBM's 'Storage Tank' and the emerging
Lustre product on Linux).

- bill

#4 October 29th 03, 08:35 AM

"VirtualSean" wrote in message
om...
Hello Bill,

To your rejoinder points... a couple of questions, if I may.

...while there may be some 'object' flavor to some of the internal
mechanisms
used, it's still basically just a file system that distributes files
across
an expandible set of storage servers...

To realize a "true" OSD (as defined by Intel, et al),

I hadn't noticed that Intel had presumed to 'define' what an OSD was.

is it the case
that the physical drives themselves in such a configuration (the drive
manufacturer's firmware) and/or disk controllers must posses the
capability of "handling" the Object Storage model, along with
correspondingly aligned "filesystem" (objsystem) software systems?

That would, I suspect, depend upon exactly whose definition of what an OSD
was you used.

I don't know the Panasas product offering(s), but I'm curious as to
how far they go toward the idealized OBS/OSD model,

And that would as well.

and to what degree
they might have [dis]advantages vis-a-vis Lustre and something like
Sistina's GFS (the latter may be a stretch, but I'm trying to
understand if an OSD model could be imposed on something like a GFS as
well).

The current definitions of an OSD that I'm aware of state that the 'device'
(which if undefined could be something that many people would call a
'server' of some kind rather than a disk) should encapsulate the physical
placement of data on the disk(s) such that it can be addressed externally by
object-identifier/offset/length triples. This allows the device to make
optimization decisions on its own about placement (IIRC HP's AutoRAID
product had some primitive capabilities in this area, but did not present
the result in the form of 'objects'). The 'object' may also support
(possibly extendible) 'attributes'.

One problem with this approach is that rather than simply moving complexity
from something like a file 'inode' from one place (the higher-level
software) to another ( the OSD), it adds a layer - since files can exceed
the size of the OSD and hence must be able to span multiple OSDs. So
there's still metadata in the 'inode', plus more at the OSD - and more
stages of metadata introduce more potential for requiring additional random
accesses before you can get to the data you want (not to mention adding more
points at which any *changes* in metadata must be reliably stored before a
modification operation can be considered to be 'stable' - though suitable
introduction of NVRAM at these points can help mitigate any additional
latency that this might cause). Another problem is that a high-level view
of a file as an 'object' often includes any storage redundancy used to store
that file, whereas individual 'objects' at an OSD by definition do not
encapsulate redundancy across multiple OSDs (hence for this reason as well a
'file' cannot necessarily map 1:1 to an OSD object). A third issue is that
of locking granularity, which also can span OSDs when a file does (Garth had
some early problems with this).

OTOH, decentralizing this metadata can help support increased scalability in
large systems (where many clients may be able to interrogate many OSDs
directly without going through a single 'inode server' as a bottleneck -
though there are other ways to accomplish this as well, such as
client-caching some or all of the metadata when it doesn't change all that
often).

IMO, the industry is still groping around for something approaching an
'ideal' decomposition of work in distributed file systems. The SNIA
includes disk vendors who would *love* to find ways to standardize what an
'OSD' is so that they could build added value into their products and price
them accordingly, but it's just not yet clear what that standard should be -
or even that a disk is the right level for *any* such standardization (and
if it's not, whether standardization at a higher - storage server - level
makes sense, rather than just accepting that the possibilities there are
sufficiently rich that proprietary solutions should be allowed to bloom to
address varying needs).

My own current views are that 1) OSDs don't make a great deal of sense when
talking about block-level storage (i.e., they don't add much value to that
idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level
devices attached to single hosts (since they don't off-load sufficient work
to be very interesting: whatever you off-load tends to be balanced by the
additional host interface complexity required to control it as closely as a
host may often wish to do), and 3) *standardized* OSDs don't make much
sense in distributed systems, since they unnecessarily constrain creativity
in the design of those systems (i.e., such systems are sufficiently complex
that no single decomposition is even close to ideal for all circumstances -
and no current decomposition that I've seen is close to ideal even for many
common ones).

- bill

#5 October 29th 03, 10:33 AM

Hi Bill,

Thanx for the insight.

I was looking for a high-throughput storage solution to handle random read
and writes of large number of small files.

Wonder if the panasas solution will do better then a netapp.

We are currently facing some problems when loading large number of small
files into a NetApp F820 at high speed...the machine actually "core-dumped"
on us!
"Bill Todd" wrote in message
...

"VirtualSean" wrote in message
om...
Hello Bill,

To your rejoinder points... a couple of questions, if I may.

...while there may be some 'object' flavor to some of the internal
mechanisms
used, it's still basically just a file system that distributes files
across
an expandible set of storage servers...

To realize a "true" OSD (as defined by Intel, et al),

I hadn't noticed that Intel had presumed to 'define' what an OSD was.

is it the case
that the physical drives themselves in such a configuration (the drive
manufacturer's firmware) and/or disk controllers must posses the
capability of "handling" the Object Storage model, along with
correspondingly aligned "filesystem" (objsystem) software systems?

That would, I suspect, depend upon exactly whose definition of what an OSD
was you used.

I don't know the Panasas product offering(s), but I'm curious as to
how far they go toward the idealized OBS/OSD model,

And that would as well.

and to what degree
they might have [dis]advantages vis-a-vis Lustre and something like
Sistina's GFS (the latter may be a stretch, but I'm trying to
understand if an OSD model could be imposed on something like a GFS as
well).

The current definitions of an OSD that I'm aware of state that the
'device'
(which if undefined could be something that many people would call a
'server' of some kind rather than a disk) should encapsulate the physical
placement of data on the disk(s) such that it can be addressed externally
by
object-identifier/offset/length triples. This allows the device to make
optimization decisions on its own about placement (IIRC HP's AutoRAID
product had some primitive capabilities in this area, but did not present
the result in the form of 'objects'). The 'object' may also support
(possibly extendible) 'attributes'.

One problem with this approach is that rather than simply moving
complexity
from something like a file 'inode' from one place (the higher-level
software) to another ( the OSD), it adds a layer - since files can exceed
the size of the OSD and hence must be able to span multiple OSDs. So
there's still metadata in the 'inode', plus more at the OSD - and more
stages of metadata introduce more potential for requiring additional
random
accesses before you can get to the data you want (not to mention adding
more
points at which any *changes* in metadata must be reliably stored before a
modification operation can be considered to be 'stable' - though suitable
introduction of NVRAM at these points can help mitigate any additional
latency that this might cause). Another problem is that a high-level view
of a file as an 'object' often includes any storage redundancy used to
store
that file, whereas individual 'objects' at an OSD by definition do not
encapsulate redundancy across multiple OSDs (hence for this reason as well
a
'file' cannot necessarily map 1:1 to an OSD object). A third issue is
that
of locking granularity, which also can span OSDs when a file does (Garth
had
some early problems with this).

OTOH, decentralizing this metadata can help support increased scalability
in
large systems (where many clients may be able to interrogate many OSDs
directly without going through a single 'inode server' as a bottleneck -
though there are other ways to accomplish this as well, such as
client-caching some or all of the metadata when it doesn't change all that
often).

IMO, the industry is still groping around for something approaching an
'ideal' decomposition of work in distributed file systems. The SNIA
includes disk vendors who would *love* to find ways to standardize what an
'OSD' is so that they could build added value into their products and
price
them accordingly, but it's just not yet clear what that standard should
be -
or even that a disk is the right level for *any* such standardization (and
if it's not, whether standardization at a higher - storage server - level
makes sense, rather than just accepting that the possibilities there are
sufficiently rich that proprietary solutions should be allowed to bloom to
address varying needs).

My own current views are that 1) OSDs don't make a great deal of sense
when
talking about block-level storage (i.e., they don't add much value to that
idiom), 2) OSDs don't make a great deal of sense as
higher-than-block-level
devices attached to single hosts (since they don't off-load sufficient
work
to be very interesting: whatever you off-load tends to be balanced by the
additional host interface complexity required to control it as closely as
a
host may often wish to do), and 3) *standardized* OSDs don't make much
sense in distributed systems, since they unnecessarily constrain
creativity
in the design of those systems (i.e., such systems are sufficiently
complex
that no single decomposition is even close to ideal for all
circumstances -
and no current decomposition that I've seen is close to ideal even for
many
common ones).

- bill

#6 October 29th 03, 03:11 PM

The Panasas equipment is pretty interesting, you should also check out
the Spinnaker networks equipment and the NAS equipment from SGI.
NetApp equipment has the write penalty in part due to its use of RAID
4. You might want to talk to the folks at www.zerowait.com, they know
a lot of the tricks on tuning NetApp equipment.

Grey

"darren" wrote in message ...
Hi Bill,

Thanx for the insight.

I was looking for a high-throughput storage solution to handle random read
and writes of large number of small files.

Wonder if the panasas solution will do better then a netapp.

We are currently facing some problems when loading large number of small
files into a NetApp F820 at high speed...the machine actually "core-dumped"
on us!
"Bill Todd" wrote in message
...

"VirtualSean" wrote in message
om...
Hello Bill,

To your rejoinder points... a couple of questions, if I may.

...while there may be some 'object' flavor to some of the internal
mechanisms
used, it's still basically just a file system that distributes files
across
an expandible set of storage servers...

To realize a "true" OSD (as defined by Intel, et al),

I hadn't noticed that Intel had presumed to 'define' what an OSD was.

is it the case
that the physical drives themselves in such a configuration (the drive
manufacturer's firmware) and/or disk controllers must posses the
capability of "handling" the Object Storage model, along with
correspondingly aligned "filesystem" (objsystem) software systems?

That would, I suspect, depend upon exactly whose definition of what an OSD
was you used.

I don't know the Panasas product offering(s), but I'm curious as to
how far they go toward the idealized OBS/OSD model,

And that would as well.

and to what degree
they might have [dis]advantages vis-a-vis Lustre and something like
Sistina's GFS (the latter may be a stretch, but I'm trying to
understand if an OSD model could be imposed on something like a GFS as
well).

The current definitions of an OSD that I'm aware of state that the
'device'
(which if undefined could be something that many people would call a
'server' of some kind rather than a disk) should encapsulate the physical
placement of data on the disk(s) such that it can be addressed externally
by
object-identifier/offset/length triples. This allows the device to make
optimization decisions on its own about placement (IIRC HP's AutoRAID
product had some primitive capabilities in this area, but did not present
the result in the form of 'objects'). The 'object' may also support
(possibly extendible) 'attributes'.

One problem with this approach is that rather than simply moving
complexity
from something like a file 'inode' from one place (the higher-level
software) to another ( the OSD), it adds a layer - since files can exceed
the size of the OSD and hence must be able to span multiple OSDs. So
there's still metadata in the 'inode', plus more at the OSD - and more
stages of metadata introduce more potential for requiring additional
random
accesses before you can get to the data you want (not to mention adding
more
points at which any *changes* in metadata must be reliably stored before a
modification operation can be considered to be 'stable' - though suitable
introduction of NVRAM at these points can help mitigate any additional
latency that this might cause). Another problem is that a high-level view
of a file as an 'object' often includes any storage redundancy used to
store
that file, whereas individual 'objects' at an OSD by definition do not
encapsulate redundancy across multiple OSDs (hence for this reason as well
a
'file' cannot necessarily map 1:1 to an OSD object). A third issue is
that
of locking granularity, which also can span OSDs when a file does (Garth
had
some early problems with this).

OTOH, decentralizing this metadata can help support increased scalability
in
large systems (where many clients may be able to interrogate many OSDs
directly without going through a single 'inode server' as a bottleneck -
though there are other ways to accomplish this as well, such as
client-caching some or all of the metadata when it doesn't change all that
often).

IMO, the industry is still groping around for something approaching an
'ideal' decomposition of work in distributed file systems. The SNIA
includes disk vendors who would *love* to find ways to standardize what an
'OSD' is so that they could build added value into their products and
price
them accordingly, but it's just not yet clear what that standard should
be -
or even that a disk is the right level for *any* such standardization (and
if it's not, whether standardization at a higher - storage server - level
makes sense, rather than just accepting that the possibilities there are
sufficiently rich that proprietary solutions should be allowed to bloom to
address varying needs).

My own current views are that 1) OSDs don't make a great deal of sense
when
talking about block-level storage (i.e., they don't add much value to that
idiom), 2) OSDs don't make a great deal of sense as
higher-than-block-level
devices attached to single hosts (since they don't off-load sufficient
work
to be very interesting: whatever you off-load tends to be balanced by the
additional host interface complexity required to control it as closely as
a
host may often wish to do), and 3) *standardized* OSDs don't make much
sense in distributed systems, since they unnecessarily constrain
creativity
in the design of those systems (i.e., such systems are sufficiently
complex
that no single decomposition is even close to ideal for all
circumstances -
and no current decomposition that I've seen is close to ideal even for
many
common ones).

- bill

#7 October 29th 03, 05:26 PM

[Responded privately before noticing this newsgroup post:]

Hi Bill,

Thanx for the insight.

I was looking for a high-throughput storage solution to handle random
read
and writes of large number of small files.

Wonder if the panasas solution will do better then a netapp.

We are currently facing some problems when loading large number of small
files into a NetApp F820 at high speed...the machine actually
"core-dumped"
on us!

Needless to say, that should *never* happen due to simple load. Either your
hardware has a problem, or WAFL does: I would expect NetApp to take this
problem fairly seriously if you reported it.

Log-structured (or sort-of-log-structured, like NetApp) file systems should
be a good choice for the workload you describe - especially when
supplemented with NVRAM to handle the most recent portion of the 'log' (as
WAFL is). I think Sun may have one that they support, and Linux may as
well. Since I have no idea what internal algorithms Panasas uses to place
data on disk, there's no way to SWAG how well they'd do: I have a
recollection of recently seeing IOPS figures for a system that were
*clearly* far beyond the capabilities of the underlying disk storage (i.e.,
I expect they referred to accessing cached data), and it may well have been
in the Panasas literature (which is the most recent stuff I've looked at) -
so that won't necessarily shed much light on the subject.

Reiserfs on Linux is supposedly optimized to handle small files well, but
may not offer as much flexibility as you'd need (e.g., the ability to load
them at high speed without much robustness because you could just start the
load over in the unlikely event of, e.g., a power failure). I've been
working on a file system of my own for a while that should do the job
flexibly and well, but I suspect you may want something sooner than it's
likely to be available (if you're looking a year or so out, we could talk).

- bill

#8 October 29th 03, 09:31 PM

Hi Bill,

To realize a "true" OSD (as defined by Intel, et al),...

I hadn't noticed that Intel had presumed to 'define' what an OSD was.

I neglected my "e.g.," in that statement. I was simply attempting to
"settle" on one definition (or some acknowledged _set_ of aligned
definitions) for reference. Not my intention to assert that Intel has
any more clout or credibility in defining OSD/OBS.

Thank you very much for your thoughtful and info-packed reply Bill.

Much obliged.

--
VS

"Bill Todd" wrote in message ...
"VirtualSean" wrote in message
om...
Hello Bill,

To your rejoinder points... a couple of questions, if I may.

...while there may be some 'object' flavor to some of the internal
mechanisms
used, it's still basically just a file system that distributes files
across
an expandible set of storage servers...

To realize a "true" OSD (as defined by Intel, et al),

I hadn't noticed that Intel had presumed to 'define' what an OSD was.

is it the case
that the physical drives themselves in such a configuration (the drive
manufacturer's firmware) and/or disk controllers must posses the
capability of "handling" the Object Storage model, along with
correspondingly aligned "filesystem" (objsystem) software systems?

That would, I suspect, depend upon exactly whose definition of what an OSD
was you used.

I don't know the Panasas product offering(s), but I'm curious as to
how far they go toward the idealized OBS/OSD model,

And that would as well.

and to what degree
they might have [dis]advantages vis-a-vis Lustre and something like
Sistina's GFS (the latter may be a stretch, but I'm trying to
understand if an OSD model could be imposed on something like a GFS as
well).

The current definitions of an OSD that I'm aware of state that the 'device'
(which if undefined could be something that many people would call a
'server' of some kind rather than a disk) should encapsulate the physical
placement of data on the disk(s) such that it can be addressed externally by
object-identifier/offset/length triples. This allows the device to make
optimization decisions on its own about placement (IIRC HP's AutoRAID
product had some primitive capabilities in this area, but did not present
the result in the form of 'objects'). The 'object' may also support
(possibly extendible) 'attributes'.

One problem with this approach is that rather than simply moving complexity
from something like a file 'inode' from one place (the higher-level
software) to another ( the OSD), it adds a layer - since files can exceed
the size of the OSD and hence must be able to span multiple OSDs. So
there's still metadata in the 'inode', plus more at the OSD - and more
stages of metadata introduce more potential for requiring additional random
accesses before you can get to the data you want (not to mention adding more
points at which any *changes* in metadata must be reliably stored before a
modification operation can be considered to be 'stable' - though suitable
introduction of NVRAM at these points can help mitigate any additional
latency that this might cause). Another problem is that a high-level view
of a file as an 'object' often includes any storage redundancy used to store
that file, whereas individual 'objects' at an OSD by definition do not
encapsulate redundancy across multiple OSDs (hence for this reason as well a
'file' cannot necessarily map 1:1 to an OSD object). A third issue is that
of locking granularity, which also can span OSDs when a file does (Garth had
some early problems with this).

OTOH, decentralizing this metadata can help support increased scalability in
large systems (where many clients may be able to interrogate many OSDs
directly without going through a single 'inode server' as a bottleneck -
though there are other ways to accomplish this as well, such as
client-caching some or all of the metadata when it doesn't change all that
often).

IMO, the industry is still groping around for something approaching an
'ideal' decomposition of work in distributed file systems. The SNIA
includes disk vendors who would *love* to find ways to standardize what an
'OSD' is so that they could build added value into their products and price
them accordingly, but it's just not yet clear what that standard should be -
or even that a disk is the right level for *any* such standardization (and
if it's not, whether standardization at a higher - storage server - level
makes sense, rather than just accepting that the possibilities there are
sufficiently rich that proprietary solutions should be allowed to bloom to
address varying needs).

My own current views are that 1) OSDs don't make a great deal of sense when
talking about block-level storage (i.e., they don't add much value to that
idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level
devices attached to single hosts (since they don't off-load sufficient work
to be very interesting: whatever you off-load tends to be balanced by the
additional host interface complexity required to control it as closely as a
host may often wish to do), and 3) *standardized* OSDs don't make much
sense in distributed systems, since they unnecessarily constrain creativity
in the design of those systems (i.e., such systems are sufficiently complex
that no single decomposition is even close to ideal for all circumstances -
and no current decomposition that I've seen is close to ideal even for many
common ones).

- bill

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
I want to build a 2.8TB storage array	Yeechang Lee	Homebuilt PC's	21	January 12th 05 01:00 AM
Cost of DVD as data storage versus HDD (UK)	David X	Cdr	136	December 7th 04 02:46 PM
31 Terabytes of storage?	Terry Wilson	Homebuilt PC's	10	August 5th 04 10:10 PM
Windows Clustering with Compaq F2 Storage array	JB Lee	Compaq Servers	1	July 16th 04 11:50 PM
Terabyte Storage By Real-Storage	Real-Storage	Storage & Hardrives	2	October 23rd 03 04:18 PM