If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
comments on Panasas Object Storage
Hi all,
Just came across this article: http://www.infoworld.com/article/03/...rinside_1.html http://www.panasas.com/activescaleos.html Sounds like an interesting product. Anybopdy got any comments or experience with this? |
#2
|
|||
|
|||
"darren" wrote in message ... Hi all, Just came across this article: http://www.infoworld.com/article/03/...rinside_1.html http://www.panasas.com/activescaleos.html Sounds like an interesting product. Anybopdy got any comments or experience with this? It is indeed an interesting product, but Infoworld doesn't seem to have been able to separate hype from fact. Multi-protocol file servers (which can serve up any individual file the way the client wants it, whether via NFS or CIFS) are nothing new. Clustered file servers actually aren't anything new, either (VMS has supported them for decades, and some commercial Unixes have more recently), and while they're relatively new to commodity hardware Panasas' isn't the first (Tricord, which went belly-up last year, actually shipped a product called 'Illumina' with some similar characteristics, for example - including IIRC the ability to reshuffle data to restore redundancy around failed components and shift load to added hardware). Their CTO is Garth Gibson, who's been touting 'object storage devices' for many years (starting with his work at CMU on 'network-attached secure disks'). After several somewhat misguided shots at the problem this one looks worthwhile, though still somewhat sub-optimal in a few respects. And while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers: as I said, a worthwhile product, but hardly unique (see, for example, IBM's 'Storage Tank' and the emerging Lustre product on Linux). - bill |
#3
|
|||
|
|||
Hello Bill,
To your rejoinder points... a couple of questions, if I may. ...while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers... To realize a "true" OSD (as defined by Intel, et al), is it the case that the physical drives themselves in such a configuration (the drive manufacturer's firmware) and/or disk controllers must posses the capability of "handling" the Object Storage model, along with correspondingly aligned "filesystem" (objsystem) software systems? I don't know the Panasas product offering(s), but I'm curious as to how far they go toward the idealized OBS/OSD model, and to what degree they might have [dis]advantages vis-a-vis Lustre and something like Sistina's GFS (the latter may be a stretch, but I'm trying to understand if an OSD model could be imposed on something like a GFS as well). I'll appreciate any info. Thank you. -- VS "Bill Todd" wrote in message ... "darren" wrote in message ... Hi all, Just came across this article: http://www.infoworld.com/article/03/...rinside_1.html http://www.panasas.com/activescaleos.html Sounds like an interesting product. Anybopdy got any comments or experience with this? It is indeed an interesting product, but Infoworld doesn't seem to have been able to separate hype from fact. Multi-protocol file servers (which can serve up any individual file the way the client wants it, whether via NFS or CIFS) are nothing new. Clustered file servers actually aren't anything new, either (VMS has supported them for decades, and some commercial Unixes have more recently), and while they're relatively new to commodity hardware Panasas' isn't the first (Tricord, which went belly-up last year, actually shipped a product called 'Illumina' with some similar characteristics, for example - including IIRC the ability to reshuffle data to restore redundancy around failed components and shift load to added hardware). Their CTO is Garth Gibson, who's been touting 'object storage devices' for many years (starting with his work at CMU on 'network-attached secure disks'). After several somewhat misguided shots at the problem this one looks worthwhile, though still somewhat sub-optimal in a few respects. And while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers: as I said, a worthwhile product, but hardly unique (see, for example, IBM's 'Storage Tank' and the emerging Lustre product on Linux). - bill |
#4
|
|||
|
|||
"VirtualSean" wrote in message om... Hello Bill, To your rejoinder points... a couple of questions, if I may. ...while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers... To realize a "true" OSD (as defined by Intel, et al), I hadn't noticed that Intel had presumed to 'define' what an OSD was. is it the case that the physical drives themselves in such a configuration (the drive manufacturer's firmware) and/or disk controllers must posses the capability of "handling" the Object Storage model, along with correspondingly aligned "filesystem" (objsystem) software systems? That would, I suspect, depend upon exactly whose definition of what an OSD was you used. I don't know the Panasas product offering(s), but I'm curious as to how far they go toward the idealized OBS/OSD model, And that would as well. and to what degree they might have [dis]advantages vis-a-vis Lustre and something like Sistina's GFS (the latter may be a stretch, but I'm trying to understand if an OSD model could be imposed on something like a GFS as well). The current definitions of an OSD that I'm aware of state that the 'device' (which if undefined could be something that many people would call a 'server' of some kind rather than a disk) should encapsulate the physical placement of data on the disk(s) such that it can be addressed externally by object-identifier/offset/length triples. This allows the device to make optimization decisions on its own about placement (IIRC HP's AutoRAID product had some primitive capabilities in this area, but did not present the result in the form of 'objects'). The 'object' may also support (possibly extendible) 'attributes'. One problem with this approach is that rather than simply moving complexity from something like a file 'inode' from one place (the higher-level software) to another ( the OSD), it adds a layer - since files can exceed the size of the OSD and hence must be able to span multiple OSDs. So there's still metadata in the 'inode', plus more at the OSD - and more stages of metadata introduce more potential for requiring additional random accesses before you can get to the data you want (not to mention adding more points at which any *changes* in metadata must be reliably stored before a modification operation can be considered to be 'stable' - though suitable introduction of NVRAM at these points can help mitigate any additional latency that this might cause). Another problem is that a high-level view of a file as an 'object' often includes any storage redundancy used to store that file, whereas individual 'objects' at an OSD by definition do not encapsulate redundancy across multiple OSDs (hence for this reason as well a 'file' cannot necessarily map 1:1 to an OSD object). A third issue is that of locking granularity, which also can span OSDs when a file does (Garth had some early problems with this). OTOH, decentralizing this metadata can help support increased scalability in large systems (where many clients may be able to interrogate many OSDs directly without going through a single 'inode server' as a bottleneck - though there are other ways to accomplish this as well, such as client-caching some or all of the metadata when it doesn't change all that often). IMO, the industry is still groping around for something approaching an 'ideal' decomposition of work in distributed file systems. The SNIA includes disk vendors who would *love* to find ways to standardize what an 'OSD' is so that they could build added value into their products and price them accordingly, but it's just not yet clear what that standard should be - or even that a disk is the right level for *any* such standardization (and if it's not, whether standardization at a higher - storage server - level makes sense, rather than just accepting that the possibilities there are sufficiently rich that proprietary solutions should be allowed to bloom to address varying needs). My own current views are that 1) OSDs don't make a great deal of sense when talking about block-level storage (i.e., they don't add much value to that idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level devices attached to single hosts (since they don't off-load sufficient work to be very interesting: whatever you off-load tends to be balanced by the additional host interface complexity required to control it as closely as a host may often wish to do), and 3) *standardized* OSDs don't make much sense in distributed systems, since they unnecessarily constrain creativity in the design of those systems (i.e., such systems are sufficiently complex that no single decomposition is even close to ideal for all circumstances - and no current decomposition that I've seen is close to ideal even for many common ones). - bill |
#5
|
|||
|
|||
Hi Bill,
Thanx for the insight. I was looking for a high-throughput storage solution to handle random read and writes of large number of small files. Wonder if the panasas solution will do better then a netapp. We are currently facing some problems when loading large number of small files into a NetApp F820 at high speed...the machine actually "core-dumped" on us! "Bill Todd" wrote in message ... "VirtualSean" wrote in message om... Hello Bill, To your rejoinder points... a couple of questions, if I may. ...while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers... To realize a "true" OSD (as defined by Intel, et al), I hadn't noticed that Intel had presumed to 'define' what an OSD was. is it the case that the physical drives themselves in such a configuration (the drive manufacturer's firmware) and/or disk controllers must posses the capability of "handling" the Object Storage model, along with correspondingly aligned "filesystem" (objsystem) software systems? That would, I suspect, depend upon exactly whose definition of what an OSD was you used. I don't know the Panasas product offering(s), but I'm curious as to how far they go toward the idealized OBS/OSD model, And that would as well. and to what degree they might have [dis]advantages vis-a-vis Lustre and something like Sistina's GFS (the latter may be a stretch, but I'm trying to understand if an OSD model could be imposed on something like a GFS as well). The current definitions of an OSD that I'm aware of state that the 'device' (which if undefined could be something that many people would call a 'server' of some kind rather than a disk) should encapsulate the physical placement of data on the disk(s) such that it can be addressed externally by object-identifier/offset/length triples. This allows the device to make optimization decisions on its own about placement (IIRC HP's AutoRAID product had some primitive capabilities in this area, but did not present the result in the form of 'objects'). The 'object' may also support (possibly extendible) 'attributes'. One problem with this approach is that rather than simply moving complexity from something like a file 'inode' from one place (the higher-level software) to another ( the OSD), it adds a layer - since files can exceed the size of the OSD and hence must be able to span multiple OSDs. So there's still metadata in the 'inode', plus more at the OSD - and more stages of metadata introduce more potential for requiring additional random accesses before you can get to the data you want (not to mention adding more points at which any *changes* in metadata must be reliably stored before a modification operation can be considered to be 'stable' - though suitable introduction of NVRAM at these points can help mitigate any additional latency that this might cause). Another problem is that a high-level view of a file as an 'object' often includes any storage redundancy used to store that file, whereas individual 'objects' at an OSD by definition do not encapsulate redundancy across multiple OSDs (hence for this reason as well a 'file' cannot necessarily map 1:1 to an OSD object). A third issue is that of locking granularity, which also can span OSDs when a file does (Garth had some early problems with this). OTOH, decentralizing this metadata can help support increased scalability in large systems (where many clients may be able to interrogate many OSDs directly without going through a single 'inode server' as a bottleneck - though there are other ways to accomplish this as well, such as client-caching some or all of the metadata when it doesn't change all that often). IMO, the industry is still groping around for something approaching an 'ideal' decomposition of work in distributed file systems. The SNIA includes disk vendors who would *love* to find ways to standardize what an 'OSD' is so that they could build added value into their products and price them accordingly, but it's just not yet clear what that standard should be - or even that a disk is the right level for *any* such standardization (and if it's not, whether standardization at a higher - storage server - level makes sense, rather than just accepting that the possibilities there are sufficiently rich that proprietary solutions should be allowed to bloom to address varying needs). My own current views are that 1) OSDs don't make a great deal of sense when talking about block-level storage (i.e., they don't add much value to that idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level devices attached to single hosts (since they don't off-load sufficient work to be very interesting: whatever you off-load tends to be balanced by the additional host interface complexity required to control it as closely as a host may often wish to do), and 3) *standardized* OSDs don't make much sense in distributed systems, since they unnecessarily constrain creativity in the design of those systems (i.e., such systems are sufficiently complex that no single decomposition is even close to ideal for all circumstances - and no current decomposition that I've seen is close to ideal even for many common ones). - bill |
#6
|
|||
|
|||
The Panasas equipment is pretty interesting, you should also check out
the Spinnaker networks equipment and the NAS equipment from SGI. NetApp equipment has the write penalty in part due to its use of RAID 4. You might want to talk to the folks at www.zerowait.com, they know a lot of the tricks on tuning NetApp equipment. Grey "darren" wrote in message ... Hi Bill, Thanx for the insight. I was looking for a high-throughput storage solution to handle random read and writes of large number of small files. Wonder if the panasas solution will do better then a netapp. We are currently facing some problems when loading large number of small files into a NetApp F820 at high speed...the machine actually "core-dumped" on us! "Bill Todd" wrote in message ... "VirtualSean" wrote in message om... Hello Bill, To your rejoinder points... a couple of questions, if I may. ...while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers... To realize a "true" OSD (as defined by Intel, et al), I hadn't noticed that Intel had presumed to 'define' what an OSD was. is it the case that the physical drives themselves in such a configuration (the drive manufacturer's firmware) and/or disk controllers must posses the capability of "handling" the Object Storage model, along with correspondingly aligned "filesystem" (objsystem) software systems? That would, I suspect, depend upon exactly whose definition of what an OSD was you used. I don't know the Panasas product offering(s), but I'm curious as to how far they go toward the idealized OBS/OSD model, And that would as well. and to what degree they might have [dis]advantages vis-a-vis Lustre and something like Sistina's GFS (the latter may be a stretch, but I'm trying to understand if an OSD model could be imposed on something like a GFS as well). The current definitions of an OSD that I'm aware of state that the 'device' (which if undefined could be something that many people would call a 'server' of some kind rather than a disk) should encapsulate the physical placement of data on the disk(s) such that it can be addressed externally by object-identifier/offset/length triples. This allows the device to make optimization decisions on its own about placement (IIRC HP's AutoRAID product had some primitive capabilities in this area, but did not present the result in the form of 'objects'). The 'object' may also support (possibly extendible) 'attributes'. One problem with this approach is that rather than simply moving complexity from something like a file 'inode' from one place (the higher-level software) to another ( the OSD), it adds a layer - since files can exceed the size of the OSD and hence must be able to span multiple OSDs. So there's still metadata in the 'inode', plus more at the OSD - and more stages of metadata introduce more potential for requiring additional random accesses before you can get to the data you want (not to mention adding more points at which any *changes* in metadata must be reliably stored before a modification operation can be considered to be 'stable' - though suitable introduction of NVRAM at these points can help mitigate any additional latency that this might cause). Another problem is that a high-level view of a file as an 'object' often includes any storage redundancy used to store that file, whereas individual 'objects' at an OSD by definition do not encapsulate redundancy across multiple OSDs (hence for this reason as well a 'file' cannot necessarily map 1:1 to an OSD object). A third issue is that of locking granularity, which also can span OSDs when a file does (Garth had some early problems with this). OTOH, decentralizing this metadata can help support increased scalability in large systems (where many clients may be able to interrogate many OSDs directly without going through a single 'inode server' as a bottleneck - though there are other ways to accomplish this as well, such as client-caching some or all of the metadata when it doesn't change all that often). IMO, the industry is still groping around for something approaching an 'ideal' decomposition of work in distributed file systems. The SNIA includes disk vendors who would *love* to find ways to standardize what an 'OSD' is so that they could build added value into their products and price them accordingly, but it's just not yet clear what that standard should be - or even that a disk is the right level for *any* such standardization (and if it's not, whether standardization at a higher - storage server - level makes sense, rather than just accepting that the possibilities there are sufficiently rich that proprietary solutions should be allowed to bloom to address varying needs). My own current views are that 1) OSDs don't make a great deal of sense when talking about block-level storage (i.e., they don't add much value to that idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level devices attached to single hosts (since they don't off-load sufficient work to be very interesting: whatever you off-load tends to be balanced by the additional host interface complexity required to control it as closely as a host may often wish to do), and 3) *standardized* OSDs don't make much sense in distributed systems, since they unnecessarily constrain creativity in the design of those systems (i.e., such systems are sufficiently complex that no single decomposition is even close to ideal for all circumstances - and no current decomposition that I've seen is close to ideal even for many common ones). - bill |
#7
|
|||
|
|||
[Responded privately before noticing this newsgroup post:]
Hi Bill, Thanx for the insight. I was looking for a high-throughput storage solution to handle random read and writes of large number of small files. Wonder if the panasas solution will do better then a netapp. We are currently facing some problems when loading large number of small files into a NetApp F820 at high speed...the machine actually "core-dumped" on us! Needless to say, that should *never* happen due to simple load. Either your hardware has a problem, or WAFL does: I would expect NetApp to take this problem fairly seriously if you reported it. Log-structured (or sort-of-log-structured, like NetApp) file systems should be a good choice for the workload you describe - especially when supplemented with NVRAM to handle the most recent portion of the 'log' (as WAFL is). I think Sun may have one that they support, and Linux may as well. Since I have no idea what internal algorithms Panasas uses to place data on disk, there's no way to SWAG how well they'd do: I have a recollection of recently seeing IOPS figures for a system that were *clearly* far beyond the capabilities of the underlying disk storage (i.e., I expect they referred to accessing cached data), and it may well have been in the Panasas literature (which is the most recent stuff I've looked at) - so that won't necessarily shed much light on the subject. Reiserfs on Linux is supposedly optimized to handle small files well, but may not offer as much flexibility as you'd need (e.g., the ability to load them at high speed without much robustness because you could just start the load over in the unlikely event of, e.g., a power failure). I've been working on a file system of my own for a while that should do the job flexibly and well, but I suspect you may want something sooner than it's likely to be available (if you're looking a year or so out, we could talk). - bill |
#8
|
|||
|
|||
Hi Bill,
To realize a "true" OSD (as defined by Intel, et al),... I hadn't noticed that Intel had presumed to 'define' what an OSD was. I neglected my "e.g.," in that statement. I was simply attempting to "settle" on one definition (or some acknowledged _set_ of aligned definitions) for reference. Not my intention to assert that Intel has any more clout or credibility in defining OSD/OBS. Thank you very much for your thoughtful and info-packed reply Bill. Much obliged. -- VS "Bill Todd" wrote in message ... "VirtualSean" wrote in message om... Hello Bill, To your rejoinder points... a couple of questions, if I may. ...while there may be some 'object' flavor to some of the internal mechanisms used, it's still basically just a file system that distributes files across an expandible set of storage servers... To realize a "true" OSD (as defined by Intel, et al), I hadn't noticed that Intel had presumed to 'define' what an OSD was. is it the case that the physical drives themselves in such a configuration (the drive manufacturer's firmware) and/or disk controllers must posses the capability of "handling" the Object Storage model, along with correspondingly aligned "filesystem" (objsystem) software systems? That would, I suspect, depend upon exactly whose definition of what an OSD was you used. I don't know the Panasas product offering(s), but I'm curious as to how far they go toward the idealized OBS/OSD model, And that would as well. and to what degree they might have [dis]advantages vis-a-vis Lustre and something like Sistina's GFS (the latter may be a stretch, but I'm trying to understand if an OSD model could be imposed on something like a GFS as well). The current definitions of an OSD that I'm aware of state that the 'device' (which if undefined could be something that many people would call a 'server' of some kind rather than a disk) should encapsulate the physical placement of data on the disk(s) such that it can be addressed externally by object-identifier/offset/length triples. This allows the device to make optimization decisions on its own about placement (IIRC HP's AutoRAID product had some primitive capabilities in this area, but did not present the result in the form of 'objects'). The 'object' may also support (possibly extendible) 'attributes'. One problem with this approach is that rather than simply moving complexity from something like a file 'inode' from one place (the higher-level software) to another ( the OSD), it adds a layer - since files can exceed the size of the OSD and hence must be able to span multiple OSDs. So there's still metadata in the 'inode', plus more at the OSD - and more stages of metadata introduce more potential for requiring additional random accesses before you can get to the data you want (not to mention adding more points at which any *changes* in metadata must be reliably stored before a modification operation can be considered to be 'stable' - though suitable introduction of NVRAM at these points can help mitigate any additional latency that this might cause). Another problem is that a high-level view of a file as an 'object' often includes any storage redundancy used to store that file, whereas individual 'objects' at an OSD by definition do not encapsulate redundancy across multiple OSDs (hence for this reason as well a 'file' cannot necessarily map 1:1 to an OSD object). A third issue is that of locking granularity, which also can span OSDs when a file does (Garth had some early problems with this). OTOH, decentralizing this metadata can help support increased scalability in large systems (where many clients may be able to interrogate many OSDs directly without going through a single 'inode server' as a bottleneck - though there are other ways to accomplish this as well, such as client-caching some or all of the metadata when it doesn't change all that often). IMO, the industry is still groping around for something approaching an 'ideal' decomposition of work in distributed file systems. The SNIA includes disk vendors who would *love* to find ways to standardize what an 'OSD' is so that they could build added value into their products and price them accordingly, but it's just not yet clear what that standard should be - or even that a disk is the right level for *any* such standardization (and if it's not, whether standardization at a higher - storage server - level makes sense, rather than just accepting that the possibilities there are sufficiently rich that proprietary solutions should be allowed to bloom to address varying needs). My own current views are that 1) OSDs don't make a great deal of sense when talking about block-level storage (i.e., they don't add much value to that idiom), 2) OSDs don't make a great deal of sense as higher-than-block-level devices attached to single hosts (since they don't off-load sufficient work to be very interesting: whatever you off-load tends to be balanced by the additional host interface complexity required to control it as closely as a host may often wish to do), and 3) *standardized* OSDs don't make much sense in distributed systems, since they unnecessarily constrain creativity in the design of those systems (i.e., such systems are sufficiently complex that no single decomposition is even close to ideal for all circumstances - and no current decomposition that I've seen is close to ideal even for many common ones). - bill |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I want to build a 2.8TB storage array | Yeechang Lee | Homebuilt PC's | 21 | January 12th 05 01:00 AM |
Cost of DVD as data storage versus HDD (UK) | David X | Cdr | 136 | December 7th 04 02:46 PM |
31 Terabytes of storage? | Terry Wilson | Homebuilt PC's | 10 | August 5th 04 10:10 PM |
Windows Clustering with Compaq F2 Storage array | JB Lee | Compaq Servers | 1 | July 16th 04 11:50 PM |
Terabyte Storage By Real-Storage | Real-Storage | Storage & Hardrives | 2 | October 23rd 03 04:18 PM |