If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Cleversafe
Has anyone looked at/used Cleversafe for anything serious?
http://www.cleversafe.org/ Thanks, -Wendell -- |
#2
|
|||
|
|||
Cleversafe
Previously Wendell III wrote:
Has anyone looked at/used Cleversafe for anything serious? http://www.cleversafe.org/ Thanks, -Wendell -- It has the usual problems of distributes, unmanaged storage: - You need to contribute twice as much as you get, bot in storage space and in bandwidth. - You need to have enough upstream bandwith. - 6 out of 11 is not too reliable, especially for longer-term storage. - Not clear how long this will stay operational. I used to do a bit of research in the area, but concluded that the idea, while seemingly attractive, does not work well and does not make economic sense. This is one of these stupid things that results when people take several concepts, here Internet and storage, and try to merge them at all cost. Arno |
#3
|
|||
|
|||
Cleversafe
Arno Wagner;711472 Wrote: Previously Wendell III wrote:- Has anyone looked at/used Cleversafe for anything serious?- - http://www.cleversafe.org/- - Thanks, -Wendell -- - It has the usual problems of distributes, unmanaged storage: - You need to contribute twice as much as you get, bot in storage space and in bandwidth. - You need to have enough upstream bandwith. - 6 out of 11 is not too reliable, especially for longer-term storage. - Not clear how long this will stay operational. I used to do a bit of research in the area, but concluded that the idea, while seemingly attractive, does not work well and does not make economic sense. This is one of these stupid things that results when people take several concepts, here Internet and storage, and try to merge them at all cost. Arno A couple of comments: The 11 lose 5 scenario means that you'd need to sustain 5 simultaneous node failures to lose any data. The odds of that happening are slim - in fact, it creates a "twelve 9" availability situation -- far more reliable than any other storage solution today. The blowup - i.e., the amount of total storage needed relative to the original data set - is 2.1x the original data set size in the 11 lose 5 scenario. While this might seem high, it's significantly less than the number of copies of data that companies typically make to ensure that their data is available when they want to access it. It's generally accepted that high availability environments create 4-10x the original data set size. And, we actually are working on a yet-to-be-released version which will reduce the blowup to ~1.3x. Finally, with people and companies wanting to keep data around for a long time on secure, cost-effective storage solutions that are accessible and don't degrade (like tape), information dispersal is far and away the best solution. -- PlanetRudy |
#4
|
|||
|
|||
Cleversafe
Previously PlanetRudy wrote:
Arno Wagner;711472 Wrote: Previously Wendell III wrote:- Has anyone looked at/used Cleversafe for anything serious?- - http://www.cleversafe.org/- - Thanks, -Wendell -- - It has the usual problems of distributes, unmanaged storage: - You need to contribute twice as much as you get, bot in storage space and in bandwidth. - You need to have enough upstream bandwith. - 6 out of 11 is not too reliable, especially for longer-term storage. - Not clear how long this will stay operational. I used to do a bit of research in the area, but concluded that the idea, while seemingly attractive, does not work well and does not make economic sense. This is one of these stupid things that results when people take several concepts, here Internet and storage, and try to merge them at all cost. Arno A couple of comments: The 11 lose 5 scenario means that you'd need to sustain 5 simultaneous node failures to lose any data. The odds of that happening are slim - in fact, it creates a "twelve 9" availability situation -- far more reliable than any other storage solution today. You are overlooking the time factor. Due to bandwidth and storage space limitations, re-replication of data can take significant time. And different from traditional storage media, you have absolutely no hard numbers on reliability, instead you need to make wild guesses about your user population behaviour. The blowup - i.e., the amount of total storage needed relative to the original data set - is 2.1x the original data set size in the 11 lose 5 scenario. While this might seem high, it's significantly less than the number of copies of data that companies typically make to ensure that their data is available when they want to access it. It's generally accepted that high availability environments create 4-10x the original data set size. This solution is not high-availability. You can have lots of temporary failures from PCs that are not running, laptops that do not have Internet connectivity, etc.. Also you get the same blowup (more if re-replication is a frequent event) in network bandwidth usage. And, we actually are working on a yet-to-be-released version which will reduce the blowup to ~1.3x. This sounds to good to be true without some major drawback hidden in it. Finally, with people and companies wanting to keep data around for a long time on secure, cost-effective storage solutions that are accessible and don't degrade (like tape), information dispersal is far and away the best solution. I strongly disagree. True, this is usually quoted as advantage of this type of system. But it is bogus: Instead of needing to monitor tape or MOD degradation (which is well understood and has basically no risks), the user now needs to monitor the state of you network. If your users leave in significant numbers, the remaining users will be in trouble. This is an entierly unquantifyable risk compared to the well understood risk of tape, MOD or other traditional archival media solutions. As I asid, intuitively this is intriguing. But if you look at the numbers it turns out that traditional in-house or external archival storage has risks that are well understood and quantifiable. This system is a wild card with not well understood risks and it can have risks that are entriely non-obvious. You might get lucky or you might not. And then, traditional archival storage is not that expensive. If done in-house it also does not have the bandwidth problem. Personally I think this is nice to play around with, but only a fool would depend on it. Also it is unusable for larger amounts of data. If you store larger amounts of data, you get completely unrealistic numbers of users that need to participate in this long-term. Arno |
#5
|
|||
|
|||
Cleversafe
This solution is not high-availability. You can have lots of temporary failures from PCs that are not running, laptops that do not have Internet connectivity, etc.. Arno, your comments seem to be assuming that we are designing Dispersed Storage to be hosted on devices like laptops that come and go. This is not the focus of the initial release. Laptops and home PCs can be clients for a Dispersed Storage grid, but they are not the focus type of server. The initial focus for servers for Cleversafe Dispersed Storage grids are hosted servers whose availability would typically be around 99.9%. Hosting a Dispersed Storage grid on this class of servers results in extremely available and reliable storage. And, we actually are working on a yet-to-be-released version which will reduce the blowup to ~1.3x. This sounds to good to be true without some major drawback hidden in it. In order to realize a blowup of 1.3 (i.e. a storage overhead of 30%), we are using methods like Reed-Solomon coding to get that level of overhead at extremely high levels of reliability. These methods have been around for decades and are widely used in communications. Personally I think this is nice to play around with, but only a fool would depend on it. Also it is unusable for larger amounts of data. If you store larger amounts of data, you get completely unrealistic numbers of users that need to participate in this long-term. Dispersed Storage was NOT is not being designed to be hosted on a federation of low availability devices, like laptops and home PCs. Dispersed Storage IS designed for a hosting model like that of the Internet. The Internet uses an open protocol -- TCP/IP, but is typically provided as a commercial service by ISPs who use highly available devices -- hosted routers -- to provide an inter-networking service. Some larger organizations also host their own routers to provide internal networking services. Cleversafe Dispersed Storage is designed for a model like the Internet where a variety of companies like ISPs and hosting companies will offer storage as a service using highly available devices -- storage servers. In addition, some larger organizations will also use Dispersed Storage to host their own storage services. You can also use the open Dispersed Storage protocol to create a non-commercial Dispersed Storage grid. When building a Dispersed Storage grid, we'd recommend you use servers to build that grid, just like you'd want to use highly available routers if you were building your own Internet. Perhaps one day, you'll see mesh communications networks and mesh storage networks built on very low reliability devices like laptops that come and go a lot, but that is not the initial focus of Cleversafe Dispersed Storage. Regards, Chris Gladwin |
#6
|
|||
|
|||
Cleversafe
|
Thread Tools | |
Display Modes | |
|
|