If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Scaling storage performance and capacity horizontally?
Hi
I have a question about storage management, or disk load-balancing, or.. I'm not sure. I am having trouble working out how to scale storage horizontally. We have a 16x320GB SATA-II RAID-5 array on our internet download server, which offers a mix of big and small files, (e.g. ISO images, software patches, etc). We often comfortably serve 1000-2000 concurrent clients. However, this is only when the majority of downloads are small files ( = 1MB). Whenever we release a new ISO image, or even something just 100MB, the server starts to struggle.. even with only half our usual amount of downloaders. I am guessing this is because the slower clients are unable to take advantage of reading the files sequentially, and as such create more of a random IO situation, and we are maxing out IOPS? I can alleviate the problem right now by just buying another RAID-5 array and PC and doing DNS round robin, but then I am maintaining two copies of our data and it just doesn't really feel like the optimal solution. I am wondering how people usually scale their storage, for both capacity and IOPS, whilst keeping it easy to manage. I guess this requires either some kind of fancy management software, or perhaps using something like iSCSI with commodity hardware? I am imagining a horizontally scalable storage architecture to which I can simply add/remove hardware as our client-base changes in size. It has one global namespace where I place the file once and then the storage management system works out how to distribute it across spindles/controllers to achieve the best performance -- if a file is being requested far more than any other, I guess it would be replicated more. Does something like this exist, or perhaps I am going about it the wrong way? Any advice would be greatly appreciated! |
#2
|
|||
|
|||
Scaling storage performance and capacity horizontally?
|
#3
|
|||
|
|||
Scaling storage performance and capacity horizontally?
|
#4
|
|||
|
|||
Scaling storage performance and capacity horizontally?
Bill Todd wrote:
wrote: ... We often comfortably serve 1000-2000 concurrent clients. However, this is only when the majority of downloads are small files ( = 1MB). Whenever we release a new ISO image, or even something just 100MB, the server starts to struggle.. even with only half our usual amount of downloaders. I am guessing this is because the slower clients are unable to take advantage of reading the files sequentially, and as such create more of a random IO situation, and we are maxing out IOPS? Unlikely, since the hot file is probably getting loaded into the array RAM (or the individual on-disk caches) and being served from there, regardless of how fast the individual clients can inhale it. Bandwidth (at the array interface or in the network) is the far more likely bottleneck. Ok this makes perfect sense. But I know our Gigabit ethernet can push harder, and from what I can tell the SAS array interface should outperform that. When the server starts to struggle, I do see 100% utilization in iostat(1) output for the array though, with numbers something like this: r/s 500 rsec/s 280000 rkB/s 140000 avgrq-sz 500 avgqu-sz 10-20 await 100-200ms svctm 2-4ms %util 100 During this time I can still dd(1) a cold file to /dev/null from the array at around 100MiB/s, so I'm not entirely sure what to make of this. That is why I was thinking it had something to do with the slower clients. Is it possible that the problem is just system/controller RAM? And that the burst of new webserver connections for the popular file are taking cache memory away from the (many many) other less popular files still being requested, thus resulting in extra traffic to the disk? Is the 500 r/s quoted above going to be near the upper limit of 14x SATA-II 7200RPM NCQ disks? ... However, as noted above it doesn't sound as if you need anything like this to solve your current problem: This is the impression I am getting. I feel like I'm missing something painfully obvious, but I really don't know what it is. |
#6
|
|||
|
|||
Scaling storage performance and capacity horizontally?
First off, I'm not much of an expert on smallish storage systems. I
would feel more comfortable if the question were about a PB class system connected at many GBytes/s. In article .com, wrote: I have a question about storage management, or disk load-balancing, or.. I'm not sure. I think the key lies in the "I'm not sure". Do you know what your bottleneck really is? And even if you know your one smallest bottleneck, do you know what other system components are next in line to become bottlenecks? Until you know that, it's hard to improve the system. The iostat you posted is part of the answer: One component (namely the disk IO) seems to be 100% utilized; that is probably the bottleneck. So we can either give you a much more powerful disk IO and disk subsystem, or make it so the workload doesn't actually hit the disk. But let's go through a few numbers . You say your server is connected via a single GBit Ethernet. That means that you need to serve ONLY 100 MBytes/s of bandwidth. On a high-powered motherboard with enough CPU power (say for grins dual Opteron, each dual-core, meaning an investment of maybe $2000 for motherboard and chips), that should be trivial, even with the CPU load of your web server. Have you actually checked that you are not CPU starved? Whenever we release a new ISO image, or even something just 100MB, the server starts to struggle.. even with only half our usual amount of downloaders. Then you say that things go to hell when serving big files. Strange. Small files should present a much larger load *** if the outgoing total bandwidth is held constant ***, because the ratio of CPU- and disk-consuming metadata operations and seeks is much larger. But for both small and big files, the metadata and seek problem should be cured by adding cache. Have you tried this: Add a dozen or two GByte of memory to the server? That will make sure that nearly all the files that are being constantly served (for example a few ISO images) are in cache, and don't hit the disk array. This might be particularly important of you are using software RAID (which can consume huge amounts of CPU power for writes, huge amount of disk IO for writes smaller than a stripe, and huge amount of memory bandwidth for copying the data into a variety of IO buffers). Here are two other suggestions. You say you are using RAID-5 over 16 disks. Do you actually need the capacity advantage of RAID-5? If you are using software RAID, it might be a good idea to reconfigure your array for RAID-10 (meaning mirroring and concatenation of disks). This might greatly lower your CPU overhead. Second, your workload should be nearly completely read-only. Meaning that there should be no writes to the disks. Meaning that things should run like a bat out of hell even with RAID-5, because there are no disk updates (meaning ho read-modify-write cycles for sub-stripe updates). So maybe your problem is that you have a lot of disk writes going? Here is a suggestion: what file system are you using? Could it be that the file system is updating atime (last access time) whenever a client reads a file, and the thing that is killing you is not actually the reads, but the small writes that come from atime updates? Try this: mount your data read-only, or get your filesystem to disable atime updates. Now, if you are running some HSM or ILM software, the lack of atime updates might break it, so be a little careful. I am guessing this is because the slower clients are unable to take advantage of reading the files sequentially, and as such create more of a random IO situation, and we are maxing out IOPS? Shouldn't. Any good file system (which file system are you using?) should be prefetching enough data into the cache to make the disk IO sequential enough to get good efficiency. But do you have enough cache? I can alleviate the problem right now by just buying another RAID-5 array and PC and doing DNS round robin, but then I am maintaining two copies of our data and it just doesn't really feel like the optimal solution. If you want to double your disk capacity, I would rather go for RAID-10 instead, and stick to a single server (maybe by upgrading your server to a real big one). The management overhead of having to maintain two copies of the data, and the headaches when the two copies diverge (which they are guaranteed to, unless you are really careful) is probably worse than the hardware cost of throwing more iron at it. I am wondering how people usually scale their storage, for both capacity and IOPS, whilst keeping it easy to manage. I guess this requires either some kind of fancy management software, or perhaps using something like iSCSI with commodity hardware? How do people scale their storage? By throwing money at it. Lots of it. Your problem could be trivially solved by using high-end servers (spend a few M$ on IBM AIX or HP HP-UX hardware), a few M$ on a SAN (more Brocade gear than the police allows), a few M$ on storage servers (I like the high-end Hitachi, although IBM's Shark is also pretty good at feeding high-bandwidth workloads), and a lot of money AND time on management software (EMC's control center seems to be the 400lbs gorilla in that market, for better or for worse). For good measure, throw in a good cluster file system to make sure your data is consistent (see above about having two copies of the data); I'm quite partial towards GPFS, but then I'm highly biased, so take that advice with a grain of sale. By going to Panasas, LeftHand Networks, iBrix, Isalon and such you can get similar performance, for somewhat less money than Tier-1 gear. Look at the big Livermore system: a 2PB single file system, tens of thousands of disks, thousands of hosts, zillions of dollars. It can be done. Your problem is that you want to have mid-range performance using low-end dollars. This can be done, but is fraught with pitfalls. I am imagining a horizontally scalable storage architecture to which I can simply add/remove hardware as our client-base changes in size. You are dreaming the same dream as most researchers in this field. It has one global namespace Cluster or distributed file system gives you that. where I place the file once and then the storage management system works out how to distribute it across spindles/controllers to achieve the best performance -- if a file is being requested far more than any other, I guess it would be replicated more. The holy grail. Everyone wants it. Some systems are getting somewhat close to it. Panasas and LeftHand are probably the close commercial approxiamation to it today (and I'm probably ommiting many others that can do similar things). But going for interestingly intelligent self-managing systems is probaby total overkill for your situation. That's the kind of solution the likes of Goldman-Sachs and Morgan-Stanley may be toying with in their data centers, and the kind of thing that Google and Livermore are actually using in production. Your problem is much smaller scale, and can hopefully be solved without having to use heavyweight solutions. -- The address in the header is invalid for obvious reasons. Please reconstruct the address from the information below (look for _). Ralph Becker-Szendy |
#7
|
|||
|
|||
Scaling storage performance and capacity horizontally?
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Xbox360, PS3 : announcements of the death of PC gaming seem premature........ | mace | Nvidia Videocards | 10 | June 30th 05 08:25 PM |
Implementing a RAID System | Chris Guimbellot | General | 33 | February 3rd 05 09:43 AM |
PCI-E vs AGP | Ryan J. Paque | Nvidia Videocards | 30 | December 25th 04 03:32 PM |
15K rpm SCSI-disk | Ronny Mandal | General | 26 | December 8th 04 08:04 PM |
Performance after your hard drive reaches 50% capacity | Dan Irwin | Storage (alternative) | 11 | October 6th 04 06:02 PM |