If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 11, 6:10 pm, Robert Myers wrote:
On Jun 11, 5:22 am, wrote: I'd agree with mr deo. Roadrunner interconnects look like a big step backward from other PR- heavy American supercomputers. http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode... The predicted worst-case latency is about the same as Blue-Gene. Red Storm routing/switching looks like Blue Gene. Columbia uses both Infiniband and Numalink in a fat tree like Roadrunner. My understanding of p.10 is that 2-way latency between SPEs on two neighbor triblades is ~8 usec i.e. about the same as node-to-node latency in the whole 65K-node machine. I didn't find any estimates for worst-case latency in the 10K-node configuration. My personal uneducated guess - 10 times worse than BG/L in the worst case and 5 time worse in average loaded case. |
#12
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 11, 8:03 pm, wrote:
On Jun 11, 6:10 pm, Robert Myers wrote: On Jun 11, 5:22 am, wrote: I'd agree with mr deo. Roadrunner interconnects look like a big step backward from other PR- heavy American supercomputers. http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode... The predicted worst-case latency is about the same as Blue-Gene. Red Storm routing/switching looks like Blue Gene. Columbia uses both Infiniband and Numalink in a fat tree like Roadrunner. My understanding of p.10 is that 2-way latency between SPEs on two neighbor triblades is ~8 usec i.e. about the same as node-to-node latency in the whole 65K-node machine. I didn't find any estimates for worst-case latency in the 10K-node configuration. My personal uneducated guess - 10 times worse than BG/L in the worst case and 5 time worse in average loaded case. I took the "worst case 2-way Infiniband" latency to be the worst case for the mesh fabric. The advertised worst case for one version of Blue Gene was, I think, 5 microseconds. The latency between the Opteron and the Cell Processor is another matter. In any case, it has nothing to do with the mesh fabric. BG/L will do fine on some kinds of problems. Just not ones requiring significant global communication. That was my beef about Blue Gene and Red Storm. Robert. |
#13
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 12, 3:43 am, Robert Myers wrote:
On Jun 11, 8:03 pm, wrote: On Jun 11, 6:10 pm, Robert Myers wrote: On Jun 11, 5:22 am, wrote: I'd agree with mr deo. Roadrunner interconnects look like a big step backward from other PR- heavy American supercomputers. http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode... The predicted worst-case latency is about the same as Blue-Gene. Red Storm routing/switching looks like Blue Gene. Columbia uses both Infiniband and Numalink in a fat tree like Roadrunner. My understanding of p.10 is that 2-way latency between SPEs on two neighbor triblades is ~8 usec i.e. about the same as node-to-node latency in the whole 65K-node machine. I didn't find any estimates for worst-case latency in the 10K-node configuration. My personal uneducated guess - 10 times worse than BG/L in the worst case and 5 time worse in average loaded case. I took the "worst case 2-way Infiniband" latency to be the worst case for the mesh fabric. I read it as a latency within triblade that does not include fabric. The advertised worst case for one version of Blue Gene was, I think, 5 microseconds. Full original version was closer to 9 microseconds. Since the currently the machine is almost twice bigger than it was back then I'd guess that today they are at 10 microseconds. The latency between the Opteron and the Cell Processor is another matter. In any case, it has nothing to do with the mesh fabric. That's the point. I saw nothing about latency/bandwidth characteristics of the IB switches used in the roadrunner. BG/L will do fine on some kinds of problems. Just not ones requiring significant global communication. That was my beef about Blue Gene and Red Storm. It depends on the kind of communication. If the communication consist mostly of small bi-directional messages these machines seem to be much better than anything else in existence. For the large bandwidth-bound messages on BG/L the picture is less rosy. I didn't see numbers for Crays XTn of comparable size (not sure they exist); in theory they should be significantly better than BG/L. However, I do not see why the roadrunner should be any better for bandwidth-bound global communication. If anything, I expect it to do worse, esp. in the worst case. Robert. |
#14
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Wed, 11 Jun 2008 07:53:31 -0700, Neal wrote:
Though the CELL may prove more useful, it still has some serious arch issues that a lot of people don't like. Programming the thing is not the easiest thing in the world to do (though parallel programming models for CMPs are somewhat of an open issue). It's nice that they will continue to push performance, but many GPUs will be well above 1TFLOPS SPFP by 2010.. How reasonable is it to complain about the ease of programming the CELL, and in the very next sentence, go on about GPUs? -- Andrew |
#15
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 12, 1:45 am, Andrew Reilly -
users.org wrote: On Wed, 11 Jun 2008 07:53:31 -0700, Neal wrote: Though the CELL may prove more useful, it still has some serious arch issues that a lot of people don't like. Programming the thing is not the easiest thing in the world to do (though parallel programming models for CMPs are somewhat of an open issue). It's nice that they will continue to push performance, but many GPUs will be well above 1TFLOPS SPFP by 2010.. How reasonable is it to complain about the ease of programming the CELL, and in the very next sentence, go on about GPUs? -- Andrew It's a pretty reasonable assumption that given the recent and significant architectural changes of GPUs that they will continue becoming more general purpose and programmer friendly. I prolly don't need to mention the name Larrabee. I've programmed for CELL, and I've programmed in CUDA... my personal opinion is that GPUs in 2010 overall will be more appealling. That's IMO, and of course opinions can be debated and may one day be found wrong. |
#16
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 12, 3:54 am, wrote:
On Jun 12, 3:43 am, Robert Myers wrote: BG/L will do fine on some kinds of problems. Just not ones requiring significant global communication. That was my beef about Blue Gene and Red Storm. It depends on the kind of communication. If the communication consist mostly of small bi-directional messages these machines seem to be much better than anything else in existence. For the large bandwidth-bound messages on BG/L the picture is less rosy. I didn't see numbers for Crays XTn of comparable size (not sure they exist); in theory they should be significantly better than BG/L. However, I do not see why the roadrunner should be any better for bandwidth-bound global communication. If anything, I expect it to do worse, esp. in the worst case. Well, actually, it looks like you're right. I calculated the bisection of bandwidth of BG/L in the tens of millibytes/flop (because of the geometry and large mesh), and it turns out that the bisection bandwidth of Roadrunner could be no better than 10 millibytes/flop, because of the number of flops loaded onto a single node by way of the Cell processors (about 100 gigaflop DP per node) connected by a infiniband interconnect (about 1 gigabyte per second) that is wimpy by comparison. :-( Using the fat-tree topology addresses the scalability of bisection bandwidth, but the individual links aren't properly scaled to the nodes. Robert. |
#17
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1 PetaFlop ?(1000 TeraFlops) of double-precision FP Performance
I wonder what they really do with these computers. I find it unlikely they still don't know how nuclear weapons work, especially considering they're mature technology and they've been around for decades. There are no nuclear weapons. It's a huge conspiracy. The fact is, we use these computers to model human behavior to learn how to keep the ignorant masses under our control. Have you noticed lately funny noises when you talk on the phone? It's because we are monitoring you. You are close to learning our secret, and we are keeping an eye on you. |
#18
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop ?(1000 TeraFlops) of double-precision FP Performance
On Jun 11, 3:36 pm, Cydrome Leader wrote:
.. I wonder what they really do with these computers. I find it unlikely they still don't know how nuclear weapons work, especially considering they're mature technology and they've been around for decades. Try these google searchs: ~simulation site:lanl.gov ~simulation site:llnl.gov I get over 30000 page hits. If you surf around and even ponder how to zero in on the nuclear weapons work, you will have your answer. Question: We've discovered a warehouse full of Viet Nam-era artillery shells. Should we just ship them to a war zone, or count on them working in case of war? I mean, we *do* know how artillery shells work, don't we? Robert. |
#19
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1PetaFlop (1000 TeraFlops) of double-precision FP Performance
On Jun 11, 7:53 am, Neal wrote:
On Jun 11, 7:32 am, Air Raid wrote: On Jun 11, 4:32 am, wrote: On the positive note, successful testing of the Roadrunner means that IBM has the ability to manufacture a new variant of Cell with fully- pipelined double-precision FPU in production quantity. IBM web site indicates that a new engine is available to mere mortals:http://www-03.ibm.com/systems/bladec...ers/qs22/index... Interesting thing about the IBM PowerXCell 8i Processor is that it offers 4 to 5 (IBM says 5) times the double precision FP performance of the original Cell Processor. Depending on various factors such as having 7 or 8 SPEs active, counting the PPE or not counting it, and clockspeed, the original CELL could manage 218 to 256 to just under 300 GFLOPs of single precision FP. When double precision is needed performance drops massively, down to around 25 GFLOPs. The IBM PowerXCell 8i is said to be capable of over 100 GFLOPs double precision. That's a huge increase without adding more SPEs or upping clockspeed. PowerXCell 8i cannot be considered a next-generation CELL, only an enhanced first-gen CELL. IBM plans to put 32 SPEs on the next-gen CELL to hit 1 TFLOP (single precision I would imagine) in a single chip by 2010. There was also an official roadmap that showed a CELL with 64 SPEs on a process smaller than 45nm (be it 32nm, 22nm, I don't know). I posted about both in the past. It's clear that the IBM-Toshiba-Sony CELL is proving to be much more useful beyond PS3 than the Sony-Toshiba 'Emotion Engine' ever was, which really had no use outside of PS2 and cheap, home-made university "supercomputers' such as the one using 60 or 70 PS2s at UIC in IL. Roadrunner is serious stuff, and it's only the beginning. In the next decade we'll see more powerful supercomputers using next-gen CELLs. Though the CELL may prove more useful, it still has some serious arch issues that a lot of people don't like. Programming the thing is not the easiest thing in the world to do (though parallel programming models for CMPs are somewhat of an open issue). It's nice that they will continue to push performance, but many GPUs will be well above 1TFLOPS SPFP by 2010... which means that obviously Larrabee will be out then. While I'm glad that CELL came out as it's enlightened the world and solved several of the multicore integration problems, it simply doesn't seem like the chip of the future right now. I simply haven't heard a whole lot of interest from the HPC community on CELL, but you never know... I could be wrong. AMD's RV770 GPU coming out this month is already at 1 TFLOP and the R700 product with two RV770 GPUs on a single card (4870 X2) which is due out this August or September should be around 2 TFLOP. Of course this is a GPU (or GPGPUs) and is not as programmable as CELL. The Larrabee should however, change that. I think Larrabee and anything like it, with a manycore architecture (beyond multicore) is the future. It'll be interesting to see how the next-gen CELL compares to Larrabee. |
#20
|
|||
|
|||
Roadrunner Supercomputer using 12,960 CELL Processors Hits 1 PetaFlop ?(1000 TeraFlops) of double-precision FP Performance
On Thu, 12 Jun 2008 19:57:43 +0100, Zootal
wrote: There are no nuclear weapons. It's a huge conspiracy. The fact is, we use these computers to model human behavior to learn how to keep the ignorant masses under our control. Have you noticed lately funny noises when you talk on the phone? It's because we are monitoring you. You are close to learning our secret, and we are keeping an eye on you. Yes, Zootal, you and your co-conspirators are secretly in charge of everthing. Or at least that's what *they* want you to think... |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance | [email protected] | AMD x86-64 Processors | 27 | September 15th 06 01:22 PM |
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance | AirRaid | General | 1 | September 8th 06 09:48 PM |
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance | AirRaid | AMD x86-64 Processors | 1 | September 8th 06 09:48 PM |
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance | [email protected] | General | 0 | September 6th 06 02:00 AM |
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance | [email protected] | AMD x86-64 Processors | 0 | September 6th 06 02:00 AM |