Roadrunner Supercomputer using 12,960 CELL Processors Hits 1 PetaFlop(1000 TeraFlops) of double-precision FP Performance

**[email protected]**

On Jun 11, 6:10 pm, Robert Myers wrote:
On Jun 11, 5:22 am, wrote:

I'd agree with mr deo.
Roadrunner interconnects look like a big step backward from other PR-
heavy American supercomputers.

http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode...

The predicted worst-case latency is about the same as Blue-Gene. Red
Storm routing/switching looks like Blue Gene. Columbia uses both
Infiniband and Numalink in a fat tree like Roadrunner.

My understanding of p.10 is that 2-way latency between SPEs on two
neighbor triblades is ~8 usec i.e. about the same as node-to-node
latency in the whole 65K-node machine.
I didn't find any estimates for worst-case latency in the 10K-node
configuration. My personal uneducated guess - 10 times worse than BG/L
in the worst case and 5 time worse in average loaded case.

**Robert Myers**

On Jun 11, 8:03 pm, wrote:
On Jun 11, 6:10 pm, Robert Myers wrote:

On Jun 11, 5:22 am, wrote:

I'd agree with mr deo.
Roadrunner interconnects look like a big step backward from other PR-
heavy American supercomputers.

http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode...

The predicted worst-case latency is about the same as Blue-Gene. Red
Storm routing/switching looks like Blue Gene. Columbia uses both
Infiniband and Numalink in a fat tree like Roadrunner.

My understanding of p.10 is that 2-way latency between SPEs on two
neighbor triblades is ~8 usec i.e. about the same as node-to-node
latency in the whole 65K-node machine.
I didn't find any estimates for worst-case latency in the 10K-node
configuration. My personal uneducated guess - 10 times worse than BG/L
in the worst case and 5 time worse in average loaded case.

I took the "worst case 2-way Infiniband" latency to be the worst case
for the mesh fabric. The advertised worst case for one version of
Blue Gene was, I think, 5 microseconds. The latency between the
Opteron and the Cell Processor is another matter. In any case, it has
nothing to do with the mesh fabric.

BG/L will do fine on some kinds of problems. Just not ones requiring
significant global communication. That was my beef about Blue Gene
and Red Storm.

Robert.

**[email protected]**

On Jun 12, 3:43 am, Robert Myers wrote:
On Jun 11, 8:03 pm, wrote:

On Jun 11, 6:10 pm, Robert Myers wrote:

On Jun 11, 5:22 am, wrote:

I'd agree with mr deo.
Roadrunner interconnects look like a big step backward from other PR-
heavy American supercomputers.

http://www.lanl.gov/orgs/hpc/roadrun...0-%20RR%20Mode...

The predicted worst-case latency is about the same as Blue-Gene. Red
Storm routing/switching looks like Blue Gene. Columbia uses both
Infiniband and Numalink in a fat tree like Roadrunner.

My understanding of p.10 is that 2-way latency between SPEs on two
neighbor triblades is ~8 usec i.e. about the same as node-to-node
latency in the whole 65K-node machine.
I didn't find any estimates for worst-case latency in the 10K-node
configuration. My personal uneducated guess - 10 times worse than BG/L
in the worst case and 5 time worse in average loaded case.

I took the "worst case 2-way Infiniband" latency to be the worst case
for the mesh fabric.

I read it as a latency within triblade that does not include fabric.

The advertised worst case for one version of
Blue Gene was, I think, 5 microseconds.

Full original version was closer to 9 microseconds. Since the
currently the machine is almost twice bigger than it was back then I'd
guess that today they are at 10 microseconds.

The latency between the
Opteron and the Cell Processor is another matter. In any case, it has
nothing to do with the mesh fabric.

That's the point. I saw nothing about latency/bandwidth
characteristics of the IB switches used in the roadrunner.

BG/L will do fine on some kinds of problems. Just not ones requiring
significant global communication. That was my beef about Blue Gene
and Red Storm.

It depends on the kind of communication.
If the communication consist mostly of small bi-directional messages
these machines seem to be much better than anything else in existence.
For the large bandwidth-bound messages on BG/L the picture is less
rosy. I didn't see numbers for Crays XTn of comparable size (not sure
they exist); in theory they should be significantly better than BG/L.
However, I do not see why the roadrunner should be any better for
bandwidth-bound global communication. If anything, I expect it to do
worse, esp. in the worst case.

Robert.

**Andrew Reilly**

On Wed, 11 Jun 2008 07:53:31 -0700, Neal wrote:

Though the CELL may prove more useful, it still has some serious arch
issues that a lot of people don't like. Programming the thing is not the
easiest thing in the world to do (though parallel programming models for
CMPs are somewhat of an open issue). It's nice that they will continue
to push performance, but many GPUs will be well above 1TFLOPS SPFP by
2010..

How reasonable is it to complain about the ease of programming the CELL,
and in the very next sentence, go on about GPUs?

--
Andrew

**Neal**

On Jun 12, 1:45 am, Andrew Reilly -
users.org wrote:
On Wed, 11 Jun 2008 07:53:31 -0700, Neal wrote:
Though the CELL may prove more useful, it still has some serious arch
issues that a lot of people don't like. Programming the thing is not the
easiest thing in the world to do (though parallel programming models for
CMPs are somewhat of an open issue). It's nice that they will continue
to push performance, but many GPUs will be well above 1TFLOPS SPFP by
2010..

How reasonable is it to complain about the ease of programming the CELL,
and in the very next sentence, go on about GPUs?

--
Andrew

It's a pretty reasonable assumption that given the recent and
significant architectural changes of GPUs that they will continue
becoming more general purpose and programmer friendly. I prolly don't
need to mention the name Larrabee. I've programmed for CELL, and I've
programmed in CUDA... my personal opinion is that GPUs in 2010 overall
will be more appealling. That's IMO, and of course opinions can be
debated and may one day be found wrong.

**Robert Myers**

On Jun 12, 3:54 am, wrote:
On Jun 12, 3:43 am, Robert Myers wrote:

BG/L will do fine on some kinds of problems. Just not ones requiring
significant global communication. That was my beef about Blue Gene
and Red Storm.

It depends on the kind of communication.
If the communication consist mostly of small bi-directional messages
these machines seem to be much better than anything else in existence.
For the large bandwidth-bound messages on BG/L the picture is less
rosy. I didn't see numbers for Crays XTn of comparable size (not sure
they exist); in theory they should be significantly better than BG/L.
However, I do not see why the roadrunner should be any better for
bandwidth-bound global communication. If anything, I expect it to do
worse, esp. in the worst case.

Well, actually, it looks like you're right. I calculated the
bisection of bandwidth of BG/L in the tens of millibytes/flop (because
of the geometry and large mesh), and it turns out that the bisection
bandwidth of Roadrunner could be no better than 10 millibytes/flop,
because of the number of flops loaded onto a single node by way of the
Cell processors (about 100 gigaflop DP per node) connected by a
infiniband interconnect (about 1 gigabyte per second) that is wimpy by
comparison. :-(

Using the fat-tree topology addresses the scalability of bisection
bandwidth, but the individual links aren't properly scaled to the
nodes.

Robert.

**Zootal**

I wonder what they really do with these computers.

I find it unlikely they still don't know how nuclear weapons work,
especially considering they're mature technology and they've been around
for decades.

There are no nuclear weapons. It's a huge conspiracy. The fact is, we use
these computers to model human behavior to learn how to keep the ignorant
masses under our control. Have you noticed lately funny noises when you talk
on the phone? It's because we are monitoring you. You are close to learning
our secret, and we are keeping an eye on you.

**Robert Myers**

On Jun 11, 3:36 pm, Cydrome Leader wrote:
..

I wonder what they really do with these computers.

I find it unlikely they still don't know how nuclear weapons work,
especially considering they're mature technology and they've been around
for decades.

Try these google searchs: ~simulation site:lanl.gov
~simulation site:llnl.gov

I get over 30000 page hits. If you surf around and even ponder how to
zero in on the nuclear weapons work, you will have your answer.

Question: We've discovered a warehouse full of Viet Nam-era artillery
shells. Should we just ship them to a war zone, or count on them
working in case of war? I mean, we *do* know how artillery shells
work, don't we?

Robert.

**[email protected]**

On Jun 11, 7:53 am, Neal wrote:
On Jun 11, 7:32 am, Air Raid wrote:

On Jun 11, 4:32 am, wrote:

On the positive note, successful testing of the Roadrunner means that
IBM has the ability to manufacture a new variant of Cell with fully-
pipelined double-precision FPU in production quantity.
IBM web site indicates that a new engine is available to mere mortals:http://www-03.ibm.com/systems/bladec...ers/qs22/index...

Interesting thing about the IBM PowerXCell 8i Processor is that it
offers 4 to 5 (IBM says 5) times the double precision FP performance
of the original Cell Processor.

Depending on various factors such as having 7 or 8 SPEs active,
counting the PPE or not counting it, and clockspeed, the original CELL
could manage 218 to 256 to just under 300 GFLOPs of single precision
FP. When double precision is needed performance drops massively, down
to around 25 GFLOPs.

The IBM PowerXCell 8i is said to be capable of over 100 GFLOPs
double precision. That's a huge increase without adding more SPEs or
upping clockspeed.

PowerXCell 8i cannot be considered a next-generation CELL, only an
enhanced first-gen CELL.

IBM plans to put 32 SPEs on the next-gen CELL to hit 1 TFLOP (single
precision I would imagine) in a single chip by 2010. There was also
an official roadmap that showed a CELL with 64 SPEs on a process
smaller than 45nm (be it 32nm, 22nm, I don't know). I posted about
both in the past.

It's clear that the IBM-Toshiba-Sony CELL is proving to be much more
useful beyond PS3 than the Sony-Toshiba 'Emotion Engine' ever was,
which really had no use outside of PS2 and cheap, home-made university
"supercomputers' such as the one using 60 or 70 PS2s at UIC in IL.

Roadrunner is serious stuff, and it's only the beginning. In the next
decade we'll see more powerful supercomputers using next-gen CELLs.

Though the CELL may prove more useful, it still has some serious arch
issues that a lot of people don't like. Programming the thing is not
the easiest thing in the world to do (though parallel programming
models for CMPs are somewhat of an open issue). It's nice that they
will continue to push performance, but many GPUs will be well above
1TFLOPS SPFP by 2010... which means that obviously Larrabee will be
out then. While I'm glad that CELL came out as it's enlightened the
world and solved several of the multicore integration problems, it
simply doesn't seem like the chip of the future right now. I simply
haven't heard a whole lot of interest from the HPC community on CELL,
but you never know... I could be wrong.

AMD's RV770 GPU coming out this month is already at 1 TFLOP and the
R700 product with two RV770 GPUs on a single card (4870 X2) which is
due out this August or September should be around 2 TFLOP. Of
course this is a GPU (or GPGPUs) and is not as programmable as
CELL.

The Larrabee should however, change that. I think Larrabee and
anything like it, with a manycore architecture (beyond multicore) is
the future. It'll be interesting to see how the next-gen CELL
compares to Larrabee.

**Ken Hagan[_2_]**

On Thu, 12 Jun 2008 19:57:43 +0100, Zootal
wrote:

There are no nuclear weapons. It's a huge conspiracy. The fact is, we use
these computers to model human behavior to learn how to keep the ignorant
masses under our control. Have you noticed lately funny noises when you
talk on the phone? It's because we are monitoring you. You are close to
learning our secret, and we are keeping an eye on you.

Yes, Zootal, you and your co-conspirators are secretly in charge of
everthing. Or at least that's what *they* want you to think...

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	[email protected]	AMD x86-64 Processors	27	September 15th 06 01:22 PM
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	AirRaid	General	1	September 8th 06 09:48 PM
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	AirRaid	AMD x86-64 Processors	1	September 8th 06 09:48 PM
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	[email protected]	General	0	September 6th 06 02:00 AM
IBM to build Opteron-Cell hybrid supercomputer of 1 PetaFlop performance	[email protected]	AMD x86-64 Processors	0	September 6th 06 02:00 AM