View Single Post
  #1  
Old April 12th 07, 05:52 AM posted to comp.sys.super,comp.arch,comp.arch.embedded,comp.sys.ibm.pc.hardware.chips,comp.os.linux.advocacy
AirRaid
external usenet poster
 
Posts: 51
Default CELL 2 "Enhanced Cell Broadband Engine" to be revealed soon

http://www.ps3coderz.com/index.php?o...70&Itemi d=31

Cell2: Some Thoughts
Written by nblachford
Thursday, 12 April 2007

Next week we should see the first details of the second generation
Cell processor. Little has been said publicly to date about it, but
knowing what we do we can attempt to figure out what's planned.

What we do know:
Cell 2, or properly "Enhanced Cell Broadband Engine" has full speed
Double Precision floating point units.

16,000 of them are due to be used alongside Opterons in the
forthcoming Roadrunner Supercomputer, the first slated to reach a
PetaFLOP of computing power.

They are due to ship in a blade with up to 16 GigaBytes of RAM,
probably sometime in 2008.

Pedicting Performance
No figures have been published as yet but there are ways of working
out expected performance.

One of these is a dead give-away - the recently released SDK 2.1
incorporates a compiler and simulator capable of simulating the
processor. You should be able to get a very good approximation of the
performance from that.

However the simulator can only simulate, it can't tell you things like
clock speed or other technical specifications of final hardware. We
can however work some things out...

The Roadrunner supercomputer is slated to achieve a PeteFLOP of
computing power (1million GigaFLOPs). This will be done using the Cell
processors, the Opterons are used for I/O and control.

The use of Opterons for I/O implies the relatively weak PPE is not
getting a significant upgrade. If it was the Opterons would not be
necessary.

It also tells us each Cell has to get a linpack 1K rating of at least
62.5 Double Precision GigaFLOPS. While Cell is fast and highly
efficient, it is not that efficient on linpack. Getting a good linpack
score is going to be difficult. Just adding full Double Precision
units won't do it.

There are a number of ways that the good linpack score can be
achieved:

The first is to use the technique which only uses double precision
when absolutely necessary and single precision at other times. This
works for linpack and has achieved very high rates already [1]. This
however may be considered "cheating" as it doesn't really measure
double precision performance.

The second method would be to upgrade the hardware. The floating point
hardware is obviously changing but we don't know about any other
changes as yet.

Double Precision requires twice the room for data compared to single
precision, applications which switch to DP will have problems if the
current local store size is kept. I expect we'll see the local store
doubled in size to 512 KiloBytes.

The second area to change is the memory controller. If data takes
twice the room it will also take twice the memory bandwidth. XDR
already runs at twice the rate of the interface in the standard Cell
so this is an option. XDR2 is even faster still so it could also be
used.

Since this chip will be used in higher end machines than the existing
Cell it can be more expensive, additional pins can be added if they
want. Adding more pins will allow for more memory controllers or
memory lanes to be added, again increasing bandwidth.

I think a doubling of memory bandwidth is highly likely and a
quadrupling is also a possibility because of the need to increase the
efficiency of the chip. Increasing the number of lanes also means more
memory chips can be connected, we already know this chip is due in
blades with up to 16GB.

A clock frequency rise is a distinct possibility, however going up
high raises power consumption sharply so they won't go too far. That
said I expect they'll be able to go safely above 4GHz without
problems.

Interconnects
One other possibility for change is in the I/O connections. The
Opterons will utilise HyperTransport 3, a very high speed I/O system.
Having this directly on Cell would allow it to be used with commodity
PC chipsets, this would save a lot of money for anyone interested in
building Cell workstations.

I don't think there will be any other big changes, there will likely
be all sorts of tweaks and certainly there'll be additional work on
lowering the power consumption. The Internal bus system (EIB) may be
beefed up a bit if the external memory bandwidth increases.

A lot will have been learned in the development of the first Cell
processor so there is the possibility of all sorts of other changes.

We also can't rule out the possibility of additional SPEs being added.
However there is nothing to indicate this will happen.

Conclusion
Cell2 is due to be discussed shortly. We already know it's a processor
designed for high performance Double Precision FLOPs, will access a
lot more memory and will appear in a large supercomputer. The
performance necessary means there is a need for higher efficiency in
the new processor, I expect this will mean adding higher capacity
local stores and higher memory bandwidth.

Contrary to common opinion the existing Cell is actually very fast on
DP maths [2][3]. It should be at least in line with most desktop
processors, and well ahead in some cases. Beefing up the hardware in a
Cell2 should ensure the DP Cell should be an absolute beast. Cell has
already shown performance 10-100 times faster than existing processors
on single precision, this new processor should do the same for double
precision. Cell has already had a lot of interest from the scientific
& HPC community, expect this chip to bring a lot more.