View Single Post
  #7  
Old November 13th 03, 08:28 PM
Hans de Vries
external usenet poster
 
Posts: n/a
Default

"subsystem" wrote in message m...
In the patent, it was prefered that each *APU* have 32 GFLOPs performance.
Not each PE.


OK. I should have used "APU" instead of "PE" in Cell terms.

The 4 PowerPC's are the "PU"s in the patent and the 32 PS2 like SIMD
processors are the "APU"s


There would be 1 PU/CPU per PE, and 8 APUs - which would give *256 GFLOPs*
per PE.

Then 4 PEs (256 GFLOPs each) are put onto a single chip to form a BroadBand
Engine. that is where the 1 TFLOPs came from.

The BroadBand Engine would be the main CPU of PS3.


The 32 GFlops and 32 GIops are only mentioned once in the patent on
page 52 referring to figure 4:

"Floating point units 412 preferably operate at a speed of 32
billion floating point operations per second (32 GFLOPS), and
integer units 414 preferably operate at a speed of 32 billion
operations per second (32 GOPS)"

Figure 4 shows four Floating Point units and four Integer Units.
The text of the patent may be interpreted also in other ways like:
Each Floating Point unit operates at 32 GFlops (= 128 GFlops per APU)
or in yet another interpretation: All Floating Point units together
operate at 32 GFlops (= 32 GFlops/chip)


Now in this new presentation, KK shows 1 PE having performance of 1 GFLOP.
this does not make sense at all. that's less than the Emotion Engine of
PS2 which has 6.2 GFLOPs performance.



Each APU would have 1 GFlops. The entire Chip would have 32 APU's
running at 32 GFlops according to:
http://www.watch.impress.co.jp/game/...921/tgsf15.jpg


The slides are 2-3 years old, that is why. they are the SAME slides that IBM
showed for the Blue Gene project, IIRC.


The patent application is from March 22, 2001 while this presentation
from the President of Sony's Entertainment division (=PlayStation)
is from September 20, 2002.


If one PE (Processor Element) can only achieve 1 GFLOPs, then
Sony-IBM-Toshiba are going BACKWARDS not FORWARDS in performance.

256 GFLOPs in patent down to 1 GFLOPs makes no sense whatsoever.


So it's 1 GFlop per APU, 8 GFlops per PE and 32 GFlops per Chip.
It is still a huge improvement over the PS2. The PS2 runs at 300 MHz
while each of the 32 (virtual) APU's would run at 250 MHz.


Regards, Hans





"Hans de Vries" wrote in message
om...
"subsystem" wrote in message

gy.com...
old but otherwise interesting read

http://www.xboxrules.com/yabbse/index.php?threadid=47


The Technology of PS3
Eddie Edwards, April 2003
Foreword



The only practical way to implement 4 Power PC's and 32 Cell Processors
each with 128 bit (4x32) functional units on a single chip in 2006 with
a 65 nm process and a 100W budget is to use virtual processors. This
would be consistent with future PowerPC processors and IBM's Blue Gene

work.

The 4 PowerPC's could be a single IBM Power6 core running 4 threads and
at twice the frequency as a Power5 would run in the same process.
That would be 8 GHz in 65 nm.

The 32 PE's have a combined performance of 32 GFlops or 1 GFlop each
according to this presentation of Sony Entertainment's CEO here.
http://www.watch.impress.co.jp/game/...20921/tgsf.htm

Have a look at this image:
http://www.watch.impress.co.jp/game/...921/tgsf15.jpg

This presentation uses large data centers to get at these 1 TeraFlop
and even 1 PetaFlop marketing numbers. This Sony presentation seems
to be a clarification after the 1 TeraFlop rumor stories: "PS3 will
be more then 100 times more powerfull than a Pentium 4"

A single "Altivec" or "PS2" like SIMD unit with four 32 bit Floating
Point units and four 32 bit Integer units running also at 8 GHz in
65 nm could be used to implement 32 virtual PE's working from one 4 MB
local memory.

Each PE would run at an effective 250 MHz with 1 GFlop (as stated in
the presentation). Each PE would be able to fetch, decode and execute a
single SIMD instruction before loading the next one. Thereby eliminating

all
the branch prediction, out of order and load/store overhead of modern
processors. 80% of such a unit would be Functional units, Floating Point
and Integer, and 20% would be control logic. In modern OOO processors
it is more like the reverse.

The patent application revived the 1 TeraFlop rumors by saying that the
"preferred" performance of each PE would be 32 GFlops and 32 GIops. Sony's
own PS3 presentation however clearly says 1 GFlop per PE for the first
implementation. 1 GFlop per PE suggest that the PE's are implemented as
virtual PE's, possibly in the way as described above.

Regards, Hans.