View Single Post
  #3  
Old November 13th 03, 01:11 PM
subsystem
external usenet poster
 
Posts: n/a
Default

In the patent, it was prefered that each *APU* have 32 GFLOPs performance.
Not each PE.

There would be 1 PU/CPU per PE, and 8 APUs - which would give *256 GFLOPs*
per PE.

Then 4 PEs (256 GFLOPs each) are put onto a single chip to form a BroadBand
Engine. that is where the 1 TFLOPs came from.

The BroadBand Engine would be the main CPU of PS3.



Now in this new presentation, KK shows 1 PE having performance of 1 GFLOP.
this does not make sense at all. that's less than the Emotion Engine of
PS2 which has 6.2 GFLOPs performance.

The slides are 2-3 years old, that is why. they are the SAME slides that IBM
showed for the Blue Gene project, IIRC.

If one PE (Processor Element) can only achive 1 GFLOPs, then
Sony-IBM-Toshiba are going BACKWARDS not FORWARDS in performance.

256 GFLOPs in patent down to 1 GFLOPs makes no sense whatsoever.



"Hans de Vries" wrote in message
om...
"subsystem" wrote in message

gy.com...
old but otherwise interesting read

http://www.xboxrules.com/yabbse/index.php?threadid=47


The Technology of PS3
Eddie Edwards, April 2003
Foreword



The only practical way to implement 4 Power PC's and 32 Cell Processors
each with 128 bit (4x32) functional units on a single chip in 2006 with
a 65 nm process and a 100W budget is to use virtual processors. This
would be consistent with future PowerPC processors and IBM's Blue Gene

work.

The 4 PowerPC's could be a single IBM Power6 core running 4 threads and
at twice the frequency as a Power5 would run in the same process.
That would be 8 GHz in 65 nm.

The 32 PE's have a combined performance of 32 GFlops or 1 GFlop each
according to this presentation of Sony Entertainment's CEO here.
http://www.watch.impress.co.jp/game/...20921/tgsf.htm

Have a look at this image:
http://www.watch.impress.co.jp/game/...921/tgsf15.jpg

This presentation uses large data centers to get at these 1 TeraFlop
and even 1 PetaFlop marketing numbers. This Sony presentation seems
to be a clarification after the 1 TeraFlop rumor stories: "PS3 will
be more then 100 times more powerfull than a Pentium 4"

A single "Altivec" or "PS2" like SIMD unit with four 32 bit Floating
Point units and four 32 bit Integer units running also at 8 GHz in
65 nm could be used to implement 32 virtual PE's working from one 4 MB
local memory.

Each PE would run at an effective 250 MHz with 1 GFlop (as stated in
the presentation). Each PE would be able to fetch, decode and execute a
single SIMD instruction before loading the next one. Thereby eliminating

all
the branch prediction, out of order and load/store overhead of modern
processors. 80% of such a unit would be Functional units, Floating Point
and Integer, and 20% would be control logic. In modern OOO processors
it is more like the reverse.

The patent application revived the 1 TeraFlop rumors by saying that the
"preferred" performance of each PE would be 32 GFlops and 32 GIops. Sony's
own PS3 presentation however clearly says 1 GFlop per PE for the first
implementation. 1 GFlop per PE suggest that the PE's are implemented as
virtual PE's, possibly in the way as described above.

Regards, Hans.