In the patent, it was prefered that each *APU* have 32 GFLOPs performance.
Not each PE.
There would be 1 PU/CPU per PE, and 8 APUs - which would give *256 GFLOPs*
per PE.
Then 4 PEs (256 GFLOPs each) are put onto a single chip to form a BroadBand
Engine. that is where the 1 TFLOPs came from.
The BroadBand Engine would be the main CPU of PS3.
Now in this new presentation, KK shows 1 PE having performance of 1 GFLOP.
this does not make sense at all. that's less than the Emotion Engine of
PS2 which has 6.2 GFLOPs performance.
The slides are 2-3 years old, that is why. they are the SAME slides that IBM
showed for the Blue Gene project, IIRC.
If one PE (Processor Element) can only achive 1 GFLOPs, then
Sony-IBM-Toshiba are going BACKWARDS not FORWARDS in performance.
256 GFLOPs in patent down to 1 GFLOPs makes no sense whatsoever.
"Hans de Vries" wrote in message
om...
"subsystem" wrote in message
gy.com...
old but otherwise interesting read
http://www.xboxrules.com/yabbse/index.php?threadid=47
The Technology of PS3
Eddie Edwards, April 2003
Foreword
The only practical way to implement 4 Power PC's and 32 Cell Processors
each with 128 bit (4x32) functional units on a single chip in 2006 with
a 65 nm process and a 100W budget is to use virtual processors. This
would be consistent with future PowerPC processors and IBM's Blue Gene
work.
The 4 PowerPC's could be a single IBM Power6 core running 4 threads and
at twice the frequency as a Power5 would run in the same process.
That would be 8 GHz in 65 nm.
The 32 PE's have a combined performance of 32 GFlops or 1 GFlop each
according to this presentation of Sony Entertainment's CEO here.
http://www.watch.impress.co.jp/game/...20921/tgsf.htm
Have a look at this image:
http://www.watch.impress.co.jp/game/...921/tgsf15.jpg
This presentation uses large data centers to get at these 1 TeraFlop
and even 1 PetaFlop marketing numbers. This Sony presentation seems
to be a clarification after the 1 TeraFlop rumor stories: "PS3 will
be more then 100 times more powerfull than a Pentium 4"
A single "Altivec" or "PS2" like SIMD unit with four 32 bit Floating
Point units and four 32 bit Integer units running also at 8 GHz in
65 nm could be used to implement 32 virtual PE's working from one 4 MB
local memory.
Each PE would run at an effective 250 MHz with 1 GFlop (as stated in
the presentation). Each PE would be able to fetch, decode and execute a
single SIMD instruction before loading the next one. Thereby eliminating
all
the branch prediction, out of order and load/store overhead of modern
processors. 80% of such a unit would be Functional units, Floating Point
and Integer, and 20% would be control logic. In modern OOO processors
it is more like the reverse.
The patent application revived the 1 TeraFlop rumors by saying that the
"preferred" performance of each PE would be 32 GFlops and 32 GIops. Sony's
own PS3 presentation however clearly says 1 GFlop per PE for the first
implementation. 1 GFlop per PE suggest that the PE's are implemented as
virtual PE's, possibly in the way as described above.
Regards, Hans.