ATI Interview - The Power of the Xbox 360 GPU

Air Raid

http://interviews.teamxbox.com/xbox/...ox-360-GPU/p1/

The Power of the Xbox 360 GPU
By: César A. Berardini - "Cesar"
January 24th, 2006

Back in 2003, ATI Technologies and Microsoft announced a technology
development agreement under which ATI would develop custom,
leading-edge graphics technologies for use in the Xbox successor. This
broke the previous relationship Microsoft had with nVIDIA and saw a new
contractor taking care of the Xbox graphics.

Fast forward to 2005, Microsoft revealed that the Xbox 360 will feature
a custom ATI graphics processor that clocks in at a blistering 500 MHz,
with 48-way parallel floating-point dynamically scheduled shader
pipelines and 10 MB of embedded RAM. Do all those numbers mean nothing
to you? Don't worry.

We had the chance to interview Bob Feldstein, Vice President of
Engineering, ATI Technologies, Inc., to learn more about the
development and power of the Xenos GPU.

What were the goals and challenges that ATI faced in developing the
Xbox 360 GPU?

Bob Feldstein: The challenges included creating on schedule a platform
that can live for five years without enhancement. Microsoft's
aggressive performance specifications for the system forced ATI to once
again think outside the box -in this case, the PC market. After
making the breakthrough that we needed by thinking of this product as a
console product only, the innovations -- Intelligent Memory, Unified
Shader, Modeling Engine -- came more easily. Then the architecture team
had to come through in record time to stay ahead of an aggressive
implementation team.

Did Microsoft know exactly what it wanted for its GPU or did they just
set the goals and you proposed the architecture and technology?

Bob Feldstein: Microsoft set broad goals for the GPU. They were
especially concerned with memory bandwidth and overall system
performance. They wanted a machine that could take advantage of CPU
multi-processing through multi-threading, plus a machine that would be
conceptually simple to program while providing head room for developers
to stay competitive over the console's lifetime. Microsoft and ATI
did the GPU architectural design, with MS determining the overall
performance targets and ATI turning those targets into reality. The
Unified Shaders and Intelligent Memory, for example, are direct results
of our remarkable collaboration.

Before we continue, we never had the chance to clarify the correct name
of the Xbox 360 GPU. Some call it Xenos, others C1. Sometimes it was
known as R500. But the rumor was that ATI wanted to avoid that codename
because it could make the Xbox 360 GPU look less powerful than ATI's
R520 PC part. So, what is the Xbox 360 GPU codename?

Bob Feldstein: The Xbox GPU had nothing whatsoever to do with the PC
products. R500 was never an internal name for the Xbox. Internally we
called the GPU, interchangeably, C1 and Xenos. C1 was a code name
defined before we had the contract, Xenos was the project name after
the contract was won - but C1 stuck in everyone's minds. Once we
started calling it C1, it was hard to change.

How many ATI engineers worked on the design of the Xenos GPU? Any
statistics from this project you'd like to throw in?

Bob Feldstein: ATI had 175 engineers working on the Xenos GPU at the
peak. These included architects, logic designers, physical design
engineers, verification engineers, manufacturing test engineers,
qualification engineers and management. The team was spread out between
Orlando, Florida, Marlborough, Massachusetts and Toronto, Ontario. The
team's size varied during the life of the program - in fact, we
still have about 20 engineers involved in various cost down projects.

_

The Xenos is made of two elements, the parent die, which is basically a
shader core and also acts as the Northbridge; and the daughter die,
which handles some functions traditionally executed inside a one-die
GPU, like the FSAA, or alpha and Z logic.
What was the reason for these two dies to exist? Was it because of a
physical constraints (difficulty in putting all these transistors into
one single die) or an architecture need?

Bob Feldstein: There is no architectural reason for the two parts of
the chip to exist on separate die. Instead, it was an economic
decision. The daughter die that handles FSAA, alpha, stencil and Z
contains a large array of dynamic memory. We have logic in this memory
array and we call this combination Intelligent Memory. Because the
Dynamic memory has a higher failure rate at manufacturing, it allows us
to decouple the fallout from memory failures from the general fallout
- and this saves us money overall. I believe that it would make sense
in the future to combine the dies in a smaller geometry to save money.

People love numbers. How many transistors does the Xenos have? Can you
break that down into the parent and daughter die? Explain some of the
numbers that have been mentioned, such as the 2-terabit; the 32GB/sec
and 22.4GB/sec bandwidths, etc.

Bob Feldstein: 235 million transistors parent die, 90 million
transistors daughter die. Bandwidth for Intelligent Memory is derived
from the following events occurring every cycle:

2 Quads of samples/cycle * 4 samples * (4 bytes color + 4 bytes Z)*2
(read and write)*500mhz = 256 gbytes/sec (that is, 2 Terabits/sec).

The 22.4GB/sec link is the connection to main memory (note,
incidentally, that all 512MB of Xbox 360 system memory is in one place,
which makes accessing it easier from a developer perspective). The GPU
is also directly connected to the L2 cache of the CPUs - this is a
24GB/sec link Memory bandwidth is extremely important, which is why we
spent so much time on it. Fortunately, designing the system from the
ground up gave us the freedom to build incredible bandwidth into the
box.

The interface to the system's memory is 128-bit. Isn't this a
bottleneck considering the bandwidth-intensive tasks performed in the
GPU? Why was a 128-bit bus selected when PC parts already implement
256-bit buses in their high-end editions?

Bob Feldstein: Excellent question because it gets to the heart of what
is right in the system design. We have a great deal of internal memory
in the daughter die referred to above. We actually use this memory as
our back buffer. In addition, all anti-aliasing resolves, Z-Buffering
and Alpha Blending occur within this internal memory. This means our
highest bandwidth clients (Z, Alpha and FSAA) occur internally to the
chip and don't need to access main memory. This makes the 128 bit
interface to system memory, and the ensuing bandwidth, more than enough
for our needs because we are offloading the bandwidth hogs to internal
memory.

Let's talk about the unified shader architecture. First, I'd like
to know about its performance. I'm pretty sure a unified shader
architecture makes things easier for developers, but is a unified
shader pipeline as good (performance wise) as the current architecture
seen in PC parts, that is, separated pixel and vertex processing units.

Bob Feldstein: The Unified Shader Architecture actually improves
overall performance. To understand why, we need to look at what is
unified.

In current architectures there are separate shader mechanisms with
different instruction sets and different caching mechanisms. What ATI
found is that, in real applications, one set of shaders, or the other,
is often idle because either pixel or vertex processing dominates in a
bursty manner. To handle the bursts, you need to build a lot of
parallel shader resources - even though they will be idle as
processing transitions from pixels to vertex.

The Unified Shaders combines the instruction sets, creates the right
caching mechanisms and with a lot of other complication allows all the
shader processors to be used for any problem. Thus, when pixel
processing dominates, we can use all 48 shaders for pixel processing.
When vertex processing dominates, we can use the 48 shaders for
vertices. When the workload is some vertex and some pixel processing,
we can mix the shader resources between the two programs.

Did you help Microsoft with the first steps of the manufacturing
process, such as obtaining the first yields and going through the first
steps of the process?

Bob Feldstein: ATI helped Microsoft a great deal in picking a fab and
manufacturing yield issues. This was part of the initial contract and
was important to Microsoft. Microsoft had a good team in place to work
on these issues as well, but having so much experience in-house at ATI,
it was only natural that ATI helped in getting the product to market
quickly.

Is the shader architecture unified at both the hardware and software
level, or do programmers still write their pixel and vertex shaders in
two different syntaxes and then throw them into the GPU that puts them
in a unified shader pipeline?

Bob Feldstein: The programmer can use the superset of instructions for
any data running through the shader. This allows the programmer to be
creative and not limited to current views of how to manipulate vertices
and pixels - or any other data that the programmer decides to run
through the shader processors.

When the Xbox 360 GPU features were unveiled, Nvidia expressed doubts
about unified shader architecture, particularly about its performance.
Do you think Nvidia's comments are due to no Nvidia part, not even
the RSX, having a unified shader architecture yet?

Bob Feldstein: Oh yes. Very much so.

In Windows Vista, WGF 2.0 will treat GPUs as having unified vertex and
pixel pipelines, even if that's not the case in the actual hardware.
This suggests that eventually, all GPUs will have a unified shader
architecture.
Does the design of the Xenos GPU give ATI the edge over the competition
in the near future, since you have already developed a unified shader
architecture?

Bob Feldstein: I have no idea what Nvidia is doing in the future, but
ATI has a leg up on the research and development of Unified Shaders.
This certainly seems like an advantage.

Is it possible to measure the Xenos performance against a PC part? Or
does the fact that they have different shader architectures, run with
different operating systems, and system hardware, make a comparison too
difficult?

Bob Feldstein: It really is difficult to measure the environments
against each other. You would tend to write an app to run well in its
intended environment. I would discourage trying to compare them in
these manners. That being said, the console has certain advantages -
most notably, the controlled environment. This is what allows us to
overcome memory bandwidth bottlenecks with Intelligent Memory and make
the 48 shader units operate at peak efficiency. A device could have an
infinite number of shader ALUs, but without the memory bandwidth to
feed the system all of the hardware would go for naught.

The RSX has a 550 MHz clock speed. Does this 10% clock speed lead over
the Xenos GPU necessarily mean that the PlayStation 3 GPU is more
powerful than the Xbox 360 GPU? We won't believe it until we see it,
but if true, how is it possible that the PlayStation 3 can output two
1080p video streams simultaneously? That makes the RSX sound more
powerful than the Xenos...

Bob Feldstein: No! These are inconsequential numbers that don't
reflect any reality concerning the system performance. The 1080p
streams have no bearing on understanding system performance, and the
clock speed means little.

Realize that the memory bandwidth is the bottleneck of graphics
systems. ATI's Intelligent Memory provides an almost infinite
bandwidth path to memory - meaning that the Unified Shaders will
never be stifled in getting work to do. The Sony processor is going to
come up against memory bandwidth limitations constantly, negating any
small clocking differential.

The Sony 1080p dual outputs are not an indication of performance - at
best 1080p is an indication that Sony considered this resolution the
sweet spot of the market. The use case of dual 1080p just shows that
the RSX has a PC pedigree, and has been cobbled together with the
console.

When describing the Xenos GPU, it has been said that its capabilities
go beyond the Shader Model 3.0, placing a plus symbol after the DirectX
nomenclature. Can you give specific numbers (for example, the maximum
number of shader instructions executed) that corroborate this Shader
Model 3.0+ capacity?

Bob Feldstein: Among others impressive aspects, the Xbox 360 can
execute 1000 instructions/programs, offers higher precision than
required, and does shader exports. Also, as a unified shader
architecture, both vertex and pixel shaders have access to the full
instruction set.

Now that Microsoft has announced that it is developing a HD-DVD
external drive, we'd like to know if the Xenos GPU has any of this
Avivo technology found in the Radeon X1800, such as hardware
accelerated processing of HD video formats like H.264, VC-1. Or would
Microsoft be required to use its three-core PowerPC processor to
accelerate HD decoding?

Bob Feldstein: Better directed to Microsoft.

After AGEAI unveiled its PhysX processor, everyone expected that at
least one of the next-generation consoles would include a physics
processor, but that didn't happen. There have been rumors that the
Xenos GPU, particularly its daughter die, could be used to accelerate
physics. Is that true? If so, would it be practical or would the use of
the daughter die to perform hardware accelerated physics have an impact
in the Xenos graphics performance?

Bob Feldstein: It is certainly true that physics could be done on our
GPU. Realize that the Xbox 360 also has three high-speed processors,
each multi-threaded - so there are many ways to achieve interesting
A.I. We will seek to enable all paths.

Besides developing hardware, ATI always helps developers by releasing
tools, source code samples, etc. For example, we heard about a
displaced subdivision surfaces algorithm developed by Vineet Goel of
ATI Research Orlando. Are you helping Xbox 360 developers leverage the
power of the Xenos GPU or is that a Microsoft responsibility?

Bob Feldstein: We have teamed with Microsoft to enable developers. We
have had some members of the developer relation and tools teams work
directly at developer events and assist in training Microsoft
employees. We are ready to help at anytime but the truth is that we
have been quite impressed by how Microsoft is handling developer
relationships.

We will push ideas like Vineet's (and he has others) through
Microsoft. When we say the Xbox has headroom, it is through the ability
to enable algorithms on the GPU, including interesting lighting and
surface algorithm like subdivision surfaces. It would be impossible to
implement these algorithms in real time without our unique features
such as shader export.

Have the Xbox 360 titles released so far surprised ATI in any way, in
terms of performance? If so, can you specifically mention which games
and what aspects of them? Is there an office favorite?

Bob Feldstein: We enjoy all the titles, and think that there are many
under development and currently on the market that should impress all.

Finally, where do you think real-time graphics are heading in the next
few years? Will GPU accelerate physics or a separated processor (a PPU,
as AGEIA calls it) be required? Will future GPUs go multicore in a
single die like CPUs are doing, or will Crossfire/SLI-like
implementations be the solution to increased performance?

Bob Feldstein: The future of graphics could well include multicore
GPUs, although we are looking at many alternatives to enable GPUs to
live in the necessary cost and power envelopes.

I am not sure about physics processors and whether they offer enough
value to become standard processor in gaming platforms. GPUs can offer
much the same benefit and take advantage of volumes that would be hard
to achieve in the Physics Processor market. It seems that physics
processors are after the same market as GPUs, but that they are not an
essential special purpose processor. And it is too early to count out
the general purpose CPU as the best implementation of physics in the
platform.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Motherboard Power Requirements	arifi	Asus Motherboards	27	February 24th 05 11:31 AM
PSU Fans	Muttly	General	16	February 13th 04 10:42 PM
Won't Power Up after Power Outage	Greg Lovern	Homebuilt PC's	7	February 8th 04 01:47 PM
Happy Birthday America	SST	Overclocking AMD Processors	326	November 27th 03 07:54 PM
Happy Birthday America	SST	Nvidia Videocards	336	November 27th 03 07:54 PM