If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
ATI Interview - The Power of the Xbox 360 GPU
http://interviews.teamxbox.com/xbox/...ox-360-GPU/p1/ The Power of the Xbox 360 GPU By: César A. Berardini - "Cesar" January 24th, 2006 Back in 2003, ATI Technologies and Microsoft announced a technology development agreement under which ATI would develop custom, leading-edge graphics technologies for use in the Xbox successor. This broke the previous relationship Microsoft had with nVIDIA and saw a new contractor taking care of the Xbox graphics. Fast forward to 2005, Microsoft revealed that the Xbox 360 will feature a custom ATI graphics processor that clocks in at a blistering 500 MHz, with 48-way parallel floating-point dynamically scheduled shader pipelines and 10 MB of embedded RAM. Do all those numbers mean nothing to you? Don't worry. We had the chance to interview Bob Feldstein, Vice President of Engineering, ATI Technologies, Inc., to learn more about the development and power of the Xenos GPU. What were the goals and challenges that ATI faced in developing the Xbox 360 GPU? Bob Feldstein: The challenges included creating on schedule a platform that can live for five years without enhancement. Microsoft's aggressive performance specifications for the system forced ATI to once again think outside the box -in this case, the PC market. After making the breakthrough that we needed by thinking of this product as a console product only, the innovations -- Intelligent Memory, Unified Shader, Modeling Engine -- came more easily. Then the architecture team had to come through in record time to stay ahead of an aggressive implementation team. Did Microsoft know exactly what it wanted for its GPU or did they just set the goals and you proposed the architecture and technology? Bob Feldstein: Microsoft set broad goals for the GPU. They were especially concerned with memory bandwidth and overall system performance. They wanted a machine that could take advantage of CPU multi-processing through multi-threading, plus a machine that would be conceptually simple to program while providing head room for developers to stay competitive over the console's lifetime. Microsoft and ATI did the GPU architectural design, with MS determining the overall performance targets and ATI turning those targets into reality. The Unified Shaders and Intelligent Memory, for example, are direct results of our remarkable collaboration. Before we continue, we never had the chance to clarify the correct name of the Xbox 360 GPU. Some call it Xenos, others C1. Sometimes it was known as R500. But the rumor was that ATI wanted to avoid that codename because it could make the Xbox 360 GPU look less powerful than ATI's R520 PC part. So, what is the Xbox 360 GPU codename? Bob Feldstein: The Xbox GPU had nothing whatsoever to do with the PC products. R500 was never an internal name for the Xbox. Internally we called the GPU, interchangeably, C1 and Xenos. C1 was a code name defined before we had the contract, Xenos was the project name after the contract was won - but C1 stuck in everyone's minds. Once we started calling it C1, it was hard to change. How many ATI engineers worked on the design of the Xenos GPU? Any statistics from this project you'd like to throw in? Bob Feldstein: ATI had 175 engineers working on the Xenos GPU at the peak. These included architects, logic designers, physical design engineers, verification engineers, manufacturing test engineers, qualification engineers and management. The team was spread out between Orlando, Florida, Marlborough, Massachusetts and Toronto, Ontario. The team's size varied during the life of the program - in fact, we still have about 20 engineers involved in various cost down projects. _ The Xenos is made of two elements, the parent die, which is basically a shader core and also acts as the Northbridge; and the daughter die, which handles some functions traditionally executed inside a one-die GPU, like the FSAA, or alpha and Z logic. What was the reason for these two dies to exist? Was it because of a physical constraints (difficulty in putting all these transistors into one single die) or an architecture need? Bob Feldstein: There is no architectural reason for the two parts of the chip to exist on separate die. Instead, it was an economic decision. The daughter die that handles FSAA, alpha, stencil and Z contains a large array of dynamic memory. We have logic in this memory array and we call this combination Intelligent Memory. Because the Dynamic memory has a higher failure rate at manufacturing, it allows us to decouple the fallout from memory failures from the general fallout - and this saves us money overall. I believe that it would make sense in the future to combine the dies in a smaller geometry to save money. People love numbers. How many transistors does the Xenos have? Can you break that down into the parent and daughter die? Explain some of the numbers that have been mentioned, such as the 2-terabit; the 32GB/sec and 22.4GB/sec bandwidths, etc. Bob Feldstein: 235 million transistors parent die, 90 million transistors daughter die. Bandwidth for Intelligent Memory is derived from the following events occurring every cycle: 2 Quads of samples/cycle * 4 samples * (4 bytes color + 4 bytes Z)*2 (read and write)*500mhz = 256 gbytes/sec (that is, 2 Terabits/sec). The 22.4GB/sec link is the connection to main memory (note, incidentally, that all 512MB of Xbox 360 system memory is in one place, which makes accessing it easier from a developer perspective). The GPU is also directly connected to the L2 cache of the CPUs - this is a 24GB/sec link Memory bandwidth is extremely important, which is why we spent so much time on it. Fortunately, designing the system from the ground up gave us the freedom to build incredible bandwidth into the box. The interface to the system's memory is 128-bit. Isn't this a bottleneck considering the bandwidth-intensive tasks performed in the GPU? Why was a 128-bit bus selected when PC parts already implement 256-bit buses in their high-end editions? Bob Feldstein: Excellent question because it gets to the heart of what is right in the system design. We have a great deal of internal memory in the daughter die referred to above. We actually use this memory as our back buffer. In addition, all anti-aliasing resolves, Z-Buffering and Alpha Blending occur within this internal memory. This means our highest bandwidth clients (Z, Alpha and FSAA) occur internally to the chip and don't need to access main memory. This makes the 128 bit interface to system memory, and the ensuing bandwidth, more than enough for our needs because we are offloading the bandwidth hogs to internal memory. Let's talk about the unified shader architecture. First, I'd like to know about its performance. I'm pretty sure a unified shader architecture makes things easier for developers, but is a unified shader pipeline as good (performance wise) as the current architecture seen in PC parts, that is, separated pixel and vertex processing units. Bob Feldstein: The Unified Shader Architecture actually improves overall performance. To understand why, we need to look at what is unified. In current architectures there are separate shader mechanisms with different instruction sets and different caching mechanisms. What ATI found is that, in real applications, one set of shaders, or the other, is often idle because either pixel or vertex processing dominates in a bursty manner. To handle the bursts, you need to build a lot of parallel shader resources - even though they will be idle as processing transitions from pixels to vertex. The Unified Shaders combines the instruction sets, creates the right caching mechanisms and with a lot of other complication allows all the shader processors to be used for any problem. Thus, when pixel processing dominates, we can use all 48 shaders for pixel processing. When vertex processing dominates, we can use the 48 shaders for vertices. When the workload is some vertex and some pixel processing, we can mix the shader resources between the two programs. Did you help Microsoft with the first steps of the manufacturing process, such as obtaining the first yields and going through the first steps of the process? Bob Feldstein: ATI helped Microsoft a great deal in picking a fab and manufacturing yield issues. This was part of the initial contract and was important to Microsoft. Microsoft had a good team in place to work on these issues as well, but having so much experience in-house at ATI, it was only natural that ATI helped in getting the product to market quickly. Is the shader architecture unified at both the hardware and software level, or do programmers still write their pixel and vertex shaders in two different syntaxes and then throw them into the GPU that puts them in a unified shader pipeline? Bob Feldstein: The programmer can use the superset of instructions for any data running through the shader. This allows the programmer to be creative and not limited to current views of how to manipulate vertices and pixels - or any other data that the programmer decides to run through the shader processors. When the Xbox 360 GPU features were unveiled, Nvidia expressed doubts about unified shader architecture, particularly about its performance. Do you think Nvidia's comments are due to no Nvidia part, not even the RSX, having a unified shader architecture yet? Bob Feldstein: Oh yes. Very much so. In Windows Vista, WGF 2.0 will treat GPUs as having unified vertex and pixel pipelines, even if that's not the case in the actual hardware. This suggests that eventually, all GPUs will have a unified shader architecture. Does the design of the Xenos GPU give ATI the edge over the competition in the near future, since you have already developed a unified shader architecture? Bob Feldstein: I have no idea what Nvidia is doing in the future, but ATI has a leg up on the research and development of Unified Shaders. This certainly seems like an advantage. Is it possible to measure the Xenos performance against a PC part? Or does the fact that they have different shader architectures, run with different operating systems, and system hardware, make a comparison too difficult? Bob Feldstein: It really is difficult to measure the environments against each other. You would tend to write an app to run well in its intended environment. I would discourage trying to compare them in these manners. That being said, the console has certain advantages - most notably, the controlled environment. This is what allows us to overcome memory bandwidth bottlenecks with Intelligent Memory and make the 48 shader units operate at peak efficiency. A device could have an infinite number of shader ALUs, but without the memory bandwidth to feed the system all of the hardware would go for naught. The RSX has a 550 MHz clock speed. Does this 10% clock speed lead over the Xenos GPU necessarily mean that the PlayStation 3 GPU is more powerful than the Xbox 360 GPU? We won't believe it until we see it, but if true, how is it possible that the PlayStation 3 can output two 1080p video streams simultaneously? That makes the RSX sound more powerful than the Xenos... Bob Feldstein: No! These are inconsequential numbers that don't reflect any reality concerning the system performance. The 1080p streams have no bearing on understanding system performance, and the clock speed means little. Realize that the memory bandwidth is the bottleneck of graphics systems. ATI's Intelligent Memory provides an almost infinite bandwidth path to memory - meaning that the Unified Shaders will never be stifled in getting work to do. The Sony processor is going to come up against memory bandwidth limitations constantly, negating any small clocking differential. The Sony 1080p dual outputs are not an indication of performance - at best 1080p is an indication that Sony considered this resolution the sweet spot of the market. The use case of dual 1080p just shows that the RSX has a PC pedigree, and has been cobbled together with the console. When describing the Xenos GPU, it has been said that its capabilities go beyond the Shader Model 3.0, placing a plus symbol after the DirectX nomenclature. Can you give specific numbers (for example, the maximum number of shader instructions executed) that corroborate this Shader Model 3.0+ capacity? Bob Feldstein: Among others impressive aspects, the Xbox 360 can execute 1000 instructions/programs, offers higher precision than required, and does shader exports. Also, as a unified shader architecture, both vertex and pixel shaders have access to the full instruction set. Now that Microsoft has announced that it is developing a HD-DVD external drive, we'd like to know if the Xenos GPU has any of this Avivo technology found in the Radeon X1800, such as hardware accelerated processing of HD video formats like H.264, VC-1. Or would Microsoft be required to use its three-core PowerPC processor to accelerate HD decoding? Bob Feldstein: Better directed to Microsoft. After AGEAI unveiled its PhysX processor, everyone expected that at least one of the next-generation consoles would include a physics processor, but that didn't happen. There have been rumors that the Xenos GPU, particularly its daughter die, could be used to accelerate physics. Is that true? If so, would it be practical or would the use of the daughter die to perform hardware accelerated physics have an impact in the Xenos graphics performance? Bob Feldstein: It is certainly true that physics could be done on our GPU. Realize that the Xbox 360 also has three high-speed processors, each multi-threaded - so there are many ways to achieve interesting A.I. We will seek to enable all paths. Besides developing hardware, ATI always helps developers by releasing tools, source code samples, etc. For example, we heard about a displaced subdivision surfaces algorithm developed by Vineet Goel of ATI Research Orlando. Are you helping Xbox 360 developers leverage the power of the Xenos GPU or is that a Microsoft responsibility? Bob Feldstein: We have teamed with Microsoft to enable developers. We have had some members of the developer relation and tools teams work directly at developer events and assist in training Microsoft employees. We are ready to help at anytime but the truth is that we have been quite impressed by how Microsoft is handling developer relationships. We will push ideas like Vineet's (and he has others) through Microsoft. When we say the Xbox has headroom, it is through the ability to enable algorithms on the GPU, including interesting lighting and surface algorithm like subdivision surfaces. It would be impossible to implement these algorithms in real time without our unique features such as shader export. Have the Xbox 360 titles released so far surprised ATI in any way, in terms of performance? If so, can you specifically mention which games and what aspects of them? Is there an office favorite? Bob Feldstein: We enjoy all the titles, and think that there are many under development and currently on the market that should impress all. Finally, where do you think real-time graphics are heading in the next few years? Will GPU accelerate physics or a separated processor (a PPU, as AGEIA calls it) be required? Will future GPUs go multicore in a single die like CPUs are doing, or will Crossfire/SLI-like implementations be the solution to increased performance? Bob Feldstein: The future of graphics could well include multicore GPUs, although we are looking at many alternatives to enable GPUs to live in the necessary cost and power envelopes. I am not sure about physics processors and whether they offer enough value to become standard processor in gaming platforms. GPUs can offer much the same benefit and take advantage of volumes that would be hard to achieve in the Physics Processor market. It seems that physics processors are after the same market as GPUs, but that they are not an essential special purpose processor. And it is too early to count out the general purpose CPU as the best implementation of physics in the platform. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Motherboard Power Requirements | arifi | Asus Motherboards | 27 | February 24th 05 11:31 AM |
PSU Fans | Muttly | General | 16 | February 13th 04 10:42 PM |
Won't Power Up after Power Outage | Greg Lovern | Homebuilt PC's | 7 | February 8th 04 01:47 PM |
Happy Birthday America | SST | Overclocking AMD Processors | 326 | November 27th 03 07:54 PM |
Happy Birthday America | SST | Nvidia Videocards | 336 | November 27th 03 07:54 PM |