If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Additional registers in x86-64
A few weeks ago, AMD published the SPECint2000 score for the FX-53:
http://www.spec.org/cpu2000/results/...628-03181.html SPECint2000_peak = 1700 SPECint2000_base = 1601 I see that they used Intel's compiler on Windows XP Professional. Please correct me if I am wrong. Windows XP is a 32-bit OS, thus the benchmarks did not use the 8 additional general purpose registers defined in the x86-64 instruction set, right? I imagine that, even with 8 more registers available, gcc cannot outperform Intel's compiler and Microsoft libraries on integer code? I also noticed Sun's recent SPECfp2000 submission for the Opteron 150: http://www.spec.org/cpu2000/results/...712-03241.html SPECfp2000_peak = 1787 SPECfp2000_base = 1637 Sun did use a 64-bit OS, and it seems they compiled most benchmarks as 64-bit applications. I imagine the compiler (most often PathScale) produced SIMD code to use the XMM registers? In short, I am wondering how much improvement the 8 additional GPRs and 8 additional media registers bring... -- Regards, Grumble |
#2
|
|||
|
|||
On Fri, 13 Aug 2004 11:03:06 +0200, Grumble wrote:
A few weeks ago, AMD published the SPECint2000 score for the FX-53: http://www.spec.org/cpu2000/results/...628-03181.html SPECint2000_peak = 1700 SPECint2000_base = 1601 I see that they used Intel's compiler on Windows XP Professional. Please correct me if I am wrong. Windows XP is a 32-bit OS, thus the benchmarks did not use the 8 additional general purpose registers defined in the x86-64 instruction set, right? That is correct. I imagine that, even with 8 more registers available, gcc cannot outperform Intel's compiler and Microsoft libraries on integer code? Correct again. The optimizations in GCC are not as good as those in Intel's compiler, though the difference is generally not huge. Take a look at the results AMD published for their 'A4800' systems. These are a bunch of Opteron 144 (1.8GHz) processors running under a variety of different OSes and using different compilers. The fastest results they achieved was 1095 using Win2K3 (32-bit OS) + Intel's (32-bit) compiler. For comparison, SuSE 8 for AMD64 (64-bit OS) + GCC 3.3 (64-bit) they managed 1045, and with SuSE 8 for x86 (32-bit OS) + GCC 3.3 for x86 (32-bit compiler) they turned in a score of 960. So, in the end AMD showed an 8.8% improvement by going from 32 to 64-bit code, but they saw a 14% improvement going from Linux + GCC (32-bit ) to Windows + Intel C (also 32-bit). I also noticed Sun's recent SPECfp2000 submission for the Opteron 150: http://www.spec.org/cpu2000/results/...712-03241.html SPECfp2000_peak = 1787 SPECfp2000_base = 1637 Sun did use a 64-bit OS, and it seems they compiled most benchmarks as 64-bit applications. I imagine the compiler (most often PathScale) produced SIMD code to use the XMM registers? Presumably yes, it would use SIMD code, the XMM registers and the extra 8 integer registers (even with FP code you still need some integer registers). In short, I am wondering how much improvement the 8 additional GPRs and 8 additional media registers bring... Usually more than enough to make up for the performance loss you would expect with 64-bit code. Normally, if all else is equal, 64-bit code is about 5-10% slower than 32-bit code until you blow your memory limits, at which point 32-bit code just completely breaks down. That's why most bi-arch systems still use lots of 32-bit applications if they can, eg Sun's Solaris. With AMD64 the extra registers have managed to improve the performance enough that they not only negate this performance loss, but turn it into a 5-10% performance gain on average. Not bad at all for a fairly small cost in die space and virtually no changes to the instruction set. FWIW the reason why AMD only went to 16 registers (still a pretty low number as compared to most modern processors) is that this is the most that they could squeeze into the x86 instruction set without making fairly major changes (they did a pretty damn good job of this, obviously they actually put some thought into how to extend x86 to 64-bits as naturally as possible). ------------- Tony Hill hilla underscore 20 at yahoo dot ca |
#3
|
|||
|
|||
Tony Hill wrote:
Correct again. The optimizations in GCC are not as good as those in Intel's compiler, though the difference is generally not huge. Take a look at the results AMD published for their 'A4800' systems. These are a bunch of Opteron 144 (1.8GHz) processors running under a variety of different OSes and using different compilers. The fastest results they achieved was 1095 using Win2K3 (32-bit OS) + Intel's (32-bit) compiler. For comparison, SuSE 8 for AMD64 (64-bit OS) + GCC 3.3 (64-bit) they managed 1045, and with SuSE 8 for x86 (32-bit OS) + GCC 3.3 for x86 (32-bit compiler) they turned in a score of 960. So, in the end AMD showed an 8.8% improvement by going from 32 to 64-bit code, but they saw a 14% improvement going from Linux + GCC (32-bit ) to Windows + Intel C (also 32-bit). Cool, but I wonder why AMD submitted the scores with the Intel 32-bit compiler and a 32-bit OS, rather than a 64-bit OS with the 64-bit Pathscale or PGI compilers? These two companies seem to have designed themselves completely for AMD64, which I'm completely certain the Intel compilers aren't. With AMD64 the extra registers have managed to improve the performance enough that they not only negate this performance loss, but turn it into a 5-10% performance gain on average. Not bad at all for a fairly small cost in die space and virtually no changes to the instruction set. FWIW the reason why AMD only went to 16 registers (still a pretty low number as compared to most modern processors) is that this is the most that they could squeeze into the x86 instruction set without making fairly major changes (they did a pretty damn good job of this, obviously they actually put some thought into how to extend x86 to 64-bits as naturally as possible). How do we know that the extra performance isn't due to built-in memory controller and branch prediction? Yousuf Khan |
#4
|
|||
|
|||
Grumble wrote :
In short, I am wondering how much improvement the 8 additional GPRs and 8 additional media registers bring... not to much in Intel design i guess http://www.anandtech.com/linux/showdoc.aspx?i=2163 Pozdrawiam. -- RusH // http://randki.o2.pl/profil.php?id_r=352019 Like ninjas, true hackers are shrouded in secrecy and mystery. You may never know -- UNTIL IT'S TOO LATE. |
#5
|
|||
|
|||
On Sun, 15 Aug 2004 07:46:17 GMT, "Yousuf Khan"
wrote: Tony Hill wrote: So, in the end AMD showed an 8.8% improvement by going from 32 to 64-bit code, but they saw a 14% improvement going from Linux + GCC (32-bit ) to Windows + Intel C (also 32-bit). Cool, but I wonder why AMD submitted the scores with the Intel 32-bit compiler and a 32-bit OS, rather than a 64-bit OS with the 64-bit Pathscale or PGI compilers? These two companies seem to have designed themselves completely for AMD64, which I'm completely certain the Intel compilers aren't. Both of these compilers are still fairly new and they still are not as fast as Intel's x86 compilers for integer code. Sun submitted some SPEC CINT results using the Pathscale compiler, and they only managed a score of 1437/1584 (base/peak) with an Opteron 250 while AMD managed a score of 1566/1655 with an Opteron 150 using Intel's compiler. What's more, Sun still had to resort to using GCC for one of their tests as it was 20% faster on that test than PathCC. On the floating point side of things though, it's a different story. Sun's Opteron systems turns in a VERY respectable 1637/1787 (base/peak) score using a combination of GCC, PGI and Pathscale's compilers. This puts them just about on-par with IBM's Power4 chip, not bad for a processor that sells for about 1/10th the cost. With AMD64 the extra registers have managed to improve the performance enough that they not only negate this performance loss, but turn it into a 5-10% performance gain on average. Not bad at all for a fairly small cost in die space and virtually no changes to the instruction set. FWIW the reason why AMD only went to 16 registers (still a pretty low number as compared to most modern processors) is that this is the most that they could squeeze into the x86 instruction set without making fairly major changes (they did a pretty damn good job of this, obviously they actually put some thought into how to extend x86 to 64-bits as naturally as possible). How do we know that the extra performance isn't due to built-in memory controller and branch prediction? Err.. it's not like AMD turns those features off in 32-bit mode on their Athlon64 and Opteron chips! ------------- Tony Hill hilla underscore 20 at yahoo dot ca |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Using onboard RAID controller as additional IDE controller | Giobibo | General | 3 | December 30th 04 05:53 PM |
Epson 925 additional horizontal lines | Nigbo | Printers | 1 | December 12th 04 02:55 PM |
basic graphics - nvidia registers and C | Ben | Nvidia Videocards | 0 | February 20th 04 08:03 PM |
VIA PCI card is used to provide additional IDE channels - DVD drive problems | Reviewer2003 | Asus Motherboards | 1 | January 6th 04 02:19 PM |
Adding additional HDD to server | Tony | Dell Computers | 0 | November 3rd 03 02:24 PM |