A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

AMD to leave x86 behind?



 
 
Thread Tools Display Modes
  #31  
Old October 31st 05, 06:04 PM
Jeremy Linton
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?



Tim McCaffrey wrote:
Instructions that load/store/copy memory that control how much
cache pollution is done and/or communicate to the bridges and
I/O devices how much data is being loaded/stored could improve
efficiency on the I/O (PCI) and memory busses.

Ahhh.. Maybe you should look at the MTRR's (memory Type range
registers) PAT (Page Attribute Table) and the prefetch and Non-temporal
instructions already provided. I've seen near therotical throughput
numbers both on the memory subsystem and on PCI busses given properly
tuned code. Its not that you can't control such things with the x86 its
just that I haven't seen a compiler generate optimal code.

AMD had a very nice document they wrote a few years ago about how to
get max throughput with memory copy operations, where they compared
diffrent methods and instructions for doing the memory copy. If I
remember correctly in the end they got nearly theoritical bandwidth
numbers by doing a simple loop to preread (with actual register load
instead of prefretch) cache block size reads followed by another loop
accually doing a Non Temporal quadword copy. This reduced the read vs
write bus turnaround times enough to get numbers that were significantly
faster than nearly any other method.

So, its possible, right now given proper code.






  #32  
Old October 31st 05, 08:19 PM
Terje Mathisen
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Jeremy Linton wrote:
AMD had a very nice document they wrote a few years ago about how to
get max throughput with memory copy operations, where they compared
diffrent methods and instructions for doing the memory copy. If I
remember correctly in the end they got nearly theoritical bandwidth
numbers by doing a simple loop to preread (with actual register load
instead of prefretch) cache block size reads followed by another loop
accually doing a Non Temporal quadword copy. This reduced the read vs
write bus turnaround times enough to get numbers that were significantly
faster than nearly any other method.


Afair, that optimization was in regard to doing a simple set of fp
operations on a block of data, where it turned out that the fastest way
was to move everything three times:

First the max speed pre-read loop, then an operate loop, storing to a
fixed half L1 sized buffer, then finally NT stores to move the result
block to the final destination.

So, its possible, right now given proper code.


Or in this case, quite horribly overcomplicated code. :-(

Terje

--
-
"almost all programming can be viewed as an exercise in caching"
  #33  
Old October 31st 05, 09:23 PM
Scott A Crosby
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

On Mon, 31 Oct 2005 18:04:01 GMT, Jeremy Linton writes:

Ahhh.. Maybe you should look at the MTRR's (memory Type range
registers) PAT (Page Attribute Table) and the prefetch and
Non-temporal instructions already provided. I've seen near
therotical throughput numbers both on the memory subsystem and
on PCI busses given properly tuned code. Its not that you
can't control such things with the x86 its just that I haven't
seen a compiler generate optimal code.

AMD had a very nice document they wrote a few years ago about
how to get max throughput with memory copy operations, where


Would you happen to know the URL? I'd like to read this document.

This reduced the read vs write bus turnaround times enough to
get numbers that were significantly faster than nearly any
other method.


In particular, for this memory turnaround effect you've mentioned?

Scott
  #34  
Old November 1st 05, 06:27 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

David Kanter wrote:
how about some form of SMT for AMD?


I don't know that might come too, but it can't be done as easily as
Hyperthreading. Hyperthreading relied on the Pentium 4's inherent
inefficiency to run a lot of threads simultaneously.



If you think that any modern MPU is efficient, you are smoking crack.
They all have plenty of unused cycles left on the table (except when
running linpack).


But the secret is to have enough idle cycles to run both threads at
close to full speed each. I'd say anything that had enough to run both
threads at 80% full speed, was a reasonably successful SMT.
  #35  
Old November 1st 05, 06:38 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Stephen Fuld wrote:
Is there some technical reason behind the limitation to three HT links or
was it a marketing decision? If the latter, then it doesn't seem like it
would be a big deal, if larger systems seems to be a bigger market, to add
another link (or even two). The HT links must be a pretty small amount of
silicon and a small number of pins. Does that make sense?


I don't think there was any technical or marketing reason behind
limiting it to 3 HTT links per processor. It may have simply been a "we
need to keep the number HTT links and their pin counts within a
reasonable amount"-type decision. I'm sure they can add even more HT
links in the future.

Yousuf Khan
  #36  
Old November 1st 05, 06:39 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Oliver S. wrote:
If it added instructions to explicitly prefetch data from another
processor then it would probably have a gain in performance.



These instructions wouldn't work better than the prefetching-instructions
currently implemented. I think it would be cleverer to copy hw-scouting
from Sun's upcoming CPUs. HW-scouting is simple to implement if you're
going to have a SMT-core anyway.


So what's HW-scouting?

Yousuf Khan
  #37  
Old November 1st 05, 07:10 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

David Hopwood wrote:
Rob Stow wrote:

In an eight-way system most
are one hop away, while a few are two hops away.


No again. This would be the ideal 8P Opty 8xx scheme:

CPU6-----------------CPU7
| \ / |
| \ / |
| CPU4------CPU5 |
| | | |
| | | |
| CPU2------CPU[3] |
| / \ |
| / \ |
CPU0 CPU1
| |
| |
Chipset Chipset


Hence, there are 11 one-hops, 12 two-hops, and 5 three-hops.



That's not optimal:

CPU6--------------CPU7
| \_____ ____/ |
| \ / |
| X |
| / \ |
| CPU4---CPU5 |
| | | |
| | | |
| CPU2---CPU3 |
| / \ |
| / \ |
CPU0 CPU1
| |
| |
Chipset Chipset

11 one-hops, 16 two-hops, and 1 three-hop.


I don't get it, your diagram seems to be only a different permutation of
Rob's diagram. The only difference, in yours is that you got CPU5
connecting to CPU6 and CPU4 to CPU7, whereas in Rob's it was CPU4 to
CPU6 & CPU5 to CPU7. That little "x" you put in between doesn't
represent a shortcut, it represents one line going over the other but
not touching.

Listing all of the 3 hop combinations in yours and Rob's, this is what I
get.

Rob:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU7: 2-4-5-7 or 2-3-5-7
5: CPU3-CPU6: 3-2-4-6 or 3-5-4-6

David:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU6: 2-4-5-6 or 2-3-5-6
5: CPU3-CPU7: 3-2-4-7 or 3-5-4-7

Only #4 & #5 are different between your two respective diagrams.

Yousuf Khan
  #38  
Old November 1st 05, 07:18 AM
Yousuf Khan
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Oliver S. wrote:
And let's say it'll have 32 FP registers instead of just 16 like SSE
does.



Of course 32 registers would be better than 16, but I think we're well
behind a critical point with 16 fp-registers. I think these large regis-
ter-sets we see today on newer architectures exist rather because they
are easy to implement in a cpu than because of their necessity; in dif-
ferent words: the benefit of 32 or more registers isn't very high in
most cases, but their cost in terms of the chip-design is rather low
when your register-file shouldn't become too large.


Can't disagree with that.

Yousuf Khan
  #39  
Old November 1st 05, 07:48 AM
David Brown
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Yousuf Khan wrote:
David Hopwood wrote:

Rob Stow wrote:

In an eight-way system most
are one hop away, while a few are two hops away.


No again. This would be the ideal 8P Opty 8xx scheme:

CPU6-----------------CPU7
| \ / |
| \ / |
| CPU4------CPU5 |
| | | |
| | | |
| CPU2------CPU[3] |
| / \ |
| / \ |
CPU0 CPU1
| |
| |
Chipset Chipset


Hence, there are 11 one-hops, 12 two-hops, and 5 three-hops.




That's not optimal:

CPU6--------------CPU7
| \_____ ____/ |
| \ / |
| X |
| / \ |
| CPU4---CPU5 |
| | | |
| | | |
| CPU2---CPU3 |
| / \ |
| / \ |
CPU0 CPU1
| |
| |
Chipset Chipset

11 one-hops, 16 two-hops, and 1 three-hop.


I don't get it, your diagram seems to be only a different permutation of
Rob's diagram. The only difference, in yours is that you got CPU5
connecting to CPU6 and CPU4 to CPU7, whereas in Rob's it was CPU4 to
CPU6 & CPU5 to CPU7. That little "x" you put in between doesn't
represent a shortcut, it represents one line going over the other but
not touching.

Listing all of the 3 hop combinations in yours and Rob's, this is what I
get.

Rob:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU7: 2-4-5-7 or 2-3-5-7
5: CPU3-CPU6: 3-2-4-6 or 3-5-4-6

David:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU6: 2-4-5-6 or 2-3-5-6
5: CPU3-CPU7: 3-2-4-7 or 3-5-4-7

Only #4 & #5 are different between your two respective diagrams.

Yousuf Khan


The cross-over gives short-cuts for #2 to #5 (#0, CPU0-CPU1, is still a
3 hop):

2: CPU0-CPU5: 0-6-5
3: CPU1-CPU4: 1-7-4
4: CPU2-CPU6: 2-0-6
5: CPU3-CPU7: 3-1-7

mvh.,

David
  #40  
Old November 1st 05, 09:42 AM
Terje Mathisen
external usenet poster
 
Posts: n/a
Default AMD to leave x86 behind?

Yousuf Khan wrote:

David Hopwood wrote:

That's not optimal:

CPU6--------------CPU7
| \_____ ____/ |
| \ / |
| X |
| / \ |
| CPU4---CPU5 |
| | | |
| | | |
| CPU2---CPU3 |
| / \ |
| / \ |
CPU0 CPU1
| |
| |
Chipset Chipset

11 one-hops, 16 two-hops, and 1 three-hop.


I don't get it, your diagram seems to be only a different permutation of
Rob's diagram. The only difference, in yours is that you got CPU5
connecting to CPU6 and CPU4 to CPU7, whereas in Rob's it was CPU4 to
CPU6 & CPU5 to CPU7. That little "x" you put in between doesn't
represent a shortcut, it represents one line going over the other but
not touching.

Listing all of the 3 hop combinations in yours and Rob's, this is what I
get.

Rob:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU7: 2-4-5-7 or 2-3-5-7
5: CPU3-CPU6: 3-2-4-6 or 3-5-4-6

David:
1: CPU0-CPU1: 0-2-3-1
2: CPU0-CPU5: 0-2-3-5 or 0-2-4-5
3: CPU1-CPU4: 1-3-2-4 or 1-3-5-4
4: CPU2-CPU6: 2-4-5-6 or 2-3-5-6
5: CPU3-CPU7: 3-2-4-7 or 3-5-4-7

Only #4 & #5 are different between your two respective diagrams.


I think you've missed a key feature of that cross:

2: CPU0-CPU5: 0-6-5
3: CPU1-CPU4: 1-7-4
4: CPU2-CPU6: 2-0-6
5: CPU3-CPU7: 3-1-7

I.e. only the CPU0-CPU1 link has to pass over three hops.

Terje
--
-
"almost all programming can be viewed as an exercise in caching"
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Should I leave my printers on? OM Printers 22 August 8th 05 10:50 PM
Please leave in garage? John Hardaker UK Computer Vendors 1 May 14th 05 07:34 PM
Leave Dell 4600 PC Always On? Filipo General 6 September 15th 04 01:21 AM
Turn printer off or leave it on? Walter R. Printers 4 February 29th 04 08:18 PM
Should I leave well enough alone? Ken Fox Overclocking 1 January 25th 04 12:34 AM


All times are GMT +1. The time now is 12:17 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.