If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
AMD to leave x86 behind?
And who will optimize? What was the name of AMD compiler again? what, no
compiler? .. great idea AMD pouring money to GCC guys would be a different story. Thats where AMD is becoming bigger now, in small linux server boxes. AFAIKT, compiler optimization doesn't matter that much to mainstream servers. it's not as if apache is compute-bound, for instance. in my part of the AMD-loving server market (HPC), compilers _do_ matter, but there are multiple good compilers available. if all our users had only C/C++ code, we'd consider using nothing but GCC. AMD has been demonstrating a penchant for earthy thinking recently. So I'd like to see them do a fused-mul-add using xmm registers. I don't know that it would make that much difference in real code, but it would certainly help Top500 scores. it wouldn't surprise me if the instructions it comes up with are designed for practical housekeeping rather than performance glory. Especially in the server realm. Perhaps instructions to explicitly share memory and cache contents between processors/cores? Instructions I'm not quite sure what that means. the address space is inherently shared, so... for optimizing multiple Hypertransport links. This is apparently as a result of requests by Sun Microsystems for future feature directions of Opterons. Of course, none of this would be relevant to stone-age Intel processors -- they don't have anything like this. well, if AMD added something like directory-based cache coherence onchip, it would make a HUGE difference, at least for larger SMP's. it's a bit hard to tell how relevant that would be for the majority of the market, though, since SMP's above, say 4 sockets are an extremely rarified market. in a sense, the trend toward multi-core chips fights the need for smarter coherency protocols, since quite fast machines can be built with fewer sockets (and therefore fewer hops during the coherence broadcast.) then again, I've always wondered why AMD didn't just produce, say, an 8-port HT "bridge" that contained a smart crossbar inside. how about some form of SMT for AMD? |
#22
|
|||
|
|||
AMD to leave x86 behind?
Mark Hahn wrote:
it wouldn't surprise me if the instructions it comes up with are designed for practical housekeeping rather than performance glory. Especially in the server realm. Perhaps instructions to explicitly share memory and cache contents between processors/cores? Instructions I'm not quite sure what that means. the address space is inherently shared, so... Not really, remember AMD's architecture is in actual fact a NUMA, even though it would want everyone to forget that and treat it like an SMP. So some memory is local to one processor, and some is local to another. If it added instructions to explicitly prefetch data from another processor then it would probably have a gain in performance. for optimizing multiple Hypertransport links. This is apparently as a result of requests by Sun Microsystems for future feature directions of Opterons. Of course, none of this would be relevant to stone-age Intel processors -- they don't have anything like this. well, if AMD added something like directory-based cache coherence onchip, it would make a HUGE difference, at least for larger SMP's. it's a bit hard to tell how relevant that would be for the majority of the market, though, since SMP's above, say 4 sockets are an extremely rarified market. in a sense, the trend toward multi-core chips fights the need for smarter coherency protocols, since quite fast machines can be built with fewer sockets (and therefore fewer hops during the coherence broadcast.) then again, I've always wondered why AMD didn't just produce, say, an 8-port HT "bridge" that contained a smart crossbar inside. I guess there wasn't as much demand for it. But also remember that adding a crossbar would result in a minimum of two hops for all processor-to-processor interactions. The crossbar itself would be one of the hops. Under present conditions, Opterons can traverse across any processors in a four-way system in one hop. In an eight-way system most are one hop away, while a few are two hops away. In a crossbar eight-way, all processors are two hops away no matter what. how about some form of SMT for AMD? I don't know that might come too, but it can't be done as easily as Hyperthreading. Hyperthreading relied on the Pentium 4's inherent inefficiency to run a lot of threads simultaneously. |
#23
|
|||
|
|||
AMD to leave x86 behind?
well, if AMD added something like directory-based cache coherence onchip,
it would make a HUGE difference, at least for larger SMP's. it's a bit hard to tell how relevant that would be for the majority of the market, though, since SMP's above, say 4 sockets are an extremely rarified market. in a sense, the trend toward multi-core chips fights the need for smarter coherency protocols, since quite fast machines can be built with fewer sockets (and therefore fewer hops during the coherence broadcast.) then again, I've always wondered why AMD didn't just produce, say, an 8-port HT "bridge" that contained a smart crossbar inside. I guess there wasn't as much demand for it. But also remember that adding a crossbar would result in a minimum of two hops for all processor-to-processor interactions. The crossbar itself would be one of the hops. Under present conditions, Opterons can traverse across any processors in a four-way system in one hop. In an eight-way system most are one hop away, while a few are two hops away. In a crossbar eight-way, all processors are two hops away no matter what. I agree with you here, shockingly enough. Especially, since if you want more compute power, you just slap in dual core MPUs. how about some form of SMT for AMD? I don't know that might come too, but it can't be done as easily as Hyperthreading. Hyperthreading relied on the Pentium 4's inherent inefficiency to run a lot of threads simultaneously. If you think that any modern MPU is efficient, you are smoking crack. They all have plenty of unused cycles left on the table (except when running linpack). David David |
#24
|
|||
|
|||
AMD to leave x86 behind?
Under present conditions, Opterons can traverse across any
processors in a four-way system in one hop. No. Each Opty 8xx has three HT links. If all are used for interprocessor communications - which would be required for a one-hop scenario in a 4P box - then none are left over for links to the outside world. A typical 4P Opty scheme is like this, dashed lines = HT links: CPU2---CPU3 | | | | | | CPU0---CPU1 | | | | Chipset Chipset Hence, there are two hops between the CPU1 and CPU2 and two hops between CPU0 and CPU3. In an eight-way system most are one hop away, while a few are two hops away. No again. This would be the ideal 8P Opty 8xx scheme: CPU6-----------------CPU7 | \ / | | \ / | | CPU4------CPU5 | | | | | | | | | | CPU2------CPU4 | | / \ | | / \ | CPU0 CPU1 | | | | Chipset Chipset Hence, there are 11 one-hops, 12 two-hops, and 5 three-hops. In a crossbar eight-way, all processors are two hops away no matter what. Horus will get you part of the way there. However, Horus only allows 4 CPUs per Horus chip, so in an 8P system the four CPUs on one Horus would be two hops away from each other, but would be three hops away from the four CPUs on the next Horus. Hence with Horus the mix would be 0 one-hops, 12 two-hops, and 16 three-hops. However, even then an 8P dual-core box with Horus is still supposed to be slightly better than an 8P non-Horus box, as claimed he http://www.aceshardware.com/read_news.jsp?id=80000550 There is supposedly going to be a 16 processor/32 cores Horus demo in November at some shindig in Seattle, but I can't remember where I read that. |
#25
|
|||
|
|||
AMD to leave x86 behind?
Rob Stow wrote:
In an eight-way system most are one hop away, while a few are two hops away. No again. This would be the ideal 8P Opty 8xx scheme: CPU6-----------------CPU7 | \ / | | \ / | | CPU4------CPU5 | | | | | | | | | | CPU2------CPU[3] | | / \ | | / \ | CPU0 CPU1 | | | | Chipset Chipset Hence, there are 11 one-hops, 12 two-hops, and 5 three-hops. That's not optimal: CPU6--------------CPU7 | \_____ ____/ | | \ / | | X | | / \ | | CPU4---CPU5 | | | | | | | | | | CPU2---CPU3 | | / \ | | / \ | CPU0 CPU1 | | | | Chipset Chipset 11 one-hops, 16 two-hops, and 1 three-hop. -- David Hopwood |
#26
|
|||
|
|||
AMD to leave x86 behind?
"David Hopwood" wrote in message k... Rob Stow wrote: In an eight-way system most are one hop away, while a few are two hops away. No again. This would be the ideal 8P Opty 8xx scheme: CPU6-----------------CPU7 | \ / | | \ / | | CPU4------CPU5 | | | | | | | | | | CPU2------CPU[3] | | / \ | | / \ | CPU0 CPU1 | | | | Chipset Chipset Hence, there are 11 one-hops, 12 two-hops, and 5 three-hops. That's not optimal: CPU6--------------CPU7 | \_____ ____/ | | \ / | | X | | / \ | | CPU4---CPU5 | | | | | | | | | | CPU2---CPU3 | | / \ | | / \ | CPU0 CPU1 | | | | Chipset Chipset 11 one-hops, 16 two-hops, and 1 three-hop. Is there some technical reason behind the limitation to three HT links or was it a marketing decision? If the latter, then it doesn't seem like it would be a big deal, if larger systems seems to be a bigger market, to add another link (or even two). The HT links must be a pretty small amount of silicon and a small number of pins. Does that make sense? -- - Stephen Fuld e-mail address disguised to prevent spam |
#27
|
|||
|
|||
AMD to leave x86 behind?
For instance, the x86 interrupt model really needs a re-think ...
-v |
#28
|
|||
|
|||
AMD to leave x86 behind?
And let's say it'll have 32 FP registers instead of just 16 like SSE does.
Of course 32 registers would be better than 16, but I think we're well behind a critical point with 16 fp-registers. I think these large regis- ter-sets we see today on newer architectures exist rather because they are easy to implement in a cpu than because of their necessity; in dif- ferent words: the benefit of 32 or more registers isn't very high in most cases, but their cost in terms of the chip-design is rather low when your register-file shouldn't become too large. |
#29
|
|||
|
|||
AMD to leave x86 behind?
If it added instructions to explicitly prefetch data from another
processor then it would probably have a gain in performance. These instructions wouldn't work better than the prefetching-instructions currently implemented. I think it would be cleverer to copy hw-scouting from Sun's upcoming CPUs. HW-scouting is simple to implement if you're going to have a SMT-core anyway. I guess there wasn't as much demand for it. Of course not, but AMD might get into that market in the future. |
#30
|
|||
|
|||
AMD to leave x86 behind?
Mark Hahn wrote: then again, I've always wondered why AMD didn't just produce, say, an 8-port HT "bridge" that contained a smart crossbar inside. I guess they are waiting to see how horus turns out... http://www.hypertransport.org/docs/t...aper_final.pdf |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Should I leave my printers on? | OM | Printers | 22 | August 8th 05 10:50 PM |
Please leave in garage? | John Hardaker | UK Computer Vendors | 1 | May 14th 05 07:34 PM |
Leave Dell 4600 PC Always On? | Filipo | General | 6 | September 15th 04 01:21 AM |
Turn printer off or leave it on? | Walter R. | Printers | 4 | February 29th 04 09:18 PM |
Should I leave well enough alone? | Ken Fox | Overclocking | 1 | January 25th 04 01:34 AM |