If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#211
|
|||
|
|||
In comp.sys.ibm.pc.hardware.chips keith wrote:
I'm not against SMP at all, if it's free I'll take it (and have predicted multiple core processors here for at leat five years), but to say it's somehow "free" today, is *nutz*. Even a short few years ago I stated that two complete systeems were better than one dual. I think the line is crossing soon to the dual-porcessor, but I'd rather have two systems. ...both duals soon. ;-) I think you're a bit behind the times I've been running an Abit BP6 (dual OC Celerons) as my main machine since July 1999. Current uptime 195 days. IIRC when I built it, the premium for dual was $75. Effectively zero, especially considering the life extention. But two complete systems are still better for some things (backup, MS-Windows) and always will be. -- Robert |
#212
|
|||
|
|||
On Mon, 04 Oct 2004 01:34:39 +0000, Robert Redelmeier wrote:
In comp.sys.ibm.pc.hardware.chips keith wrote: I'm not against SMP at all, if it's free I'll take it (and have predicted multiple core processors here for at leat five years), but to say it's somehow "free" today, is *nutz*. Even a short few years ago I stated that two complete systeems were better than one dual. I think the line is crossing soon to the dual-porcessor, but I'd rather have two systems. ...both duals soon. ;-) I think you're a bit behind the times Well, I was talking about single-chip SMP. Even at that it was rather obvious (I believe I argued with Fleger over this). What else to do with infinite transistor budgets after caches? Actually *designing* a way of using transistors is exponentially difficult. Doubling cacches is more or less linear, as is another processor. I've been running an Abit BP6 (dual OC Celerons) as my main machine since July 1999. Current uptime 195 days. IIRC when I built it, the premium for dual was $75. Effectively zero, especially considering the life extention. When I looked (a few months ago) a decent dual AthlonMP board was around $400, with the processors at a rather premium too. I was *considering a dual K7 at the time, rather than a single K8. The duals lost because of the cost. It would have been cheaper to upgrade the second system than go SMP. But two complete systems are still better for some things (backup, MS-Windows) and always will be. ....particularly when Linux is on this one. ;-) -- Keith |
#213
|
|||
|
|||
In article , keith writes: | On Sun, 03 Oct 2004 09:24:06 -0700, Eugene Miya wrote: | Stefan Monnier wrote: | | Your second CPU will be mostly idle, of course, but so is the first CPU | anyway ;-) | | Yeah, but that's not bad. | 2nd CPUs are cheap these days. | | You may htinf the second is "cheap", but I don't. The second CPU and the | board that dgoes with it are certainly *not* "cheap". What board? The cost difference is far more marketing than production. Dual CPU boards are sold as 'servers' and as 'performance workstations', both at a premium. They could equally well be sold with the same margin as the 'economy' boards. Regards, Nick Maclaren. |
#214
|
|||
|
|||
In comp.sys.ibm.pc.hardware.chips keith wrote:
Well, I was talking about single-chip SMP. Sorry, I missed that upthread. What else to do with infinite transistor budgets after caches? A very good point. SMT is a fairly simple thing. Orthogonal to other efforts to improve performance. Actually *designing* a way of using transistors is exponentially difficult. True enough. You run out of orthogonalities When I looked (a few months ago) a decent dual AthlonMP board was around $400, with the processors at a rather premium too. Decent? What do you classify as decent? I see'em around $200, and surely you don't shy away from fixing painted jumpers? I figure the dual premium is around $200 now. ...particularly when Linux is on this one. ;-) Oh, I see you're still running the K6-3. No reason to stop. -- Robert |
#215
|
|||
|
|||
In article , Robert Redelmeier writes: | | Well, I was talking about single-chip SMP. | | Sorry, I missed that upthread. | | What else to do with infinite transistor | budgets after caches? | | A very good point. SMT is a fairly simple thing. | Orthogonal to other efforts to improve performance. Boggle. If it were either, let alone both, it would be vastly more effective. Regards, Nick Maclaren. |
#216
|
|||
|
|||
In comp.sys.ibm.pc.hardware.chips Nick Maclaren wrote:
Robert Redelmeier writes: | A very good point. SMT is a fairly simple thing. | Orthogonal to other efforts to improve performance. Boggle. If it were either, let alone both, it would be vastly more effective. SMT is simple in that "all" that needs be done is create duplicate state machines (register sets) to create "virtual CPUs". Add some (not too much) fairness to the hardware scheduler and thread through the retirement unit. The main execution pipeline (ROB, ports, exec units) remains unchanged. "Vastly more effective" is a comparative term. What do you expect? SMT won't match SMP under most circumstances. You don't have the ports or exec units! It'll be particularly lame on the P7 because that throwback is short of issue ports. Code type matters. SMT is best for continuing work during the ~300 clock memory fetch latency. You'd rather the CPU just stall? But most optimized code has already done prefetching and is either bandwidth or compute limited. SMT will help with neither. SMP will only help the latter. -- Robert |
#217
|
|||
|
|||
In article , Robert Redelmeier writes: | In comp.sys.ibm.pc.hardware.chips Nick Maclaren wrote: | Robert Redelmeier writes: | | A very good point. SMT is a fairly simple thing. | | Orthogonal to other efforts to improve performance. | | Boggle. If it were either, let alone both, it would be | vastly more effective. | | SMT is simple in that "all" that needs be done is create | duplicate state machines (register sets) to create "virtual | CPUs". Add some (not too much) fairness to the hardware | scheduler and thread through the retirement unit. The main | execution pipeline (ROB, ports, exec units) remains unchanged. That is wrong, completely so. You DON'T just create duplicate register sets, but have to "dual port" every execution unit - possible by creating a single set of double the length, and create some new scheduling to manage it. You have to move some privileged registers and state from out of (logically) the execution units to the register sets. You have to mangle any performance counters and many privileged registers fairly horribly, because their meanings and constraints change. Similarly, you have to add logic for CPU state change synchronisation, because some changes must affect only the current thread and some must affect both. And you have to handle the case of the two threads attempting incompatible operations simultaneously. Oh, of course, none of this affects the main flow of control, but all forms of real engineering (as distinct from academic demonstrations and marketing) are as much or more about the problem cases as the normal ones. | "Vastly more effective" is a comparative term. What do you | expect? SMT won't match SMP under most circumstances. ... My suspicion is that it wouldn't match CMP, with the same amount of real estate, under most circumstances. But that is pure speculation AS IS THE CLAIM OF THE CONVERSE until and unless someone does some proper analysis. Regards, Nick Maclaren. |
#218
|
|||
|
|||
On Mon, 4 Oct 2004, Robert Redelmeier wrote: Code type matters. SMT is best for continuing work during the ~300 clock memory fetch latency. What is the evidence to back up this claim? Not theories, but _evidence_ of bigger speed up compared to, for example, switch on event multi-threading, or CMP with simpler and smaller processors, but not sharing L1 cache. Note that I'm not claiming evidence the other way, but as far as I can tell the jury is out on the best organisation for concurrency on chip. I would however claim that functional units are almost free, and that the best organisation will win in the long run, not necessarily the one that best uses a finite number of functional units. But most optimized code has already done prefetching and is either bandwidth or compute limited. SMT will help with neither. SMP will only help the latter. CMP will also help the former. Peter -- Robert |
#219
|
|||
|
|||
In comp.sys.ibm.pc.hardware.chips Nick Maclaren wrote:
That is wrong, completely so. Interesting. Do you have specific specialised knowledge? Or some reference to exactly how SMT has been implemented? You DON'T just create duplicate register sets, but have to "dual port" every execution unit - possible by creating a single set of double the length, and create some new scheduling to manage it. This is an awful lot of work compared to simply tagging each instruction with a thread number which indicates which register set to operate upon. Then letting everything run through with the extra bits catching dependancies. You have to mangle any performance counters and many privileged registers fairly horribly, because their meanings and constraints change. Similarly, you have to add logic for CPU state change synchronisation, because some changes must affect only the current thread and some must affect both. I wouldn't expect SMT to _always_ run multi-threaded. The name is _Symmetrical_ Multi Threading. The moment the execution environment is driven assymmetrical, I expect failures. Some changes might require an IPI to restart And you have to handle the case of the two threads attempting incompatible operations simultaneously. Usually this is handled by the OS. Oh, of course, none of this affects the main flow of control, but all forms of real engineering (as distinct from academic demonstrations and marketing) are as much or more about the problem cases as the normal ones. It's still engineering if it works 99% of the time so long as it doesn't fail catastrophically in the other 1%. I see SMT as a simple, cheap way to use fetch wait cycles. It just needs to work in the common case, two+ pmode threads (maybe multiple rings) with different pagemaps. Of course you can probably make it break. Then you deserve what you get. -- Robert |
#220
|
|||
|
|||
"Nick Maclaren" wrote in message
... In article , Robert Redelmeier writes: | In comp.sys.ibm.pc.hardware.chips Nick Maclaren wrote: | Robert Redelmeier writes: | | A very good point. SMT is a fairly simple thing. | | Orthogonal to other efforts to improve performance. | | Boggle. If it were either, let alone both, it would be | vastly more effective. | | SMT is simple in that "all" that needs be done is create | duplicate state machines (register sets) to create "virtual | CPUs". Add some (not too much) fairness to the hardware | scheduler and thread through the retirement unit. The main | execution pipeline (ROB, ports, exec units) remains unchanged. That is wrong, completely so. You DON'T just create duplicate register sets, but have to "dual port" every execution unit - possible by creating a single set of double the length, and create some new scheduling to manage it. You have to move some privileged registers and state from out of (logically) the execution units to the register sets. I think that Nick is muddled on this one. If the base implementation is already OoO then there will normally be many more physical registers than architected ones. To go two-way SMT may not involve adding any physical registers, but rather involve changes to renaming. "dual port" every execution unit doesn't make much sense to me. Access to execution units from either virtual processor is essentially free - they are after all virtual processors, not real. What is required is that every bit of *architected* processor state be renamed or duplicated, prehaps that's what Nick is getting at? You have to mangle any performance counters and many privileged registers fairly horribly, because their meanings and constraints change. Similarly, you have to add logic for CPU state change synchronisation, because some changes must affect only the current thread and some must affect both. And you have to handle the case of the two threads attempting incompatible operations simultaneously. What operations are incompatible. SMT as implemented in the Pentium 4, say, allows either virtual processor to do what it likes. One can transition from user to kernel and back while the other services interrupts or exceptions or whatever. The only coordination needed for proper operation is what is needed for two processors - of course the performance may suffer though. Oh, of course, none of this affects the main flow of control, but all forms of real engineering (as distinct from academic demonstrations and marketing) are as much or more about the problem cases as the normal ones. | "Vastly more effective" is a comparative term. What do you | expect? SMT won't match SMP under most circumstances. ... My suspicion is that it wouldn't match CMP, with the same amount of real estate, under most circumstances. But that is pure speculation AS IS THE CLAIM OF THE CONVERSE until and unless someone does some proper analysis. yes lots of speculation. The difference here is that to CMP processor take about twice the silicon of one, while with SMT you have the option to use 1.5 cores worth of silicon. Perhaps once dual cores is cheap and easy SMT will die because its more effort than its worth, but my bet is that chips will go both routes with SMT and CMP. Just one more little problem for the OS developers to deal with Regards, Nick Maclaren. Peter |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Intel Prescott CPU in a Nutshell | LuvrSmel | Overclocking | 1 | January 10th 05 03:23 PM |
Intel chipsets are the most stable? | Grumble | Homebuilt PC's | 101 | October 26th 04 02:53 AM |
Real World Comparisons: AMD 3200 -vs- Intel 3.2. Your thoughts, experiences.... | Ted Grevers | General | 33 | February 6th 04 02:34 PM |
Intel & 65nm | Yousuf Khan | General | 0 | November 25th 03 01:18 AM |
Intel Updates Plans Again: Adds Pentium 4 EE at 3.40GHz and Pentium 4 at 3.40GHz | lyon_wonder | General | 2 | November 10th 03 11:17 PM |