If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#241
|
|||
|
|||
On Mon, 04 Oct 2004 21:23:55 -0400, keith wrote:
On Mon, 04 Oct 2004 20:08:05 +0000, Felger Carbon wrote: One of the stories I got about dual-core (*initially* dual) cpus is that they were to solve the heat problem. So we just had IDF where 200 watt heat-sinks were on display for dual-core CPUs. What?? Umm, did you catch the link here earlier today, comparing the 90nm A64, 130nm A64, and 90nm P4? A P4 at 230W! Yeow! I passed that one around the office. ;-) But those were system figures, not merely processors no? The last time I looked, the average Opteron/A64 topped out at around 60W. So a good 100W of those figures are likely from the other components. This would make the P4 burn around 130~140W. So those 200W heatsinks wouldn't quite be needed just yet right? :PppP -- L.Angel: I'm looking for web design work. If you need basic to med complexity webpages at affordable rates, email me Standard HTML, SHTML, MySQL + PHP or ASP, Javascript. If you really want, FrontPage & DreamWeaver too. But keep in mind you pay extra bandwidth for their bloated code |
#242
|
|||
|
|||
"Nick Maclaren" wrote in message ... In article , Stephen Fuld wrote: "Nick Maclaren" wrote in message ... I am referring to the fair comparison between a 2-way SMT and a dual-core CMP using the same amount of silicon, power etc. THAT is what should have been compared - but I can find no evidence that it was (though it probably was). Probably because it can't be done. I think virtually everyone here believes that the extra silicon area for a two way SMP is much less than 100% of the die area of the core. Thus a two way SMP will use less die area, power, etc. than a two way CMP and the comparison that you specify can't be done. Let me repeat, I am not an SMP bigot. It seems to me that it is a usefull tool, along with others, including CMP in the designers tool box. As someone else has said, I expect the future to be combinations of both, along with multiple chips per PCB and multiple PCBs per system. In the above, you mean SMT, I assume. Yes, sorry. :-( It's been possible for at least 5 years, probably 10. Yes, the cores of a CMP system would necessarily be simpler, but it becomes possible as soon as the transistor count of the latest and greatest model in the range exceeds doubt that of the simplest. Well, roughly, and allowing for the difference between code and data transistors. But if you compare different cores, the more complex ones for the SMT (excluding the extra complexity of the SMT) versus a simpler one for the CMP, then you complicate the comparison by not comparing apples to apples. How much of the difference is the SMT vs CMP and how much is the difference in cores? One presumes the more complex core performs better than the simpler one (or why do the complex one). Besides, if the SMT die area penalty is in the 10% range that many have been quoting, can you do the "simpler" core in almost exactly 55% of the die area of the complex one? Once you change the core, you change the comparison such that I maintain that it isn't the same comparison any more and my original comment holds. Yes, a comparison with different cores could be done, but I can see why no one is very interested in doing it. -- - Stephen Fuld e-mail address disguised to prevent spam |
#243
|
|||
|
|||
On Mon, 4 Oct 2004, Robert Redelmeier wrote: Logic. When else can SMT really do net increased work? If you want to test, run some pointer-chasers. Ah, I was objecting to what I read as your claim that SMT is the best way to deal with 300cycles latency. Switch on event multithreading may help equally well, and chip multiprocessing may help more. From your point above here I suspect I misread, and that you are merely pointing out that latency tolerance is the best use for SMT. This is getting more and more true as caches grow, but only from an areal perspective. A multiplier still sucks back a huge amount of power and tosses it as heat. There is also the Pirhana concept of making the multiple cores simpler and not suck so much heat. I think the jury is out on CMP vs SMT. BTW, anyone see the Broadcom BCM1480 announcent. Four 1.2Ghz cores, quad issue in order MIPS with 3 HT ports in .09um, only draws 23W. CMP will also help the former Nope, not without a second memory bus and all those pins. I misread your original post - now I parsed it correctly, I agree completely on this point! Sorry for jumping in too hastily, Peter -- Robert Peter Boyle |
#244
|
|||
|
|||
You haven't allowed for the problem of access.
Access to what, please? Look at the performance counters, think of floating-point modes (in SMT, they may need to change for each operation), All thread-specific state information flows together with the instruction it belongs to through the pipeline. Yes, the amount of information you are sending along has increased - however, access to global state (e.g., FP mode flags) is costly as well, and I believe there have been implementations that have taken the route sketched above for performance reasons without SMT. think of quiescing the other CPU (needed for single to dual thread switching), think of interrupts (machine check needs one logic, and underflow another). In ALL cases, on two CPUs, each can operate independently, but SMT threads can't. So you SMT is a little asymmetric: you stop decoding/issuing instructions for all threads but one, and when they have drained the pipeline, you are back to the single-thread situation, and continue from there. This is at least correct behaviour, and if its performance impact is too great, you look at those subsets of situations where you can relax the constraints this imposes. Jan |
#245
|
|||
|
|||
code could be autoparallized by autoparallerizing compiler.
Yeah? Like who's? Cray, Sun, IBM, DEC, ... Oh, you mean performance is worse than for your hand-tuned MPI program? Yeah, but that _is_ the state of the art. Jan |
#246
|
|||
|
|||
Between the register file and the execution units, and between
execution units. The point is the days when 'wiring' was cheap are no more - at least according to every source I have heard! While the latter is true, with the former you are comparing apples and oranges - the starting point is adding, to a processor with a given set of resources (FUs, registers, ...) SMT-like capability. The wiring mentioned above does not change substantially - in the minimum of 2-thread SMT, all it must carry is one additional bit/wire to distinguish the two thread. No, they don't. Take performance counters. [...] The Pentium 4 kludges this horribly. Yeah, it seems they didn't completely think this through on the first round. So one imperfect implementation damns the concept? Methinks not. Jan |
#247
|
|||
|
|||
"Make" is run on workstations. It is not a legacy application for
personal computers. Ah blech. I'm mostly in the PC category - mail, editing, Excel & Co. But fairly regularly, I run a compute-intensive program - might be an applet running in my browser - and Winwoes broken scheduler gets me. Same when paging is occuring (another broken piece of software, the pager/swapper in Winwoes). In these cases, a second processor would help my productivity a lot. Less often, I even use make (in the form of pressing the Build button in an MSDS project for instance). And the guys in our software development team use the same type of system as I am using - does that turn them from a PC into a workstation? I think that distinction is dead, nowadays. Jan |
#248
|
|||
|
|||
Peter Boyle wrote:
BTW, anyone see the Broadcom BCM1480 announcent. Four 1.2Ghz cores, quad issue in order MIPS with 3 HT ports in .09um, only draws 23W. Or the recent Freescale MPC8641D announcement? Also 90nm, 15W(?), dual core 1.5GHz PPC G4 (4 issue (3+branch)), with dual 64-bit DDR2 memory interfaces on chip, 1MB L2 Cache per core, and RapidIO and GigE ports for fabric. I know that Altivec doesn't excite the double-precision-only guys in comp.arch (followups set -- sorry ibm-pc.h.c folk), but the processor density that you could build an array of these things at would be pretty wicked (tesselation of chip+DRAM, basically). And they do do double precision at some speed. Cheers, -- Andrew |
#249
|
|||
|
|||
"Make" is run on workstations. It is not a legacy application for
personal computers. Ah blech. I'm mostly in the PC category - mail, editing, Excel & Co. But fairly regularly, I run a compute-intensive program - might be an applet running in my browser - and Winwoes broken scheduler gets me. Same when paging is occuring (another broken piece of software, the pager/swapper in Winwoes). In these cases, a second processor would help my productivity a lot. Less often, I even use make (in the form of pressing the Build button in an MSDS project for instance). And the guys in our software development team use the same type of system as I am using - does that turn them from a PC into a workstation? I think that distinction is dead, nowadays. Jan Legacy performance improvements mean LITTLE when considering new CPU:s. Think Does you word need better performance or excell or powerpoint... Where do you need the performance most over the current CPU:s. A) Games. B) Video/GFX editing. C) Compilation. [Not for Joe MSuser.] D) Running more tasks simultaneously. A has quite potential for coarse grain parallerism. Running separate threads for AI, physics, networking and graphics, and OS and video drivers etc... B) Parallelization already done in many software products. C) Gcc is already paralerised. D) this is interesting at the moment, windows users have plenty of tasks, like Firewall, Mp3 player, P2P application, virus scanner running in background, and having one CPU for foreground and other for ALL background tasks do speed up foreground task, and gets rid of annoying pauses. Also remember plenty of transistors available and ILP and frequency hunting for extra transistors, is in such situation that doubling the transistor budget won't give much anymore, in that direction for X86. Remember P4 was double the size of P3 when it first came. At that point getting P4 bus and putting two P3:s CMP would equal in die area. So thats the reason why CMP is way to go. Hunting higher frequencies and more ILP isn't going to work like it used to work. So they need to hunt other use for additional transistors, so putting 2nd core is obvious choice. Jouni Osmala |
#250
|
|||
|
|||
Felger Carbon wrote: We have long had desktop SMP available. Question: what legacy software runs faster on two cores (whether on one or two chips) than on one? Answer: none. What about IDE? That seems to be rather cpu intensive when you're doing a lot of i/o. The economics are a little strange though. While a scsi processor is simpler and would offload the i/o processing somewhat, it's more expensive than a second cpu core. Joe Seigh |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Intel Prescott CPU in a Nutshell | LuvrSmel | Overclocking | 1 | January 10th 05 03:23 PM |
Intel chipsets are the most stable? | Grumble | Homebuilt PC's | 101 | October 26th 04 02:53 AM |
Real World Comparisons: AMD 3200 -vs- Intel 3.2. Your thoughts, experiences.... | Ted Grevers | General | 33 | February 6th 04 02:34 PM |
Intel & 65nm | Yousuf Khan | General | 0 | November 25th 03 01:18 AM |
Intel Updates Plans Again: Adds Pentium 4 EE at 3.40GHz and Pentium 4 at 3.40GHz | lyon_wonder | General | 2 | November 10th 03 11:17 PM |