65nm news from Intel

#**241** October 5th 04, 03:50 AM

On Mon, 04 Oct 2004 21:23:55 -0400, keith wrote:

On Mon, 04 Oct 2004 20:08:05 +0000, Felger Carbon wrote:
One of the stories I got about dual-core (*initially* dual) cpus is that
they were to solve the heat problem. So we just had IDF where 200 watt
heat-sinks were on display for dual-core CPUs. What??

Umm, did you catch the link here earlier today, comparing the 90nm A64,
130nm A64, and 90nm P4? A P4 at 230W! Yeow! I passed that one around
the office. ;-)

But those were system figures, not merely processors no? The last time
I looked, the average Opteron/A64 topped out at around 60W. So a good
100W of those figures are likely from the other components. This would
make the P4 burn around 130~140W. So those 200W heatsinks wouldn't
quite be needed just yet right? :PppP

--
L.Angel: I'm looking for web design work.
If you need basic to med complexity webpages at affordable rates, email me

Standard HTML, SHTML, MySQL + PHP or ASP, Javascript.
If you really want, FrontPage & DreamWeaver too.
But keep in mind you pay extra bandwidth for their bloated code

#**242** October 5th 04, 06:05 AM

"Nick Maclaren" wrote in message
...
In article ,
Stephen Fuld wrote:

"Nick Maclaren" wrote in message
...

I am referring to the fair comparison between a 2-way SMT and a
dual-core CMP using the same amount of silicon, power etc. THAT
is what should have been compared - but I can find no evidence
that it was (though it probably was).

Probably because it can't be done. I think virtually everyone here
believes
that the extra silicon area for a two way SMP is much less than 100% of
the
die area of the core. Thus a two way SMP will use less die area, power,
etc. than a two way CMP and the comparison that you specify can't be done.
Let me repeat, I am not an SMP bigot. It seems to me that it is a usefull
tool, along with others, including CMP in the designers tool box. As
someone else has said, I expect the future to be combinations of both,
along
with multiple chips per PCB and multiple PCBs per system.

In the above, you mean SMT, I assume.

Yes, sorry. :-(

It's been possible for at least 5 years, probably 10. Yes, the cores
of a CMP system would necessarily be simpler, but it becomes possible
as soon as the transistor count of the latest and greatest model in
the range exceeds doubt that of the simplest. Well, roughly, and
allowing for the difference between code and data transistors.

But if you compare different cores, the more complex ones for the SMT
(excluding the extra complexity of the SMT) versus a simpler one for the
CMP, then you complicate the comparison by not comparing apples to apples.
How much of the difference is the SMT vs CMP and how much is the difference
in cores? One presumes the more complex core performs better than the
simpler one (or why do the complex one). Besides, if the SMT die area
penalty is in the 10% range that many have been quoting, can you do the
"simpler" core in almost exactly 55% of the die area of the complex one?
Once you change the core, you change the comparison such that I maintain
that it isn't the same comparison any more and my original comment holds.

Yes, a comparison with different cores could be done, but I can see why no
one is very interested in doing it.

--
- Stephen Fuld
e-mail address disguised to prevent spam

#**243** October 5th 04, 06:46 AM

On Mon, 4 Oct 2004, Robert Redelmeier wrote:

Logic. When else can SMT really do net increased work?
If you want to test, run some pointer-chasers.

Ah, I was objecting to what I read as your claim that SMT is the best
way to deal with 300cycles latency. Switch on event multithreading may
help equally well, and chip multiprocessing may help more.
From your point above here I suspect I misread, and that you are merely
pointing out that latency tolerance is the best use for SMT.

This is getting more and more true as caches grow, but only
from an areal perspective. A multiplier still sucks back a
huge amount of power and tosses it as heat.

There is also the Pirhana concept of making the multiple cores
simpler and not suck so much heat. I think the jury is out on
CMP vs SMT.

BTW, anyone see the Broadcom BCM1480 announcent. Four 1.2Ghz cores, quad
issue in order MIPS with 3 HT ports in .09um, only draws 23W.

CMP will also help the former

Nope, not without a second memory bus and all those pins.

I misread your original post - now I parsed it correctly, I agree
completely on this point!

Sorry for jumping in too hastily,

Peter

-- Robert

Peter Boyle

#**244** October 5th 04, 08:15 AM

You haven't allowed for the problem of access.

Access to what, please?

Look at the performance counters, think of floating-point modes
(in SMT, they may need to change for each operation),

All thread-specific state information flows together with the instruction
it belongs to through the pipeline. Yes, the amount of information you
are sending along has increased - however, access to global state (e.g.,
FP mode flags) is costly as well, and I believe there have been
implementations that have taken the route sketched above for performance
reasons without SMT.

think of quiescing the other CPU (needed for single to dual thread
switching), think of interrupts (machine check needs one logic, and
underflow another). In ALL cases, on two CPUs, each can operate
independently, but SMT threads can't.

So you SMT is a little asymmetric: you stop decoding/issuing instructions
for all threads but one, and when they have drained the pipeline, you are
back to the single-thread situation, and continue from there. This is at
least correct behaviour, and if its performance impact is too great, you
look at those subsets of situations where you can relax the constraints
this imposes.

Jan

#**245** October 5th 04, 08:16 AM

code could be autoparallized by autoparallerizing compiler.
Yeah? Like who's?

Cray, Sun, IBM, DEC, ...

Oh, you mean performance is worse than for your hand-tuned MPI program?
Yeah, but that _is_ the state of the art.

Jan

#**246** October 5th 04, 08:23 AM

Between the register file and the execution units, and between
execution units. The point is the days when 'wiring' was cheap
are no more - at least according to every source I have heard!

While the latter is true, with the former you are comparing apples
and oranges - the starting point is adding, to a processor with a given
set of resources (FUs, registers, ...) SMT-like capability. The wiring
mentioned above does not change substantially - in the minimum of
2-thread SMT, all it must carry is one additional bit/wire to distinguish
the two thread.

No, they don't. Take performance counters. [...] The
Pentium 4 kludges this horribly.

Yeah, it seems they didn't completely think this through on the first
round. So one imperfect implementation damns the concept? Methinks not.

Jan

#**247** October 5th 04, 08:29 AM

"Make" is run on workstations. It is not a legacy application for
personal computers.

Ah blech. I'm mostly in the PC category - mail, editing, Excel & Co.
But fairly regularly, I run a compute-intensive program - might be an
applet running in my browser - and Winwoes broken scheduler gets me.
Same when paging is occuring (another broken piece of software, the
pager/swapper in Winwoes). In these cases, a second processor would
help my productivity a lot. Less often, I even use make (in the form
of pressing the Build button in an MSDS project for instance). And
the guys in our software development team use the same type of system
as I am using - does that turn them from a PC into a workstation?

I think that distinction is dead, nowadays.

Jan

#**248** October 5th 04, 09:01 AM

Peter Boyle wrote:
BTW, anyone see the Broadcom BCM1480 announcent. Four 1.2Ghz cores, quad
issue in order MIPS with 3 HT ports in .09um, only draws 23W.

Or the recent Freescale MPC8641D announcement? Also 90nm, 15W(?),
dual core 1.5GHz PPC G4 (4 issue (3+branch)), with dual 64-bit
DDR2 memory interfaces on chip, 1MB L2 Cache per core, and RapidIO
and GigE ports for fabric.

I know that Altivec doesn't excite the double-precision-only guys
in comp.arch (followups set -- sorry ibm-pc.h.c folk), but the
processor density that you could build an array of these things at
would be pretty wicked (tesselation of chip+DRAM, basically).
And they do do double precision at some speed.

Cheers,

--
Andrew

#**249** October 5th 04, 09:26 AM

"Make" is run on workstations. It is not a legacy application for
personal computers.

Ah blech. I'm mostly in the PC category - mail, editing, Excel & Co.
But fairly regularly, I run a compute-intensive program - might be an
applet running in my browser - and Winwoes broken scheduler gets me.
Same when paging is occuring (another broken piece of software, the
pager/swapper in Winwoes). In these cases, a second processor would
help my productivity a lot. Less often, I even use make (in the form
of pressing the Build button in an MSDS project for instance). And
the guys in our software development team use the same type of system
as I am using - does that turn them from a PC into a workstation?

I think that distinction is dead, nowadays.

Jan

Legacy performance improvements mean LITTLE when considering new CPU:s.
Think Does you word need better performance or excell or powerpoint...
Where do you need the performance most over the current CPU:s.
A) Games.
B) Video/GFX editing.
C) Compilation. [Not for Joe MSuser.]
D) Running more tasks simultaneously.

A has quite potential for coarse grain parallerism. Running separate
threads for AI, physics, networking and graphics, and OS and video
drivers etc...
B) Parallelization already done in many software products.
C) Gcc is already paralerised.
D) this is interesting at the moment, windows users have plenty of
tasks, like Firewall, Mp3 player, P2P application, virus scanner running
in background, and having one CPU for foreground and other for ALL
background tasks do speed up foreground task, and gets rid of annoying
pauses.

Also remember plenty of transistors available and ILP and frequency
hunting for extra transistors, is in such situation that doubling the
transistor budget won't give much anymore, in that direction for X86.
Remember P4 was double the size of P3 when it first came. At that point
getting P4 bus and putting two P3:s CMP would equal in die area.
So thats the reason why CMP is way to go. Hunting higher frequencies and
more ILP isn't going to work like it used to work. So they need to hunt
other use for additional transistors, so putting 2nd core is obvious choice.

Jouni Osmala

#**250** October 5th 04, 11:40 AM

Felger Carbon wrote:

We have long had desktop SMP available. Question: what legacy
software runs faster on two cores (whether on one or two chips) than
on one? Answer: none.

What about IDE? That seems to be rather cpu intensive when you're doing
a lot of i/o. The economics are a little strange though. While a scsi
processor is simpler and would offload the i/o processing somewhat, it's
more expensive than a second cpu core.

Joe Seigh

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Intel Prescott CPU in a Nutshell	LuvrSmel	Overclocking	1	January 10th 05 03:23 PM
Intel chipsets are the most stable?	Grumble	Homebuilt PC's	101	October 26th 04 02:53 AM
Real World Comparisons: AMD 3200 -vs- Intel 3.2. Your thoughts, experiences....	Ted Grevers	General	33	February 6th 04 02:34 PM
Intel & 65nm	Yousuf Khan	General	0	November 25th 03 01:18 AM
Intel Updates Plans Again: Adds Pentium 4 EE at 3.40GHz and Pentium 4 at 3.40GHz	lyon_wonder	General	2	November 10th 03 11:17 PM