Interesting read about upcoming K9 processors

#**221** August 6th 04, 05:11 PM

Ken Hagan wrote:

Stephen Sprunk wrote:

M$ spent a lot of effort moving from 16-bit to 32-bit code, and it's
possible at the time they expected their code to be obsolete before
64-bit systems took over -- 4GB is enough for anybody, right? I'm
sure they didn't expect their newly-ported 32-bit code to require
porting again in less than a decade, given 16-bit x86 code had been
around for two decades.

It's not *that* surprising that 64-bit systems are now arriving on
desktops. (But then again, by now it's been more than a decade.)

Back in, ooo, 1990 or thereabouts, I can remember plotting a graph of
memory size in the average bargain bucket PC versus date. OK, they were
my guesses and taken only over a decade, not proper data, but the graph
cut through the 4GB line around 2005. I predicted then that my OS would
be 64-bit, even if none of my applications were. (smug)

Sure hope you don't break your arm patting yourself on the back!

#**222** August 6th 04, 05:24 PM

"Yousuf Khan" wrote in message
. rogers.com...
Stephen Sprunk wrote:
Does an OS really need to be aware of the difference between two
cores on the same chip?

Linux has a concept of a NUMA "node", where all of the processors in
a node are considered equivalent. It'll still try to schedule
threads on the same CPU they last ran on, but next it will try other
CPUs in the same node before giving up and sending it to any
available node.

IIRC, the code already understands two-CPU nodes, because that is how
Intel's SMT chips are handled. Treating K8 CMP the same way sounds
correct, once AMD releases specs on how to recognize dual-core chips.

I'm sure as a first cut, not treating them specially is the right way to
go.
But eventually everybody tries to optimize down to the bone.

From a NUMA-aware scheduler's perspective, what's different between 2-way
SMT and 2-way CMP?

AMD is even suggesting not treating Hypertransport as NUMA but as
simple SMP is quite acceptable, and this suggestion is likely to hold for
dual-cores too (probably even more so).

Once the bugs were worked out, Linux showed much better performance (for 2+
way Opeterons) when it was made NUMA-aware. Treating the system as simple
SMP is acceptable, but demonstrably not optimal.

S

--
Stephen Sprunk "Those people who think they know everything
CCIE #3723 are a great annoyance to those of us who do."
K5SSS --Isaac Asimov

#**223** August 6th 04, 05:46 PM

Stephen Sprunk wrote:
M$ spent a lot of effort moving from 16-bit to 32-bit code, and it's
possible at the time they expected their code to be obsolete before
64-bit systems took over -- 4GB is enough for anybody, right? I'm
sure they didn't expect their newly-ported 32-bit code to require
porting again in less than a decade, given 16-bit x86 code had been
around for two decades. And, given that most programmers don't stay
in the same job for a decade, why would one expect them to plan for
the future (other than style)?

The move from 16-bit to 32-bit Windows was doubly difficult, because of the
need to replace segment-based addressing with linear addressing. That is not
an issue with the 32-bit to 64-bit port. I think people are justifiably
disappointed in MS, because this one should've been smooth as silk. Their
kernel was ported quickly, but nothing else is being ported quickly.

I don't think it's entirely as a result of spaghetti code. I think it's also
as a result of some dufusy arbitrary decisions that MS made. For example,
they decided not support 16-bit protected mode apps in Win64, even though
the AMD64 architecture has no problems running either 16-bit or 32-bit
protected-mode apps in compatibility mode. They've also decided not to allow
the use of x87 FPU under 64-bit mode, relying solely on SSE, even though
AMD64 is perfectly fine with either or both; now I don't know if this is
actually going to be a problem to developers, but it does show that MS is
taking arbitrary decisions. Then it hasn't created a 32-bit to 64-bit device
driver thunking layer, which would've given device driver manufacturers an
additional amount of time to port full 64-bit drivers to Windows.

Yousuf Khan

#**224** August 6th 04, 05:46 PM

Dean Kent wrote:
As usual the Kentster's way of impolitely calling someone a liar...
and not only me. The roadmaps *did* exist! Were they official
roadmaps like those issued to the i-Stooges in your quixotic,
privileged position?... nope!
Were they published in magazines and Web sites?... yup! The
evidence has vanished along with bubble memory cheers and i860
effervescence - seems
like you were not paying attention.

We've now even dug up some old historical webpages (possibly written
in parchment or papyrus or something) from the early days of the
commercial Internet which states exactly why we thought Intel's
plans were to go towards IA-64. Yet, he still needs to argue. Some
people are just beyond quixotic!

1) The usual tactics of .chips denizens who use character
assasination, innuendo and fallacy to present an argument instead of
actually using facts and evidence. Add to that the cries of "I been
wronged" while using uncomplimentary names and implied accusations,
and the pattern is complete.

big snip

Please do keep describing yourself perfectly Dean. And please to keep on
tilting at those windmills. Bye.

Yousuf Khan

#**225** August 6th 04, 06:38 PM

Yousuf Khan wrote:
....
I think it's also as a result of some dufusy arbitrary decisions that MS made.
For example,
....
They've also decided not to allow
the use of x87 FPU under 64-bit mode, relying solely on SSE, even though
AMD64 is perfectly fine with either or both; now I don't know if this is
actually going to be a problem to developers, but it does show that MS is
taking arbitrary decisions.
...

I think this particular decision is not without its merit,
so calling it arbitrary is too harsh.

It would make some people's life easier immediately
(e.g. compiler, math library, etc)
although I'll have to admit it is debatable how much easier.
In the longer term, they can drop the support in the ISA
to recover opcode space and avoid supporting cost in the chip.
It will take long time to be able to actually gain from this decision
but dropping them now when trasitioning to 64bit ABI
is much easier than trying to do it later.

For example, SPARC ABI should have outright *banned* the use of y register
(sdiv,udiv,rdy, etc) in the new 64bit (v9) or extended 32bit (v8plus),
but they just kept it for compatibility's sake.
They were declared "deprecated" but the new ABI didn't ban the use.
Since sdiv/udiv performed better than sdivx/udivx and was allowed in ABI,
compilers continue to use sdiv/udiv even in v8plus/v9.
And now it's much more difficult to remove.

All in all, I'm always in favor of dropping unnecessary instructions
from ABI - so that it can be dropped from the ISA later.
It's next to impossible to do this
without introducing major pain in the existing ABI,
so this 64bit transition of x86 is a good opportunity to drop
some mistakes of the past.
--
#pragma ident "Seongbae Park, compiler, http://blogs.sun.com/seongbae/"

#**226** August 6th 04, 06:42 PM

Stephen Sprunk wrote:
From a NUMA-aware scheduler's perspective, what's different between
2-way SMT and 2-way CMP?

It's not really the 2-ways that we're talking about here. We're talking
about the enterprise and datacentre classes, which would range from 4-way to
64-way or more. I think any old memory-organization model would do for 2-way
systems.

AMD is even suggesting not treating Hypertransport as NUMA but as
simple SMP is quite acceptable, and this suggestion is likely to
hold for dual-cores too (probably even more so).

Once the bugs were worked out, Linux showed much better performance
(for 2+ way Opeterons) when it was made NUMA-aware. Treating the
system as simple SMP is acceptable, but demonstrably not optimal.

True, but Linux has been around right from the beginning, and they are well
beyond the first-cut stage of their kernel development. They are at the
optimize right down to the bone stage now.

Yousuf Khan

#**227** August 6th 04, 07:05 PM

(Nick Maclaren) writes:

And then think about how customers would react if told that the ONLY
future platforms for HP-UX, VMS and NonStop had just been cancelled.
Overjoyed isn't the word I would use ....

I supose they could drag out the dead and try to revive the Alpha, or
ask someone else to for VMS. Non-Stop has been advanced so as to
flashingrednot require a lock-step CPU any more/flashingred. PHUX
can crawl away and smell in a corner, it has nothing to offer over any
other unix. Less in many cases in fact.

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.

#**228** August 6th 04, 09:20 PM

In article ,
Stephen Sprunk wrote:

From a NUMA-aware scheduler's perspective, what's different between 2-way
SMT and 2-way CMP?

About ten times as much as the difference between two-way CMP and
dual CPUs on a board?

The point about CMP is that it is merely the latest stage in the
integration of multiple SMP CPUs. Many of us remember when each
such CPU was in a separate box and built out of multiple boards!
There is, however, little logical difference between them and two
CPUs on a die - the only question is the level at which they share
cache (including TLBs).

Eggers-style SMT is entirely different, and the two cores CAN'T be
scheduled as if they were separate CPUs. The timing issues are a
problem (think NTP), the Pentium 4 has the problem of a single set
of performance registers, but those pale into insignificance compared
with the problem of the mode of one CPU affecting another. I don't
know how serious this is for the Pentium 4, as the documents are not
trivially accessible, but there are a lot of place in the public
documents where I mentally raised a flag.

For example, consider the facility to switch between using both cores
and using a single double-speed one. This is not SOLELY decided by
the BIOS, and the first-level interrupt handler has to be aware of
the state and perhaps even control it. That, in turn, affects the
scheduler, because each core may generate an interrupt at any time.

Regards,
Nick Maclaren.

#**229** August 6th 04, 09:22 PM

In article ,
Paul Repacholi wrote:
(Nick Maclaren) writes:

And then think about how customers would react if told that the ONLY
future platforms for HP-UX, VMS and NonStop had just been cancelled.
Overjoyed isn't the word I would use ....

I supose they could drag out the dead and try to revive the Alpha, or
ask someone else to for VMS. Non-Stop has been advanced so as to
flashingrednot require a lock-step CPU any more/flashingred. PHUX
can crawl away and smell in a corner, it has nothing to offer over any
other unix. Less in many cases in fact.

Case (a): Quite.

Case (b): It would be interesting to know if any sacrifice of the
reliability was needed.

Case (c): If true, that is sad. I last used it with HP-UX 9, and
that was a very nice Unix.

Regards,
Nick Maclaren.

#**230** August 6th 04, 10:34 PM

"Stephen Sprunk" writes:

"Yousuf Khan" wrote in message
. rogers.com...
Stephen Sprunk wrote:
Does an OS really need to be aware of the difference between two
cores on the same chip?

Linux has a concept of a NUMA "node", where all of the processors in
a node are considered equivalent. It'll still try to schedule
threads on the same CPU they last ran on, but next it will try other
CPUs in the same node before giving up and sending it to any
available node.

IIRC, the code already understands two-CPU nodes, because that is how
Intel's SMT chips are handled. Treating K8 CMP the same way sounds
correct, once AMD releases specs on how to recognize dual-core chips.

I'm sure as a first cut, not treating them specially is the right way to
go.
But eventually everybody tries to optimize down to the bone.

From a NUMA-aware scheduler's perspective, what's different between 2-way
SMT and 2-way CMP?

The two cores in CMP are pretty independent. They have a shared
connection to the memory which causes some dependence, but that can be
ignored for scheduling because it is equivalent to the shared FSB in
most SMP systems and fair enough.

The threads in SMT slow each other down, which the scheduler may
want to take into account. Otherwise you get into situations where
low priority processes can slow down high priority processes.

Some SMT implementations like IBM's support setting priorities in
the hardware, but Intel's doesn't support that, so software has
to handle it.

AMD is even suggesting not treating Hypertransport as NUMA but as
simple SMP is quite acceptable, and this suggestion is likely to hold for
dual-cores too (probably even more so).

Once the bugs were worked out, Linux showed much better performance (for 2+
way Opeterons) when it was made NUMA-aware. Treating the system as simple
SMP is acceptable, but demonstrably not optimal.

No, that's wrong. In fact we turned off the old NUMA scheduler support on
Opteron because it even made things worse. It was aimed at big NUMA
machines with slow interconnect, where low node balance rates
are good. But on a Opteron you want a fast NUMA balance rate,
it is best to treat it nearly like an SMP.

The new current Linux 2.6 kernel uses the new NUMA scheduler, but
it is still subject to more tuning and it may get turned off again.

-Andi

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Harddisks: Seek, Read, Write, Read, Write, Slow ?	Marc de Vries	General	7	July 26th 04 02:57 AM
Please Read...A Must Read	Trini4life2k2	General	1	March 8th 04 12:30 AM
Slow read speed on P4C800E Dlx	Dave	Asus Motherboards	6	January 20th 04 02:36 AM
Seagate SATA 120GB raw read errors	Kierkecaat	General	0	December 16th 03 02:52 PM
CD burning speed determines read speed?	David K	General	4	July 22nd 03 09:31 AM