Intel COO signals willingness to go with AMD64!!

#**121** February 15th 04, 10:42 PM

"Keith R. Williams" wrote in message
. ..

I don't understand what the author of that tidbit is up to, but I
suspect there is something missing about the understanding of OoO here.
OoO doesn't mean the processor reorders instructions at will. They (at
least the ones's I'm familiar with) still dispatch and completion
instructions in-order, but do what they please inbetween (execution).
Thus, to the external observer, the program executes as if it were run
on an in-order processor. Perhaps there is a PPro bug that prevents
this? If so it's a *huge* bug.

It depends what you mean by "external observer". An external observer
looking at the memory bus could definitely see fetches being done out of
order.

If the author is referring to the moaning that it's impossible to
predict the execution time of a random hunk of code, I agree with him,
TDB. One shouldn't be writing critical timing-dependent code on such
processors. There are too simply many variables and it is subject to
change across sibling processors, to say nothing of third and fourth
cousins.

No, he's talking about a specific problems with SMP PPro systems that
requires you to use a LOCK prefix to make your unlock instructions atomic
even though 32-bit aligned writes are supposed to be atomic any way.

The author brings up PPC's EIEIO instruction. This instruction,
"Enforce In-order Execution of IO". Doesn't tell the execution unit to
go in-order, rather tells the bus unit to enforce in-order I/O. The
PPC will try to give priority to reads over writes and reads under
cache "misses" (or misses under misses, or...). If the "memory" in
question is really an I/O device things can get all bollixed up. The
EIEIO instruction is intended to enforce in-order I/O operation to
avoid this problem. The actual instructions still execute OoO. The
sync and isync instructions will force in-order execution (of differing
levels), but are rarely needed by a user (needed for processor state
altering sorts of things).

They are needed all the time by users who write synchronization code.

Most programmers, including myself, ignore the PPro errate and simply
state that they no longer support SMP PPro systems. This is because the cost
of a LOCK prefix on a P4 machine is just too high, and it's more logical to
assume that 32-bit aligned writes will occur atomically.

I've forgotten the specifics of the PPro errata, but the net effect is
that to release a spinlock, you have to use an exchange or locked move
instruction rather than a regular move.

DS

#**122** February 16th 04, 04:46 PM

David Schwartz wrote:

I've forgotten the specifics of the PPro errata, but the net effect
is that to release a spinlock, you have to use an exchange or
locked move instruction rather than a regular move.

Pentium Pro Processor Specification Update
http://developer.intel.com/design/pr...t/24268935.pdf

#**123** February 16th 04, 10:31 PM

"Grumble" wrote in message
...
David Schwartz wrote:

I've forgotten the specifics of the PPro errata, but the net effect
is that to release a spinlock, you have to use an exchange or
locked move instruction rather than a regular move.

Pentium Pro Processor Specification Update
http://developer.intel.com/design/pr...t/24268935.pdf

This is the relevant section, it's a bug in the cache coherency logic
that can result in two processors each having modified cache lines for the
same memory area!

There exists a narrow timing window when, if P0 wins the external bus
invalidation race and gains ownership

rights to line A due to the sequence of bus invalidation traffic, P1 may not
have completed the pending

invalidation of its own, currently valid and shared copy of line A. During
this window, it is possible for a P1

internal opportunistic write to a portion of line A (while awaiting
ownership rights) to occur with the original shared

copy of line A still resident in P1's L2 cache. Such internal modification
is permissible subject to delaying the

broadcast of such changes until line ownership has actually been gained.
However, the processor must ensure

that any internal re-read by P1 of line A returns with data in the order
actually written; in this case, this should be

the data written by P0. In the case of this erratum, the internal re-read
uses the data which was written by P1.

DS

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Gigabyte GA-8IDML with Mobile CPU?	Cuzman	General	0	December 8th 04 02:39 PM
Intel Loses Chipset Market Share	Yousuf Khan	General	8	November 1st 04 05:02 AM
Intel developers helping out with Linux AMD64	Yousuf Khan	Intel	0	December 17th 03 08:41 PM
Intel	Commander	Intel	0	October 30th 03 07:05 PM
Intel wants to slow down platform changes	Rob Stow	General	6	July 5th 03 11:13 AM