If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#121
|
|||
|
|||
"Keith R. Williams" wrote in message . .. I don't understand what the author of that tidbit is up to, but I suspect there is something missing about the understanding of OoO here. OoO doesn't mean the processor reorders instructions at will. They (at least the ones's I'm familiar with) still dispatch and completion instructions in-order, but do what they please inbetween (execution). Thus, to the external observer, the program executes as if it were run on an in-order processor. Perhaps there is a PPro bug that prevents this? If so it's a *huge* bug. It depends what you mean by "external observer". An external observer looking at the memory bus could definitely see fetches being done out of order. If the author is referring to the moaning that it's impossible to predict the execution time of a random hunk of code, I agree with him, TDB. One shouldn't be writing critical timing-dependent code on such processors. There are too simply many variables and it is subject to change across sibling processors, to say nothing of third and fourth cousins. No, he's talking about a specific problems with SMP PPro systems that requires you to use a LOCK prefix to make your unlock instructions atomic even though 32-bit aligned writes are supposed to be atomic any way. The author brings up PPC's EIEIO instruction. This instruction, "Enforce In-order Execution of IO". Doesn't tell the execution unit to go in-order, rather tells the bus unit to enforce in-order I/O. The PPC will try to give priority to reads over writes and reads under cache "misses" (or misses under misses, or...). If the "memory" in question is really an I/O device things can get all bollixed up. The EIEIO instruction is intended to enforce in-order I/O operation to avoid this problem. The actual instructions still execute OoO. The sync and isync instructions will force in-order execution (of differing levels), but are rarely needed by a user (needed for processor state altering sorts of things). They are needed all the time by users who write synchronization code. Most programmers, including myself, ignore the PPro errate and simply state that they no longer support SMP PPro systems. This is because the cost of a LOCK prefix on a P4 machine is just too high, and it's more logical to assume that 32-bit aligned writes will occur atomically. I've forgotten the specifics of the PPro errata, but the net effect is that to release a spinlock, you have to use an exchange or locked move instruction rather than a regular move. DS |
#122
|
|||
|
|||
David Schwartz wrote:
I've forgotten the specifics of the PPro errata, but the net effect is that to release a spinlock, you have to use an exchange or locked move instruction rather than a regular move. Pentium Pro Processor Specification Update http://developer.intel.com/design/pr...t/24268935.pdf |
#123
|
|||
|
|||
"Grumble" wrote in message ... David Schwartz wrote: I've forgotten the specifics of the PPro errata, but the net effect is that to release a spinlock, you have to use an exchange or locked move instruction rather than a regular move. Pentium Pro Processor Specification Update http://developer.intel.com/design/pr...t/24268935.pdf This is the relevant section, it's a bug in the cache coherency logic that can result in two processors each having modified cache lines for the same memory area! There exists a narrow timing window when, if P0 wins the external bus invalidation race and gains ownership rights to line A due to the sequence of bus invalidation traffic, P1 may not have completed the pending invalidation of its own, currently valid and shared copy of line A. During this window, it is possible for a P1 internal opportunistic write to a portion of line A (while awaiting ownership rights) to occur with the original shared copy of line A still resident in P1's L2 cache. Such internal modification is permissible subject to delaying the broadcast of such changes until line ownership has actually been gained. However, the processor must ensure that any internal re-read by P1 of line A returns with data in the order actually written; in this case, this should be the data written by P0. In the case of this erratum, the internal re-read uses the data which was written by P1. DS |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Gigabyte GA-8IDML with Mobile CPU? | Cuzman | General | 0 | December 8th 04 02:39 PM |
Intel Loses Chipset Market Share | Yousuf Khan | General | 8 | November 1st 04 05:02 AM |
Intel developers helping out with Linux AMD64 | Yousuf Khan | Intel | 0 | December 17th 03 08:41 PM |
Intel | Commander | Intel | 0 | October 30th 03 07:05 PM |
Intel wants to slow down platform changes | Rob Stow | General | 6 | July 5th 03 11:13 AM |