If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
A question about optimization
I've got a small fragment of code I need to optimize. It adds an
integer to another integer, modulo a third integer. In C, this operation is: int i, j, k, l; i = (j + k) % l; I know that in the past integer division was a very expensive operation, so the following fragment might run faster: i = j + k; if (i = l) i -= l; This avoids the divide operation, but it introduces a compare and branch operation, which may be worse. This code will be used on three different chips: Pentium/MMX, Pentium-II or Pentium-III. Using GCC with optimizations, the first example compiles to: i = (j + k) % l; lea (%esi,%ecx,1),%edi mov %edi,%eax cltd idivl 0xc(%ebp) The version with the if statement compiles to: i = j + k; if (i = l) i -= l; lea (%esi,%ecx,1),%edx cmp %ebx,%ecx jl 0000001a foo+0x1a sub %ebx,%ecx Off the top of my head, I would assume that four instructions with a divide will be slower than four instructions with a branch, but I don't know enough about the x86 architecture on the chips involved (Pentium, P-II and P-III) to know for sure. Does anyone here know? -- David |
#2
|
|||
|
|||
On Mon, 30 Jun 2003 12:48:56 -0400, David C. wrote:
Does anyone here know? Time it using the RDTSC instruction. |
#3
|
|||
|
|||
"David C." wrote in message ... In C, this operation is: int i, j, k, l; i = (j + k) % l; I know that in the past integer division was a very expensive operation, so the following fragment might run faster: i = j + k; if (i = l) i -= l; This avoids the divide operation, but it introduces a compare and branch operation, which may be worse. Try i=j+k; i-=(i=l) ? l : 0; See if your compiler is smart enough to optimize it into branchless code (using the SETLE opcode). DS |
#4
|
|||
|
|||
"David Schwartz" wrote in message ... See if your compiler is smart enough to optimize it into branchless code (using the SETLE opcode). GCC version 3.3 makes it branchless with CMOVG: addl %ecx, %eax movl %eax, %ecx subl %edx, %ecx cmpl %edx, %eax cmovg %ecx, %eax DS |
#5
|
|||
|
|||
"David Schwartz" wrote in message ... GCC version 3.3 makes it branchless with CMOVG: addl %ecx, %eax movl %eax, %ecx subl %edx, %ecx cmpl %edx, %eax cmovg %ecx, %eax Sorry, these are P2 and later optimizations. If you need to support earlier processors, you can't use them (or need to implement two ways and detect the CPU at run time). DS |
#6
|
|||
|
|||
It could also be done with no branches using SETLE, but I doubt that'd be better. DS |
#7
|
|||
|
|||
"Stephen Lee -- post replies please" wrote in message ... According to David C. : int i, j, k, l; i = (j + k) % l; if (i = l) i -= l; Hmm... Don't you mean "if (i l) i -= l;" ? Close, he means: if (i=l) i-=l; The code can be made branchless and the division reduced to a multiplication by using 'SETGE' (which is available on a Pentium), but it's not clear that this is any faster. Basically, you compare 'i' to 'l', use SETGE to set a register to '1' if and only if 'i=l'. Then multiply that register by 'l' and (giving you zero if 'il' and 'l' if 'i=l' and then subtract that from 'i'. This trades a divide for a multiply, a SETGE, and two subtractions (one to compare and one at the end). I doubt that would be a win overall. DS |
#8
|
|||
|
|||
David Schwartz wrote:
"Stephen Lee -- post replies please" wrote in message ... According to David C. : int i, j, k, l; i = (j + k) % l; if (i = l) i -= l; Hmm... Don't you mean "if (i l) i -= l;" ? Close, he means: if (i=l) i-=l; The code can be made branchless and the division reduced to a multiplication by using 'SETGE' (which is available on a Pentium), but it's not clear that this is any faster. Basically, you compare 'i' to 'l', use SETGE to set a register to '1' if and only if 'i=l'. Then multiply that register by 'l' and (giving you zero if 'il' and 'l' if 'i=l' and then subtract that from 'i'. This trades a divide for a multiply, a SETGE, and two subtractions (one to compare and one at the end). I doubt that would be a win overall. SETxx writes only 8 bits, so the upper 24 bits must be written some other way. You can set the entire register to zero before SETxx, but then using the 32-bit register after SETxx incurs a stall (about 7 to 10 cycles) while the write merge occurs in the register file. So use MOVZBL after SETxx. After that, replace the multiply with NEG followed by AND, which saves about 9 cycles. As pointed out in another post, using "SBC r,r" after unsigned compare is the fastest and smallest way, and also works on any x86. |
#9
|
|||
|
|||
"John Reiser" wrote in message ... SETxx writes only 8 bits, so the upper 24 bits must be written some other way. You can set the entire register to zero before SETxx, but then using the 32-bit register after SETxx incurs a stall (about 7 to 10 cycles) while the write merge occurs in the register file. You can avoid this stall by properly zeroing the register early in the instruction stream, but this would mean yet another instruction. So use MOVZBL after SETxx. After that, replace the multiply with NEG followed by AND, which saves about 9 cycles. That's a pretty good idea. As pointed out in another post, using "SBC r,r" after unsigned compare is the fastest and smallest way, and also works on any x86. That's the best idea, assuming you're not willing to use P2-specific code with CMOV. DS |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
A question about dual monitors | Woodsy | General | 1 | March 11th 05 06:17 AM |
couple of Dimension XPS Gen4 question | Matt | Dell Computers | 3 | March 4th 05 02:20 AM |
Pentium question | Robert Sligh | Overclocking AMD Processors | 4 | April 27th 04 01:13 PM |
Question - Printing 2-up on 11x17 | Don | Printers | 1 | August 24th 03 04:27 AM |
Home PC DVD burner Question. | Ruben | General Hardware | 2 | August 18th 03 05:29 PM |