If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#18
|
|||
|
|||
An idea how to speed up computer programs and avoid waiting.("event driven memory system")
wolfgang kern wrote:
Bernhard Schornak wrote: ... L00: mov ecx,[esp] # ECX = 0x00[ESP] mov ebx,esi # EBX = ESI shl ebx,cl # EBX = EBX CL xor edx,edx # EDX = 0 mov ecx,[esp+$04] # ECX = 0x04[ESP] dec ecx # ECX - 1 test ecx,ecx # redundant jb L02 # outer loop if sign inc ecx # ECX + 1 DEC wont alter carry, so "jb" aka "jc" should be replaced by "jng"or "js". Hi! Thanks for refreshing my knowledge base ... I haven't seen much more than the steering wheel of an Atego 1222 and thousands of waybills for about nine months, now. ... @ Wolfgang: Both loops do work properly. In the worst case (value is zero), these loops count down the full 32 bit range. OTOH what I see is: dec ecx jng ... actually checks is if ecx were zero or negative before the DEC, so I'd had just test ecx,ecx jng ... ;jumps on zero- or sign- or overflow -flag as this will imply a zero detection. Right. Hence, the dec/inc pairs are redundant for checking the range of EDX and ECX. Freeing EBP as GPR allows to replace the outer loop counter [ESP +0x08] with a register. Just these three cosmetic changes saved 5 * 4,000 = 20,000 clocks... ....not a real improvement for processing a 130 MB array, randomly accessed 320,000,000 times... My suggestion to expand the 8,000 to 8,192 dwords could reduce all range checks to and ecx,0x1FFF je ... leaves a valid index in ECX, and skips processing if ECX = 0. Same with EDX (anded with 0x0FFF). And for a biased range ie: cmp ecx,3 jng ... ;jumps if ecx = 3 or less (signed) A sometimes required operation. In most cases, it is better to define a valid range and "transpose" it to something counted up or down to zero, using appropriate offsets "compensating" the transposed index. Unfortunately, there seem to be some addresses in the first elements of each block, so the properly coded loop had to check for the lower limit - the real array starts at offset 0x18 - as well. Slows down the code with two additional branches. Looks like HeLL, smells like HeLL, nua s Design is ned goa so hell... Greetings from Augsburg Bernhard Schornak |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Dimension 8400 w/intel 670 3.8gig processor "Thermal Event" | Brad[_3_] | Dell Computers | 44 | April 23rd 11 11:09 PM |
Idea for Quake 3/Live: "Skill Glow" | Skybuck Flying[_2_] | Nvidia Videocards | 1 | February 22nd 09 08:34 AM |
Can't "unsync" memory bus speed (A8V-E SE) | Hackworth | Asus Motherboards | 2 | September 6th 06 05:28 AM |
P5WD2-E system "hang" after memory size | [email protected] | Asus Motherboards | 12 | July 8th 06 11:24 PM |