A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Video Cards » Nvidia Videocards
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

An idea how to speed up computer programs and avoid waiting. ("event driven memory system")



 
 
Thread Tools Display Modes
Prev Previous Post   Next Post Next
  #18  
Old August 12th 11, 10:37 PM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Bernhard Schornak
external usenet poster
 
Posts: 17
Default An idea how to speed up computer programs and avoid waiting.("event driven memory system")

wolfgang kern wrote:


Bernhard Schornak wrote:
...
L00:
mov ecx,[esp] # ECX = 0x00[ESP]
mov ebx,esi # EBX = ESI
shl ebx,cl # EBX = EBX CL
xor edx,edx # EDX = 0
mov ecx,[esp+$04] # ECX = 0x04[ESP]
dec ecx # ECX - 1
test ecx,ecx # redundant
jb L02 # outer loop if sign
inc ecx # ECX + 1


DEC wont alter carry, so "jb" aka "jc" should be replaced by "jng"or "js".



Hi! Thanks for refreshing my knowledge base ... I
haven't seen much more than the steering wheel of
an Atego 1222 and thousands of waybills for about
nine months, now.


...
@ Wolfgang: Both loops do work properly. In the worst
case (value is zero), these loops count down the full
32 bit range.


OTOH what I see is:

dec ecx
jng ...

actually checks is if ecx were zero or negative before the DEC,
so I'd had just

test ecx,ecx
jng ... ;jumps on zero- or sign- or overflow -flag

as this will imply a zero detection.



Right. Hence, the dec/inc pairs are redundant for
checking the range of EDX and ECX. Freeing EBP as
GPR allows to replace the outer loop counter [ESP
+0x08] with a register. Just these three cosmetic
changes saved 5 * 4,000 = 20,000 clocks...

....not a real improvement for processing a 130 MB
array, randomly accessed 320,000,000 times...

My suggestion to expand the 8,000 to 8,192 dwords
could reduce all range checks to

and ecx,0x1FFF
je ...

leaves a valid index in ECX, and skips processing
if ECX = 0. Same with EDX (anded with 0x0FFF).


And for a biased range ie:

cmp ecx,3
jng ... ;jumps if ecx = 3 or less (signed)



A sometimes required operation. In most cases, it
is better to define a valid range and "transpose"
it to something counted up or down to zero, using
appropriate offsets "compensating" the transposed
index.

Unfortunately, there seem to be some addresses in
the first elements of each block, so the properly
coded loop had to check for the lower limit - the
real array starts at offset 0x18 - as well. Slows
down the code with two additional branches. Looks
like HeLL, smells like HeLL, nua s Design is ned
goa so hell...


Greetings from Augsburg

Bernhard Schornak
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Dimension 8400 w/intel 670 3.8gig processor "Thermal Event" Brad[_3_] Dell Computers 44 April 23rd 11 11:09 PM
Idea for Quake 3/Live: "Skill Glow" Skybuck Flying[_2_] Nvidia Videocards 1 February 22nd 09 08:34 AM
Can't "unsync" memory bus speed (A8V-E SE) Hackworth Asus Motherboards 2 September 6th 06 05:28 AM
P5WD2-E system "hang" after memory size [email protected] Asus Motherboards 12 July 8th 06 11:24 PM


All times are GMT +1. The time now is 07:58 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 HardwareBanter.
The comments are property of their posters.