View Single Post
  #9  
Old December 1st 04, 01:52 AM
Paul
external usenet poster
 
Posts: n/a
Default

In article , "Johnny"
wrote:

Paul wrote:
I don't normally top post, but don't want to try to trim the
rest of this down.

Some random observations:

1) Could this be a Hyperthreading problem ? Is Hyperthreading
disabled in the BIOS ? I don't know my Hyperthreading policy
versus OS, but perhaps if you were quitting Passmark between
runs, maybe the program is running on a different virtual
processor each time, and one virtual processor has more load
than the other. If you disable Hyperthreading in the BIOS,
the perf difference might stop.

In any case, Hyperthreading is not all it is cracked up to
be. In some cases, it is a clear win, but in other cases it
can trash the performance of the memory subsystem, and actually
run slower than without it.

WOW!!! Before altering any voltages or settings, just running the standard
[auto] jumperless detection settings and simply setting CPU hyperthreading
[disabled] option, the results are now, well, somewhat different!!
How thorough or accurate passmark is I know not but for purposes of
comparison it's useful. It's difficult to present the results in here but
the scores for example of the CPU suite of tests are as follows in my
attempt at a table (hope it comes out ok).

cpu test hyperthreading [enabled] hyperthreading[disabled]

integer math 170/246 varies 257 solid
floating p math 230 291
mmx 181 278
sse 131 164
compression 1319 1868
encryption 6.8 10.9
image rotation 113 195.9
string sorting 665 810

CPU passmark 322 467
integer math

I havent managed to get anything other than very close to the numbers
above with hyperthreading [disabled], it is solid. [disabled]
hyperthreading has also affected the memory test benchmark speeds,
presumably due to the increased CPU performance.

all this before altering any voltages or any other settings, blimey!


Does the memtest86 memory bandwidth indicator change as a function
of the BIOS Hyperthreading setting ? It shouldn't. In any case, one
thing that strikes me, is how negative an effect hyperthreading is
having on your results.


2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
start with, and you may find bumping the memory voltage up
a couple notches stops the errors. If the memory passes memtest86
in an overnight test without errors, use Prime95 torture test
in mixed mode, and see if it runs error free as well. I've had
memory pass memtest86 and fail Prime95.

3) Look up your Corsair memory he

http://corsairmicro.com/corsair/xms.html

Click the link and download the datasheet. For example, 3200XL
is rated for 2.75V and you could try that. The datasheet for
3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
start at 2.5-2-2 on its own. If this is some other memory,
you may need to post in this forum, and get some help with
your product - or search for someone having the same system
as you've got:

The product is CMX512-3200XLPT listed on their site under CMX512-3200XL and
it clearly states 2.75V. Changing the voltage to 2.75V has stopped the
blackouts.

For interest here are the passmark memory results before (but with
hyperthreading disabled) and after voltage change. The - configure DRAM
timing by speed option is [enabled] in bios

test [auto] 2.75V[auto] 2.75V / 2.0-2-2-5

allocate small block 1162.8 1163 1164.8
read cached 1390 1389.7 1389.9
read uncached 1326.6 1328.3 1328.8
write 809.4 809.7 809.4


As the auto and manual setting seem to be doing the same thing, I think
you can conclude that the SPD on the 3200XL is 2-2-2. You can play
with the 5 number manually, as by calculation, the 5 number is supposed
to be the sum of two of the other parameters plus 2 (four beats of
DDR data taking 2 cycles). On an AMD system, raising that number to
10 is best, while on the P4, a lower value is better, but play with it
a bit, and see what happens.

In terms of memory bandwidth, your CTIAW and memtest86 bandwidth
indicators are in the same ballpark as mine, so I don't think you
are far off from optimal. Certainly, overclocking the memory will
be the single biggest determinant of memory bandwidth, and the
nice thing about the 3200XL, is you can play with it a bit. I think
it can be pushed up to DDR500, at the expense of relaxing the timing
numbers a bit. My Ballistix doesn't like that quite as much.

These two documents describe some of the things you can do to
optimize memory bandwidth. But with the Asus hack to enable PAT,
the rules might be more like an 875 than an 865. The chips, after
all, are the same die, but with different signals pinned out.

ftp://download.intel.com/design/chip...s/25273001.pdf (875P)
ftp://download.intel.com/design/chip...s/25303601.pdf (865PE)

altering the dram burst timing between 4 and 8 clocks appeared to make no
difference in these tests. having memory acceleration enabled gave the
following 1165.4,1389.3, 1340.2, 810 so only read uncached improved
slightly but consistently.


When the cache is enabled for a certain area of memory, the memory
controller likes to fetch cache-line-sized chunks. That might be why
normally, the 4 versus 8 setting doesn't make a difference. Perhaps
the memory used by PCI cards for I/O is uncached ? I've left mine
set at 4. (I think the cache line size is 64 bytes, and with dual
channel memory, 16 bytes are transferred per beat, so the 4 setting
would be right for it. If you were in single channel mode, perhaps
8 would be the right setting, times 8 bytes per beat.)


**** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7 June 03
****
Kernel Driver: WinNT DIRECTNT.SYS V01.09
Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
Bus Speed: max=200MHz, ratio=15 = 200 MHz
Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel, Rev:0002h
----------------------------------------------------------------
Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
System Frequency : FSB533/133 MHz
Memory Frequency : DDR266/133 MHz (1:1)
IOQ Depth : 12 deep
Top of usable Memory : 1024.0 MByte
Extended SMRAM (Tseg) : disabled
Overflowdevice : disabled and unlocked, ID= 2576h, Rev: 2
Memory Delays Base Address : FECF0000 not prefetchable
CPU Parking : disabled
Memory : row0: 512 MByte/16 KB Pages
: row1: 512 MByte/16 KB Pages
DRAM-Channels : Dual Channel Linear, DDR
ECC & Refresh : Non-ECC, Refresh=7.8 µs
PAT-mode : (1) fully enabled
Active to Precharge Delay : 5 clocks .. 70 µs
Tcl - Trcd -Trp : 2-2-2 T (DRAM Clocks)

Memory Read Bandwidth : ca. 5780.5 MBytes/s, Cacheline size= 64
go on with CR




http://www.houseofhelp.com/forums/fo...hp?forumid=128

4) CTIAW and memtest86 disagree on your PAT setting. I don't know
what to make of that.

5) There is a possible reason for CTIAW mis-reporting the bus
speed. An 865PE Northbridge is not supposed to have PAT, but
Asus and others use a trick to enable it. The processor has
two signals called BSEL, and they indicate the bus speed rating
of the processor (400, 533, 800 etc). The BSEL signals are
normally routed from the processor to the Northbridge and to
the clockgen. What Asus did, is they disconnected that link.
Asus sends a fake value of BSEL to the Northbridge - I think
if the FSB is set to 533, PAT is enabled, so by sending the
533 bit pattern to the Northbridge, but setting the clockgen
to 800, PAT is enabled, and the memory can run at DDR400, just
like on an 875P Northbridge. I think what CTIAW could be doing,
is reading the Northbridge register, instead of checking the
clockgen. This trick is great for fooling the hardware, but
software authors have to be aware of the trick too, to get
the info right.

6) I dug up some benchmarks you can try. Maybe these will be
reproducible from run to run.

http://www.super-computing.org/
ftp://pi.super-computing.org/windows/super_pi.zip

Super_pi computes PI, and you select the number of digits from
the menu. You double click the .exe, to run a Windows dialog.
Select the number of digits to calculate and then run it.
I just ran 1 million digits, and it takes 48 seconds
on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
they had exactly the same test time. A file is created in the
install directory with the results of the calculation.
The test time and the amount of memory used increase
with the digits setting. Some people use the 32M setting
as a stability test for new motherboards.

44 seconds with hyper threading [disabled]
53 seconds with hyper threading [enabled]

as you say this test is consistent


I just don't understand why your results are being hammered
so bad by Hyperthreading. The OS cannot be taking up that much
memory bandwidth in the background. And, since your processor
has a 1MB cache, it shouldn't be measurably thrashing the cache
either. I wonder if Windows is actually using the whole
cache ? I remember reading a while back, about a situation where
Windows needed to be manually adjusted to use the whole cache
(back in the P3 era). Something still isn't right here.


Here is a second test:

This is some kind of finite element analysis. It was
posted by the author a while back. It uses a good chunk
of memory, and judging by the CPU heating, is not memory
bound, but does a fair amount of computing. To use it,
unzip the file, fire up a MSDOS window, cd to the unzipped
directory, then type "now" into the MSDOS window, to execute
now.bat . After it reaches "step 992", it will finish, and
print the number of "MUPs", which are millions of operations
per second. My computer takes 202 or 203 seconds to run the
benchmark, and achieves a rating of 12.27 MUPs (the number
is printed in scientific notation, so shift the decimal
point as appropriate).

with hyperthreading [enabled]

242 - 244seconds 10.16 - 10.24 MUPs +/- 0.04% (i assume)

with hyperthreading [disabled]

203seconds 12.21 MUPs +/- 0.06% consistently.


The Hyperthreading penalty seems to be the same here, as
Super_PI. It seems strange that they would be the same, as
these programs won't have the same memory access pattern.


http://users.viawest.net/~hwstock/bench/3d0/3d0.zip

Instructions and some background info are he
http://www.abxzone.com/forums/showthread.php?t=70142

Those two tests are reproducible for me. Give them a
try, with and without Hyperthreading turned on in the
BIOS.

Note: The 3d0 program is a bit unhygenic, and leaves
a bunch of files in its directory. You may want to
dump all but the original files, when the directory
fills up.

Be interested to hear what you make of that lot. Obviously hyperthreading is
doing the bulk of the damage but the memory scores seem a little low also.
I'll run the memtest and mess with some other BIOS settings later but I have
to go make some money now.

many thanks,
J


snip

All I can say, is Hyperthreading is doing way more damage than
it should be. Try memtest86 again, with Hyperthreading enabled
and then with it disabled. There should be no change in the
bandwidth readout. If there is, there is some other serious
problem there.

In my registry, I see an entry called SecondLevelDataCache, but
it is set to zero. Implying it is detected automatically, as if
L2 were disabled, you would see the performance plummet.

HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTRO L\SESSION MANAGER\MEMORY
MANAGEMENT

According to this, changing it shouldn't help:
http://www.winguides.com/registry/display.php/116/

You might try downloading Sandra Lite 2005 and run the
"Cache and Memory" benchmark. The 2002 version I've got
has that benchmark, and the "bumps" in the curve tell
you where the cache breakpoints are. A Prescott, with
its 1MB cache, should have a breakpoint at the 1MB mark
if the cache is working.

http://www.sisoftware.co.uk/index.ht...&langx=e n&a=

I think if I try to install it, it will remove the older software,
so I cannot do this right now. I hope the Lite version still has
that benchmark...

HTH,
Paul