View Single Post
  #2  
Old March 9th 20, 04:44 PM posted to alt.comp.hardware.pc-homebuilt
Paul[_28_]
external usenet poster
 
Posts: 1,467
Default CPU temp hits 80C then cools back to 55/60 [memtest]

wrote:
I was running memtest on an LGA2011 system that seemed to have memory issues.
I noticed that when memtest starts, the core temperature is up to 80 Celsius.
Then it drifts back to 60 (with full 8 DIMMs). Testing only 2 DIMMs at a time
it goes lower to 55 Celsius. This is single-threaded test. So what is
memtest reporting? Just the active core, rather than average of the die?
This PC has a water cooling system. There is no model number on it, so I
can't determine what the thermal capacity of this cooler is. The radiator
fans seem to be going at fixed speed.
CPU is i7-3820. The RAM speed is 1600 MHz with CAS 10. No overclocking is
set in BIOS.


It's hard to say what core it is.

The code does have the notion of the "boot processor" which might
be CPU 0, but I can't find the selection logic, like what happens
if this is a multicore run. I can't imagine it runs multiple
instances of the code, but maybe it does.

The following code "just runs" in a sense. Normally, on a P4 without HT,
it would be obvious what core it runs on. It's the boot processor
core. But I haven't located what happens here when it is running SMP.
It "smells" like it runs an instance on each core, but then some
test address must be assigned to a running test (it probably cannot
afford to overlap in memory).

There's no OS running. There's no scheduler. The code should
run 100%, 100% of the time. The code talks of a "barrier", which
means if it runs SMP, one copy of code is the master, and
it does something to run the slaves (on other cores). When a
slave is finished, presumably the master copy then advances
to the next test or test step. There would be an assumption
the cores are all "equally productive" and that the master
doesn't wait an extraordinary time for a slow slave to finish.

A comment in the code mentions "we don't have a timer", which
means delays are done by brute force. And if a delay was
being used, that would likely make a core hot. When actually
testing memory, the core should cool off, because the memory
cannot keep up with the CPU (the CPU core would stall until
the memory fetch comes back, no delay loop or anything).
Many steps in the memory test, are waiting for the memory subsystem
to come back. Running multiple cores is not likely necessary to
get full performance from the memory subsystem. Normal
program codes, rely on cache hits for performance,
not on main memory being available instantly. Since
memtest is a "cache buster" and cache is also undesired,
the CPU cools its heels waiting for each memory fetch
to come back. The temp should drop to some extent
when this happens.

If for any reason, an SMP thread was slow to complete, then
the main thread would perhaps busy_wait and the temperature
might go up. Just a guess.

*******

memtest.org version 5.01

http://memtest.org/download/5.01/memtest86+-5.01.tar.gz

void coretemp(void) {
unsigned int msrl, msrh;
unsigned int tjunc, tabs, tnow;
unsigned long rtcr;
double amd_raw_temp;

// Only enable coretemp if IMC is known
if (imc_type == 0) {
return;
}

tnow = 0;

// Intel CPU
if (cpu_id.vend_id.char_array[0] == 'G' && cpu_id.max_cpuid = 6) {
if (cpu_id.dts_pmp & 1) {
rdmsr(MSR_IA32_THERM_STATUS, msrl, msrh); === Core #, running this privileged instruction
tabs = ((msrl 16) & 0x7F);
rdmsr(MSR_IA32_TEMPERATURE_TARGET, msrl, msrh); === Tjmax value of newer processors (May 2010)
tjunc = ((msrl 16) & 0x7F);
if (tjunc 50 || tjunc 125) {
tjunc = 90;
} // assume Tjunc = 90°C if boggus value received.
tnow = tjunc - tabs;
dprint(LINE_CPU + 1, 30, v - check_temp, 3, 0);
v - check_temp = tnow;
}
return;
}

// AMD CPU
if (cpu_id.vend_id.char_array[0] == 'A' && cpu_id.vers.bits.extendedFamily 0) {
pci_conf_read(0, 24, 3, 0xA4, 4, & rtcr);
amd_raw_temp = ((rtcr 21) & 0x7FF);
v - check_temp = (int)(amd_raw_temp / 8);
dprint(LINE_CPU + 1, 30, v - check_temp, 3, 0);
}
}

*******

Paul