"TLB parity error in virtual array; TLB error 'instruction"?

**Ant[_3_]** · March 8th 10, 04:45 PM posted to comp.sys.ibm.pc.hardware.chips

Hello.

Lately, I have been getting random and rare kernel panics on my old
Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I
couldn't figure out what it was until I discovered mcelog a couple days
ago, and it revealed interesting scary datas in my dmesg/messages and
syslog:

# cat /var/log/messages
....
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.

--
"Above ground I shall be food for kites; below I shall be food for
mole-crickets and ants. Why rob one to feed the other?" --Juang-zu (4th
Century B.C.)
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or
Ant is currently not listening to any songs on his home computer.

**Yousuf Khan** · March 9th 10, 03:01 AM posted to comp.sys.ibm.pc.hardware.chips

Ant wrote:
Hello.

Lately, I have been getting random and rare kernel panics on my old
Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I
couldn't figure out what it was until I discovered mcelog a couple days
ago, and it revealed interesting scary datas in my dmesg/messages and
syslog:

# cat /var/log/messages
...
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Yeah, the TLB stands for Translation Lookaside Buffer, it's the part of
the processor that keeps track of memory pages. I'm not sure what they
are referring to when they talk about "virtual array", unless it has
something to do with OS virtualization. In any case, if your TLB is
damaged, then various programs will fail if their memory pages get
tracked by that TLB entry.

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.

If that PSU failure took out so much other hardware, then it's likely it
took out your processor too, and it took longer for it to finally fail.
CPU chips tend to be more robust than memory chips and GPU chips, a lot
more redundancy, so they may show the signs of the failure much later.

Memtest86+ won't find faults inside the CPU, it only tests for faults in
the RAM.

Yousuf Khan

**Ant[_3_]** · March 9th 10, 06:37 AM posted to comp.sys.ibm.pc.hardware.chips

On 3/8/2010 7:01 PM PT, Yousuf Khan typed:

If that PSU failure took out so much other hardware, then it's likely it
took out your processor too, and it took longer for it to finally fail.
CPU chips tend to be more robust than memory chips and GPU chips, a lot
more redundancy, so they may show the signs of the failure much later.

Ah, that could be it. So far, a 512 MB of RAM and video card went bust
with the PSU. Too bad my friend and I did not see physical evidences of
busted caps, discolorations, etc.

Memtest86+ won't find faults inside the CPU, it only tests for faults in
the RAM.

What's a good way to test the CPU? I tried sys_basher, unraring 10 GB of
datas, memtest86+ v4.00 (you said it is only for RAM), etc. None of them
caused kernel panics. The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.
--
"To conquer the world, we must be as meticulous and calculating as a
colony of ants on the march." --Julius Caesar
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or
Ant is currently not listening to any songs on his home computer.

**Yousuf Khan[_2_]** · March 9th 10, 04:59 PM posted to comp.sys.ibm.pc.hardware.chips

Ant wrote:
On 3/8/2010 7:01 PM PT, Yousuf Khan typed:

If that PSU failure took out so much other hardware, then it's likely it
took out your processor too, and it took longer for it to finally fail.
CPU chips tend to be more robust than memory chips and GPU chips, a lot
more redundancy, so they may show the signs of the failure much later.

Ah, that could be it. So far, a 512 MB of RAM and video card went bust
with the PSU. Too bad my friend and I did not see physical evidences of
busted caps, discolorations, etc.

Those may yet come, after much time. But in reality, caps can be much
more robust than any of the electronic components. The CPU and RAM may
run anywhere between 1.0 to 2.0 Volts, so a spike of even 0.1V is
significant to them. A capacitor is just a very simple electrical
component, and a small spike won't kill it. A damaged capacitor might
still continue to work in diminished capacity for a long time.

In actual fact, the motherboard capacitors are there to protect against
voltage spikes to some extent. So the fact that it didn't really protect
these components, might be an indication that they may already be
damaged and just working in diminished capacity right now.

Your original PSU problem, what caused it? Lightening? Or did it just go
on its own for some unknown reason? If it went on its own, then it's
likely it caused this level of damage to your entire system. The PSU
also has capacitors in it, designed to protect against voltage spikes. A
surge suppressing power bar also helps protect along the way, with
capacitors. Each one acts like a flood dike. A lightening strike may
overwhelm the surge suppressor, and then it will overwhelm the PSU, but
the PSU has fuses that will sacrifice themselves and thus protect the
motherboard and internal components. If the PSU didn't do that fast
enough, then it may have let over-voltage through.

Or possibly, the PSU itself was the cause of the overvoltage. Was it an
old PSU that failed? Certain PSU size calculator sites make a provision
for systems that are left on for 24 hours for years on end. They reduce
its capacity rating by upto 40% for such a situation!

Yousuf Khan

**[email protected]** · March 10th 10, 01:21 AM posted to comp.sys.ibm.pc.hardware.chips

Your original PSU problem, what caused it? Lightening? Or did it just go
on its own for some unknown reason? If it went on its own, then it's
likely it caused this level of damage to your entire system. The PSU
also has capacitors in it, designed to protect against voltage spikes. A
surge suppressing power bar also helps protect along the way, with
capacitors. Each one acts like a flood dike. A lightening strike may
overwhelm the surge suppressor, and then it will overwhelm the PSU, but
the PSU has fuses that will sacrifice themselves and thus protect the
motherboard and internal components. If the PSU didn't do that fast
enough, then it may have let over-voltage through.

Here is what I remember before the PSU went dead.

1. A few days before it, I smelled something burning but couldn't figure
out what.

2. A few laters, computer went dead. Computer didn't want to boot up.
Drive light blink like crazy when computer is on.

3. My friend and I investigated and narrowed down to dead PSU. However,
computer still wouldn't boot up. We tried another SAME motherboard
model. Same thing. We tried an older motherboard with an Athlon 754
single core CPU. No problems!

4. After more testings, we found that EVGA GeForce 8800 GT was the
problem to prevent motherboard to boot up. That explains why motherboard
beeped a few times without it. With it, nothing. :/ We RMA'ed it and got
a fixed one.

5. Got everything back. Then, kernel panics one in a while (usually
takes 5-8 days to reproduce and usually during idle times from what I
noticed)! Note that I never had them before getting things back
together. I assume it was the PSU incident that started it.

6. We ran memtest86+ v4.00 and it found errors. My friend and I narrowed
down to a 512 MB piece and removed it. Tested all of them and no errors
again.

7. We assumed things were fine now after finding out the bad RAM. NOPE
after a week or so, more kernel panics! Reran memtest86 overnight twice
and no errors. Great, something else is wrong then.

Or possibly, the PSU itself was the cause of the overvoltage. Was it an
old PSU that failed? Certain PSU size calculator sites make a provision
for systems that are left on for 24 hours for years on end. They reduce
its capacity rating by upto 40% for such a situation!

The new Antec PSU? How can I check for that? I think I already shared my
machine specifications:
http://alpha.zimage.com/~ant/antfarm.../computers.txt (secondary). If
not, then see that link.

Also note that I have an UPS behind the computer (and another desktop).
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )

**Yousuf Khan[_2_]** · March 10th 10, 07:18 AM posted to comp.sys.ibm.pc.hardware.chips

wrote:
Or possibly, the PSU itself was the cause of the overvoltage. Was it an
old PSU that failed? Certain PSU size calculator sites make a provision
for systems that are left on for 24 hours for years on end. They reduce
its capacity rating by upto 40% for such a situation!

The new Antec PSU? How can I check for that? I think I already shared my
machine specifications:
http://alpha.zimage.com/~ant/antfarm.../computers.txt (secondary). If
not, then see that link.

No, not the new PSU, the old one that created all of the commotion in
the first place. Just wondering what the cause of that original failure
was. Since you got a new PSU, it's not likely the cause of any of the
current problems. The current CPU problem is probably a leftover from
that original failure.

Yousuf Khan

**Ant[_2_]** · April 25th 10, 07:11 AM posted to comp.sys.ibm.pc.hardware.chips

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.

FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.

**Ant[_3_]** · April 25th 10, 01:37 PM posted to comp.sys.ibm.pc.hardware.chips

On 4/24/2010 11:11 PM PT, Ant typed:

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.

FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.

And another. Grr.
--
"Have I told you how much I like ants, huh? Especially fried in a subtle
blend of mech fluid and grated gears?" --Rampage to Inferno,
"Transmutate" in Transformers (Beast Wars)
/\___/\ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
/ /\ /\ \ Ant's Quality Foraged Links: http://aqfl.net
| |o o| |
\ _ / If crediting, then use Ant nickname and AQFL URL/link.
( ) If e-mailing, then axe ANT from its address if needed.
Ant is currently not listening to any songs on this computer.

**Yousuf Khan[_2_]** · April 26th 10, 04:10 PM posted to comp.sys.ibm.pc.hardware.chips

Ant wrote:
On 4/24/2010 11:11 PM PT, Ant typed:

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.

FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.

And another. Grr.

It's probably getting worse. Might be time to think about replacement.

Yousuf Khan

**Robert Redelmeier** · March 9th 10, 04:49 AM posted to comp.sys.ibm.pc.hardware.chips

Ant wrote in part:
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? [snip]

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA
GeForce 8800 GT video card and damage a 512 MB of RAM (tested
3 GB with and each piece with memtest86+ v4.00 to narrow it down).

As Yousef has mentioned, any PSU failure serious enough to
damage RAM could easily damage the CPU. Especially AMD with
the RAM controller and busses inside the CPU.

http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of
the details of my systems. I did run memtest86+ again a couple
weeks ago and this morning for 5-6 hours, but not got no errors
after five full tests (passed). I also do not overclock/OC.

memtest86 is a good pgm, but it is more extensive than intensive.
It tests all memory, but not especially hard. If you want to
diagnose further, you could try running a few dozen copies of my
`burnMMX P`. It is a bit old and not quite as high bandwidth as
possible on newer processors. If there is no error, they should
stay running indefinitely. Watch for terminations and/or dmesg.
Run by `nice -19` should increase TLB transitions.

-- Robert author `cpuburn` http://pages.sbcglobal.net/redelm

Thread Tools
Show Printable Version Email this Page
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
"TLB parity error in virtual array; TLB error 'instruction"?	Ant[_3_]	AMD x86-64 Processors	8	March 13th 10 04:32 PM
"Parity Error Detected" message when running Intel Storage Console.	Brcobrem	Storage (alternative)	1	November 18th 09 08:49 PM
"paper is jammed" "at the transport" error message-Canon Mp830 (false error)	markm75	Printers	2	August 19th 07 02:04 AM
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual	Lady Margaret Thatcher	Printers	5	May 4th 06 04:51 AM
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message?	ByTor	AMD x86-64 Processors	5	January 13th 06 06:50 PM