If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
|
#1
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Hello.
Lately, I have been getting random and rare kernel panics on my old Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure out what it was until I discovered mcelog a couple days ago, and it revealed interesting scary datas in my dmesg/messages and syslog: # cat /var/log/messages .... Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events logged Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0 Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010 Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction, level 1' Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43 I am not familiar with hardwares, so I assume this is very bad, but what part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had it and its motherboard since 12/24/2006, so it is not that old yet. I have the full details on my secondary machine at http://alpha.zimage.com/~ant/antfarm.../computers.txt ... Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. Thank you in advance. -- "Above ground I shall be food for kites; below I shall be food for mole-crickets and ants. Why rob one to feed the other?" --Juang-zu (4th Century B.C.) /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#2
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote:
Hello. Lately, I have been getting random and rare kernel panics on my old Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure out what it was until I discovered mcelog a couple days ago, and it revealed interesting scary datas in my dmesg/messages and syslog: # cat /var/log/messages ... Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events logged Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0 Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010 Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction, level 1' Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43 I am not familiar with hardwares, so I assume this is very bad, but what part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had it and its motherboard since 12/24/2006, so it is not that old yet. I have the full details on my secondary machine at http://alpha.zimage.com/~ant/antfarm.../computers.txt ... Yeah, the TLB stands for Translation Lookaside Buffer, it's the part of the processor that keeps track of memory pages. I'm not sure what they are referring to when they talk about "virtual array", unless it has something to do with OS virtualization. In any case, if your TLB is damaged, then various programs will fail if their memory pages get tracked by that TLB entry. Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. Thank you in advance. If that PSU failure took out so much other hardware, then it's likely it took out your processor too, and it took longer for it to finally fail. CPU chips tend to be more robust than memory chips and GPU chips, a lot more redundancy, so they may show the signs of the failure much later. Memtest86+ won't find faults inside the CPU, it only tests for faults in the RAM. Yousuf Khan |
#3
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/8/2010 7:01 PM PT, Yousuf Khan typed:
If that PSU failure took out so much other hardware, then it's likely it took out your processor too, and it took longer for it to finally fail. CPU chips tend to be more robust than memory chips and GPU chips, a lot more redundancy, so they may show the signs of the failure much later. Ah, that could be it. So far, a 512 MB of RAM and video card went bust with the PSU. Too bad my friend and I did not see physical evidences of busted caps, discolorations, etc. Memtest86+ won't find faults inside the CPU, it only tests for faults in the RAM. What's a good way to test the CPU? I tried sys_basher, unraring 10 GB of datas, memtest86+ v4.00 (you said it is only for RAM), etc. None of them caused kernel panics. The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. -- "To conquer the world, we must be as meticulous and calculating as a colony of ants on the march." --Julius Caesar /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#4
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote:
On 3/8/2010 7:01 PM PT, Yousuf Khan typed: If that PSU failure took out so much other hardware, then it's likely it took out your processor too, and it took longer for it to finally fail. CPU chips tend to be more robust than memory chips and GPU chips, a lot more redundancy, so they may show the signs of the failure much later. Ah, that could be it. So far, a 512 MB of RAM and video card went bust with the PSU. Too bad my friend and I did not see physical evidences of busted caps, discolorations, etc. Those may yet come, after much time. But in reality, caps can be much more robust than any of the electronic components. The CPU and RAM may run anywhere between 1.0 to 2.0 Volts, so a spike of even 0.1V is significant to them. A capacitor is just a very simple electrical component, and a small spike won't kill it. A damaged capacitor might still continue to work in diminished capacity for a long time. In actual fact, the motherboard capacitors are there to protect against voltage spikes to some extent. So the fact that it didn't really protect these components, might be an indication that they may already be damaged and just working in diminished capacity right now. Your original PSU problem, what caused it? Lightening? Or did it just go on its own for some unknown reason? If it went on its own, then it's likely it caused this level of damage to your entire system. The PSU also has capacitors in it, designed to protect against voltage spikes. A surge suppressing power bar also helps protect along the way, with capacitors. Each one acts like a flood dike. A lightening strike may overwhelm the surge suppressor, and then it will overwhelm the PSU, but the PSU has fuses that will sacrifice themselves and thus protect the motherboard and internal components. If the PSU didn't do that fast enough, then it may have let over-voltage through. Or possibly, the PSU itself was the cause of the overvoltage. Was it an old PSU that failed? Certain PSU size calculator sites make a provision for systems that are left on for 24 hours for years on end. They reduce its capacity rating by upto 40% for such a situation! Yousuf Khan |
#5
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Your original PSU problem, what caused it? Lightening? Or did it just go
on its own for some unknown reason? If it went on its own, then it's likely it caused this level of damage to your entire system. The PSU also has capacitors in it, designed to protect against voltage spikes. A surge suppressing power bar also helps protect along the way, with capacitors. Each one acts like a flood dike. A lightening strike may overwhelm the surge suppressor, and then it will overwhelm the PSU, but the PSU has fuses that will sacrifice themselves and thus protect the motherboard and internal components. If the PSU didn't do that fast enough, then it may have let over-voltage through. Here is what I remember before the PSU went dead. 1. A few days before it, I smelled something burning but couldn't figure out what. 2. A few laters, computer went dead. Computer didn't want to boot up. Drive light blink like crazy when computer is on. 3. My friend and I investigated and narrowed down to dead PSU. However, computer still wouldn't boot up. We tried another SAME motherboard model. Same thing. We tried an older motherboard with an Athlon 754 single core CPU. No problems! 4. After more testings, we found that EVGA GeForce 8800 GT was the problem to prevent motherboard to boot up. That explains why motherboard beeped a few times without it. With it, nothing. :/ We RMA'ed it and got a fixed one. 5. Got everything back. Then, kernel panics one in a while (usually takes 5-8 days to reproduce and usually during idle times from what I noticed)! Note that I never had them before getting things back together. I assume it was the PSU incident that started it. 6. We ran memtest86+ v4.00 and it found errors. My friend and I narrowed down to a 512 MB piece and removed it. Tested all of them and no errors again. 7. We assumed things were fine now after finding out the bad RAM. NOPE after a week or so, more kernel panics! Reran memtest86 overnight twice and no errors. Great, something else is wrong then. Or possibly, the PSU itself was the cause of the overvoltage. Was it an old PSU that failed? Certain PSU size calculator sites make a provision for systems that are left on for 24 hours for years on end. They reduce its capacity rating by upto 40% for such a situation! The new Antec PSU? How can I check for that? I think I already shared my machine specifications: http://alpha.zimage.com/~ant/antfarm.../computers.txt (secondary). If not, then see that link. Also note that I have an UPS behind the computer (and another desktop). -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( ) |
#7
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. |
#8
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 4/24/2010 11:11 PM PT, Ant typed:
The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. And another. Grr. -- "Have I told you how much I like ants, huh? Especially fried in a subtle blend of mech fluid and grated gears?" --Rampage to Inferno, "Transmutate" in Transformers (Beast Wars) /\___/\ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) / /\ /\ \ Ant's Quality Foraged Links: http://aqfl.net | |o o| | \ _ / If crediting, then use Ant nickname and AQFL URL/link. ( ) If e-mailing, then axe ANT from its address if needed. Ant is currently not listening to any songs on this computer. |
#9
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote:
On 4/24/2010 11:11 PM PT, Ant typed: The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. And another. Grr. It's probably getting worse. Might be time to think about replacement. Yousuf Khan |
#10
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? [snip] Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). As Yousef has mentioned, any PSU failure serious enough to damage RAM could easily damage the CPU. Especially AMD with the RAM controller and busses inside the CPU. http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. memtest86 is a good pgm, but it is more extensive than intensive. It tests all memory, but not especially hard. If you want to diagnose further, you could try running a few dozen copies of my `burnMMX P`. It is a bit old and not quite as high bandwidth as possible on newer processors. If there is no error, they should stay running indefinitely. Watch for terminations and/or dmesg. Run by `nice -19` should increase TLB transitions. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"TLB parity error in virtual array; TLB error 'instruction"? | Ant[_3_] | AMD x86-64 Processors | 8 | March 13th 10 04:32 PM |
"Parity Error Detected" message when running Intel Storage Console. | Brcobrem | Storage (alternative) | 1 | November 18th 09 08:49 PM |
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) | markm75 | Printers | 2 | August 19th 07 02:04 AM |
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual | Lady Margaret Thatcher | Printers | 5 | May 4th 06 04:51 AM |
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? | ByTor | AMD x86-64 Processors | 5 | January 13th 06 06:50 PM |