If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Hello.
Lately, I have been getting random and rare kernel panics on my old Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure out what it was until I discovered mcelog a couple days ago, and it revealed interesting scary datas in my dmesg/messages and syslog: # cat /var/log/messages .... Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events logged Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0 Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010 Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction, level 1' Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43 I am not familiar with hardwares, so I assume this is very bad, but what part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had it and its motherboard since 12/24/2006, so it is not that old yet. I have the full details on my secondary machine at http://alpha.zimage.com/~ant/antfarm.../computers.txt ... Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. Thank you in advance. -- "Above ground I shall be food for kites; below I shall be food for mole-crickets and ants. Why rob one to feed the other?" --Juang-zu (4th Century B.C.) /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#2
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote:
Hello. Lately, I have been getting random and rare kernel panics on my old Debian/Linux box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure out what it was until I discovered mcelog a couple days ago, and it revealed interesting scary datas in my dmesg/messages and syslog: # cat /var/log/messages ... Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events logged Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0 Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010 Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction, level 1' Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43 I am not familiar with hardwares, so I assume this is very bad, but what part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had it and its motherboard since 12/24/2006, so it is not that old yet. I have the full details on my secondary machine at http://alpha.zimage.com/~ant/antfarm.../computers.txt ... Yeah, the TLB stands for Translation Lookaside Buffer, it's the part of the processor that keeps track of memory pages. I'm not sure what they are referring to when they talk about "virtual array", unless it has something to do with OS virtualization. In any case, if your TLB is damaged, then various programs will fail if their memory pages get tracked by that TLB entry. Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. Thank you in advance. If that PSU failure took out so much other hardware, then it's likely it took out your processor too, and it took longer for it to finally fail. CPU chips tend to be more robust than memory chips and GPU chips, a lot more redundancy, so they may show the signs of the failure much later. Memtest86+ won't find faults inside the CPU, it only tests for faults in the RAM. Yousuf Khan |
#3
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? [snip] Although, this might be related to the PSU's death back in early December 2009. My friend and I believe it also took out my EVGA GeForce 8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each piece with memtest86+ v4.00 to narrow it down). As Yousef has mentioned, any PSU failure serious enough to damage RAM could easily damage the CPU. Especially AMD with the RAM controller and busses inside the CPU. http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the details of my systems. I did run memtest86+ again a couple weeks ago and this morning for 5-6 hours, but not got no errors after five full tests (passed). I also do not overclock/OC. memtest86 is a good pgm, but it is more extensive than intensive. It tests all memory, but not especially hard. If you want to diagnose further, you could try running a few dozen copies of my `burnMMX P`. It is a bit old and not quite as high bandwidth as possible on newer processors. If there is no error, they should stay running indefinitely. Watch for terminations and/or dmesg. Run by `nice -19` should increase TLB transitions. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm |
#4
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/8/2010 7:01 PM PT, Yousuf Khan typed:
If that PSU failure took out so much other hardware, then it's likely it took out your processor too, and it took longer for it to finally fail. CPU chips tend to be more robust than memory chips and GPU chips, a lot more redundancy, so they may show the signs of the failure much later. Ah, that could be it. So far, a 512 MB of RAM and video card went bust with the PSU. Too bad my friend and I did not see physical evidences of busted caps, discolorations, etc. Memtest86+ won't find faults inside the CPU, it only tests for faults in the RAM. What's a good way to test the CPU? I tried sys_basher, unraring 10 GB of datas, memtest86+ v4.00 (you said it is only for RAM), etc. None of them caused kernel panics. The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. -- "To conquer the world, we must be as meticulous and calculating as a colony of ants on the march." --Julius Caesar /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#5
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/8/2010 8:49 PM PT, Robert Redelmeier typed:
As Yousef has mentioned, any PSU failure serious enough to damage RAM could easily damage the CPU. Especially AMD with the RAM controller and busses inside the CPU. Damn. Intel CPUs does better with this? memtest86 is a good pgm, but it is more extensive than intensive. It tests all memory, but not especially hard. If you want to diagnose further, you could try running a few dozen copies of my `burnMMX P`. It is a bit old and not quite as high bandwidth as possible on newer processors. If there is no error, they should stay running indefinitely. Watch for terminations and/or dmesg. Run by `nice -19` should increase TLB transitions. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm Thanks. I will try it when I don't need to use the box. You should update your program to support the newer processors. -- "Is this stuff any good for ants?" "No, it kills them." --unknown /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#6
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/8/2010 8:49 PM PT, Robert Redelmeier typed:
If you want to diagnose further, you could try running a few dozen copies of my `burnMMX P`. It is a bit old and not quite as high bandwidth as possible on newer processors. If there is no error, they should stay running indefinitely. Watch for terminations and/or dmesg. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm I ran two "burnMMX P" (did not use nice -19) processes since I have a dual core 939 CPU and no crashes and errors after 45.25 minutes. Before I aborted both, I checked the temperatures which look OK for a 75 degrees(F) room: $ sensors -f acpitz-virtual-0 Adapter: Virtual device temp1: +71.2°F (crit = +206.2°F) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +129.2°F Core1 Temp: +104.0°F -- "It is said that the lonely eagle flies to the mountain peaks while the lowly ant crawls the ground, but cannot the soul of the ant soar as high as the eagle?" --unknown /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#7
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/8/2010 8:49 PM PT, Robert Redelmeier typed:
`burnMMX P`. It is a bit old and not quite as high bandwidth as possible on newer processors. If there is no error, they should stay running indefinitely. Watch for terminations and/or dmesg. Run by `nice -19` should increase TLB transitions. I hope I am doing this correctly for nice -19. I ran one "nice -19 ../burnMMX p" command for over 25.5 minutes and no problems and errors. $ sensors -f acpitz-virtual-0 Adapter: Virtual device temp1: +71.2°F (crit = +206.2°F) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +122.0°F Core1 Temp: +98.6°F Or was I supposed to run two of them? -- "I like ants, in chocolate. Crunch, hummmm." --unknown /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#8
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
On 3/8/2010 8:49 PM PT, Robert Redelmeier typed: I ran two "burnMMX P" (did not use nice -19) processes since I have a dual core 939 CPU and no crashes and errors after 45.25 minutes. Before I aborted both, I checked the temperatures which look OK for a 75 degrees(F) room: $ sensors -f acpitz-virtual-0 Adapter: Virtual device temp1: +71.2°F (crit = +206.2°F) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +129.2°F Core1 Temp: +104.0°F You must have good cooling. `burnMMX P` only exercises 64 MB of RAM. You should run about _forty_ (40) of them. This also eats up more TLB entries and more switching with nice -19 . It won't run any hotter, (maybe slightly cooler with all the task switching) but will exercise more of the TLB. At the very least run 3 or 5 copies. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm |
#9
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
On 3/8/2010 8:49 PM PT, Robert Redelmeier typed: As Yousef has mentioned, any PSU failure serious enough to damage RAM could easily damage the CPU. Especially AMD with the RAM controller and busses inside the CPU. Damn. Intel CPUs does better with this? Not really. With an Intel CPU, the same spike would fry the northbridge with RAM. It depends on whether you prefer to fry your mobo or CPU. The CPU is easier to replace, but also often costs more. Thanks. I will try it when I don't need to use the box. You should update your program to support the newer processors. Probably. But I have neither time nor need. I'm still running on the same box as 1999. -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm |
#10
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/9/2010 6:01 AM PT, Robert Redelmeier typed:
wrote in part: On 3/8/2010 8:49 PM PT, Robert Redelmeier typed: I ran two "burnMMX P" (did not use nice -19) processes since I have a dual core 939 CPU and no crashes and errors after 45.25 minutes. Before I aborted both, I checked the temperatures which look OK for a 75 degrees(F) room: $ sensors -f acpitz-virtual-0 Adapter: Virtual device temp1: +71.2°F (crit = +206.2°F) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +129.2°F Core1 Temp: +104.0°F You must have good cooling. `burnMMX P` only exercises 64 MB of RAM. You should run about _forty_ (40) of them. This also eats up more TLB entries and more switching with nice -19 . 40 times?! Is there an easy way to run it in one command or something? I had to run two of them manually earlier. It won't run any hotter, (maybe slightly cooler with all the task switching) but will exercise more of the TLB. At the very least run 3 or 5 copies. OK, I am fine to 40 but I don't want to manually copy and paste 40 times. -- /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"TLB parity error in virtual array; TLB error 'instruction"? | Ant[_3_] | AMD x86-64 Processors | 8 | March 13th 10 04:32 PM |
"Parity Error Detected" message when running Intel Storage Console. | Brcobrem | Storage (alternative) | 1 | November 18th 09 08:49 PM |
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) | markm75 | Printers | 2 | August 19th 07 02:04 AM |
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual | Lady Margaret Thatcher | Printers | 5 | May 4th 06 04:51 AM |
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? | ByTor | AMD x86-64 Processors | 5 | January 13th 06 06:50 PM |