A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

"TLB parity error in virtual array; TLB error 'instruction"?



 
 
Thread Tools Display Modes
  #71  
Old March 15th 10, 08:27 PM posted to comp.sys.ibm.pc.hardware.chips
Jerry Peters
external usenet poster
 
Posts: 71
Default "TLB parity error in virtual array; TLB error 'instruction"?

Robert Redelmeier wrote:
Ant wrote in part:
I remember reconfiguring Kernel during Red Hat days. Oh my
goodness, that was a such a pain to reconfigure since I had NO
idea what each part was for! So I never touched it again.


As I said, a bit complex. It helps if you know _exactly_ what
hardware you have.

I assume that's done via grub loader. Does APCI only do power
management or are there other things?


Yes, holdint [TAB] or some other key during boot should
bring up a command line.

APCI only does power, but that has tenticles into many
hardware devices.

-- Robert


No, ACPI is also involved with hardware configuration: Advanced
*Configuration* & Power Interface.

Jerry
  #72  
Old March 16th 10, 12:10 AM posted to comp.sys.ibm.pc.hardware.chips
[email protected]
external usenet poster
 
Posts: 191
Default "TLB parity error in virtual array; TLB error 'instruction"?

I remember reconfiguring Kernel during Red Hat days. Oh my
goodness, that was a such a pain to reconfigure since I had NO
idea what each part was for! So I never touched it again.


As I said, a bit complex. It helps if you know _exactly_ what
hardware you have.


Yeah. I even knew some hardware basics, but the details like chipsets.


I assume that's done via grub loader. Does APCI only do power
management or are there other things?


Yes, holdint [TAB] or some other key during boot should
bring up a command line.

APCI only does power, but that has tenticles into many
hardware devices.


Sheesh, so complex.
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
  #73  
Old March 16th 10, 03:04 PM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Jerry Peters wrote in part:
No, ACPI is also involved with hardware configuration:
Advanced *Configuration* & Power Interface.


That was the intent, a replacement for PnP, however AFAIK Linux
_only_ implements the power features, and even has trouble with
that. Linus has been known to rail against ACPI.

-- Robert

  #74  
Old March 16th 10, 08:02 PM posted to comp.sys.ibm.pc.hardware.chips,comp.os.linux.hardware
[email protected]
external usenet poster
 
Posts: 191
Default "TLB parity error in virtual array; TLB error 'instruction"?

Having a better look through your logs, I see this addr is
very common (almost all errs are at this addr). Aren't
you curious about the instruction that produced the errors?
/boot/System.map should contain the addr of all kernel fns,
and there should be some way to lookup modules.


I did a "cat /var/log/messages |grep ADDR" and found these addresses:
c104e3f0
c106e8c0
c11b6ff0 (most common)

But none of them matched to /boot/System.map-2.6.32-trunk-686. Here are
close addresses around them for each one:

c104e2f9 T tick_handle_periodic
c104e360 T tick_get_broadcast_device

c1063e1b t stop_cpu
c1063ec6 T stop_machine_destroy

c11b6fb8 T acpi_pm_read_verified
c11b6ffc t acpi_pm_read


Since I did a Kernel upgrade (2.6.32-3 from -2 trunk) yesterday morning,
I noticed a new address in my /var/log/messages (only one so far):
Mar 16 05:41:16 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem!
Mar 16 05:41:16 foobar mcelog: Please contact your hardware vendor
Mar 16 05:41:16 foobar mcelog: MCE 0
Mar 16 05:41:16 foobar mcelog: CPU 1 1 instruction cache
Mar 16 05:41:16 foobar mcelog: ADDR c104e570
Mar 16 05:41:16 foobar mcelog: TIME 1268743276 Tue Mar 16 05:41:16 2010
Mar 16 05:41:16 foobar mcelog: TLB parity error in virtual array
Mar 16 05:41:16 foobar mcelog: TLB error 'instruction transaction, level 1'
Mar 16 05:41:16 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 16 05:41:16 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 16 05:41:16 foobar mcelog: CPUID Vendor AMD Family 15 Model 43

# ls -all /boot/System.map-2.6.32-3-686
-rw-r--r-- 1 root root 1259340 2010-02-25 01:00 /boot/System.map-2.6.32-3-686

I am going to assume contents changed in both Kernel and the system.map. I did a look up to match that c104e570 address. Closest address we
# cat /boot/System.map-2.6.32-3-686 |grep c104e
c104e07d t tick_notify
c104e374 t tick_periodic
c104e3dd T tick_handle_periodic
c104e444 T tick_get_broadcast_device
c104e44a T tick_get_broadcast_mask
c104e450 T tick_is_broadcast_device
c104e464 T tick_set_periodic_handler
c104e477 T tick_get_broadcast_oneshot_mask
c104e47d T tick_broadcast_oneshot_active
c104e48a T tick_shutdown_broadcast_oneshot
c104e4ac T tick_check_oneshot_broadcast
c104e4d5 T tick_resume_broadcast_oneshot
c104e4e2 T tick_broadcast_setup_oneshot
c104e5ae T tick_broadcast_switch_to_oneshot
c104e5e0 t tick_do_broadcast
c104e634 t tick_handle_oneshot_broadcast
c104e71d t tick_do_periodic_broadcast
c104e74a T tick_broadcast_oneshot_control
c104e82c T tick_resume_broadcast
c104e8a3 T tick_device_uses_broadcast
c104e91b T tick_suspend_broadcast
c104e943 T tick_shutdown_broadcast
c104e989 t tick_handle_periodic_broadcast
c104e9ce T tick_broadcast_on_off
c104eb0e T tick_check_broadcast_device
c104eb60 T tick_oneshot_mode_active
c104eb96 T tick_switch_to_oneshot
c104ec1e T tick_init_highres
c104ec28 T tick_dev_program_event
c104eca9 T tick_setup_oneshot
c104ecd9 T tick_program_event
c104ecfc T tick_resume_oneshot
c104ed24 T tick_get_tick_sched
c104ed33 T tick_nohz_get_sleep_length
c104ed4c T tick_oneshot_notify
c104ed63 t tick_init_jiffy_update
c104edae T tick_check_oneshot_change
c104eea1 t tick_do_update_jiffies64
c104ef87 t tick_nohz_handler

A Google quick search
(http://www.google.com/search?q=linux...tick+broadcast) seems to
show related to APIC? Does anyone know what these ticks do to cause
these rare and random machine errors and kernel panics? The address
seems to hang out in broadcast area. Again, I am not familiar with
hardwares.
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
  #75  
Old March 16th 10, 08:41 PM posted to comp.sys.ibm.pc.hardware.chips
Jerry Peters
external usenet poster
 
Posts: 71
Default "TLB parity error in virtual array; TLB error 'instruction"?

Robert Redelmeier wrote:
Jerry Peters wrote in part:
No, ACPI is also involved with hardware configuration:
Advanced *Configuration* & Power Interface.


That was the intent, a replacement for PnP, however AFAIK Linux
_only_ implements the power features, and even has trouble with
that. Linus has been known to rail against ACPI.

-- Robert

Wrong, Linux implements the configuration features also. Some
machines, probably newer laptops, can't be configured without ACPI.
And I'd expect that desktop machines will be getting to that point
also.
Linus hates the ACPI design, the AML language that invokes unknown and
probably buggy firmware routines. It's another "everything including
the kitchen sink" design.
I'd doubt that the OP's problem is caused by ACPI though. The TLB on
x86 is mostly hardware maintained, the OS's sole responsibility is to
purge the TLB when it changes the page tables. He's getting a parity
error in the associative array, that's a hardware problem.

Jerry
  #76  
Old March 16th 10, 09:34 PM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Jerry Peters wrote in part:
Wrong, Linux implements the configuration features also. Some
machines, probably newer laptops, can't be configured without ACPI.


While I cannot say that _none_ of the 1000s of device modules use ACPI,
I can say that most do not need it. Not to say BIOS didn't use it.
I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did
it help when I couldn't get a device working -- something fairly
frequent under Linux, especially for wireless. Very frustrating when
`lspci` shows it. I presume some sort of device code IPL is required.

I have no problem squirting arbitrary bytes at known PCI addr[s], nor
do I imagine Linus does either, although Stallman might. But giving
execution over to foreign code in ring0 is a recipe for insecurity.
You wanna get Theo de Raadt even hotter under the collar?


I'd doubt that the OP's problem is caused by ACPI though. The TLB on
x86 is mostly hardware maintained, the OS's sole responsibility is to
purge the TLB when it changes the page tables. He's getting a parity
error in the associative array, that's a hardware problem.


Agreed it looks like a hardware problem. But the fact it arises
almost exclusively at one code address is very suspicious. Some code
there seems to be triggering some hardware "sensitivity". Especially
since the OP did not have this problem prior to a known PSU fry-fest.

There have been recent changes to the kernel in this area --
perhaps a roll-back to an earlier kernel (that gave good service
on the hardware) would be a good test. Newer is not always better.

-- Robert





Jerry

  #78  
Old March 17th 10, 12:13 AM posted to comp.sys.ibm.pc.hardware.chips
[email protected]
external usenet poster
 
Posts: 191
Default "TLB parity error in virtual array; TLB error 'instruction"?

Wrong, Linux implements the configuration features also. Some
machines, probably newer laptops, can't be configured without ACPI.

Very frustrating when `lspci` shows it. I presume some sort of device code IPL is required.


FYI if it is related to my issues:
$ lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:09.0 Ethernet controller: Intel Corporation 82559 InBusiness 10/100 (rev 08)
05:00.0 VGA compatible controller: nVidia Corporation G92 [GeForce 8800 GT] (rev a2)


There have been recent changes to the kernel in this area --
perhaps a roll-back to an earlier kernel (that gave good service
on the hardware) would be a good test. Newer is not always better.


I was using the same Kernel 2.6.30 before and after the PSU incident. I
never had problems before, but started having problems after. Unless
something else like related kernel updates (modules or whatever) started
them.

--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
  #79  
Old March 17th 10, 12:19 AM posted to comp.os.linux.hardware,comp.sys.ibm.pc.hardware.chips
[email protected]
external usenet poster
 
Posts: 191
Default "TLB parity error in virtual array; TLB error 'instruction"?

Does anyone know what these ticks do to cause
these rare and random machine errors and kernel panics?


No but everything about those errors looks hardware related so I'd be looking at
replacing the cpu at the very least. That looks like the most likely component
but it's not necessarily the right one - other bits that spring to mind are
motherboard, PSU and RAM.


Yeah, it is probably my CPU since my PSU+video card went dead and a 512
MB RAM piece showed memory errors in memtest86+ v4.00 before these
problems came out. After replacing all of them, memtest86+ v4.00 passed
a few times for several hours and few days of testings (including its
test #9).
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"TLB parity error in virtual array; TLB error 'instruction"? Ant[_3_] AMD x86-64 Processors 8 March 13th 10 04:32 PM
"Parity Error Detected" message when running Intel Storage Console. Brcobrem Storage (alternative) 1 November 18th 09 08:49 PM
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) markm75 Printers 2 August 19th 07 02:04 AM
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual Lady Margaret Thatcher Printers 5 May 4th 06 04:51 AM
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? ByTor AMD x86-64 Processors 5 January 13th 06 06:50 PM


All times are GMT +1. The time now is 07:51 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.