A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

"TLB parity error in virtual array; TLB error 'instruction"?



 
 
Thread Tools Display Modes
  #61  
Old March 14th 10, 05:28 AM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Robert Redelmeier wrote in part:
Bah, the error came back again after my tests:

dmesg:
[32399.988020] Machine check events logged

From /var/log/messages:
Mar 12 14:45:16 foobar kernel: [32399.988020] Machine check events logged
Mar 12 14:45:16 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem!
Mar 12 14:45:16 foobar mcelog: Please contact your hardware vendor
Mar 12 14:45:16 foobar mcelog: MCE 0
Mar 12 14:45:16 foobar mcelog: CPU 1 1 instruction cache
Mar 12 14:45:16 foobar mcelog: ADDR c11b6ff0
Mar 12 14:45:16 foobar mcelog: TIME 1268433916 Fri Mar 12 14:45:16 2010
Mar 12 14:45:16 foobar mcelog: TLB parity error in virtual array
Mar 12 14:45:16 foobar mcelog: TLB error 'instruction transaction, level 1'
Mar 12 14:45:16 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 12 14:45:16 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 12 14:45:16 foobar mcelog: CPUID Vendor AMD Family 15 Model 43

Noting the addr is in kernel space and the instruction cache,
this is going to take much ingenuity to replicate


You just gave me an idea: # cat /var/log/messages |grep MCGSTATUS
Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0


[snip] no, this is the status word where the bits have meanings.
The ADDR line tells you where the error occurred. 0xC+ is kernel space
on most kernels.




Having a better look through your logs, I see this addr is
very common (almost all errs are at this addr). Aren't
you curious about the instruction that produced the errors?
/boot/System.map should contain the addr of all kernel fns,
and there should be some way to lookup modules.

-- Robert

  #62  
Old March 14th 10, 08:40 AM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/13/2010 9:28 PM PT, Robert Redelmeier typed:

Having a better look through your logs, I see this addr is
very common (almost all errs are at this addr). Aren't
you curious about the instruction that produced the errors?
/boot/System.map should contain the addr of all kernel fns,
and there should be some way to lookup modules.


I did a "cat /var/log/messages |grep ADDR" and found these addresses:
c104e3f0
c106e8c0
c11b6ff0 (most common)

But none of them matched to /boot/System.map-2.6.32-trunk-686. Here are
close addresses around them for each one:

c104e2f9 T tick_handle_periodic
c104e360 T tick_get_broadcast_device

c1063e1b t stop_cpu
c1063ec6 T stop_machine_destroy

c11b6fb8 T acpi_pm_read_verified
c11b6ffc t acpi_pm_read


For the common one, it is ACPI. Hmm!
# locate acpi_pm
/usr/src/linux-headers-2.6.30-2-common/include/linux/acpi_pmtmr.h
/usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h
# more /usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h
#ifndef _ACPI_PMTMR_H_
#define _ACPI_PMTMR_H_

#include linux/clocksource.h

/* Number of PMTMR ticks expected during calibration run */
#define PMTMR_TICKS_PER_SEC 3579545

/* limit it to 24 bits */
#define ACPI_PM_MASK CLOCKSOURCE_MASK(24)

/* Overrun value */
#define ACPI_PM_OVRRUN (124)

#ifdef CONFIG_X86_PM_TIMER

extern u32 acpi_pm_read_verified(void);
extern u32 pmtmr_ioport;

static inline u32 acpi_pm_read_early(void)
{
if (!pmtmr_ioport)
return 0;
/* mask the output to 24 bits */
return acpi_pm_read_verified() & ACPI_PM_MASK;
}

extern void pmtimer_wait(unsigned);

#else

static inline u32 acpi_pm_read_early(void)
{
return 0;
}

#endif

#endif


Hmm, what is using ACPI then?
# lsof |grep acpi
kacpid 22 root cwd DIR 3,1 1024 2 /
kacpid 22 root rtd DIR 3,1 1024 2 /
kacpid 22 root txt unknown
/proc/22/exe
kacpi_not 23 root cwd DIR 3,1 1024 2 /
kacpi_not 23 root rtd DIR 3,1 1024 2 /
kacpi_not 23 root txt unknown
/proc/23/exe
kacpi_hot 24 root cwd DIR 3,1 1024 2 /
kacpi_hot 24 root rtd DIR 3,1 1024 2 /
kacpi_hot 24 root txt unknown
/proc/24/exe
acpid 1986 root cwd DIR 3,1 1024 2 /
acpid 1986 root rtd DIR 3,1 1024 2 /
acpid 1986 root txt REG 3,6 34684
353719 /usr/sbin/acpid
acpid 1986 root mem REG 3,1 1331496
14245 /lib/libc-2.10.2.so
acpid 1986 root mem REG 3,1 117416
14243 /lib/ld-2.10.2.so
acpid 1986 root 0u CHR 1,3 0t0
1344 /dev/null
acpid 1986 root 1u CHR 1,3 0t0
1344 /dev/null
acpid 1986 root 2u CHR 1,3 0t0
1344 /dev/null
acpid 1986 root 3r CHR 13,64 0t0
4005 /dev/input/event0
acpid 1986 root 4r CHR 13,65 0t0
4012 /dev/input/event1
acpid 1986 root 5r CHR 13,66 0t0
4016 /dev/input/event2
acpid 1986 root 6r DIR 0,10 0
1 inotify
acpid 1986 root 7u sock 0,6 0t0
5680 can't identify protocol
acpid 1986 root 8u unix 0xf5749c00 0t0
5681 /var/run/acpid.socket
acpid 1986 root 9u unix 0xf52ad400 0t0
7044 /var/run/acpid.socket
acpid 1986 root 10u unix 0xf6fef800 0t0
5683 socket
acpid 1986 root 11u unix 0xf5eb1200 0t0
1585927 /var/run/acpid.socket
acpid 1986 root 12u unix 0xf543a000 0t0
1585931 /var/run/acpid.socket
hald-addo 2632 haldaemon txt REG 3,6 11604
401855 /usr/lib/hal/hald-addon-acpi
I looked around on my Debian's installation, and found an acpid package
so I uninstalled it to see what happens... FYI:
# apt-get remove acpi
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package acpi is not installed, so not removed
0 upgraded, 0 newly installed, 0 to remove and 126 not upgraded.
foobar:/home/ant/download# apt-cache show acpid
Package: acpid
Priority: optional
Section: admin
Installed-Size: 196
Maintainer: Debian Acpi Team
Architectu i386
Version: 1:2.0.2-1
Depends: libc6 (= 2.4), lsb-base (= 3.2-14), module-init-tools (
3.1-rel-2)
Recommends: acpi-support-base (= 0.114-1)
Filename: pool/main/a/acpid/acpid_2.0.2-1_i386.deb
Size: 48204
MD5sum: f7a607fe746c5503f364ef82cd47cbd8
SHA1: 7fac7cedade5d17f6644da1cff1bdafc10d798b3
SHA256: 852fe7a6ac15d4c11a0d9df2739b34dab3307a3b96ffb9a960 29a1b0e23cca81
Description: Advanced Configuration and Power Interface event daemon
Modern computers support the Advanced Configuration and Power
Interface (ACPI)
to allow intelligent power management on your system and to query
battery and
configuration status.
  #63  
Old March 14th 10, 08:58 AM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/13/2010 9:18 PM PT, Robert Redelmeier typed:

Hmm, lsmod |grep acpid showed nothing.


It won't -- acpid is a daemon, not a module.
It shows on the process taks list (ps/top), not lsmod .

But it is unlikely to be the cause if cpufreq modules
aren't loaded.


Ah OK. Here you go after uninstalling acpid:

# ps aux |grep acpi
root 22 0.0 0.0 0 0 ? S Mar12 0:00 [kacpid]
root 23 0.0 0.0 0 0 ? S Mar12 0:00
[kacpi_notify]
root 24 0.0 0.0 0 0 ? S Mar12 0:00
[kacpi_hotplug]
108 2632 0.0 0.0 3240 1084 ? S Mar12 0:00
hald-addon-acpi: listening on acpid socket /var/run/acpid.socket

Don't know where kapci is coming from. Kernel?
--
"Don't step on ants... they're people too." --a quote from ANTZ movie.
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #64  
Old March 14th 10, 03:19 PM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote in part:
Don't know where kapci is coming from. Kernel?


Yes. rmmod anything that looks like apci.
You can Google AMD MCE ACPI ERRATA .

-- Robert

  #65  
Old March 14th 10, 04:17 PM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:

Don't know where kapci is coming from. Kernel?


Yes. rmmod anything that looks like apci.
You can Google AMD MCE ACPI ERRATA .


OK if I still need to dig deeper. It had been almost seven hours
(including losing an hour from PST to PDT change) and no new errors in
logs. Maybe we hit the jackpot. Crossing my antennae.

Maybe I should reboot/readd these modules later too because of rmmod
earlier:
rmmod cpufreq_powersave
rmmod cpufreq_userspace
rmmod cpufreq_stats
rmmod cpufreq_conservative
--
"... [Let us inquire] what glory there was in an omnipotent being
torturing forever a puny little creature who could in no way defend
himself? Would it be to the glory of a man to fry ants?" --Charlotte
Perkins Gilman
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #66  
Old March 14th 10, 11:10 PM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/14/2010 9:17 AM PT, Ant typed:

OK if I still need to dig deeper. It had been almost seven hours
(including losing an hour from PST to PDT change) and no new errors in
logs. Maybe we hit the jackpot. Crossing my antennae.


Nope, my luck failed (dangit!):

[134549.988029] Machine check events logged

Mar 14 14:19:23 foobar kernel: [134549.988029] Machine check events logged
Mar 14 14:19:23 foobar mcelog: HARDWARE ERROR. This is *NOT* a software
problem!
Mar 14 14:19:23 foobar mcelog: Please contact your hardware vendor
Mar 14 14:19:23 foobar mcelog: MCE 0
Mar 14 14:19:23 foobar mcelog: CPU 1 1 instruction cache
Mar 14 14:19:23 foobar mcelog: ADDR c11b6ff0
Mar 14 14:19:23 foobar mcelog: TIME 1268601563 Sun Mar 14 14:19:23 2010
Mar 14 14:19:23 foobar mcelog: TLB parity error in virtual array
Mar 14 14:19:23 foobar mcelog: TLB error 'instruction transaction,
level 1'
Mar 14 14:19:23 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 14 14:19:23 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 14 14:19:23 foobar mcelog: CPUID Vendor AMD Family 15 Model 43
--
"You know what you are Earl? You're a little, tiny, busy ant. You too,
Mike. Both you guys, with your mortgages and your term life insurance
and your webber kettles(??). Ant. Ant. All of you, you're all a bunch of
little, busy, blind ants. All you all. Saving up for your rainy days.
Scratching up your acorns for the winter. You look at me and you think,
"What a piece of pathetic trash out there in that leaky trailer." No
spoon, no fork, no prospects. But, you know why? Cause I'm a
grasshopper. Ant. Grasshopper. Ant. Grasshopper. Ant. Grasshopper. Ant.
Grasshopper. Ant!" --Chris in the bar, before being thrown out in "Jaws
of Life"."
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #67  
Old March 14th 10, 11:19 PM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:

Don't know where kapci is coming from. Kernel?


Yes. rmmod anything that looks like apci.
You can Google AMD MCE ACPI ERRATA .


I am not familiar with hardwares and don't know what other modules to
remove:
$ lsmod
Module Size Used by
binfmt_misc 4875 1
ppdev 4058 0
lp 5570 0
parport 22554 2 ppdev,lp
vboxnetadp 5154 0
vboxnetflt 10202 0
vboxdrv 114333 2 vboxnetadp,vboxnetflt
xt_tcpudp 1743 92
xt_limit 1088 2
nf_conntrack_ipv4 7597 59
nf_defrag_ipv4 779 1 nf_conntrack_ipv4
xt_state 927 59
ipt_LOG 3570 2
ipt_REJECT 1517 2
nf_conntrack_irc 2499 0
nf_conntrack_ftp 4260 0
nf_conntrack 37775 4
nf_conntrack_ipv4,xt_state,nf_conntrack_irc,nf_con ntrack_ftp
iptable_filter 1790 1
ip_tables 7690 1 iptable_filter
x_tables 8335 6
xt_tcpudp,xt_limit,xt_state,ipt_LOG,ipt_REJECT,ip_ tables
dm_snapshot 17953 0
dm_mirror 9639 0
dm_region_hash 5612 1 dm_mirror
dm_log 6369 2 dm_mirror,dm_region_hash
dm_mod 45854 3 dm_snapshot,dm_mirror,dm_log
hwmon_vid 1528 0
fuse 43554 1
loop 9721 0
snd_intel8x0 19523 1
snd_ac97_codec 79136 1 snd_intel8x0
ac97_bus 710 1 snd_ac97_codec
snd_pcm_oss 28479 0
snd_mixer_oss 10461 1 snd_pcm_oss
snd_pcm 47350 3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_seq_midi 3480 0
snd_rawmidi 12313 1 snd_seq_midi
snd_seq_midi_event 3684 1 snd_seq_midi
snd_seq 35303 2 snd_seq_midi,snd_seq_midi_event
snd_timer 12258 2 snd_pcm,snd_seq
snd_seq_device 3673 3 snd_seq_midi,snd_rawmidi,snd_seq
snd 33551 11
snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_ oss,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_ device
soundcore 3450 1 snd
snd_page_alloc 4977 2 snd_intel8x0,snd_pcm
evdev 5609 7
pcspkr 1207 0
nvidia 9712528 38
agpgart 19516 1 nvidia
serio_raw 2916 0
psmouse 44409 0
i2c_nforce2 4464 0
k8temp 2411 0
i2c_core 12612 2 nvidia,i2c_nforce2
processor 25803 0
ext3 93828 9
jbd 31965 1 ext3
mbcache 3762 1 ext3
usbhid 26784 2
hid 50545 1 usbhid
ide_cd_mod 21044 0
cdrom 26487 1 ide_cd_mod
ide_gd_mod 17103 12
ohci_hcd 16804 0
ide_pci_generic 1924 0
ata_generic 2015 0
amd74xx 3552 11
ehci_hcd 27230 0
e100 22217 0
floppy 40923 0
mii 2714 1 e100
sata_nv 15386 0
button 3598 0
libata 113728 2 ata_generic,sata_nv
ide_core 63850 4
ide_cd_mod,ide_gd_mod,ide_pci_generic,amd74xx
scsi_mod 101073 1 libata
forcedeth 40709 0
usbcore 97930 6 usbhid,ohci_hcd,ehci_hcd
nls_base 4541 1 usbcore
thermal 9206 0
fan 2586 0
thermal_sys 9378 3 processor,thermal,fan
--
"Applied mathematics will always need pure mathematics, just as
anteaters will always need ants." --Paul Halmos
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #68  
Old March 15th 10, 03:22 PM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote in part:
On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:

Don't know where kapci is coming from. Kernel?


Yes. rmmod anything that looks like apci.
You can Google AMD MCE ACPI ERRATA .


I am not familiar with hardwares and don't know what other modules to
remove:
$ lsmod


None of these really look like ACPI . It might well be compiled
into the kernel. Recompiling a kernel is actually very easy,
but reconfiguring the kernel is a bit difficult.

You may be able to disable acpi with a kernel boot parameter
(given during boot) like `noacpi` .

-- Robert

  #69  
Old March 15th 10, 03:57 PM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/15/2010 8:22 AM PT, Robert Redelmeier typed:

wrote in part:
On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:

Don't know where kapci is coming from. Kernel?

Yes. rmmod anything that looks like apci.
You can Google AMD MCE ACPI ERRATA .


I am not familiar with hardwares and don't know what other modules to
remove:
$ lsmod


None of these really look like ACPI . It might well be compiled
into the kernel. Recompiling a kernel is actually very easy,
but reconfiguring the kernel is a bit difficult.


I remember reconfiguring Kernel during Red Hat days. Oh my goodness,
that was a such a pain to reconfigure since I had NO idea what each part
was for! So I never touched it again.


You may be able to disable acpi with a kernel boot parameter
(given during boot) like `noacpi` .


I assume that's done via grub loader. Does APCI only do power management
or are there other things?
--
"Oh, good morning, my little worker ants! That's just a figure of
speech; I would NEVER compare you to insects. At least not after that
sensitivity training seminar those maggots at the network forced me to
attend!" --Kay, Murphy Brown
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #70  
Old March 15th 10, 06:29 PM posted to comp.sys.ibm.pc.hardware.chips
Robert Redelmeier
external usenet poster
 
Posts: 316
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote in part:
I remember reconfiguring Kernel during Red Hat days. Oh my
goodness, that was a such a pain to reconfigure since I had NO
idea what each part was for! So I never touched it again.


As I said, a bit complex. It helps if you know _exactly_ what
hardware you have.

I assume that's done via grub loader. Does APCI only do power
management or are there other things?


Yes, holdint [TAB] or some other key during boot should
bring up a command line.

APCI only does power, but that has tenticles into many
hardware devices.

-- Robert
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"TLB parity error in virtual array; TLB error 'instruction"? Ant[_3_] AMD x86-64 Processors 8 March 13th 10 04:32 PM
"Parity Error Detected" message when running Intel Storage Console. Brcobrem Storage (alternative) 1 November 18th 09 08:49 PM
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) markm75 Printers 2 August 19th 07 02:04 AM
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual Lady Margaret Thatcher Printers 5 May 4th 06 04:51 AM
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? ByTor AMD x86-64 Processors 5 January 13th 06 06:50 PM


All times are GMT +1. The time now is 10:47 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.