If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#61
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Robert Redelmeier wrote in part:
Bah, the error came back again after my tests: dmesg: [32399.988020] Machine check events logged From /var/log/messages: Mar 12 14:45:16 foobar kernel: [32399.988020] Machine check events logged Mar 12 14:45:16 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 12 14:45:16 foobar mcelog: Please contact your hardware vendor Mar 12 14:45:16 foobar mcelog: MCE 0 Mar 12 14:45:16 foobar mcelog: CPU 1 1 instruction cache Mar 12 14:45:16 foobar mcelog: ADDR c11b6ff0 Mar 12 14:45:16 foobar mcelog: TIME 1268433916 Fri Mar 12 14:45:16 2010 Mar 12 14:45:16 foobar mcelog: TLB parity error in virtual array Mar 12 14:45:16 foobar mcelog: TLB error 'instruction transaction, level 1' Mar 12 14:45:16 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 12 14:45:16 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 12 14:45:16 foobar mcelog: CPUID Vendor AMD Family 15 Model 43 Noting the addr is in kernel space and the instruction cache, this is going to take much ingenuity to replicate You just gave me an idea: # cat /var/log/messages |grep MCGSTATUS Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 [snip] no, this is the status word where the bits have meanings. The ADDR line tells you where the error occurred. 0xC+ is kernel space on most kernels. Having a better look through your logs, I see this addr is very common (almost all errs are at this addr). Aren't you curious about the instruction that produced the errors? /boot/System.map should contain the addr of all kernel fns, and there should be some way to lookup modules. -- Robert |
#62
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/13/2010 9:28 PM PT, Robert Redelmeier typed:
Having a better look through your logs, I see this addr is very common (almost all errs are at this addr). Aren't you curious about the instruction that produced the errors? /boot/System.map should contain the addr of all kernel fns, and there should be some way to lookup modules. I did a "cat /var/log/messages |grep ADDR" and found these addresses: c104e3f0 c106e8c0 c11b6ff0 (most common) But none of them matched to /boot/System.map-2.6.32-trunk-686. Here are close addresses around them for each one: c104e2f9 T tick_handle_periodic c104e360 T tick_get_broadcast_device c1063e1b t stop_cpu c1063ec6 T stop_machine_destroy c11b6fb8 T acpi_pm_read_verified c11b6ffc t acpi_pm_read For the common one, it is ACPI. Hmm! # locate acpi_pm /usr/src/linux-headers-2.6.30-2-common/include/linux/acpi_pmtmr.h /usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h # more /usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h #ifndef _ACPI_PMTMR_H_ #define _ACPI_PMTMR_H_ #include linux/clocksource.h /* Number of PMTMR ticks expected during calibration run */ #define PMTMR_TICKS_PER_SEC 3579545 /* limit it to 24 bits */ #define ACPI_PM_MASK CLOCKSOURCE_MASK(24) /* Overrun value */ #define ACPI_PM_OVRRUN (124) #ifdef CONFIG_X86_PM_TIMER extern u32 acpi_pm_read_verified(void); extern u32 pmtmr_ioport; static inline u32 acpi_pm_read_early(void) { if (!pmtmr_ioport) return 0; /* mask the output to 24 bits */ return acpi_pm_read_verified() & ACPI_PM_MASK; } extern void pmtimer_wait(unsigned); #else static inline u32 acpi_pm_read_early(void) { return 0; } #endif #endif Hmm, what is using ACPI then? # lsof |grep acpi kacpid 22 root cwd DIR 3,1 1024 2 / kacpid 22 root rtd DIR 3,1 1024 2 / kacpid 22 root txt unknown /proc/22/exe kacpi_not 23 root cwd DIR 3,1 1024 2 / kacpi_not 23 root rtd DIR 3,1 1024 2 / kacpi_not 23 root txt unknown /proc/23/exe kacpi_hot 24 root cwd DIR 3,1 1024 2 / kacpi_hot 24 root rtd DIR 3,1 1024 2 / kacpi_hot 24 root txt unknown /proc/24/exe acpid 1986 root cwd DIR 3,1 1024 2 / acpid 1986 root rtd DIR 3,1 1024 2 / acpid 1986 root txt REG 3,6 34684 353719 /usr/sbin/acpid acpid 1986 root mem REG 3,1 1331496 14245 /lib/libc-2.10.2.so acpid 1986 root mem REG 3,1 117416 14243 /lib/ld-2.10.2.so acpid 1986 root 0u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 1u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 2u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 3r CHR 13,64 0t0 4005 /dev/input/event0 acpid 1986 root 4r CHR 13,65 0t0 4012 /dev/input/event1 acpid 1986 root 5r CHR 13,66 0t0 4016 /dev/input/event2 acpid 1986 root 6r DIR 0,10 0 1 inotify acpid 1986 root 7u sock 0,6 0t0 5680 can't identify protocol acpid 1986 root 8u unix 0xf5749c00 0t0 5681 /var/run/acpid.socket acpid 1986 root 9u unix 0xf52ad400 0t0 7044 /var/run/acpid.socket acpid 1986 root 10u unix 0xf6fef800 0t0 5683 socket acpid 1986 root 11u unix 0xf5eb1200 0t0 1585927 /var/run/acpid.socket acpid 1986 root 12u unix 0xf543a000 0t0 1585931 /var/run/acpid.socket hald-addo 2632 haldaemon txt REG 3,6 11604 401855 /usr/lib/hal/hald-addon-acpi I looked around on my Debian's installation, and found an acpid package so I uninstalled it to see what happens... FYI: # apt-get remove acpi Reading package lists... Done Building dependency tree Reading state information... Done Package acpi is not installed, so not removed 0 upgraded, 0 newly installed, 0 to remove and 126 not upgraded. foobar:/home/ant/download# apt-cache show acpid Package: acpid Priority: optional Section: admin Installed-Size: 196 Maintainer: Debian Acpi Team Architectu i386 Version: 1:2.0.2-1 Depends: libc6 (= 2.4), lsb-base (= 3.2-14), module-init-tools ( 3.1-rel-2) Recommends: acpi-support-base (= 0.114-1) Filename: pool/main/a/acpid/acpid_2.0.2-1_i386.deb Size: 48204 MD5sum: f7a607fe746c5503f364ef82cd47cbd8 SHA1: 7fac7cedade5d17f6644da1cff1bdafc10d798b3 SHA256: 852fe7a6ac15d4c11a0d9df2739b34dab3307a3b96ffb9a960 29a1b0e23cca81 Description: Advanced Configuration and Power Interface event daemon Modern computers support the Advanced Configuration and Power Interface (ACPI) to allow intelligent power management on your system and to query battery and configuration status. |
#63
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/13/2010 9:18 PM PT, Robert Redelmeier typed:
Hmm, lsmod |grep acpid showed nothing. It won't -- acpid is a daemon, not a module. It shows on the process taks list (ps/top), not lsmod . But it is unlikely to be the cause if cpufreq modules aren't loaded. Ah OK. Here you go after uninstalling acpid: # ps aux |grep acpi root 22 0.0 0.0 0 0 ? S Mar12 0:00 [kacpid] root 23 0.0 0.0 0 0 ? S Mar12 0:00 [kacpi_notify] root 24 0.0 0.0 0 0 ? S Mar12 0:00 [kacpi_hotplug] 108 2632 0.0 0.0 3240 1084 ? S Mar12 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket Don't know where kapci is coming from. Kernel? -- "Don't step on ants... they're people too." --a quote from ANTZ movie. /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#64
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
Don't know where kapci is coming from. Kernel? Yes. rmmod anything that looks like apci. You can Google AMD MCE ACPI ERRATA . -- Robert |
#65
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:
Don't know where kapci is coming from. Kernel? Yes. rmmod anything that looks like apci. You can Google AMD MCE ACPI ERRATA . OK if I still need to dig deeper. It had been almost seven hours (including losing an hour from PST to PDT change) and no new errors in logs. Maybe we hit the jackpot. Crossing my antennae. Maybe I should reboot/readd these modules later too because of rmmod earlier: rmmod cpufreq_powersave rmmod cpufreq_userspace rmmod cpufreq_stats rmmod cpufreq_conservative -- "... [Let us inquire] what glory there was in an omnipotent being torturing forever a puny little creature who could in no way defend himself? Would it be to the glory of a man to fry ants?" --Charlotte Perkins Gilman /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#66
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/14/2010 9:17 AM PT, Ant typed:
OK if I still need to dig deeper. It had been almost seven hours (including losing an hour from PST to PDT change) and no new errors in logs. Maybe we hit the jackpot. Crossing my antennae. Nope, my luck failed (dangit!): [134549.988029] Machine check events logged Mar 14 14:19:23 foobar kernel: [134549.988029] Machine check events logged Mar 14 14:19:23 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 14 14:19:23 foobar mcelog: Please contact your hardware vendor Mar 14 14:19:23 foobar mcelog: MCE 0 Mar 14 14:19:23 foobar mcelog: CPU 1 1 instruction cache Mar 14 14:19:23 foobar mcelog: ADDR c11b6ff0 Mar 14 14:19:23 foobar mcelog: TIME 1268601563 Sun Mar 14 14:19:23 2010 Mar 14 14:19:23 foobar mcelog: TLB parity error in virtual array Mar 14 14:19:23 foobar mcelog: TLB error 'instruction transaction, level 1' Mar 14 14:19:23 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 14 14:19:23 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 14 14:19:23 foobar mcelog: CPUID Vendor AMD Family 15 Model 43 -- "You know what you are Earl? You're a little, tiny, busy ant. You too, Mike. Both you guys, with your mortgages and your term life insurance and your webber kettles(??). Ant. Ant. All of you, you're all a bunch of little, busy, blind ants. All you all. Saving up for your rainy days. Scratching up your acorns for the winter. You look at me and you think, "What a piece of pathetic trash out there in that leaky trailer." No spoon, no fork, no prospects. But, you know why? Cause I'm a grasshopper. Ant. Grasshopper. Ant. Grasshopper. Ant. Grasshopper. Ant. Grasshopper. Ant!" --Chris in the bar, before being thrown out in "Jaws of Life"." /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#67
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/14/2010 8:19 AM PT, Robert Redelmeier typed:
Don't know where kapci is coming from. Kernel? Yes. rmmod anything that looks like apci. You can Google AMD MCE ACPI ERRATA . I am not familiar with hardwares and don't know what other modules to remove: $ lsmod Module Size Used by binfmt_misc 4875 1 ppdev 4058 0 lp 5570 0 parport 22554 2 ppdev,lp vboxnetadp 5154 0 vboxnetflt 10202 0 vboxdrv 114333 2 vboxnetadp,vboxnetflt xt_tcpudp 1743 92 xt_limit 1088 2 nf_conntrack_ipv4 7597 59 nf_defrag_ipv4 779 1 nf_conntrack_ipv4 xt_state 927 59 ipt_LOG 3570 2 ipt_REJECT 1517 2 nf_conntrack_irc 2499 0 nf_conntrack_ftp 4260 0 nf_conntrack 37775 4 nf_conntrack_ipv4,xt_state,nf_conntrack_irc,nf_con ntrack_ftp iptable_filter 1790 1 ip_tables 7690 1 iptable_filter x_tables 8335 6 xt_tcpudp,xt_limit,xt_state,ipt_LOG,ipt_REJECT,ip_ tables dm_snapshot 17953 0 dm_mirror 9639 0 dm_region_hash 5612 1 dm_mirror dm_log 6369 2 dm_mirror,dm_region_hash dm_mod 45854 3 dm_snapshot,dm_mirror,dm_log hwmon_vid 1528 0 fuse 43554 1 loop 9721 0 snd_intel8x0 19523 1 snd_ac97_codec 79136 1 snd_intel8x0 ac97_bus 710 1 snd_ac97_codec snd_pcm_oss 28479 0 snd_mixer_oss 10461 1 snd_pcm_oss snd_pcm 47350 3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss snd_seq_midi 3480 0 snd_rawmidi 12313 1 snd_seq_midi snd_seq_midi_event 3684 1 snd_seq_midi snd_seq 35303 2 snd_seq_midi,snd_seq_midi_event snd_timer 12258 2 snd_pcm,snd_seq snd_seq_device 3673 3 snd_seq_midi,snd_rawmidi,snd_seq snd 33551 11 snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_ oss,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_ device soundcore 3450 1 snd snd_page_alloc 4977 2 snd_intel8x0,snd_pcm evdev 5609 7 pcspkr 1207 0 nvidia 9712528 38 agpgart 19516 1 nvidia serio_raw 2916 0 psmouse 44409 0 i2c_nforce2 4464 0 k8temp 2411 0 i2c_core 12612 2 nvidia,i2c_nforce2 processor 25803 0 ext3 93828 9 jbd 31965 1 ext3 mbcache 3762 1 ext3 usbhid 26784 2 hid 50545 1 usbhid ide_cd_mod 21044 0 cdrom 26487 1 ide_cd_mod ide_gd_mod 17103 12 ohci_hcd 16804 0 ide_pci_generic 1924 0 ata_generic 2015 0 amd74xx 3552 11 ehci_hcd 27230 0 e100 22217 0 floppy 40923 0 mii 2714 1 e100 sata_nv 15386 0 button 3598 0 libata 113728 2 ata_generic,sata_nv ide_core 63850 4 ide_cd_mod,ide_gd_mod,ide_pci_generic,amd74xx scsi_mod 101073 1 libata forcedeth 40709 0 usbcore 97930 6 usbhid,ohci_hcd,ehci_hcd nls_base 4541 1 usbcore thermal 9206 0 fan 2586 0 thermal_sys 9378 3 processor,thermal,fan -- "Applied mathematics will always need pure mathematics, just as anteaters will always need ants." --Paul Halmos /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#68
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
On 3/14/2010 8:19 AM PT, Robert Redelmeier typed: Don't know where kapci is coming from. Kernel? Yes. rmmod anything that looks like apci. You can Google AMD MCE ACPI ERRATA . I am not familiar with hardwares and don't know what other modules to remove: $ lsmod None of these really look like ACPI . It might well be compiled into the kernel. Recompiling a kernel is actually very easy, but reconfiguring the kernel is a bit difficult. You may be able to disable acpi with a kernel boot parameter (given during boot) like `noacpi` . -- Robert |
#69
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 3/15/2010 8:22 AM PT, Robert Redelmeier typed:
wrote in part: On 3/14/2010 8:19 AM PT, Robert Redelmeier typed: Don't know where kapci is coming from. Kernel? Yes. rmmod anything that looks like apci. You can Google AMD MCE ACPI ERRATA . I am not familiar with hardwares and don't know what other modules to remove: $ lsmod None of these really look like ACPI . It might well be compiled into the kernel. Recompiling a kernel is actually very easy, but reconfiguring the kernel is a bit difficult. I remember reconfiguring Kernel during Red Hat days. Oh my goodness, that was a such a pain to reconfigure since I had NO idea what each part was for! So I never touched it again. You may be able to disable acpi with a kernel boot parameter (given during boot) like `noacpi` . I assume that's done via grub loader. Does APCI only do power management or are there other things? -- "Oh, good morning, my little worker ants! That's just a figure of speech; I would NEVER compare you to insects. At least not after that sensitivity training seminar those maggots at the network forced me to attend!" --Kay, Murphy Brown /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#70
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote in part:
I remember reconfiguring Kernel during Red Hat days. Oh my goodness, that was a such a pain to reconfigure since I had NO idea what each part was for! So I never touched it again. As I said, a bit complex. It helps if you know _exactly_ what hardware you have. I assume that's done via grub loader. Does APCI only do power management or are there other things? Yes, holdint [TAB] or some other key during boot should bring up a command line. APCI only does power, but that has tenticles into many hardware devices. -- Robert |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"TLB parity error in virtual array; TLB error 'instruction"? | Ant[_3_] | AMD x86-64 Processors | 8 | March 13th 10 04:32 PM |
"Parity Error Detected" message when running Intel Storage Console. | Brcobrem | Storage (alternative) | 1 | November 18th 09 08:49 PM |
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) | markm75 | Printers | 2 | August 19th 07 02:04 AM |
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual | Lady Margaret Thatcher | Printers | 5 | May 4th 06 04:51 AM |
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? | ByTor | AMD x86-64 Processors | 5 | January 13th 06 06:50 PM |