A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

"TLB parity error in virtual array; TLB error 'instruction"?



 
 
Thread Tools Display Modes
  #81  
Old March 17th 10, 09:10 PM posted to comp.sys.ibm.pc.hardware.chips
Jerry Peters
external usenet poster
 
Posts: 71
Default "TLB parity error in virtual array; TLB error 'instruction"?

Robert Redelmeier wrote:
Jerry Peters wrote in part:
Wrong, Linux implements the configuration features also. Some
machines, probably newer laptops, can't be configured without ACPI.


While I cannot say that _none_ of the 1000s of device modules use ACPI,
I can say that most do not need it. Not to say BIOS didn't use it.
I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did
it help when I couldn't get a device working -- something fairly
frequent under Linux, especially for wireless. Very frustrating when
`lspci` shows it. I presume some sort of device code IPL is required.

I have no problem squirting arbitrary bytes at known PCI addr[s], nor
do I imagine Linus does either, although Stallman might. But giving
execution over to foreign code in ring0 is a recipe for insecurity.
You wanna get Theo de Raadt even hotter under the collar?


IIRC some of the newer systems need it enumerate multiple CPU's. On
some laptops ACPI is needed to control screen brightness (then there
are the laptops that report ACPI events for screen brightness *and*
change it via firmware).

Yeah, it's a really crappy design, APM was much simpler.
What about SMI? That's even scarier. Or the trusted computing stuff.
ACPI is typical over-design engaged in by large companies. IBM used to
be famous for it.



I'd doubt that the OP's problem is caused by ACPI though. The TLB on
x86 is mostly hardware maintained, the OS's sole responsibility is to
purge the TLB when it changes the page tables. He's getting a parity
error in the associative array, that's a hardware problem.


Agreed it looks like a hardware problem. But the fact it arises
almost exclusively at one code address is very suspicious. Some code
there seems to be triggering some hardware "sensitivity". Especially
since the OP did not have this problem prior to a known PSU fry-fest.

There have been recent changes to the kernel in this area --
perhaps a roll-back to an earlier kernel (that gave good service
on the hardware) would be a good test. Newer is not always better.


If I had to guess, it might be that the kernel uses large page
mappings for itself rather than the standard 4k page size.
Another possibility is only a particular bit pattern triggers the MC.

No one seems to be complaining on LKML about machine checks in the TLB
with recent kernels, and the fact that other hardware was damaged,
probably by overvoltage, would cause me to think it's hardware.

Jerry
  #82  
Old March 18th 10, 04:06 AM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/17/2010 1:08 PM PT, Robert Redelmeier typed:

wrote in part:
I was using the same Kernel 2.6.30 before and after the PSU
incident. I never had problems before, but started having
problems after. Unless something else like related kernel
updates (modules or whatever) started them.


This really points towards a hardware failure. As a general
rule, the modules are only updated when the kernel changes.
I suppose someone could try the MS approach of "device drivers"
on a more-or-less static kernel, but that historically has not been
the Linux approach. New kernels come out relatively frequently, so
it is not a big deal to wait an upgrade everything. Note this does
not apply for foreign modules (like nvidia), but you did not mention
upgrading -- or did you do something when you changed vidcard?


Interesting. I am still having difficulities reproducing the errors and
kernel panics outside of my Debian. I tried memtest86+ v4.00 three times
and KNOPPIX v6.2.1 CD so far, and nothing. I am going to try Ubuntu v9.1
i386 CD next, maybe this weekend when I don't need to use this box much.
--
"An ant hole may collapse an embankment." --Japanese
/\___/\
/ /\ /\ \ Phil./Ant @
http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #83  
Old March 18th 10, 11:39 AM posted to comp.sys.ibm.pc.hardware.chips
Ant
external usenet poster
 
Posts: 858
Default "TLB parity error in virtual array; TLB error 'instruction"?

I was using the same Kernel 2.6.30 before and after the PSU
incident. I never had problems before, but started having
problems after. Unless something else like related kernel
updates (modules or whatever) started them.


This really points towards a hardware failure. As a general
rule, the modules are only updated when the kernel changes.
I suppose someone could try the MS approach of "device drivers"
on a more-or-less static kernel, but that historically has not been
the Linux approach. New kernels come out relatively frequently, so
it is not a big deal to wait an upgrade everything. Note this does
not apply for foreign modules (like nvidia), but you did not mention
upgrading -- or did you do something when you changed vidcard?


Oops, I didn't answer your other question for foreign modules. I always
use the latest NVIDIA drivers (beta and stable) from NVIDIA.com. I
compile them. But I never had problems with them before issues came up.
I already saw kernel panics and errors without running X as well.

There's something else I noticed the last few days (this week so far)
that might be related?
Mar 14 21:11:53
Mar 16 05:41:16

/var/log/messages showed only two machine errors for this week so far.
Also, I haven't had kernel pancis for a while too, but then it is
probably because I manually rebooted a lot. I currently only have almost
three days of uptime and they usually come when I have about a week or
so.

The only thing different is the weather and temperatures are much
higher. My room has been about 80F degrees lately (yeah, too warm)
without the windows and fan opened.

Before this week since the issues started, it was much cooler (mid
60-70F degrees in my room). Remember how I said my issues usually come
up during idle times and not during stress times? I wonder if there is a
relationship with temperatures. I checked weather.com's calendar showing
past temperatures for my city, and they seem to match. It doesn't seem
like weather will be cold again for a while too since spring is here. I
am going to keep watching this pattern.
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
  #84  
Old March 18th 10, 12:33 PM posted to comp.sys.ibm.pc.hardware.chips,comp.os.linux.hardware
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

Weird. I just noticed this in my dmesg and have no idea if this is bad
or not:

[246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
I checked previous logs, and none of them have it so it might had been a
hiccup?

# cat /var/log/messages* |grep clocksource (all the way to 2/28/2010
6:47:02 AM PST)
Mar 5 06:41:19 foobar kernel: [ 0.241186] Switching to clocksource
jiffies
Mar 5 06:41:19 foobar kernel: [ 0.281777] Switching to clocksource
acpi_pm
Mar 5 21:05:19 foobar kernel: [ 0.241193] Switching to clocksource
jiffies
Mar 5 21:05:19 foobar kernel: [ 0.281790] Switching to clocksource
acpi_pm
Mar 7 07:30:45 foobar kernel: [ 0.241186] Switching to clocksource
jiffies
Mar 7 07:30:45 foobar kernel: [ 0.281778] Switching to clocksource
acpi_pm
Mar 8 07:43:15 foobar kernel: [ 0.241194] Switching to clocksource
jiffies
Mar 8 07:43:15 foobar kernel: [ 0.281782] Switching to clocksource
acpi_pm
Mar 11 00:29:19 foobar kernel: [ 0.240922] Switching to clocksource
jiffies
Mar 11 00:29:19 foobar kernel: [ 0.281516] Switching to clocksource
acpi_pm
Mar 12 05:45:36 foobar kernel: [ 0.237194] Switching to clocksource
jiffies
Mar 12 05:45:36 foobar kernel: [ 0.277790] Switching to clocksource
acpi_pm
Mar 12 23:57:13 foobar kernel: [ 0.241187] Switching to clocksource
jiffies
Mar 12 23:57:13 foobar kernel: [ 0.281779] Switching to clocksource
acpi_pm
Mar 15 00:32:48 foobar kernel: [ 0.237192] Switching to clocksource
jiffies
Mar 15 00:32:48 foobar kernel: [ 0.277782] Switching to clocksource
acpi_pm
Mar 15 01:16:00 foobar kernel: [ 0.237290] Switching to clocksource
jiffies
Mar 15 01:16:00 foobar kernel: [ 0.277886] Switching to clocksource
acpi_pm
Mar 15 08:25:09 foobar kernel: [ 0.242800] Switching to clocksource
jiffies
Mar 15 08:25:09 foobar kernel: [ 0.283406] Switching to clocksource
acpi_pm
Mar 15 08:31:58 foobar kernel: [ 0.242802] Switching to clocksource
jiffies
Mar 15 08:31:58 foobar kernel: [ 0.283405] Switching to clocksource
acpi_pm

I did a quick Google research and found
https://lists.ubuntu.com/archives/ub...ry/175828.html
with commands:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm

I don't know if this something to worry about or a new clue.
--
"An anthill increases by accumulation. / Medicine is consumed by
distribution. / That which is feared lessens by association. / This is
the thing to understand." --Siddha Nagarjuna
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #85  
Old March 18th 10, 04:03 PM posted to comp.sys.ibm.pc.hardware.chips
Yousuf Khan[_2_]
external usenet poster
 
Posts: 1,296
Default "TLB parity error in virtual array; TLB error 'instruction"?

Robert Redelmeier wrote:
Jerry Peters wrote in part:
Wrong, Linux implements the configuration features also. Some
machines, probably newer laptops, can't be configured without ACPI.


While I cannot say that _none_ of the 1000s of device modules use ACPI,
I can say that most do not need it. Not to say BIOS didn't use it.
I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did
it help when I couldn't get a device working -- something fairly
frequent under Linux, especially for wireless. Very frustrating when
`lspci` shows it. I presume some sort of device code IPL is required.

I have no problem squirting arbitrary bytes at known PCI addr[s], nor
do I imagine Linus does either, although Stallman might. But giving
execution over to foreign code in ring0 is a recipe for insecurity.
You wanna get Theo de Raadt even hotter under the collar?



I've found that ACPI has its tentacles into nearly everything these
days, not just power management. It's responsible for assigning IRQ's,
for example. Modern PC's have more than the traditional 15 IRQ's of the
older PC-AT's, and those extra IRQ's are only available to you if you
use the ACPI API. There are actually 100's of IRQ channels these days,
so you should never have to need to share IRQ's.

In fact, it was because of OS stupidity about ACPI which hastened my
exit from XP: I was suffering from constant system panics under XP,
because it was sharing IRQ's between devices that had nothing to with
each other. For example, it was sharing the same IRQ channel between my
1Gbps Ethernet, and my video card, as well as five other minor system
board functions. The same machine has had a dual-boot to Ubuntu Linux
for a long time, and I could see how Linux was able to properly assign
several dozen IRQ's using its implementation of ACPI, with barely any
sharing at all. Similarly, after I upgraded to Windows 7, all of those
panics went away, and you can see that it's using several dozen IRQ's
just like Linux does. So it looks like Windows XP's ACPI implementation
is fundamentally flawed, at least when it comes to IRQ assignments.

So I found that Linux was much better at using ACPI, than Windows XP. So
I don't think Linux is necessarily having any problems with ACPI here.

Yousuf Khan
  #86  
Old March 19th 10, 11:36 PM posted to comp.sys.ibm.pc.hardware.chips,comp.os.linux.hardware
Darren Salt
external usenet poster
 
Posts: 9
Default "TLB parity error in virtual array; TLB error 'instruction"?

I demand that Ant may or may not have written...

Weird. I just noticed this in my dmesg and have no idea if this is bad
or not:


[246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
I checked previous logs, and none of them have it so it might had been a
hiccup?


That's harmless.

--
| Darren Salt | linux at youmustbejoking | nr. Ashington, | Doon
| using Debian GNU/Linux | or ds ,demon,co,uk | Northumberland | Army
| + http://www.youmustbejoking.demon.co.uk/ & http://tartarus.org/ds/

Locutus 1-2-3 - a Borg spreadsheet program.
  #87  
Old March 20th 10, 12:28 AM posted to comp.sys.ibm.pc.hardware.chips,comp.os.linux.hardware
Ant
external usenet poster
 
Posts: 858
Default "TLB parity error in virtual array; TLB error 'instruction"?

Weird. I just noticed this in my dmesg and have no idea if this is bad
or not:


[246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
I checked previous logs, and none of them have it so it might had been a
hiccup?


That's harmless.


OK. Weird, still no new machine errors for the last two days and no
kernel panics. I am not going to try other tests (e.g., Ubuntu liveCD)
unless kernel panics occur again. It seems like temperature related now?
:/
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
  #88  
Old April 25th 10, 07:11 AM posted to comp.sys.ibm.pc.hardware.chips
Ant[_2_]
external usenet poster
 
Posts: 16
Default "TLB parity error in virtual array; TLB error 'instruction"?

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.


FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.
  #89  
Old April 25th 10, 01:37 PM posted to comp.sys.ibm.pc.hardware.chips
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 4/24/2010 11:11 PM PT, Ant typed:

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.


FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.


And another. Grr.
--
"Have I told you how much I like ants, huh? Especially fried in a subtle
blend of mech fluid and grated gears?" --Rampage to Inferno,
"Transmutate" in Transformers (Beast Wars)
/\___/\ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
/ /\ /\ \ Ant's Quality Foraged Links: http://aqfl.net
| |o o| |
\ _ / If crediting, then use Ant nickname and AQFL URL/link.
( ) If e-mailing, then axe ANT from its address if needed.
Ant is currently not listening to any songs on this computer.
  #90  
Old April 26th 10, 04:10 PM posted to comp.sys.ibm.pc.hardware.chips
Yousuf Khan[_2_]
external usenet poster
 
Posts: 1,296
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:
On 4/24/2010 11:11 PM PT, Ant typed:

The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8.


FYI. For the first time, I got a kernel panic when I was my computer.
Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not
tied to idled times then.


And another. Grr.


It's probably getting worse. Might be time to think about replacement.

Yousuf Khan
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"TLB parity error in virtual array; TLB error 'instruction"? Ant[_3_] AMD x86-64 Processors 8 March 13th 10 04:32 PM
"Parity Error Detected" message when running Intel Storage Console. Brcobrem Storage (alternative) 1 November 18th 09 08:49 PM
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) markm75 Printers 2 August 19th 07 02:04 AM
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual Lady Margaret Thatcher Printers 5 May 4th 06 04:51 AM
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? ByTor AMD x86-64 Processors 5 January 13th 06 06:50 PM


All times are GMT +1. The time now is 06:29 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.