If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#81
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Robert Redelmeier wrote:
Jerry Peters wrote in part: Wrong, Linux implements the configuration features also. Some machines, probably newer laptops, can't be configured without ACPI. While I cannot say that _none_ of the 1000s of device modules use ACPI, I can say that most do not need it. Not to say BIOS didn't use it. I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did it help when I couldn't get a device working -- something fairly frequent under Linux, especially for wireless. Very frustrating when `lspci` shows it. I presume some sort of device code IPL is required. I have no problem squirting arbitrary bytes at known PCI addr[s], nor do I imagine Linus does either, although Stallman might. But giving execution over to foreign code in ring0 is a recipe for insecurity. You wanna get Theo de Raadt even hotter under the collar? IIRC some of the newer systems need it enumerate multiple CPU's. On some laptops ACPI is needed to control screen brightness (then there are the laptops that report ACPI events for screen brightness *and* change it via firmware). Yeah, it's a really crappy design, APM was much simpler. What about SMI? That's even scarier. Or the trusted computing stuff. ACPI is typical over-design engaged in by large companies. IBM used to be famous for it. I'd doubt that the OP's problem is caused by ACPI though. The TLB on x86 is mostly hardware maintained, the OS's sole responsibility is to purge the TLB when it changes the page tables. He's getting a parity error in the associative array, that's a hardware problem. Agreed it looks like a hardware problem. But the fact it arises almost exclusively at one code address is very suspicious. Some code there seems to be triggering some hardware "sensitivity". Especially since the OP did not have this problem prior to a known PSU fry-fest. There have been recent changes to the kernel in this area -- perhaps a roll-back to an earlier kernel (that gave good service on the hardware) would be a good test. Newer is not always better. If I had to guess, it might be that the kernel uses large page mappings for itself rather than the standard 4k page size. Another possibility is only a particular bit pattern triggers the MC. No one seems to be complaining on LKML about machine checks in the TLB with recent kernels, and the fact that other hardware was damaged, probably by overvoltage, would cause me to think it's hardware. Jerry |
#83
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
I was using the same Kernel 2.6.30 before and after the PSU
incident. I never had problems before, but started having problems after. Unless something else like related kernel updates (modules or whatever) started them. This really points towards a hardware failure. As a general rule, the modules are only updated when the kernel changes. I suppose someone could try the MS approach of "device drivers" on a more-or-less static kernel, but that historically has not been the Linux approach. New kernels come out relatively frequently, so it is not a big deal to wait an upgrade everything. Note this does not apply for foreign modules (like nvidia), but you did not mention upgrading -- or did you do something when you changed vidcard? Oops, I didn't answer your other question for foreign modules. I always use the latest NVIDIA drivers (beta and stable) from NVIDIA.com. I compile them. But I never had problems with them before issues came up. I already saw kernel panics and errors without running X as well. There's something else I noticed the last few days (this week so far) that might be related? Mar 14 21:11:53 Mar 16 05:41:16 /var/log/messages showed only two machine errors for this week so far. Also, I haven't had kernel pancis for a while too, but then it is probably because I manually rebooted a lot. I currently only have almost three days of uptime and they usually come when I have about a week or so. The only thing different is the weather and temperatures are much higher. My room has been about 80F degrees lately (yeah, too warm) without the windows and fan opened. Before this week since the issues started, it was much cooler (mid 60-70F degrees in my room). Remember how I said my issues usually come up during idle times and not during stress times? I wonder if there is a relationship with temperatures. I checked weather.com's calendar showing past temperatures for my city, and they seem to match. It doesn't seem like weather will be cold again for a while too since spring is here. I am going to keep watching this pattern. -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( ) |
#84
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Weird. I just noticed this in my dmesg and have no idea if this is bad
or not: [246348.660025] Clocksource tsc unstable (delta = -62500120 ns) I checked previous logs, and none of them have it so it might had been a hiccup? # cat /var/log/messages* |grep clocksource (all the way to 2/28/2010 6:47:02 AM PST) Mar 5 06:41:19 foobar kernel: [ 0.241186] Switching to clocksource jiffies Mar 5 06:41:19 foobar kernel: [ 0.281777] Switching to clocksource acpi_pm Mar 5 21:05:19 foobar kernel: [ 0.241193] Switching to clocksource jiffies Mar 5 21:05:19 foobar kernel: [ 0.281790] Switching to clocksource acpi_pm Mar 7 07:30:45 foobar kernel: [ 0.241186] Switching to clocksource jiffies Mar 7 07:30:45 foobar kernel: [ 0.281778] Switching to clocksource acpi_pm Mar 8 07:43:15 foobar kernel: [ 0.241194] Switching to clocksource jiffies Mar 8 07:43:15 foobar kernel: [ 0.281782] Switching to clocksource acpi_pm Mar 11 00:29:19 foobar kernel: [ 0.240922] Switching to clocksource jiffies Mar 11 00:29:19 foobar kernel: [ 0.281516] Switching to clocksource acpi_pm Mar 12 05:45:36 foobar kernel: [ 0.237194] Switching to clocksource jiffies Mar 12 05:45:36 foobar kernel: [ 0.277790] Switching to clocksource acpi_pm Mar 12 23:57:13 foobar kernel: [ 0.241187] Switching to clocksource jiffies Mar 12 23:57:13 foobar kernel: [ 0.281779] Switching to clocksource acpi_pm Mar 15 00:32:48 foobar kernel: [ 0.237192] Switching to clocksource jiffies Mar 15 00:32:48 foobar kernel: [ 0.277782] Switching to clocksource acpi_pm Mar 15 01:16:00 foobar kernel: [ 0.237290] Switching to clocksource jiffies Mar 15 01:16:00 foobar kernel: [ 0.277886] Switching to clocksource acpi_pm Mar 15 08:25:09 foobar kernel: [ 0.242800] Switching to clocksource jiffies Mar 15 08:25:09 foobar kernel: [ 0.283406] Switching to clocksource acpi_pm Mar 15 08:31:58 foobar kernel: [ 0.242802] Switching to clocksource jiffies Mar 15 08:31:58 foobar kernel: [ 0.283405] Switching to clocksource acpi_pm I did a quick Google research and found https://lists.ubuntu.com/archives/ub...ry/175828.html with commands: # cat /sys/devices/system/clocksource/clocksource0/current_clocksource acpi_pm # cat /sys/devices/system/clocksource/clocksource0/available_clocksource acpi_pm I don't know if this something to worry about or a new clue. -- "An anthill increases by accumulation. / Medicine is consumed by distribution. / That which is feared lessens by association. / This is the thing to understand." --Siddha Nagarjuna /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: NT ( ) or Ant is currently not listening to any songs on his home computer. |
#85
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Robert Redelmeier wrote:
Jerry Peters wrote in part: Wrong, Linux implements the configuration features also. Some machines, probably newer laptops, can't be configured without ACPI. While I cannot say that _none_ of the 1000s of device modules use ACPI, I can say that most do not need it. Not to say BIOS didn't use it. I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did it help when I couldn't get a device working -- something fairly frequent under Linux, especially for wireless. Very frustrating when `lspci` shows it. I presume some sort of device code IPL is required. I have no problem squirting arbitrary bytes at known PCI addr[s], nor do I imagine Linus does either, although Stallman might. But giving execution over to foreign code in ring0 is a recipe for insecurity. You wanna get Theo de Raadt even hotter under the collar? I've found that ACPI has its tentacles into nearly everything these days, not just power management. It's responsible for assigning IRQ's, for example. Modern PC's have more than the traditional 15 IRQ's of the older PC-AT's, and those extra IRQ's are only available to you if you use the ACPI API. There are actually 100's of IRQ channels these days, so you should never have to need to share IRQ's. In fact, it was because of OS stupidity about ACPI which hastened my exit from XP: I was suffering from constant system panics under XP, because it was sharing IRQ's between devices that had nothing to with each other. For example, it was sharing the same IRQ channel between my 1Gbps Ethernet, and my video card, as well as five other minor system board functions. The same machine has had a dual-boot to Ubuntu Linux for a long time, and I could see how Linux was able to properly assign several dozen IRQ's using its implementation of ACPI, with barely any sharing at all. Similarly, after I upgraded to Windows 7, all of those panics went away, and you can see that it's using several dozen IRQ's just like Linux does. So it looks like Windows XP's ACPI implementation is fundamentally flawed, at least when it comes to IRQ assignments. So I found that Linux was much better at using ACPI, than Windows XP. So I don't think Linux is necessarily having any problems with ACPI here. Yousuf Khan |
#86
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
I demand that Ant may or may not have written...
Weird. I just noticed this in my dmesg and have no idea if this is bad or not: [246348.660025] Clocksource tsc unstable (delta = -62500120 ns) I checked previous logs, and none of them have it so it might had been a hiccup? That's harmless. -- | Darren Salt | linux at youmustbejoking | nr. Ashington, | Doon | using Debian GNU/Linux | or ds ,demon,co,uk | Northumberland | Army | + http://www.youmustbejoking.demon.co.uk/ & http://tartarus.org/ds/ Locutus 1-2-3 - a Borg spreadsheet program. |
#87
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Weird. I just noticed this in my dmesg and have no idea if this is bad
or not: [246348.660025] Clocksource tsc unstable (delta = -62500120 ns) I checked previous logs, and none of them have it so it might had been a hiccup? That's harmless. OK. Weird, still no new machine errors for the last two days and no kernel panics. I am not going to try other tests (e.g., Ubuntu liveCD) unless kernel panics occur again. It seems like temperature related now? :/ -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( ) |
#88
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
The crashes seem to happen during idled time. I do
not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. |
#89
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
On 4/24/2010 11:11 PM PT, Ant typed:
The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. And another. Grr. -- "Have I told you how much I like ants, huh? Especially fried in a subtle blend of mech fluid and grated gears?" --Rampage to Inferno, "Transmutate" in Transformers (Beast Wars) /\___/\ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) / /\ /\ \ Ant's Quality Foraged Links: http://aqfl.net | |o o| | \ _ / If crediting, then use Ant nickname and AQFL URL/link. ( ) If e-mailing, then axe ANT from its address if needed. Ant is currently not listening to any songs on this computer. |
#90
|
|||
|
|||
"TLB parity error in virtual array; TLB error 'instruction"?
Ant wrote:
On 4/24/2010 11:11 PM PT, Ant typed: The crashes seem to happen during idled time. I do not use AMD's Cool'n' Quiet and PowerNow-K8. FYI. For the first time, I got a kernel panic when I was my computer. Mostly, surfing the Web in Mozilla's SeaMonkey v2.0.4. So, it is not tied to idled times then. And another. Grr. It's probably getting worse. Might be time to think about replacement. Yousuf Khan |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"TLB parity error in virtual array; TLB error 'instruction"? | Ant[_3_] | AMD x86-64 Processors | 8 | March 13th 10 04:32 PM |
"Parity Error Detected" message when running Intel Storage Console. | Brcobrem | Storage (alternative) | 1 | November 18th 09 08:49 PM |
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) | markm75 | Printers | 2 | August 19th 07 02:04 AM |
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual | Lady Margaret Thatcher | Printers | 5 | May 4th 06 04:51 AM |
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? | ByTor | AMD x86-64 Processors | 5 | January 13th 06 06:50 PM |