Skip navigation.

I need help about mcelog - Machine Check Exception

, , ,

I'm having a few Machine Check Exceptions, and I have no idea what causes this. Could anyone help me and shed some light on this subject?

My system is Asus M51Sn notebook, which has Intel Core 2 Duo T5550 CPU, 3GB of RAM and GeForce 9500M GS videocard. It runs Gentoo Linux amd64 (x86_64).

Over time, I've noticed that a few Machine Check Exceptions have been recorded by the system. Unfortunately, I have no idea about what they mean, and about what caused them. What's more: they happen about every month or so, but I never notice any side-effect of them. This makes me more scared, because if I see something broken, I can fix it as soon as possible. However, if everything seems to go well, I'm never sure if there is something broken under the hood, and that will get worse as time goes on.

Here is the full /var/log/mcelog (you need to install app-admin/mcelog in order to log MCE to that file):

2009-06-05 03:10:15 BRT
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 128 TSC 2d7ca16e5a7a 
STATUS 880901c0 MCGSTATUS 0
2009-06-22 03:10:10 BRT
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 128 TSC 1d61709dba6e 
STATUS 880901c0 MCGSTATUS 0
2009-07-20 22:40:09 BRT
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 128 TSC 1e03f8a51387 
STATUS 880b0100 MCGSTATUS 0
2009-08-13 03:10:18 BRT
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 128 TSC 2cc70c476bc5 
STATUS 880c0100 MCGSTATUS 0

Note, however, that MCEs are logged via crontab that runs daily (I've just changed it to run hourly) and thus there might have been MCEs that weren't logged, and also the date/time that is written to the log is the time when the cronjob ran, and not exactly the time when the MCE happened.

The last MCE logged in /var/log/mcelog is from the last day. I can't know exactly when it happened, but I know that I've updated my Gentoo this night. I went to bed while leaving the notebook running python-updater, which in turn re-emerged (and, thus, re-compiled) lots of packages. I know the CPU usage when to maximum and the temperature got very hot, because the fan noise was pretty loud. Then, today I found this at dmesg:

[114637.131021] CPU1: Temperature/speed normal
[114900.326042] Machine check events logged

So, my guess is that the Machine Check Exception I got was about CPU over-heating. I have no idea if this is true, this is just a guess.

By the way, my desktop machine (which is AMD 64 Sempron LE-1200) has no entries in /var/log/mcelog, so I assume there were no MCEs on my desktop (at least not yet).

If you have relevant info about MCE, please post below in the comments! I would like to know what causes MCEs on this machine, what effects MCE has on the whole system, and if possible how to avoid them.

How Google/Firefox Geolocation API worksI have suspend/hibernate, and also a battery monitor

Comments

anzah 17. August 2009, 01:01

Looks quite cryptic. Other way to reproduce the problem might be to run some memory and CPU tests. Too bad it's unlikely that it will pinpoint exactly where the problem is as CPU, memory, motherboard and power supply work quite closely together. There's some tools listed at http://ultimatebootcd.com/

CrazyTerabyte 17. August 2009, 01:23

I've already ran memtest86+ at least once, and no error was found.

anzah 17. August 2009, 01:38

Sometimes it could take few memtest runs before there's errors, error is complaining about CPU though. It never hurt though to make sure that all components work as they should.

Mersenne Prime could be good for the CPU testing, it wasn't simplest one to get running though.

It should be possible to compute list of known primes to check if results keep coming correct. That should also heat the CPU.

If it's cause by overheating, some motherboards have option to raise an alarm when the heat goes up certain level. Most likely it's on by default. In that case you would have noticed if there was overheating problem.

I noticed that such feature existed with computer which had all fans except one in power supply removed.

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies

December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31