SIG11 Problem

I was experiencing random segmentation faults while running programs under Linux. I was encoding some OGG files and oggenc kept crashing from it. Suspicious, I tried compiling the kernel source, and sure enough, segmentation fault again.

Also known as signal 11, sigseg, or segfault, this signal from the kernel means that a program used a pointer to reference invalid memory. Usually it means a buggy program, but when it starts happening system-wide, there is more likely a hardware problem. Most programs will run without a problem, but code that uses a lot of computation and pointers (like encoders and compilers) are bound to get the signal quickly.

I immediately blamed my memory chip, a single 512M SDRAM. There are a couple of unique programs for Linux that help with bad memory. First, Memtest86 can run patterns across all memory and monitor for bad addresses. Next, the Bad RAM patch for the Linux kernel will allocate the bad memory addresses at boot time, making sure they are never used by running code.

But after running Memtest86 for hours, it didn’t find anything wrong with memory. Then I tried restricting Linux to only 64M (using the “mem=64″ kernel parameter), but I still got SIG11. I thought the odds of bad memory being in the first 64M seemed unlikely, so I started looking around for another source of the problem.

On the PC Health page in the BIOS, I noticed my processor temperature running at 105 degrees Celsius! Even an AMD Althon shouldn’t run that hot. After examining it on the motherboard, everything looked okay. But the fan seemed a little too dusty, so I cleaned it out. I pulled up the BIOS page again, and the processor had evened out at 73 C. I booted up Linux, ran some heavy processing, and SIG11 had disappeared! I’ll never underestimate the power of dust again.

2 Responses to “SIG11 Problem”

  1. Rob Stevenson Says:

    What did you use to clean out your computer? I’ve always heard that using a regular vacuum cleaner’s attachments was bad since there was a lot of static electricity on the ends of the hose/attachments. Do you have one of those mini-vacuums for keyboards and the like? Or a can of compressed air? Those things always make me feel like you’ve just covered everything around you with the dust that was just right in front of you.

    I think my PC is in desperate need of a cleaning … I’m thinking I could get into the Top 10 of Supercomputers if I ever cleaned out the inside of my box! :-)

  2. Eric Says:

    I just took the fan off and blew through it, which sent dust flying everywhere as you said. I probably was supposed to use new thermal grease or something when I reattached the fan, but it’s not overheating yet.

    I must be hard on my equipment because something is always failing or breaking. Heat is the biggest culprit. After I lost yet another harddrive, I finally bought removable harddrive kits for my drives that are aluminum and have built-in fans.

Leave a Reply

You must be logged in to post a comment.