on Dec 18, 2012
It was the strangest thing. My computer would just lock up. Now, locking up isn’t an exotic problem by any stretch, but I’ve never had a machine lock up in this way. It begins with me alt-tabbing over to a window I haven’t used in a while, and the window just flat-out refuses to wake back up. Other than the one zonked window, everything seems fine, but the machine is actually in a death spiral now and there’s nothing I can do to save it. I can wait, or I can click on another window, or hit alt-Tab again. It doesn’t matter. In about ten seconds the mouse cursor will stop moving, and few seconds later the sound will begin stuttering and the machine will be borked.
I can use the computer for hours without problems. I can even play games, which taxes basically every part of the machine. No problem. But if I walk away for twenty minutes it will be dead when I get back. Note that this is the inverse of how problems usually manifest. Usually the machine will fail when it’s being pushed, not when it’s idle.
Once the machine dies, it seems to recover incrementally. The first boot attempt will stall while the BIOS is still getting things going. Then I reboot again and I’ll get to the boot loader where I can pick which operating system to use. (Windows 7 or Ubuntu, the latter being installed mostly as a novelty.) If I boot into Win 7, then it will stall on the black logo screen. I’ll reboot again and I’ll get all the way to the blue logo screen, or perhaps get a brief glimpse of the desktop before it dies again. But after N attempts, it boots fine and the machine seems normal again. Once I successfully boot, the machine acts like nothing was wrong.
This problem leads to weeks of bafflement and confusion. It’s a problem with Win 7! My graphics card is overheating! The power supply is dying! The memory is going bad! I’ve seen a lot of sick machines in my day, but I’ve never seen one exhibit these symptoms in this pattern. I scan the hard drive, I test the memory, I re-install drivers. Everything seems fine. I’d blame Windows 7 (just because Ubuntu seems to work and I’m out of hardware to blame) but this doesn’t feel like a software problem.
There really is no upper limit on the number of random tests and guesses you can make, so usually I noodle around until I get bored with the problem and go back to ignoring it. At some point I begin to suspect the machine is simply haunted.
The Cause Revealed
On Saturday morning I get up to find the machine won’t start at all. According to Win 7, files are “missing”. Booting into Ubuntu, I see a couple of hard drives are gone.
Ah! Now I get it.
Okay, this machine is a Frankenstein’s monster of new and old parts. When I got the machine, I stuck an old hard drive into it. The machine began with a 2TB SATA HD, but I dropped in an IDE drive from the previous machine. The IDE drive is naturally slower and smaller, but more space is always good and putting the drive into the machine was easier than trying to copy the 500GB of data it contained.
So, my machine has 3 drives:
HD1: The big fast drive that came with the machine
HD2: A modestly-sized IDE drive
HD3: DVD drive
However, BOTH hard drives are split into two different logical drives, purely for organizational purposes. When I installed Windows 7, I put it onto HD#2. It then proceeded to re-letter all the drives so that it would still appear on C:. So my drive list was something like:
C: First partition of HD #2. This was JUST Windows 7.
D: First partition of HD#1. This was where Ubuntu and the Win7 boot loader lived. This is where I kept all my projects. (Coding, comics, stuff for the website. Anything that might be called work with a straight face.)
E: Second partition of HD#1. My Games drive, which is basically “Steam”, plus a few other things.
F: Second partition of HD#2. My archives.
G: The DVD drive.
I forgot all about this, to the point where I couldn’t remember how many physical drives I had in the machine, or how they were divided up. I’d totally forgotten that Windows 7 had shuffled all the drive letters around like this.
You might think it’s sort of daft to forget how my machine is mapped out, but keep in mind I’ve been at this for a long time. I’ve rebuilt, bought, and re-purposed so many machines over the years that after a while it all starts to blur together.
Now that the thing has failed, I’ve gone over the symptoms in my head and this is what I’ve come up with: The drive was going to sleep, and then not waking up again. Drives spin down after some period of inactivity, and this one was spinning down and staying down. This explains the strange behavior where the problem only manifested when the machine was idle. While I was playing (say) Borderlands 2, the game was auto-saving every couple of minutes or so. This kept the drive awake. Then I’d walk away for N minutes, and the drive would quit. If I was reading a webpage, this problem might not manifest immediately. I might come back and continue reading, and the dead drive wouldn’t become a problem until I tried to access something that had been paged out of memory. At that point the OS itself would die, thus preventing me from seeing that a drive had vanished.
The drive that was disappearing was the one where Windows 7 lived, so the OS itself would die without even managing a “OH MY GOODNESS THE C: DRIVE IS GONE AND THE WORLD IS ENDING HEEEEELP!” popup to let me know what went wrong.
I didn’t suspect this was a hard drive problem because I’ve had HD problems in the past, and they never looked like this. Normally my HD problems take one of two forms:
- Gradually accumulating read/write errors. Bad sectors and data corruption alert you to the problem before the drive fails.
- Instant and total hardware death.
This intermittent drive availability and screwy boot-up problems (still not sure of the mechanics of how that worked) confused me and prevented me from seeing the root cause. All I could see was the triggers, and I couldn’t make sense of them.
So it’s Saturday. My plan was to spend the day playing Far Cry 3. Now the machine is dead. Hm. What did I lose? Let’s go over the important stuff:
- My novel should be fine. I’m pretty sure it’s saved to Google drive. Also, I sent a copy to my wife the last time I worked on it.
- I lost my Windows 7 install. Big deal. Easily replaced. Installing the OS to its own drive is an excellent policy and one I probably should have adopted ages ago.
- I lost all of my save game data that end up saved in /Documents and Settings/Username/User’s Data Ghetto/, after which the files could be in any one of a thousand places because it’s chaos down in there. Some games are stored under /Application Data/Gamename. Others are in /Games/Gamename, or /Publisher/Gamename, or /Savegames/Gamename, or /GFWL/Gamename, or just /Gamename. I don’t bother backing that stuff up because it’s such an incomprehensible mess. It’s annoying to lose this stuff, but it’s not a huge deal and I’d rather lose my save games every couple of years than waste my life trying to make and restore proper backups in this sea of disorganized data. I’d rather alphabetize the Window’s registry. Some of this stuff might end up in the Steam cloud, and I’m not too worried about the rest.
Moreover, I was between games. I’d just finished Borderlands 2 and was about to begin Far Cry 3. I also lost my Minecraft worlds, but since I play on hardcore I’m used to throwing worlds away and starting new ones all the time.
- The projects drive is fine. Moreover, I do try to keep that thing backed up. In any case, anything really valuable is under source control and stored remotely.
- I lost my archives. Hm.
That last one is… kind of a relief. See, my archives were OLD. The contents of that directory go back more than a decade. Desktop wall papers, MP3 files, old game footage, the source files for Windows Movie Maker projects, and rare game mods and patches that used to be hard to find but are now trivial. (For example, the intro movie to System Shock. Back in the day it took AGES to find that sucker and even longer to download it. Now I can find it on YouTube faster than I can find it on my own hard drive.)
I don’t even know what else was in the archives. I know they were too big to reasonably back up to DVD’s. The thing is, I never used any of it. Ever. I saved stuff in there all the time, but I never went back for it. When I download a new desktop wallpaper, I don’t look in the archives. I searched the net and then dropped the new file into the archives. (A lot of the wallpapers were 1024×768, relics from the days when I used a 4:3 monitor.) I’ve been dragging those files around for years, simply because it was too much trouble to dig through them and see what was still relevant.
I guess I lost some of my MP3’s. Most of them were ripped from CD’s I own, and I can just rip them again. The rest I bought from eMusic. Eh. I probably should feel bad about losing music I bought, but… I haven’t listened to any of that stuff in over two years, and in fact I can’t even think of a single track that I’ll miss. I was mostly using emusic as a way to discover new music, and I have Pandora, Spottify, and Grooveshark for that now.
So, this hard drive failure didn’t really destroy anything valuable. It only managed to delete a bunch of files I didn’t need but didn’t have the guts to erase. The archives were actually kind of a burden. At hundreds of gigabytes, they were too big to back up in a meaningful way. They were just this giant glob of useless crap that I had to shuffle around and worry about from time to time.
I didn’t lose any important data. However, I did lose a drive. I don’t want to put Windows 7 on the same drive as my games, and I don’t have enough room on the projects drive. If I want to fix this I’ll need to boot into Ubuntu and spend a few hours clearing space and moving files so that the ginormous Windows 7 can fit.
I’ve been really happy with Windows 7 so far. It really is a great OS. Aside from the way they deprecated / hid the quicklaunch bar, it’s basically everything I loved about XP and nothing I hated about Vista. But it is going to be a chore to get it rolling again.
For some reason, I decide to install Linux. My wife just jumped from Ubuntu to Mint, and my 11 year old son just moved from Vista to Linux Mint, and both of them are really happy with it. I’m going to give it a try. Installing Linux only takes a couple of minutes and if it doesn’t work out I can always re-install Win 7.
I’ll let you know how the Linux adventure is going later in the week.
Shamus Young is an old-school OpenGL programmer, author, and composer. He runs this site and if anything is broken you should probably blame him.