on Apr 18, 2007
A bit of code-blogging.
I spent some of yesterday afternoon debugging a very mundane bit of code. Nothing special. A bunch of memory space was getting trashed, as poorly managed memory is wont to do. I picked through the code and found some appaling bugs. (Which I had written.) The full code was a couple of pages long, but the thing boiled down to this:
void DoSomething (void *thing_in)
thing1 = malloc (sizeof (*thing_in));
//Do some stuff with thing_in and thing1.
thing2 = malloc (thing1, some_new_size);
//Do some other, unrelated stuff. Then...
memcpy (thing2, thing1, sizeof (*thing1));
//now that i just got done copying a free'd block of memory to thing2,
//I'll do some more stuff with thing2.
Now, the actual code had all sorts of crazy stuff going on between these steps that obfuscated the mistake, but at the heart of it this is what I was doing: I was free()ing blocks of memory more than once, and copying stuff from recently unallocated blocks of memory. Now, this in and of itself is not very remarkable. If you lose your head you can easily make a blunder like this. The thing is, it should result in a crash. What I’m doing above is programatic sepuku.
But in this case the program did not crash. At least, not every time and certainly not right away. Not only did it not instantly die when it executed this code, but it more or less did what it was supposed to, or failed in innocuous ways that weren’t readily apparent. It called this function several dozen times a day, and yet the program ran on like this for a span of nine months without anyone noticing something was seriously amiss.
There was another program that would restart this one in the event that it crashed. I don’t know how often that happened. I’m afraid to look. I’m terrified that I’ll discover my program ran like this for all that time, like a man going on and living his life without noticing that a couple of his major internal organs have been removed. This is not supposed to be possible.
The whole thing gave me the willies, really.
LATER: Edited the example to better show what the problem was.