Coding Style Part 2

By Shamus Posted Friday Jan 18, 2013

Filed under: Programming 146 comments

splash_frustrated.jpg

As I mentioned in the last post, I read the Office document that describes the internal coding conventions of id Software, and I thought I’d go over it. This will be pretty familiar stuff for coders, but if you don’t program this might give you a glimpse of how strange and fussy this discipline can get.

I’m going go go through the standards guide and offer my own comments / explanations on why I think they’re interesting or important. The stuff in bold is from id, the rest is from me.

“Use real tabs that equal 4 spaces.”

Ah. The old “tabs vs. spaces” holy war. This one is probably as old as C itself, and may even pre-date it by reaching back into older languages. In C++, you’re expected to indent your code:

1
2
3
4
5
6
7
if (a < = b) {
    a = a + 2;
    if (b < a) {
      b++;
      a = 0;
    }
}

When you indent code, does hitting the TAB key insert a single tab character that moves the cursor to the next tab stop, or does it insert the number of spaces required to reach the next tab stop? You can set it to work either way, but you had better make sure you’re on the same page as the other coders. Let’s say you’ve got tab stops set to 4 spaces. If you’re using actual spaces, then internally your file looks like:

1
2
3
4
5
6
7
if (a < = b) {
....a = a + 2;
....if (b < a) {
........b++;
........a = 0;
....}
}

While tabs produce a file like this:

1
2
3
4
5
6
7
if (a < = b) {
.a = a + 2;
.if (b < a) {
..b++;
..aa = 0;
.}
}

See, compilers are primitive command-line programs. Sure, you might be writing code in a fancy windowed environment, but when you hit compile your source code is handed off to a text parser that’s blind to your decadent GUI interface. It has no idea what your tab stops are set to and it doesn’t care. It just counts whitespace characters, and as far as it’s concerned spaces, tabs, and all other non-printing characters look the same. So when it reports that it sees an error on line 5, column 3, (Variable ‘aa’ is undefined.) it doesn’t realize that for YOU the problem appears to be in column 12. I’ve never personally run into this problem, but I’ve read people complaining about it. I’m sure it all depends on what compiler / editor combination you’re using.

So, spaces must be the way to go, right? Except…

Spaces are fixed. If we use tabs then different coders can adjust things to suit their own preferences. I can set my tab stops to a sprawling eight. It will eat up a ton of horizontal space, but it will make the formatting very clear. Perhaps you will set your tab stops to a miserly two. You’ll have lots of information on screen at once, but you might need to squint a bit to follow the code when things get complicated. The point is, if we use tabs instead of spaces then we can each see the code the way we want to see it. You can even change how its displayed while you’re looking at it if you need more room or clarity.

Then again, if you’ve formatted something specifically using a given tabstop arrangement, it might fly apart under a different one. These diagrams:

/*-----------------------------------------------------------------------------
                          North                 N    
    *-------*           *---+---*           *---*---*     *---+---*
    |\      |           |\     /|           |\Nl|Nr/|     |   |   |
    | \ Sup |           | \   / |           | \ | / |     | A | B |
    |  \    |           |  \ /  |           |Wr\|/El|     |   |   |
    |   \   |       West+   *   +East      W*---*---*E    *---+---*   
    |    \  |           |  / \  |           |Wl/|\Er|     |   |   |
    | Inf \ |           | /   \ |           | / | \ |     | C | D |
    |      \|           |/     \|           |/Sr|Sl\|     |   |   |
    *-------*           *---+---*           *---*---*     *---*---*
                          South                 S      
-----------------------------------------------------------------------------*/

…would collapse into a soup of random characters if they were built with tabs and someone viewed them with the wrong settings. In a less outlandish example, doing non-leading formatting like this:

int foo            =10;
float foobar       =10.0f;
char foostring     ="10";

…with tabs will lead to tears and confusion when someone looks at it with different-sized tabs.

Then again, you could always use tabs when formatting code and spaces when drawing ascii diagrams or arranging stuff into columns. Then again AGAIN, mixing tabs and spaces is a great way to drive someone mad. When you traverse over whitespace using the arrow keys, you do not want to be in a situation where you don’t know if you’re passing over spaces or tabs. Inserting or deleting spaces mixed with tabs can feel jumpy and random in a way that leads to typos and swearing.

You’re free to argue the merits of tabs over spaces, but there’s no disputing that the worst thing to do is mix them together.

codingstyle1.jpg

Oddly enough, I always use spaces even though I think tabs are better. I spent a decade working in a Thou Shalt Not Tabbify environment, and now when I traverse tabbed code the cursor-jumps make my eye twitch. Like learning to play an FPS with mouse inverted, it would have been better if I’d learned the other way but changing now would be prohibitively difficult. I’d stumble through tabs if it was part of earning a paycheck, but if I’m working alone on my own projects I’d just as soon be comfortable.

A non-programmer might ask, innocently enough, “Why don’t you just convert the tabs to spaces? One person uses spaces, then you can change them to tabs before you use the file. It should be easy to write a program to do it for you.”

Indeed this is easy. In fact, I think most environments have this sort of thing built in. In some environments it may even have a handy keyboard shortcut. Just select all and click “auto-format” or “tabbify” or whatever. The problem here is that we use revision control (or whatever the kids are calling it these days) to manage changes to all of these multi-author text files. To make a change, you “check out” a file, similar to checking a book out of the library. You make changes to the file, and then you check it back in*. The system will then offer the other coders a nice summary of what changed. They can see, line-for-line, what was added, removed, or altered. However, if you re-tab an entire document, then every indented line of the file will be different, which will make it appear as though you re-wrote the whole thing from scratch. The resulting chaos would be worse than the problem you were trying to solve.

* Note to coders: Do NOT nitpick me with merged changes, forks, branches, and other complexities. I’m just trying to throw a life preserver to the non-coders, and I don’t need you weighing it down with a cinderblock of source management theory.

“The else statement starts on the same line as the last closing brace.”

The guide is talking about doing this:

1
2
3
4
5
if (a < b) {
    biggest = b;
} else {
    biggest = a;
}

Instead of this:

1
2
3
4
5
6
7
if (a < b) {
    biggest = b;
} 
else 
{
    biggest = a;
}

I have no idea what madman decided the second was a good idea. In a complex block of code, this can throw away a ton of screenspace, and I don’t think it improves readability at all. This seems to be a page from the “things are less confusing if there’s less information on the screen” school of thought. This is an understandable sentiment in certain situations where you might find yourself daunted by walls of unbroken code, but let’s not build a coding convention to make all else statements take three whole lines of code for no reason. Spacing code out too much means you can’t see very much at any given time, which results in tunnel vision.

I’ve always suspected the three-line else (like other screen-devouring conventions) was invented by people who got paid by the line of code. According to programming lore, there was a time when managers would measure programmer output by how many lines of code they’d written. This was ostensibly a real thing done by human beings who were at least smart enough to operate a necktie. The quote “Measuring programming progress by lines of code is like measuring aircraft building progress by weight,” is attributed to Bill Gates. This was back in the days when Microsoft was a smallish company and they were making software for IBM, who purportedly liked to measure progress in this way.

“Pad parenthesized expressions with spaces”

We’re talking about this:

1
2
3
if ( a < b ) {
  DoSomething ( a );
}

Instead of this:

1
2
3
if (a < b) {
  DoSomething (a);
}

This is the first one that I’m really not crazy about, and given the type of work they do I’m kind of surprised id Software went this way. When you do a lot of 3D programming, you end up with a LOT of stuff in complex nested parenthetical expressions. x = (x-(y-(z*2))) Parenthesis usually get a space before the opening and after the ending, since you’re trying to isolate the stuff IN the parens from the stuff OUT of them. By adding another space on the inside, you’re basically forcing every open paren to take 3 characters and every close paren to take another 3. That’s a lot of horizontal space to spend, and I don’t see how it improves readability.

I admit I’m straying into really wishy-washy subjective stuff here, but for me I think of parens as little capsules that enclose and isolate their contents. To take an extreme case:

This is dense, and it’s probably hard to follow the parenthesis pairing…

1
num = (x-(y-(z*2)+2)*foo(z));

This makes the grouping a little easier to see…

1
num = (x- (y- (z*2) +2) * foo(z) );

And this is just as visually confusing as the first example, except wider…

1
num = ( x - ( y - ( z * 2 ) + 2 ) * foo ( z ) );

Granted: When you’re nesting stuff three and four levels deep, it might be time to consider breaking the expression up onto multiple lines. Still, I’m using this extreme example to help illustrate what I’m talking about.

So yes, coders will argue about this stuff. Serious people with fancy degrees will sit around and argue – at length – about how they can optimally arrange blank spaces.

To be continued…

 


From The Archives:
 

146 thoughts on “Coding Style Part 2

  1. TehShrike says:

    Indent with tabs, align other things with spaces! I like to decide indentation-width per-computer! :-)

    …I can understand arrowing across tabbed indentation might seem weird after so many years, but I think most IDEs let you hit “home” multiple times to switch between the beginning of the line, and the first non-whitespace character. That’s probably the 1337 way to be navigating around