Programming Vexations Part 6: The Compiler

By Shamus Posted Thursday Oct 3, 2019

Filed under: Programming 82 comments

In the last entry, I said that waiting for a program to compile is one of the vexations of programming. I’ve spent a lot of time writing about code over the years. I’ve (hopefully) explained things in easy-to-understand terms. With any luck, you’ve learned something along the way. 

The time has come to admit that I’ve been lying to you. Or at least, I haven’t been giving you the full story. After all this time I figure you’re finally old enough to hear the truth.

Vexation #2: Compiling is Not Programming

In the past, I’ve described the process of compiling your program like this:

It really is that simple!
It really is that simple!

You just shove your source code through the magical compile-o-matic and out pops a video game!  

The reality is that it’s not nearly so simple. In the world of C++, the “compiler” isn’t even a single program! The reality is closer to something like this:

Okay, it's not that simple at all. It's actually really annoyingly complex.
Okay, it's not that simple at all. It's actually really annoyingly complex.

This is still a pretty big simplification, but this is enough for the discussion we’re having today. The process of taking source code and turning it into an executable program involves at least three different programs and there are a lot of ways for things to go wrong.

IDE

I set Google to translate subtext and this was the page it gave me.
I set Google to translate subtext and this was the page it gave me.

In our modern jet-setting future, we have lovely programming environments called IDEIntegrated Development Environment.s that handle all of your text-editing duties. For the vast majority of people programming on Windows, this means using Microsoft Visual Studio. I’m somewhat notorious for bashing Microsoft for their many many egregious shortcomings. But while their operating system is a griefing engine and their gaming software is an affront to sanity, I’ll admit their programming tools are actually pretty dang good. It’s even free these days. Good and free? That’s pretty un-Microsoft of them.

Your IDE will recognize what programming language you’re using and can color the code accordingly. A good one will understand the code well enough to help you find definitions of things, set bookmarks, highlight undefined variablesIt’s basically a spell checker, but for code., search the code, and auto-complete things for you. It’s pretty great.

At some point you’re done typing code and you wrongly assume you’re ready to compile. You’ll hit a keyboard shortcut and your IDE will invoke…

The Compiler

Like I said at the start of this series, C++ is descended from C, and thus it inherited a lot of things that were necessary in 1972 but made no damn sense in the 1990s and are farcical today.

Back in the old days, memory was precious. You only had a few kilobytes to work with. There was no way the compiler could parse the dozens of files in your project in one go and hold all their information in memory. So instead the compiler was designed so that it could only compile one file at a time. It runs once on aircraft.c, and turns that file into binary data. Then it runs again on bullets.c, then cars.c, then guns.c, then helicopter.c, and so onI’m not sure why you’re trying to develop Arma in 1972. Did you check the target hardware? I don’t think you have enough memory for all of this.. Each file can only be considered in isolation.

This one-file-at-a-time approach remains today, and I don’t think you could change it without making huge disruptive changes to how the language works. The good news is that modern computers can invoke more than one instance of the compiler at a time. You can have a compiler working on bullets.c and another instance working on cars.c. The two instances can’t cooperate and they end up doing quite a bit of redundant work, but at least you don’t need to wait for bullets.c to finish before it starts on cars.c.

There are a lot of steps to reading in the text. Here are a few notable ones:

1. Preprocessor

As the name suggests, this is something done before the text is processed. Probably the simplest and most common use of this is for defined symbols. Perhaps the programmer has a specific number that gets used a lot in code:

#define GRAVITY 9.8

Then elsewhere in the code, you’ll have something like this:

void SpaceMarine::Falling ()
 
{
  //Shouldn't this be multiplied by some sort of timescale? Do our physics run at 1FPS? -SY 
  fall_speed = fall_speed + GRAVITY;
}

Everywhere in the code where we need to deal with acceleration due to gravity, we can use the #defined symbol GRAVITY. Before attempting to compile your program, the preprocessor will go through and replace every instance of the word GRAVITY with the value 9.8. This has two advantages. One, another programmer might not be able to understand what 9.8 means, but if they see the word GRAVITY then they’ll get what the code is doing. Secondly, this allows you to change the value in a single location. If you later realize this game is supposed to take place on the moon, you can replace 9.8 with 1.625. You only need to change a single line of code, rather than manually tracking down all gravity-related code and changing all instances of 9.8 individually.

The other major thing the preprocessor does for us is it allows us to use different code for different builds of the game. Maybe we have a function called “Desync” that gets called in the event of some serious error. In the code below, we have two different versions of that code:

#ifdef ALPHA
 
void DeSync ()
{
  //This is the public alpha build of the game, so here we would have some code to pop up a dialog box.
  //We could explain the desync situation and ask the user to provide additional details about what they were 
  //doing before the error occurred. The game would send that back to the master server where it would be
  //stored for later review by the developers.
}
 
#else 
 
void DeSync ()
{
  //This is the release build of the game. Just issue a generic "Lost connection to server" message
  //and dump the user back to the main menu.
}
 
#endif

Normally you can’t have two functions with the same name like thisYes, I know you can if they have different signatures, but I’m not going to stop and explain signatures right now.. But the preprocessor looks at what version of the game you’re compiling. It keeps the Desync you want and ignores the other.

This is a simple example, but these options can get really complex and it might take you a minute to figure out which of the six different versionsMaybe this game is being developed for the PC, PlayStation, and Xbox, and each platform needs different behavior for the shipping version of the game vs. the pre-release builds for reviewers. of DeSync() is actually active. In a modern setup, your IDE can usually highlight whichever version is active and leave the others greyed out. We didn’t have this back in the 90s, and I remember having confusing moments where I was trying to fix a bug and wondering why my fixes weren’t doing anything, only to discover later that I was editing the wrong version of DeSync()Or whatever the function was called. I’ve long since forgotten.. Eventually I learned to double-check which version was really being compiled. I’d type some lazy garbage into the function like “dsklja;gf”, and hit compile. If the compiler didn’t complain, then I’d messed up and was working on a disabled bit of code.

2. Lexical / Syntax Analysis

This is actually two different steps, but we can treat them as one for the purposes of this discussion. The compiler has to read in the text and figure out what the programmer was trying to express. In C++, this job is apparently very difficult for the people writing the compiler, but for the average game developer it’s as easy as slapping F7Or whatever the hotkey is in your IDE.

If the code doesn’t make sense because of typos / stray characters / misspellings / missing characters then the compiler will issue an error. If you’re having a good day, then the error will be simple, understandable, and point you to the source of the problem. If you’re having a normal day, then the compiler will report the problem in a confusing / misleading wayActually, error messages are pretty good these days. They’re not particularly newbie-friendly, but then C itself isn’t particularly newbie friendly.

3. Object Code


Link (YouTube)

If aircraft.c is valid – if it compiles with no errors – then the compiler will spit out aircraft.obj. This object file is something like an intermediate point between your text-based code and your final executable. The details are pretty hairy and generally not something most programmers bother worrying about, although we’re often plagued by a vague sense of guilt that we ought to learn about this stuff someday. 

Your IDE will run the compiler on all the files, one at a time: aircraft.c, bullets.c, cars.c, and so on. Each successful compile will result in another object file. 

I should point out that these file names are perfectly arbitrary and the compiler doesn’t care what the files are called. In the much-newer Java programming languageJava was introduced in 1995., filenames must match the code contained within them. Your car code must go in a file called car.java and your soldier code must go in soldier.java. In C and C++, you’re free to organize your code however you like, which means you’re also free to not organize it at all. You could make a scavenger hunt for the rest of the team by getting rid of car.c and spread the contents of the file throughout the rest of your projectIf you get caught doing this, don’t claim I suggested it. I just said it was POSSIBLE.. It’ll still compile, but now everyone else has to hunt for the scattered pieces.

This sounds crazy (and to be clear, it would be crazy to do this) but this freedom actually has some utility. If you’re doing design-by-programming, then you end up writing a lot of rough prototype code. For example:

In the last meeting someone tossed out the idea of adding mounted guns to vehicles. So now I want some cars to have mounted guns. I could make the mounted gun part of the car code, or I can make it an independent game object, or I could make it a weapon that gets added to the player’s inventory like all other weapons, but is removed if they exit the vehicleI know I heard about a game that did this, but I can’t remember what it was.

Hm. Which way is best? For now, I’ll just add the gun to the car. I’ll put all the mounted gun code here in car.c. 

Then later you decide you’re happy with this design, so you create turret.c and move all of the mounted gun code into that file. You generalize it so that other vehicles like helicopters and aircraft can also have mounted guns.


Link (YouTube)

In a language like Java, you’d be forced to put the new code into a new file right from the start, but the C language allows you to throw things together and figure out the overall file structure later, once you know what you’re doing. It seems trivial in this case, but in cases with complicated systems it’s often nice to be able to design first, and THEN decide how the code should be organized.

In some programming domains, programmers are working from a rigorous spec and this “freedom” just seems like an invitation for chaos. In those domains, you know exactly what you’re going to build before you write your first line of code. Why allow the user to build something incorrectly?

These programmers will complain that the freedom of C allows people to slap together disorganized, half-assed code. And it does. We come back to the question about how much we should restrict the programmer in the name of helping them. It’s the old “this language was made for good programmers” argument again. As it happens, that comes up a lot when you’re talking about language design.

In any case, if all of the files compiled successfully then your IDE will move on to…

4. The Linker

I tried to find a stock photo of the abstract process of linking, but this is the best I could do.
I tried to find a stock photo of the abstract process of linking, but this is the best I could do.

It’s called the linker because it takes all those disparate object files and links them together to make the final executable. The linker is also responsible for pulling in any external libraries you might be using. Maybe your program uses OpenGL for the rendering and the Windows SDK for window creation and managing user input. You don’t normally have the source code for these things. Instead, they come as pre-built files and their functionality is added to the program by the linker.

Here is where you can run into the final stage of build problems. Maybe in your code you specified a function for handling car crashes. The file car.c promises this function exists somewhere, but since the code isn’t obligated to put car-related code into the car-specific file, the early stages of compilation have no way of knowing that you messed up. It’s like the end of a multi-player scavenger hunt, where you take an inventory of everything you’ve collected. It’s not until the very end that you discover an item is missing. The compiler didn’t know the car crash code was missing until it tried to find it among the scattered object files and came up empty.

5. Why Does This Take So Long?

C has always been a bit slow to compile, and C++ even more so. I’ve always thought these compile times to be unreasonably long. In my days of C++ programming, I’d smack the F7 in Visual Studio and watch the filenames scroll by as each one was processed. After about thirty seconds I’d get restless and start thinking, “Hang on. Why is this taking so long? I was using this same codebase ten years ago and I swear it took about this long to compile. This computer is over ten times faster than the one I had back then. The code base is bigger, but it’s not ten times bigger! What is the compiler doing with all the CPU cycles?” 

I’ll admit I don’t have hard numbers on any of this. I didn’t meticulously record compile times and computer specs back in the 90s so I’d have them available for comparison today. You can’t do a proper comparison between the compile times we have now and how slow it is today. I’m sure it’s faster, but it’s not that much faster. It’s nowhere near 100x faster than it was 25 years ago, even though the increase in processing speeds are in that ballpark. I doubt it’s even 10 times faster. 

So that’s the compiler. It’s slow, but your IDE will usually offer features to help speed things up. It’s complicated, but your IDE will usually have tools to hide or mitigate that complexity. It’s obtuse, but the IDE is designed to handle all the complicated / obscure options for you. 

Next week I’ll talk more about the compiler process and why compile times are more important than they seem.

 

Footnotes:

[1] Integrated Development Environment.

[2] It’s basically a spell checker, but for code.

[3] I’m not sure why you’re trying to develop Arma in 1972. Did you check the target hardware? I don’t think you have enough memory for all of this.

[4] Yes, I know you can if they have different signatures, but I’m not going to stop and explain signatures right now.

[5] Maybe this game is being developed for the PC, PlayStation, and Xbox, and each platform needs different behavior for the shipping version of the game vs. the pre-release builds for reviewers.

[6] Or whatever the function was called. I’ve long since forgotten.

[7] Or whatever the hotkey is in your IDE.

[8] Actually, error messages are pretty good these days. They’re not particularly newbie-friendly, but then C itself isn’t particularly newbie friendly.

[9] Java was introduced in 1995.

[10] If you get caught doing this, don’t claim I suggested it. I just said it was POSSIBLE.

[11] I know I heard about a game that did this, but I can’t remember what it was.



From The Archives:
 

82 thoughts on “Programming Vexations Part 6: The Compiler

  1. Baron Tanks says:

    Love these posts as someone that once did C++ in uni close to a decade ago and since only ever ‘programmed’ a bunch of macros in Excel/VBA. You hit just the right level of detail and abstraction where it’s informative and I’m still learning new stuff, without leaving me lost in the reeds having no idea what’s going on. Also quick typolice:

    Insert 11 reads: I know I head about a game that did this, but I can’t remember what it was.

    Think you missed the R in heard chief.

  2. Olivier FAURE says:

    Re: compile times, some possibilities:

    – IO time stayed the same.
    – compilers are just bad at parallelism, which represents most of the recent improvements in computing speed
    – compilers have bad cache locality, which means they’re bad at leveraging processor speeds.

    1. lethal_guitar says:

      Plus, compilers are much more sophisticated at optimization nowadays, which can involve a lot of work. This presentation gives a pretty good idea of the elaborate transformations and analysis done by modern compilers: https://youtu.be/bSkpMdDe4g4

    2. Bloodsquirrel says:

      It might be that compilers are spending more time using more sophisticated optimization methods.

    3. methermeneus says:

      A lot of it is optimization, but not enough, since you can often turn optimizations off and barely change the compile time. I’m not exactly an expert, but from what I’ve heard, it’s mostly a combination of C[++] being badly designed for parallel compilation (which, as Crokus Younghand points out, can be mitigated with a unity build, although that has trade-offs given the many uses of DLLs) and the C[++] lex/parse cycle generating cache misses out the wazzoo. One of Jon Blow’s major goals is a faster compile time, and he seems to be doing pretty well, probably just because he’s starting from scratch instead of trying to keep C’s old compilation model.

      1. Leeward says:

        This is as good a place to quash this as any: C is not slow to compile. C is very fast to compile. C++, because of its ridiculous grammar and huge compile-time workloads, is very slow to compile. C was a language designed for easy compilation, and it succeeded.

        I/O time matters the most for small files. A single C compilation unit with a normal number of includes will spend the majority of its time opening files.

        C’s compilation model is extremely parallelizable. Since each compilation unit is independent, the compiler can spawn a different process for each .c (or .cpp, since C++ uses the same model) file. Unfortunately, when you’re I/O bound this doesn’t help unless you have parallelism in your disk. This is possible for purpose built machines with RAID arrays, but chances are that even a moderately powerful CPU is spending most of its time waiting for I/O when compiling a big project.

        Linking, on the other hand, is commonly single-threaded in C. I/O parallelism doesn’t matter, but the linker has to go through the whole program and fill in missing symbols. It’s not unparallelizable, but it’s not trivial like compilation. Then there’s the problem that modern linkers have modern features like link-time optimization. That’s harder to parallelize and it takes more computing power.

    4. Aanok says:

      Another significant factor is probably the fact modern C++ is a much, much bigger and more complicated language than it was in the 90s. The amount of constructs it has now, from all the extensions to templates, to lambdas, to rvalue references to whatever have you, is a lot higher and has a lot of weird interactions that compilers need to account for. I don’t know about the details of how this is implemented, but I think it stands to reason that it must have a cost at compile time.

      1. paercebal says:

        In my C++ experience, the difference comes mostly from templates, and sometimes a creeping desire to put everything in headers.

        First, inlining: Having code in the header “helps” the compiler inline it (e.g. accross DLLs/shared libraries), and thus, optimize it away. Also, you don’t need to deal with exporting symbols, or even linking: If you can put on github a header-only library, then you don’t even need to provide a Makefile.

        The problem is that the code is in the header, and that header could be compiled multiple times. Thus, you compile the same code again, and again (this is where precompiled headers are needed). The best part? As the code is in the header, you move dependencies (i.e. other included headers) from sources to headers, where they exponentially compound (again, precompiled headers help). And this is a major problem.

        C++ templates are awesome in a way that just can’t be matched by C# generics (I’m not even talking about Java’s ridiculous implementation of the same, or the laughable C attempts at reproducing them using macros). But this comes at a cost:

        First, templates are usually put in headers, which means the problems I explained above also apply to templates. Even if the code is not inlined, it is generated, sometimes multiple times, and then the redundant code is removed (by the linker, IIRC).

        Then, templates are **powerful**. You can do a lot of tricks with templates, and if some of them are questionable, others are just too useful to avoid. For example, templates had a lot from type safety, moving a lot of runtime errors (sometimes, silent errors) to compilation errors. So, yes, you wait a lot of time for the compilation, but that is time that will not be spent testing and debugging (in my experience, if a low level C++ library is correctly written, if its compiles, then it works). Another example is actually executing code during the compilation (see also: constexpr). The idea is that some processing has already all its data known at compile time, so instead of just doing it at runtime, the code is “executed” (via the templates, or constexpr), and the result is written directly in the binary, while the original code, if not useful anyway, is removed altogether. I can write something on Godbolt if someone wants a demo.

        It doesn’t help that most of the C++ standard library, as well as boost, is header-only, so, if your C++ code uses them (as it should, unless exception), then you’ll pay the prices mentioned above.

        In my experience, modularizing your code (putting as much code as possible in source files instead of headers, and using pre-compiled headers for headers coming from modules your compiled compile depends on, helps dramatically the compilation speed.

        Still, it remains awfully slow when compared to, say, C# (I was working with C# just yesterday). We’ll see what C++ modules (coming in C++20) will bring.

    5. Hector says:

      I have a stupid question about compiling today: will the Compiler always redo every source package, or will it skip unchanged items? That is, if you update Cars.DAT but not Turrets.STUFF, will it recognize the date or checkaum and skip it?

      1. Richard says:

        All the common toolchains keep the OBJ (or equivalent) files and linker intermediate representations around and will only touch them if it can’t prove that the corresponding source files have not changed.

        Yes – that is a double negative. If the toolchain isn’t sure it’s the same source code then it’ll recompile and relink.
        Even if all you did was fix the spelling in a comment that the actual compiler never sees (because the preprocessor step removes them)

        The sad part is the linker. Linking is fundamentally a serial thing, as every function must be implemented exactly once and that same implementation called by everywhere that uses it.

        C# gets around this by not bothering with compilation or optimisation until the program is actually run.
        It’s parsed into bytecode – this is essentially a preprocessor and a very basic linker.
        The functions don’t get compiled into actual machine code until the program is run and they’re called, and if they get hit a “lot”, an optimiser is run and they get optimised.

        In effect, C# (and similar) simply throw the problem over the fence and get the users to do it. Whether that’s a good use of the end user’s CPU time is a different matter.

        There are distinct advantages to not bothering to compile until it’s run – a function that’s heavily used can have the most aggressive optimisations applied automatically, another function that’s only called if the user does something particualrly special might never be “compiled” at all, because it’s never called.

        A language like C++ applies optimsations to pretty much the whole program, unless the developer takes the time to explicitly flag things as “really important” or “probably never happens”

  3. Mr. Wolf says:

    I spent almost this entire article distracted. I was trying to imagine Arma developed in 1972. I concluded it looks a lot like an actual military telling you to stop using their ballistic simulation machines to play games.

    That said, my imagined 1982 version looks a hell of a lot like Battlezone, and that’s always fun.

    1. PeteTimesSix says:

      A strange game. The only winning move is not to play.

      How about a nice game of chess?

  4. Philadelphus says:

    Wow, I don’t think I could handle programming in C/C++. As a Python programmer by trade, I get antsy if my code takes longer than two seconds to compile.

    1. Tuck says:

      Interpreted vs compiled, in a nutshell!

      1. tmtvl says:

        Downside being of course that A) interpreted languages depend on the user having the interpreter installed and B) compiled languages generally being an order of magnitude faster in execution of well-written code.

        1. pseudonym says:

          Yes, indeed, and there are other factors at play as well:
          A) compiled binaries only run on the architecture they are compiled for, and only if all instruction sets are supported. That’s why pre-compiled binaries are usually compiled for old cpu designs, so they can run on any newer machine. Distributing binaries is a pain, i386, x86-64, armel, armhf, arm64 etc. And you need to take linux, mac an windows into account. It is a lot of binaries. With an interpreted language, you just give the script and it runs.
          And B) interpreted languages are generally faster in development time. You can run your tests directly since you do not have to compile. This allows you to iterate very fast while debugging. If your developers are more expensive than your cpu cycles this does matter.

          All these trade offs make it very interesting to choose a programming language.

          All these trade

          1. tmtvl says:

            pseudonym? You cut out there. pseudonym? PSEUDONYM!

            *Fission Mailed*

            1. pseudonym says:

              Hahaha, that will teach me to reply on a phone ;-).
              These last three words should not have been there.

      2. Leeward says:

        Python is byte-compiled, so there’s still a compilation step; it’s just super lightweight. Oh, and all the stuff the linker does (symbol resolution, etc.) gets done at runtime.

        1. paercebal says:

          … and this is where appears, to me, the real downside of interpreted languages: A lot of compilation errors (in strongly typed languages) become runtime errors, because somehow the variable type is wrong, or you didn’t provide the right parameters, of even the function doesn’t exist, etc.. IDEs can sometimes mitigate that, though.

          1. pseudonym says:

            That is a very good point!
            I try to prevent this by lots of unit testing and getting to near 100% coverage, so I know almost each line of code runs at least once in the test suite. This way a lot of these mistakes can be caught. At the cost of some extra development time.

            The huge benefit of this testing approach is, that when you have a full test suite you can more confidently make a change, while worrying less about regressions.

            1. Leeward says:

              Seems like we ought to have 100% code coverage for our tests in any language. Compilers don’t prevent anyone from making functions like this:


              int add_one(int num)
              {
                return num + 2;
              }

          2. Leeward says:

            It’s true that you can shift stupid stuff that would have been caught by a compiler into runtime errors. You can also use a linter, which will catch that same class of error.

            Erlang comes with a tool called Dialyzer, which only throws errors when it can prove you did something wrong. This is not the kind of thing you can get in static languages where, for example:


            int foo(int num)
            {
              i = 7;
              return i + num;
            }

            is not valid C (i is not declared) but is completely unambiguous. Of course, some languages do nifty things with type inference, but if you’ve ever spent 10 minutes just trying to get a piece of Haskell code to compile you’ll know that’s not a panacea.

            I think what you’re getting at with IDEs mitigating it is those static analysis tools. You don’t need an IDE to run a static analyzer for you any more than you need one to run a compiler, but sure.

            Anyway, it’s all about tradeoffs. Dynamic languages get you something faster, but with less assurance that it’s correct. Static languages keep you from making stupid mistakes at the cost of more time spent fixing stuff that’s irrelevant or not actually wrong.

  5. Yerushalmi says:

    I could make the mounted gun part of the car code, or I can make it an independent game object, or I could make it a weapon that gets added to the player’s inventory like all other weapons, but is removed if they exit the vehicle[11]I know I heard about a game that did this, but I can’t remember what it was.

    IIRC, the tank in Goldeneye worked this way.

  6. Crokus Younghand says:

    Regarding compilation speeds, modern compilers and linkers have a lot of unnecessary overhead. You can get rid of most of it by using unity builds; my pretty large engine compiles in less that 5 second by making sure that only one translation unit gets compiled.

  7. GargamelLenoir says:

    In Java we would totally code the machine gun in the Car class and then transfer the code to its own class/file if needed. It’s really no problem.

    1. methermeneus says:

      Given that the whole article is meant to be one big oversimplification, I’m pretty sure that was shorthand for “if every entity is a class, then you’d have to create a new turret.java file for the turret class instead of prototyping the turret class inside of the car.c file.”

      1. John says:

        That’s still not true. Or still a simplification. Or whichever. The rule in Java is that public classes must appear in files with corresponding names. If Turret is a public class then, yes, it has to be in a file called Turret.java. If Turret is not a public class, however, the rules are much, much looser. The only real constraint is that it has to be in the same package as any classes which reference it directly. Let’s suppose that Turret only ever appears as a member of the Vehicle class (or of classes which extend Vehicle, like Car or Helicopter). In that case, there’s no reason for the main program to ever need to access Turret directly, and the programmer can:

        (a) keep the source code for Turret in the same file as the source code for Vehicle,

        (b) keep the source code for Turret in its own file, which does not need to be named Turret.java but which probably should be as a matter of good programming practice, or

        (c) keep the source code for Turret in some other file with the source code for other classes (of which no more than one can be public).

        Most of the time, Java programmers put classes in their own files for the same reasons that programmers in other languages do. It’s a matter of navigation, readability, and pure programmer convenience. To be perfectly honest, when I’m working on small Java projects, I sometimes keep the source for all the classes I write in a single file.

        1. Decius says:

          So now the bunker-mounted machinegun that should act the same as the vehicle-mounted one has to be a vehicle?

          1. John says:

            How you structure your program and your code is up to you. If it’s more convenient to have Turret as a public class then by all means make it one. All I’m saying is that it doesn’t have to be one and that the source doesn’t necessarily have to be in its own file.

            Although, now that you mention it, yes, Bunker could very well be an extension of Vehicle that just never moves. Heh.

            1. paercebal says:

              > How you structure your program and your code is up to you.

              Actually, that’s the point of the discussion. In Java, this is **not** “up to you”. Java imposes **artificial** limitations, that influences how you write the code. One could say “it’s for your own sake”, that the “Java language designers know better than you”, but it is still an **artificial** limitation.

              It is telling that C# (which is mostly a Java clone, done much better IMHO) didn’t clone that specific feature of Java (among others).

              1. John says:

                I’ve never said that there aren’t restrictions. One public class per file is obviously a restriction. My point is that it’s the only one and, what’s more, it’s not terribly severe either. I really don’t understand why people get all bent out of shape over it. Mountains, mole-hills, etc.

  8. Steve C says:

    Typo: “we ought to lean about this stuff”

  9. King Marth says:

    I need to plug the Obfuscated C competition winner (Jason, 2001) which implemented a text adventure game exclusively played with preprocessor #DEFINE error messages – when invoking a compiler on the command line, you can add an implicit #DEFINE override by adding the argument -DALPHA to set ALPHA in the given example. The game then plays with this, you input commands of the form -DRINK -DAQUIRI, defining constants RINK and AQUIRI which change the compile error message.

    1. Alan says:

      2001’s jason is a text adventure, but the preprocessor one is 1994’s westley.c

  10. Mephane says:

    #define GRAVITY 9.8

    I know it is just meant to be an easy example for non-programmers, but I hate when actual professional programmers do it. The proper way is

    static constexpr double GRAVITY=9.8;

    The net result is essentially the same, but it prevents certain errors where the preprocessor would replace something you did not want it to, and it also ensures that the constant is of the right data type, not handled in expected ways when passed to overloaded functions etc.

    1. Echo Tango says:

      Preprocessor stuff like this can be useful, but I’ve never had a reason to choose it over other options. Constants like you point out, are better defined in normal code, and things like Linux vs Windows can be chosen with your build tools (makefile etc).

    2. Erik says:

      But that “fix” removes the *entire point* of that section – once you do that, you are now defining the symbol in the code itself, NOT the preprocessor .

      You are correct that using core language instead of the preprocessor certainly prevents the preprocessor error bugs, and helps with type validation, and all of the good reasons for using this style. In fact, that’s why C++ created that syntax. (Some of which was back-ported to C, in C99 IIRC. It’s invalid syntax in older dialects, like K&R C.) There are excellent reasons why C++ tried hard to remove the need for the preprocessor overall, and especially in those cases.

      But in the end, using that syntax completely invalidates the example, so…. ¯\_(^^)_/¯

  11. Retsam says:

    In a language like Java, you’d be forced to put the new code into a new file right from the start…

    Is any other language “like Java” in this regard? As far as I know, this restriction is just a Java quirk. Even C#, which was basically designed to be “Java with the serial numbers filed off” doesn’t enforce this AFAICT.

    This section makes the case that “organized languages” like Java are good for domains with rigorous specs, while “freedom languages” like C++ are good for other domains.

    But in my experience this isn’t a real divide among programming languages. If we define “organized language” as “enforces one class per file”, then it’s basically “Java vs. the World”.

    Arguably the bigger difference in code organization between C++ and Java is the presence of a module system in Java, which does add some ceremony between where you define your code and where you use it. But, if we defining “organized language” as “has a module system”, then it’s basically “C/C++ vs. the World”[1].

    I do agree that the Java filename restriction is an example of the “bad programmers” language design; but I thought the article makes it sound like two competing schools of thought and not just a specific design decision made by Java, and I’m not sure I see the connection to the topic of compilers.

    [1] Though the next version of C++20 I believe is slated to include modules.

    1. John says:

      Java does not enforce one class per file. Not even the page Shamus linked to says that. Java enforces one public class per file. (A public class is a class that visible to classes outside its own package.) As I mentioned in a comment above, I sometimes keep all the source for all the classes in smaller Java projects in a single file. I tend to put classes into their own files when the program becomes easier to read that way, not because Java makes me do it.

    2. Erik says:

      C++? A “freedom” language? Bwahahaha!

      At some level, any language with static typing and compilation is not really a “freedom” language. Look towards Python or PHP or Tcl, for examples, for freedom.

      1. tmtvl says:

        “There is one and preferably only one way to do it” is an example of a freedom language?
        A real freedom language follows “there is more than one way to do it.”
        And for proper typing, look at Perl 6: optional static typing. Thus avoiding the hassle of having to write your own input validation.

        1. Philadelphus says:

          It helps if you quote the Zen of Python correctly: “There should be one– and preferably only one –obvious way to do it.”
          Keyword there is obvious. There should be a single obvious way to do things rather than multiple non-obvious ways (or even multiple obvious ways). Of course, Python includes plenty of non-obvious ways too, but the obvious way should be obvious because it’s (generally speaking) the objectively best way to handle something—you should only need to to do something else if your circumstances warrant or if you’re deliberately writing weird or sub-optimal code for fun.

          Having everyone following the obvious way also means it’s a lot easier for other people (or yourself) to figure out what you were doing later. There’s such a thing as too much of a good thing…

      2. Moridin says:

        I’ve never figured out what the supposed benefits of dynamic typing are. As far as I can tell, all it does is make it easier to mess up.

        1. Decius says:

          It also allows Array(16).join(“wat” – 1) + ” Batman!”

          1. Leeward says:

            That’s weak typing. They’re different things that both exist in some languages, so they get conflated.

            1. Decius says:

              What’s the precise pedantic difference?

              1. tmtvl says:

                Weak typing means the language doesn’t complain when you try to perform an operation with incompatible types, dynamic typing means the language doesn’t keep track of what type a variable is.

                1. Chad Miller says:

                  Generally, as I’ve heard the terms used, it looks something like this:

                  a = 3
                  b = “pasta”

                  print(a + b)

                  static typing generally means something like this won’t compile (either because they require type declarations and then complain when you try to add an integer to a string, or because they have type inference and will figure out that you’re trying to add an integer to a string)

                  dynamic typing means this will run, but with an error (Python and Ruby will start the script and then throw an exception when asked to add an integer to a string)

                  weak typing means the above will run (Javascript prints “3hello”)

                  Python people in particular get twitchy about this distinction; they don’t appreciate being lumped in with the likes of JavaScript and PHP on this front just because they don’t explicitly declare the types of their variables.

                  (yes, I’m aware that by the above definition-by-example it’s possible for language A to be “stronger” in some respect than language B while language B is stronger in some other way. It’s debatable to what degree the difference between JS and Python demonstrated should be considered part of the type system at all. The whole discussion is kind of a mess.)

                  1. tmtvl says:

                    Just because weakly typed languages are usually dynamically typed doesn’t mean you couldn’t have a language where you can do something like…

                    my Str $string = “Hello!”;
                    my Int $number = 1_000_000;
                    my IntStr $sum = $string + $number; # “Hello!” => 1000000

                    1. Chad Miller says:

                      You’re absolutely right; I was consciously oversimplifying.

                      Along similar lines to your hypothetical, here’s someone using Haskell typeclasses to impalement truthiness which is a similar idea.

                    2. Leeward says:

                      C is a static language with weak typing. ('c' + 1.0 == 'd') > 0 evaluates to true.

                      ‘c’ is a char.
                      1.0 is a float.
                      The ‘==’ operator returns a bool.
                      0 is an integer.

                      This expression compiles with all the warnings turned on under C11, C99, and C90 without any complaints.

                2. Leeward says:

                  That’s not true. Dynamically typed languages keep track of variables’ types; they just do it at runtime. This is even true of weakly typed dynamic languages.

              2. Leeward says:

                Weak typing: the language won’t complain when you implicitly try to use different types in operations. Things like adding numbers to strings, or comparing characters to booleans are weak typing.

                By contrast, strong typing prevents nonsense comparisons like true < 'b'.

                Static typing: the types of all things are known before runtime. If, in a strongly typed language, you try that true < 'b' thing, you’ll get an error, If the language has static typing, that error will happen at compile time. If it’s a dynamic language, it’ll probably show up while the program’s running. This is distinct from interpreted/compiled. Erlang and Java are both compiled to the same degree, but nobody would accuse Java of being a dynamic language.

                Some examples:

                Weak dynamic typing – Javascript, PHP, Erlang
                Strong dynamic typing – Python, Lisp
                Weak static typing – C
                Strong static typing – Haskell, Ada, C++, Rust, Go, Java

                While static/dynamic is pretty clear cut, strong/weak typing is a continuum. C is weakly typed compared to Python (adding a number and a letter is an error in Python), but stronger than Javascript (adding a number to a string gives an error in C). C++ coerces integers to booleans silently.

                It’s also not a goodness continuum. Dynamic languages are great at some things where static languages are terrible, and vice versa. The same can be said for strongly typed vs. weakly typed languages. Erlang is one of my personal favorites, and it says that lists are greater than integers.

          2. tmtvl says:

            Gary Bernhardt’s talk? That’s a good one.

        2. PeteTimesSix says:

          As much as I like my static typing and type-safety, I will admit javascript being able to just put an arbitrary JSON into an object and start pulling things out of it without having to define a class is handy. Same goes for anonymous functions that just assume the provided object has a .name in it somewhere and doesn’t worry too much about the rest of it…

          I still define the class anyway though(typescript). I *like* my static typing.

          EDIT: Oops, forgot to obsess over C# there for a second.
          C# does however have the dynamic keyword, which gives you the benefits of dynamic typing when you really need them, so its still the best language under the sun.
          There. That was close.

    3. Abnaxis says:

      I feel you’re missing the point a bit. The definition “organized languages” isn’t specifically “one public class per file,” it’s “coding practices that are ‘organized’ are forced in the language specification.” Organized languages explicitly make it difficult to do things in a quick and dirty way, which makes life harder when you have to do design-by-programming, where you often prototype ugly code and clean it up later once you’ve worked it out.

      Java’s class file structure is one example that forces developers to organize their file structures in a way that makes it easy to find class definitions, but there are plenty of other examples in other languages. My usual go-to example for this issue is Python. Python requires you to have every nested block indented, which in general is good coding practice. However, I have had jobs where the code was deployed in an embedded environment which had no proper editing tools besides a minimally-featured, worse-than-notepad embedded application. That means any time I wanted to wrap a chunk of code in a block for testing or prototyping I either needed to download the code from the runtime deployment, modify it in a proper editing suite, and reupload it; or I needed to go in line by line and indent. And then, if my testing didn’t work and I learned how to better do my thing, I had to go back in and do it AGAIN until it’s was right.

      The fact that Python was the go-to language in an environment where I constantly needed to test and modify my code without a proper IDE was a pain in my ass day-in and day-out. You could say that this is a “Python problem,” and doesn’t apply elsewhere, but IME EVERY organized language has it’s own version of the “Python problem” or the “Java problem.” Contrary to popular belief in many circles, there are plenty of times and situations where you really want to write temporary code that isn’t necessarily production quality.

      1. Retsam says:

        I don’t like Python’s whitespace-significant indentation either; but I don’t think it’s a good example of an “organized language” feature. It’s a syntax choice I don’t care for, but it really doesn’t have anything to do with how code is organized. Maybe you feel slowed down by python’s whitespace, but then a python programmer would probably feel slowed down by C++’s semi-colons and braces (which are equally “forced in the language specification”).

        Even with its questionable whitespace rules, python is like the epitome of “prototyping languages” – quick prototyping is one of the major selling points of the language; so I’d really hesitate to classify it as an “organized” language.

        1. Ruethus says:

          I always found the whitespace thing in Python to be annoying, especially when text editors decided to “help” by swapping out spaces for tabs or vice versa.
          Also, having to do the indentation did not at all stop my 2am Python code for my college classes from being horrendous to read later, even ignoring the fact that I used single spaces as my indents out of protest for the system.
          Finally, while I definitely understand that many people can prototype quickly in Python, my C/C++ muscle memory makes it almost painful to transition back into as I spend half my time deleting semicolons and curly braces that were typed out of reflex.

          1. Zak McKracken says:

            If you’re using single-space indent, then yes, that would make it harder to read. But why would you sabotage the readability of your own code like that (except maybe to prove a point…)?

            I’ve gone from BASIC to turboPascal to C, Fortran, Matlab, back to Fortran and finally Python. Of course any change of habit takes time, and everyone will slow down for a while butnow my fingers have a hard time putting brackets around code blocks. I’m typing LaTeX sometimes, but all those curly braces are just a pain, compared to not having to type them, and I find it hard to believe that there was ever a time when I was typing raw HTML to make a website…

            Nobody, of course, believes that indenting your code is sufficient to make it easy to read, but most programmers in most languages are already using whitespace to divide up their code blocks, so why not make that part of the syntax and spare them from having to type all those brackets?

            1. Richard says:

              Because not all white space is the same.

              The following two lines look identical but are not:
              Good afternoon
              Good afternoon

              (The first has a non-breaking space)
              There are several other “space” characters in Unicode. Which one would you like to use today?

              Thin space? Hair space? Ogham space mark? EN space, EM space?

              Ok, let’s assume you never, ever type them. What about a single [tab] and the ‘right number of’ spaces?
              [space][space][space][space]hello
              [tab]hello

              In a language where whitespace matters, all these are confusions.

              1. Zak McKracken says:

                I don’t even know how to type an em space (of course, by using its unicode code…), so the chances of me (or anyone, really) accidentally typing that into a program are extremely slim. And I bet you could confuse at least some C++ compilers with that, too.

                Now, tabs and spaces? Yes, that is a thing that happens sometimes (as Shamus it does in C++, according to Shamus). That’s also the reason why many IDEs have a default setting to automatically convert all tabs into spaces. And remove whitespace at the ends of lines, and auto-indent lines (but shift+tab will unindent, so you rarely have to manually count spaces), so in practice it is way less of a problem for me than forgetting braces was for me in some of the other languages, and it makes the code much more readable.

                There’s a thing with Python though, and that is that because it is duck-typed and lets you mess with things a lot (like override what “+“ does…), you absolutely can break code in obscure ways if you mess with it in the wrong way. Well, that’s the thing with power and responsibility: You can get a ton of stuff implemented in extremely short time and with very little code because you don’t have to manage memory and declare types. And that means that in cases where such details matter, you either have to keep an eye on them yourself (“Python is for good programmers”…or so) or, if this flavour of safety is a big priority for the type of project you’re working on, then Python is not the tool for it.

                …and I’m totally fine with that, and grateful for the people who implemented BLAS, LAPACK, numpy and sklearn in very efficient C++ code, so I get to use those fast implementations to do huge amounts of math with a few lines of code, and a nice plot of the results with just one more, while my colleagues who’re still on fortran are waiting for the compiler to finish so they can see where it crashes next.

        2. Abnaxis says:

          The difference between Python’s whitespace and C’s semicolons and braces is that the semicolons and braces can be anywhere. You can literally make an entire, compilable .c file one single long-ass line because C doesn’t give 2 shits about hard returns or white space.

          Whether it not a language is built for for prototyping (or any other ease-of-use case) isn’t the point–the point is the degree to which the language specification is designed with a specific structure in mind which it subsequently forces on programmers. Java’s file structure restrictions are an optimization designed to enable fast turnaround when troubleshooting runtime errors with stack traces, but it’s still “organized”; by speeding up dealing with one headache, Java slows down other valid programming use cases. Python is the same–rapid prototyping does not come for free.

          Specifically, Python famously has “Pythonic” ways of intended program design even beyond forced white space, which is pretty much the epitome of “organized.”

          1. Retsam says:

            Python requires whitespace in all the same places that C++ requires braces and semi-colons. You’re not free to organize the whitespace however you want, just like you’re not free to omit braces or semi-colons in C++. Python programmers often find braces and semi-colons to be just as much of an unnecessary imposition as C programmers do for python’s whitespace.

            And if whitespacing is a question of “program organization” then it’s the lowest, most trivial aspect of it. That’s a question of syntax, and “program organization” is an abstraction level above syntax: how do you divide up code into modules or classes, what parts are abstracted what parts are exposed, what does the file structure look like, etc.

            I just don’t see a connection between Shamus’s critique of Java’s one-public-class-per-file and your complaint about Python’s meaningful whitespace, or how to generalize the complaint into the idea of an “organized language”. If we’re talking about syntax concerns, is LISP “organized” because I have to use all those parens, (instead of just writing C-style code like God intended)? If we’re talking about about code organization, is C++ “organized” because it forces me to write header[1] files?

            And “Pythonic” just means “idiomatic python” – that there are certain ways of solving problems that are more natural in Python than others – this is true of every language. You can look up “idiomatic C++” and find the exact same vein of advice about the “right” way to write C++ code. In fact the top answer for What is Idiomatic? on StackExchange gives two examples, one for Python and one for C++.

            [1] Technically you’re not forced but it’s a very bad idea not to, AFAIK.

  12. Jamie Pate says:

    compile time doesn’t scale linearly with the number of files, each header is re-compiled (except with Precompiled headers i guess?) as it’s included in each c file (the preprocessor copy/pastes the contents for #include (yes the compiler optimizes this, but it’s a hard problem)) and each header #includes many other headers, which include many other headers!

    There is a C++ modules proposal that’s being worked through the standards body for C++20, but people aren’t very optimistic and it’s been a SUPER long road.

    https://www.modernescpp.com/index.php/c-20-modules#h2-what-are-the-advantages-of-modules

  13. Joe says:

    The IDE pic made me laugh harder than I have in a while. Thanks!

  14. Asdasd says:

    Why’d you put a random picture of Zelda in the article?

      1. tmtvl says:

        I know what you mean, Paul, Zelda is a princess. That’s a picture of Kirby.

        1. Paul Spooner says:

          Kirby Krackle is the best.

        2. Nimrandir says:

          I beg your pardon, but Kirby swung a bat, not a sword.

        3. Decius says:

          Zelda isn’t a princess, Zelda is a sheik.

  15. Steve says:

    The Visual C++ HFLE screenshot is hilarious! I even have sympathy with its users, but I still PMSL. And to think I found the first episode in this series a touch smug and might have stayed away; the whole series has actually been a joy and that HFLE screenshot is the icing on the cake. Top stuff, Shamus, top stuff!

    1. Zak McKracken says:

      Yep, amazing!

      Although I think the actual reason for MS offering VS for free is that they believe most of those hipster freeloader indie developers will at some point move into employment with a big company to make commercial software (and actually I guess many professional developers did start coding as a hobby), and wouldn’t it be nice if all those people loved no IDE more then Microsoft’s?

      Same thing with many other large software companies. Many CAD and simulation packages for engineers are available for almost or completely no money to students, in hopes that either:
      1: they’ll fall in love with the program and convince their future employer to buy the software, because it’s so nice to use
      2: After finally managing to get the obtuse interface to do what they want, they become afraid of even trying to move on to a different potentially equally-obtuse obtuse but otherwise completely different interface.

  16. Agammamon says:

    We know why its slow – so programmers getting paid by the hour can play games ‘while their code compiles’ and get paid for it.

    1. Richard says:

      No idea what you mean

  17. Phill says:

    C++ will compile multithreaded just fine, and Visual Studio does a good job of it. BUT it is disabled by default, so you need to know that you need to go turn it on to get the benefits of it. And it does make a massive difference.

    1. Mephane says:

      While the effect for the programmer is essentially the same, it is not multi-threaded compilation. What the setting does is that Visual Studio will launch multiple instances of the compiler simultaneously (you get to choose how many at most, generally you shouldn’t set this higher than the number of CPU cores of the computer), but it is still only one file per process.

      1. Richard says:

        That’s an arbitrary architectural choice, made for other reasons.

        (MSBuild/MSVC uses a master process that keeps track of what work remains, and hands pieces of work out to the subprocesses whenever they have capacity.)

        Multithreading vs multiprocessing are simply two ways to achieve the same goal.

        Multiprocessing is usually faster (when appropriate to the work), mostly because of memory access.

        – A 32bit process can only access approx. 4GB of RAM (real and virtual) in total, and I can guarantee that it takes rather more than that to compile something like Chrome or Windows.
        (Doesn’t matter anymore as a 64bit process can access 16 exabytes, which ought to be enough)

        – The CPU cache and memory subsystems don’t have to make sure that threads running on the other cores see the same values in the same places in memory, because it’s a different process and could never see that memory anyway.
        That gets particularly important in multi-socket systems, and CPUs with really large numbers of cores.

  18. Taxi says:

    I think you’re doing something wrong with the shipping version of the DeSync function. What you’re supposed to do is tell the user to check their internet connection.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to John Cancel reply

Your email address will not be published.