{"id":48221,"date":"2019-10-03T06:00:25","date_gmt":"2019-10-03T10:00:25","guid":{"rendered":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=48221"},"modified":"2019-10-03T11:39:51","modified_gmt":"2019-10-03T15:39:51","slug":"programming-vexations-part-6-the-compiler","status":"publish","type":"post","link":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=48221","title":{"rendered":"Programming Vexations Part 6: The Compiler"},"content":{"rendered":"<p>In the last entry, I said that waiting for a program to compile is one of the vexations of programming. I&#8217;ve spent a lot of time writing about code over the years. I&#8217;ve (hopefully) explained things in easy-to-understand terms. With any luck, you&#8217;ve learned something along the way.\u00a0<\/p>\n<p>The time has come to admit that I&#8217;ve been lying to you. Or at least, I haven&#8217;t been giving you the full story. After all this time I figure you&#8217;re finally old enough to hear the truth.<\/p>\n<h3>Vexation #2: Compiling is Not Programming<\/h3>\n<p>In the past, I&#8217;ve described the process of compiling your program like this:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_slide1.jpg' width=100% alt='It really is that simple!' title='It really is that simple!'\/><\/div><div class='mouseover-alt'>It really is that simple!<\/div><\/p>\n<p>You just shove your source code through the magical compile-o-matic and out pops a video game!\u00a0\u00a0<\/p>\n<p>The reality is that it&#8217;s not nearly so simple. In the world of C++, the &#8220;compiler&#8221; isn&#8217;t even a single program! The reality is closer to something like this:<\/p>\n<p><!--more--><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_slide2.jpg' width=100% alt='Okay, it&apos;s not that simple at all. It&apos;s actually really annoyingly complex.' title='Okay, it&apos;s not that simple at all. It&apos;s actually really annoyingly complex.'\/><\/div><div class='mouseover-alt'>Okay, it&apos;s not that simple at all. It&apos;s actually really annoyingly complex.<\/div><\/p>\n<p>This is <b>still<\/b> a pretty big simplification, but this is enough for the discussion we&#8217;re having today. The process of taking source code and turning it into an executable program involves at least three different programs and there are a lot of ways for things to go wrong.<\/p>\n<h3>IDE<\/h3>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_slide3.jpg' width=100% alt='I set Google to translate subtext and this was the page it gave me.' title='I set Google to translate subtext and this was the page it gave me.'\/><\/div><div class='mouseover-alt'>I set Google to translate subtext and this was the page it gave me.<\/div><\/p>\n<p>In our modern jet-setting future, we have lovely programming environments called IDE<span class='snote' title='1'>Integrated Development Environment.<\/span>s that handle all of your text-editing duties. For the vast majority of people programming on Windows, this means using Microsoft Visual Studio. I&#8217;m somewhat notorious for <a href=\"?p=47230\">bashing Microsoft<\/a> for their <a href=\"?p=34549\">many<\/a> <a href=\"?p=1403\">many<\/a> <a href=\"?p=26085\">egregious<\/a> <a href=\"?p=18309\">shortcomings<\/a>. But while <a href=\"?p=26081\">their operating system is a griefing engine<\/a> and their gaming software is <a href=\"?p=34749\">an affront to sanity<\/a>, I&#8217;ll admit their programming tools are actually pretty dang good. It&#8217;s even <a href=\"https:\/\/visualstudio.microsoft.com\/vs\/community\/\">free<\/a> these days. Good <b>and<\/b> free? That&#8217;s pretty un-Microsoft of them.<\/p>\n<p>Your IDE will recognize what programming language you&#8217;re using and can color the code accordingly. A good one will understand the code well enough to help you find definitions of things, set bookmarks, highlight undefined variables<span class='snote' title='2'>It&#8217;s basically a spell checker, but for code.<\/span>, search the code, and auto-complete things for you. It&#8217;s pretty great.<\/p>\n<p>At some point you&#8217;re done typing code and you wrongly assume you&#8217;re ready to compile. You&#8217;ll hit a keyboard shortcut and your IDE will invoke&#8230;<\/p>\n<h3>The Compiler<\/h3>\n<p>Like I said at the start of this series, C++ is descended from C, and thus it inherited a lot of things that were necessary in 1972 but made no damn sense in the 1990s and are farcical today.<\/p>\n<p>Back in the old days, memory was precious. You only had a few kilobytes to work with. There was no way the compiler could parse the dozens of files in your project in one go and hold all their information in memory. So instead the compiler was designed so that it could only compile one file at a time. It runs once on aircraft.c, and turns that file into binary data. Then it runs again on bullets.c, then cars.c, then guns.c, then helicopter.c, and so on<span class='snote' title='3'>I&#8217;m not sure why you&#8217;re trying to develop <a href=\"https:\/\/en.wikipedia.org\/wiki\/Arma_(video_game_series)\">Arma<\/a> in 1972. Did you check the target hardware? I don&#8217;t think you have enough memory for all of this.<\/span>. Each file can only be considered in isolation.<\/p>\n<p>This one-file-at-a-time approach remains today, and I don&#8217;t think you could change it without making huge disruptive changes to how the language works. The good news is that modern computers can invoke more than one instance of the compiler at a time. You can have a compiler working on bullets.c and another instance working on cars.c. The two instances can&#8217;t cooperate and they end up doing quite a bit of redundant work, but at least you don&#8217;t need to wait for bullets.c to finish before it starts on cars.c. <\/p>\n<p>There are a lot of steps to reading in the text. Here are a few notable ones:<\/p>\n<h3>1. Preprocessor<\/h3>\n<p>As the name suggests, this is something done before the text is processed. Probably the simplest and most common use of this is for defined symbols. Perhaps the programmer has a specific number that gets used a lot in code:<\/p>\n<pre lang=\"cpp\">#define GRAVITY 9.8<\/pre>\n<p>Then elsewhere in the code, you&#8217;ll have something like this:<\/p>\n<pre lang=\"cpp\">void SpaceMarine::Falling ()\r\n\r\n{\r\n\u00a0\u00a0\/\/Shouldn't this be multiplied by some sort of timescale? Do our physics run at 1FPS? -SY\u00a0\r\n\u00a0\u00a0fall_speed = fall_speed + GRAVITY;\r\n}<\/pre>\n<p>Everywhere in the code where we need to deal with acceleration due to gravity, we can use the #defined symbol GRAVITY. Before attempting to compile your program, the preprocessor will go through and replace every instance of the word GRAVITY with the value 9.8. This has two advantages. One, another programmer might not be able to understand what 9.8 means, but if they see the word GRAVITY then they&#8217;ll get what the code is doing. Secondly, this allows you to change the value in a single location. If you later realize this game is supposed to take place on the moon, you can replace 9.8 with 1.625. You only need to change a single line of code, rather than manually tracking down all gravity-related code and changing all instances of 9.8 individually.<\/p>\n<p>The other major thing the preprocessor does for us is it allows us to use different code for different builds of the game. Maybe we have a function called &#8220;Desync&#8221; that gets called in the event of some serious error.\u00a0In the code below, we have two different versions of that code:<\/p>\n<pre lang=\"cpp\">#ifdef ALPHA\r\n\r\nvoid DeSync ()\r\n{\r\n  \/\/This is the public alpha build of the game, so here we would have some code to pop up a dialog box.\r\n  \/\/We could explain the desync situation and ask the user to provide additional details about what they were\u00a0\r\n  \/\/doing before the error occurred. The game would send that back to the master server where it would be\r\n  \/\/stored for later review by the developers.\r\n}\r\n\r\n#else\u00a0\r\n\r\nvoid DeSync ()\r\n{\r\n  \/\/This is the release build of the game. Just issue a generic \"Lost connection to server\" message\r\n  \/\/and dump the user back to the main menu.\r\n}\r\n\r\n#endif<\/pre>\n<p>Normally you can&#8217;t have two functions with the same name like this<span class='snote' title='4'>Yes, I know you can if they have different signatures, but I&#8217;m not going to stop and explain signatures right now.<\/span>. But the preprocessor looks at what version of the game you&#8217;re compiling. It keeps the Desync you want and ignores the other. <\/p>\n<p>This is a simple example, but these options can get really complex and it might take you a minute to figure out which of the six different versions<span class='snote' title='5'>Maybe this game is being developed for the PC, PlayStation, and Xbox, and each platform needs different behavior for the shipping version of the game vs. the pre-release builds for reviewers.<\/span> of DeSync() is actually active. In a modern setup, your IDE can usually highlight whichever version is active and leave the others greyed out. We didn&#8217;t have this back in the 90s, and I remember having confusing moments where I was trying to fix a bug and wondering why my fixes weren&#8217;t doing anything, only to discover later that I was editing the wrong version of DeSync()<span class='snote' title='6'>Or whatever the function was called. I&#8217;ve long since forgotten.<\/span>. Eventually I learned to double-check which version was really being compiled. I&#8217;d type some lazy garbage into the function like &#8220;dsklja;gf&#8221;, and hit compile. If the compiler didn&#8217;t complain, then I&#8217;d messed up and was working on a disabled bit of code.<\/p>\n<h3>2. Lexical \/ Syntax Analysis<\/h3>\n<p>This is actually two different steps, but we can treat them as one for the purposes of this discussion. The compiler has to read in the text and figure out what the programmer was trying to express. In C++, this job is apparently <a href=\"https:\/\/yosefk.com\/c++fqa\/defective.html#defect-2\">very difficult<\/a> for the people writing the compiler, but for the average game developer it&#8217;s as easy as slapping F7<span class='snote' title='7'>Or whatever the hotkey is in your IDE.<\/span>.\u00a0<\/p>\n<p>If the code doesn&#8217;t make sense because of typos \/ stray characters \/ misspellings \/ missing characters then the compiler will issue an error. If you&#8217;re having a good day, then the error will be simple, understandable, and point you to the source of the problem. If you&#8217;re having a normal day, then the compiler will report the problem in a confusing \/ misleading way<span class='snote' title='8'>Actually, error messages are pretty good these days. They&#8217;re not particularly newbie-friendly, but then C itself isn&#8217;t particularly newbie friendly.<\/span>.\u00a0<\/p>\n<h3>3. Object Code<\/h3>\n<p><table class='nomargin' cellspacing='0' width='100%' cellpadding='0' align='center' border='0'><tr><td><iframe loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/www.youtube.com\/embed\/MAlSjtxy5ak\" frameborder=\"0\" allowfullscreen class=\"embed\"><\/iframe><br\/><small><a href='http:\/\/www.youtube.com\/watch?v=MAlSjtxy5ak'>Link (YouTube)<\/a><\/small><\/td><\/tr><\/table><\/p>\n<p>If aircraft.c is valid &#8211; if it compiles with no errors &#8211; then the compiler will spit out aircraft.obj. This object file is something like an intermediate point between your text-based code and your final executable. The details are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Object_file\">pretty hairy<\/a> and generally not something most programmers bother worrying about, although we&#8217;re often plagued by a vague sense of guilt that we ought to learn about this stuff someday.\u00a0<\/p>\n<p>Your IDE will run the compiler on all the files, one at a time: aircraft.c, bullets.c, cars.c, and so on. Each successful compile will result in another object file.\u00a0<\/p>\n<p>I should point out that these file names are perfectly arbitrary and the compiler doesn&#8217;t care what the files are called. In the much-newer Java programming language<span class='snote' title='9'>Java was introduced in 1995.<\/span>, <a href=\"http:\/\/www.cs-fundamentals.com\/tech-interview\/java\/what-are-java-source-file-declaration-rules.php\">filenames must match the code contained within them<\/a>. Your car code must go in a file called car.java and your soldier code must go in soldier.java. In C and C++, you&#8217;re free to organize your code however you like, which means you&#8217;re also free to <b>not organize it at all<\/b>. You could make a scavenger hunt for the rest of the team by getting rid of car.c and spread the contents of the file throughout the rest of your project<span class='snote' title='10'>If you get caught doing this, don&#8217;t claim I suggested it. I just said it was POSSIBLE.<\/span>.\u00a0It&#8217;ll still compile, but now everyone else has to hunt for the scattered pieces.<\/p>\n<p>This sounds crazy (and to be clear, it would be crazy to do this) but this freedom actually has some utility. If you&#8217;re doing design-by-programming, then you end up writing a lot of rough prototype code. For example:<\/p>\n<p>In the last meeting someone tossed out the idea of adding mounted guns to vehicles. So now I want some cars to have mounted guns. I could make the mounted gun part of the car code, or I can make it an independent game object, or I could make it a weapon that gets added to the player&#8217;s inventory like all other weapons, but is removed if they exit the vehicle<span class='snote' title='11'>I know I heard about a game that did this, but I can&#8217;t remember what it was.<\/span>.\u00a0<\/p>\n<p>Hm. Which way is best? For now, I&#8217;ll just add the gun to the car. I&#8217;ll put all the mounted gun code here in car.c.\u00a0<\/p>\n<p>Then later you decide you&#8217;re happy with this design, so you create turret.c and move all of the mounted gun code into that file. You generalize it so that other vehicles like helicopters and aircraft can also have mounted guns.<\/p>\n<p><table class='nomargin' cellspacing='0' width='100%' cellpadding='0' align='center' border='0'><tr><td><iframe loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/www.youtube.com\/embed\/Z5JC9Ve1sfI\" frameborder=\"0\" allowfullscreen class=\"embed\"><\/iframe><br\/><small><a href='http:\/\/www.youtube.com\/watch?v=Z5JC9Ve1sfI'>Link (YouTube)<\/a><\/small><\/td><\/tr><\/table><\/p>\n<p>In a language like Java, you&#8217;d be forced to put the new code into a new file right from the start, but the C language allows you to throw things together and figure out the overall file structure later, once you know what you&#8217;re doing. It seems trivial in this case, but in cases with complicated systems it&#8217;s often nice to be able to design first, and THEN decide how the code should be organized. <\/p>\n<p>In some programming domains, programmers are working from a rigorous spec and this &#8220;freedom&#8221; just seems like an invitation for chaos. In those domains, you know exactly what you&#8217;re going to build before you write your first line of code. Why allow the user to build something incorrectly?<\/p>\n<p>These programmers will complain that the freedom of C allows people to slap together disorganized, half-assed code. And it does. We come back to the question about how much we should restrict the programmer in the name of helping them. It&#8217;s the old &#8220;this language was made for good programmers&#8221; argument again. As it happens, that comes up a lot when you&#8217;re talking about language design.<\/p>\n<p>In any case, if all of the files compiled successfully then your IDE will move on to&#8230;<\/p>\n<h3>4. The Linker<\/h3>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_link.jpg' width=100% alt='I tried to find a stock photo of the abstract process of linking, but this is the best I could do.' title='I tried to find a stock photo of the abstract process of linking, but this is the best I could do.'\/><\/div><div class='mouseover-alt'>I tried to find a stock photo of the abstract process of linking, but this is the best I could do.<\/div><\/p>\n<p>It&#8217;s called the linker because it takes all those disparate object files and links them together to make the final executable. The linker is also responsible for pulling in any external libraries you might be using. Maybe your program uses OpenGL for the rendering and the Windows SDK for window creation and managing user input. You don&#8217;t normally have the source code for these things. Instead, they come as pre-built files and their functionality is added to the program by the linker.<\/p>\n<p>Here is where you can run into the final stage of build problems. Maybe in your code you specified a function for handling car crashes. The file car.c promises this function exists <b>somewhere<\/b>, but since the code isn&#8217;t obligated to put car-related code into the car-specific file, the early stages of compilation have no way of knowing that you messed up. It&#8217;s like the end of a multi-player scavenger hunt, where you take an inventory of everything you&#8217;ve collected. It&#8217;s not until the very end that you discover an item is missing. The compiler didn&#8217;t know the car crash code was missing until it tried to find it among the scattered object files and came up empty.<\/p>\n<h3>5. Why Does This Take So Long?<\/h3>\n<p>C has always been a bit slow to compile, and C++ even more so. I&#8217;ve always thought these compile times to be unreasonably long. In my days of C++ programming, I&#8217;d smack the F7 in Visual Studio and watch the filenames scroll by as each one was processed. After about thirty seconds I&#8217;d get restless and start thinking, &#8220;Hang on. Why is this taking so long? I was using this same codebase ten years ago and I swear it took about this long to compile. This computer is over ten times faster than the one I had back then. The code base is bigger, but it&#8217;s not <b>ten times<\/b> bigger! What is the compiler doing with all the CPU cycles?&#8221;\u00a0<\/p>\n<p>I&#8217;ll admit I don&#8217;t have hard numbers on any of this. I didn&#8217;t meticulously record compile times and computer specs back in the 90s so I&#8217;d have them available for comparison today. You can&#8217;t do a proper comparison between the compile times we have now and how slow it is today. I&#8217;m sure it&#8217;s faster, but it&#8217;s not <b>that<\/b> much faster. It&#8217;s nowhere near 100x faster than it was 25 years ago, even though the increase in processing speeds are in that ballpark. I doubt it&#8217;s even 10 times faster.\u00a0<\/p>\n<p>So that&#8217;s the compiler. It&#8217;s slow, but your IDE will usually offer features to help speed things up. It&#8217;s complicated, but your IDE will usually have tools to hide or mitigate that complexity. It&#8217;s obtuse, but the IDE is designed to handle all the complicated \/ obscure options for you.\u00a0<\/p>\n<p>Next week I&#8217;ll talk more about the compiler process and why compile times are more important than they seem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the last entry, I said that waiting for a program to compile is one of the vexations of programming. I&#8217;ve spent a lot of time writing about code over the years. I&#8217;ve (hopefully) explained things in easy-to-understand terms. With any luck, you&#8217;ve learned something along the way.\u00a0 The time has come to admit that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66],"tags":[],"class_list":["post-48221","post","type-post","status-publish","format-standard","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48221","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=48221"}],"version-history":[{"count":22,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48221\/revisions"}],"predecessor-version":[{"id":48268,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48221\/revisions\/48268"}],"wp:attachment":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=48221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=48221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=48221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}