Programming Vexations Part 10: Header Files

By Shamus Posted Thursday Nov 14, 2019

Filed under: Programming 96 comments

I’ve talked about C++ header files before. Like I said earlier in this series, the C language was designed in an age where memory was scarce and it wasn’t feasible for the compiler to hold your entire codebase in memory at once. So projects end up broken up into many different files.

The file marine.c needs to refer to the code in weapons.c, and vehicles.c. Somehow the compiler needs to be aware of the contents of those other files without loading them entirely. So we have a header file, which lists the contents of the other files. It’s like an inventory list. When the compiler is working on marine.c, it loads the header file weapons.h. Then the compiler can say, “Oh, I don’t know what the code for WeaponReload () looks like, but according to the header file I can tell that the code exists. I’ll just keep compiling and trust that the code for WeaponReload () will show up later when I’m compiling some other file.” This way the compiler can know that WeaponReload () exists, and the typo WeaponRelaod () doesn’t, enabling it to catch your error.

Vexation #4: Maintaining Header Files is not Programming

Problem: We have an information sorting problem. Solution: Write a program to do the sorting for us. New problem: The source code for the solution is itself an information sorting problem.
Problem: We have an information sorting problem. Solution: Write a program to do the sorting for us. New problem: The source code for the solution is itself an information sorting problem.

People defend header files by pointing out how useful it is to have them. You can get an overview of the weapons code by glancing through weapons.h, rather than doing a full read of weapons.c. Of course, it’s always super-useful when other people do a bunch of work for you. The question isn’t whether or not they’re useful, but if it makes sense for programmers to create and maintain these things.

Header files are generated manually by the programmer. You write the code for WeaponReload (), and then you add the definition for WeaponReload () to the header file.

It’s the programmers job to keep Weapons.c and Weapons.h in sync. Changes to one might necessitate changes to another. The list is nice to have, but it’s not always useful, even though you always need to create it. Suddenly the programmer isn’t coding, they’re taking inventory and doing bookkeeping. For the computer.

Sometimes header files need to include other header files. Sometimes these dependencies can be circular. A needs B needs C needs A. So you need to arrange things so that files are included in the right order, and you need to add special qualifying lines to help the compiler along. You need to stop it from including the same file more than once so it doesn’t get trapped in a circular chain.

This is a horrendous system.

  • It greatly inflates compile times. Cars don’t care about guns, but they do care about marines, and if you want to refer to marines then you need access to the data types that marines use. So compiling cars.c means parsing the guns.h header file. That header file might include another, and another, and pretty soon this simple game object is 50 lines of useful code that includes 5,000 lines of header files for the compiler to chew on. And since the compiler is invoked once for every file in the project, it will need to redundantly parse those 5,000 lines many, many times.
  • It makes it annoying to add things to the project. If I add a backpack data structure and make it part of the space marine, then I have to go to every single file in the project that refers to the marine and add the line #include “backpack.h” to it.  (Or include backpack.h within spacemarine.h, exacerbating to the problem above.)
  • Files need to be compiled more often. Bullets don’t interact with backpacks, but because of the way header files connect to each other, making changes to the backpack header will force bullets.c to need a recompile.
  • Changing the data structures might necessitate re-arranging all the header files. If I make a change that means bullets.h is needed before lootbox.h, then I’ll need to manually edit all the source files that refer to both.

C++ inherited all of this from C. It was introduced in 1985, which is unfortunate. If it had arrived just a few years later, it could have been constructed differently and taken advantage of the hardware gains to compile things on a project level rather than doing everything a single file at a timeAlthough to be honest, this would have been unlikely. Abandoning header files would have been a really big shift. For good or for ill, the prevailing assumption was that C++ ought to be able to parse all the extant C code. . As it stands, C++ wastes programmer time (which is and always has been precious) to save on memory (which is ridiculously plentiful at this point) during compile.

Coders will point out that there are ways to mitigate all of these annoyances: We have precompiled headers, header files that include other header files, incremental builds, and a bunch of other things to make this less of a chore. Making it less painful is nice, but this is a hassle that shouldn’t exist at all. Text files are trivial, programmer time is precious, and this busywork doesn’t make the program better. The programmer is literally doing all this work to make it easier for the compiler to sort through data. That’s like having a construction worker use a shovel to loosen up the soil before they use the bulldozer to dig the hole. Why is the human doing work to make things easier for the machine?! Even if it only takes the programmer fifteen seconds to sort out a header problem, that’s literally billions of times longer than it would have taken the computer to sort it out.

So much work has been put into making compilers help the programmer by optimizing the code. It will change loops, ignore unused code, and do a bunch of esoteric sorcery to shave off a few processor cycles here and there. But then we still have all these programmers playing this absurd header file sorting game like its 1972.

Header files aren’t a part of any modern language. Rust, Go, C#, Java, PHP, Swift, Python, Visual Basic, Ruby, and Perl all seem to get along just fine without burdening the programmer a bunch of inventory homework. Still, if you’re making a AAA game, then you want the power of C / C++. And if you want that power, then you need to endure the vexations of header files.

Before we cover the next vexation, let’s stop for an aside and talk about…

Semicolons

Why did we choose the semicolon to end a line? Going by how the symbol is used as punctuation for prose, it ought to signal that this is NOT the end of a line. Wouldn't it be better to have a period mark the end of a line? (The period has other uses now, but when the language was formed they were presumably free to choose whatever symbols they wanted.)
Why did we choose the semicolon to end a line? Going by how the symbol is used as punctuation for prose, it ought to signal that this is NOT the end of a line. Wouldn't it be better to have a period mark the end of a line? (The period has other uses now, but when the language was formed they were presumably free to choose whatever symbols they wanted.)

In C++, every line must end with a semicolon. This leads to the old programmer adage “A semicolon is mandatory everywhere it isn’t forbidden”. Semicolons have fallen out of favor in the last generation or so and I see a lot of people questioning why a language would have such a dumb feature.

All C family languages use them. So do Java, D, PHP, Perl, and Rust. But Python, Go, Groovy, Visual Basic, and Ruby don’t. Swift is an odd one, where semicolons are supported but optional. I get the sense that current-day attitudes among the younger coders lean away from semicolons.

I can understand why some people find them pointless. A semicolon is supposed to mark the end of a line.

a = a + 1;
 
b = a + 2;
 
c = a + b;

The compiler doesn’t care about whitespace, so you could format that code like this and it would be the same to the compiler:

a = a + 1; b = a + 2; c = a + b;

But don’t we already have a way to mark the end of a line? That’s what the return key is for! Why not mark the end of a line of code by actually ending the line of code?!

I can understand why people say this, but let me try to make a case for semicolons.

The Case for Semicolons

In the old days we'd only view one file at a time in a DOS-style window. These days your IDE can stck files side-by-side so you can get more information on the screen.
In the old days we'd only view one file at a time in a DOS-style window. These days your IDE can stck files side-by-side so you can get more information on the screen.

Because of the kind of work I do – procedural object generation – I sometimes wind up with a lot of really long lines. I’ll have a lot of short lines that only use the first 10 or 20 columns of horizontal space, and then a block of enormously long lines that extend far beyond 100.

Some of the code can get pretty complicated and hard to explain, but here’s a simplified example:

//Put the hat on the player's head.
 
hat.MoveTo (Vector3D (player.avatar.position.x, player.avatar.y + player.height, player.avatar.position.z));

We’re trying to put a hat on a character’s head in 3D space. So we take the player’s originTypically, this is the point at the base of the model, even with the soles of the feet., add the player’s height to the Y component, and move the hat to the resulting location.

In my experience, most coding projects encourage you to keep individual lines of code under 80-ish characters long. Back in the old days, this was the limit of how much text you could fit on the screen horizontally. We’re obviously way beyond that now. I just tested it in my IDE,  and even with a large font my 1080p monitor can easily fit ~150 characters on a line without needing to do any horizontal scrolling. However, this is still discouraged. As monitor resolutions have gone up, the preference seems to be for having more files open side-by-side. Rather than leaving this vast expanse of unused empty space on the right, why not split the window in half and show different files on the left and right sides?

The line of code above isn’t outrageously long. It’s only 108 characters, plus some indenting depending on how deeply this is nested. If a single line jutted out to column 108, most project managers would overlook it. But at some point, that line sticks out so far that you can’t see it without horizontal scrolling. Having bits of code hidden beyond the right edge of the screen is annoying and scrolling back and forth to read code sucks.

In a language with semicolons, I can take that single line of code and break it up like so:

hat.MoveTo (Vector3D (player.avatar.position.x,
 
player.avatar.y + player.height,
 
player.avatar.position.z));

To the compiler, that’s still a single line of code, because a line doesn’t end until it sees the semicolon. In a language without semicolons, you can’t do this. If a line is ridiculously long, you need to either break it up into a series of discrete operations or tolerate the horizontal scrolling.

You could shorten this line like so:

Vector3D hat_pos = player.avatar.position;
 
hat_pos.y += player.height;
 
hat.MoveTo (hat_pos);

What I dislike is that I’m creating this extra variable. That variable will hang around as long as it’s in scope and there might be some overhead to creating itVariable constructors can often hide a bunch of operations from you, so there might be an unseen cost to creating a new variable. Although the one-line version might ALSO create a temporary variable. It’s complicated.. This variable isn’t needed by the code and it only exists for the purposes of layout. Making additional variables to solve formatting and spacing problems is like wearing oversized shoes because you like the extra spacing around the Nike logo.

Like I said above, the whole point of having the compiler ignore whitespace is so that we can format code in whatever way makes visual sense to the reader. Semicolons make it easier to do that by decoupling how we view the code from how the compiler interprets the code.

In Jai, Jon Blow has specifically said that he’s “agnostic” on semicolons. Last time I looked, Jai seems to use them, but I get the sense that he’s not attached to them and could easily abandon them if he found a good reason. Like Blow, I don’t think this issue is that important. It’s an interesting thing to think about – particularly if you’re considering making a new language – but it doesn’t need to be part of the programming holy wars. I prefer them, but I’m perfectly happy using a language that doesn’t have them. I’m happy as long as the language has one of these two ideas:

  1. The line doesn’t end until you enter a semicolon.
  2. The line ends if you hit the return key, EXCEPT you can use some special symbol to indicate that this line is continued below.

As long as I have the option to have one logical line span multiple lines of text, then I don’t really care which system we use. I’m partial to semicolons because I’ve been using them for 29 years, but there’s nothing wrong with the alternative.

 

Footnotes:

[1] Although to be honest, this would have been unlikely. Abandoning header files would have been a really big shift. For good or for ill, the prevailing assumption was that C++ ought to be able to parse all the extant C code.

[2] Typically, this is the point at the base of the model, even with the soles of the feet.

[3] Variable constructors can often hide a bunch of operations from you, so there might be an unseen cost to creating a new variable. Although the one-line version might ALSO create a temporary variable. It’s complicated.



From The Archives:
 

96 thoughts on “Programming Vexations Part 10: Header Files

  1. kikito says:

    Oh yeah. Header files are the manual transmissions of compilers.

    1. Herbert Herbertson says:

      Something the US for some reason has a hate boner for, and the rest of the world is confused why they don’t focus that hate on to something that matters?

    2. Americans Need To Stop Being Scared of Manual Transmissions says:

      It’s so easy, that teenagers can learn to drive a stick. Get over it. ^^;

      1. Warclam says:

        You realize that’s the exact same argument Shamus debunked in the section on headers? As in, the post in which you are commenting?

        “Wasting time doesn’t matter, as long as it’s only a little bit at once” is a bad argument. You should be arguing that it’s a bad comparison because using a manual transmission isn’t a waste of time, because [reason].

        1. Mistwraithe says:

          Yes, but using a manual transmission can at least be fun or feel empowering.

          I’m not sure how many programmers have ever felt empowered by maintaining their header files…

          1. Ruethus says:

            It’s certainly an empowering feeling when you realize you can cram an entire library into a header file, which nets at least three dandy benefits:
            -You only need to #include it in, and don’t have to change your compile commands at all (make sure it has #pragma once though)
            -It’s a single file, so you can share it around super easily
            -And it upsets the style activist people who complain about my code unless I triple my development time to satisfy them and gain no other benefits (despite them having no vested interest in my projects)!

        2. Americans Need To Stop Being Scared of Manual Transmissions says says:

          Sorry about that; I should have elaborated.

          The “time wasted” part of a manual transmission is only one part of the trade-off. Automatics are only recently capable of matching the fuel-efficiency of a manual, and some manufacturers’ automatics are still less efficient. They also don’t accelerate quickly – I’ve driven many new vehicles that are automatics, and they accelerate like a cow in a shopping cart, even with the gas pedal floored. You can get vehicles that accelerate more quickly, but they generally larger / more gas-hungry vehicles – there’s no physical reason an automatic shouldn’t be able to out-accelerate a manual, since the manual requires the driver to be skilled and not making any mistakes that day. Manual transmissions are also much cheaper to repair, and don’t break down as often. So if you’ve got money to burn, go ahead and save the time to learn how to drive a manual.

          I learned how to drive a manual in one day; That’s a very small amount of time, compared to the thousands of dollars I’ve spent on broken automatics in my time as a vehicle owner.

      2. Sartharina says:

        Teenagers can learn to do anything, so that says nothing about the difficulty of the task. In fact, teens and younger are the best at learning how to do stuff. Past 21, people can barely learn squat and have to rely on what they learned as a child or teen.

        1. Americans Need To Stop Being Scared of Manual Transmissions says:

          I learned how to drive stick at 23, in an afternoon, because that was the only vehicle we had, to drive to the outhouse. Really, it’s easy.

  2. Ancillary says:

    Scala doesn’t require semi-colons, but can infer a multi-line statement through context. I imagine a lot of other recent languages are the same.

    1. Groboclown says:

      I’m sure there will be a lot of comments about this, but, for example, Python syntax encourages new lines to break up long lines. This here is perfectly acceptable, because the trailing open parenthesis indicates through syntax that something must follow.

      hat.MoveTo (
      Vector3D(
      player.avatar.position.x,
      player.avatar.y + player.height,
      player.avatar.position.Z
      )
      )

      (And of course my nice formatting and <pre> tags are stripped)
      And if you have a statement that can’t be cleanly broken like this, you can add a backslash to indicate a line continuation, but to me, those are uglier than semi-colons.

      1. Misamoto says:

        I’m not a fan of Python, and that is one of the reasons – I don’t want to remember where it’s acceptable to change lines and where it’s not.
        Also, while generally it’s a bad practice, sometimes code can benefit from having what normally takes multiple lines written in one, and for that you need explicit EOL symbol

        1. Michael says:

          You can put multiple expressions on one line in Python by separating them with… wait for it… semicolons.

    2. Echo Tango says:

      Golang doesn’t require semicolons, and lets you break up long lines without a special mark. Best of both worlds dot jay peg.

  3. Wolle says:

    Semicolon-less and optional-semicolon languages have no problems with your multi-line example. They can easily recognize that you haven’t closed all your parentheses yet.

    I think (hope) semicolons are going the way of the dodo. They’re just another example of the programmer doing the work of the computer.

  4. Tektotherriggen says:

    I guess they couldn’t end lines with periods, because you may need to distinguish
    x = 3
    (an integer) from
    x = 3.
    (which forces floating point). Obviously this particular situation is solved by declaring x in advance, but I bet there are situations where it matters.

    1. Noah says:

      It would be a far-reaching change, no question. You couldn’t easily change existing C syntax to support it.

    2. Duncan Snowden says:

      My initial thought was that a new language could use the real decimal point (AltGr-period, at least under XOrg). But can you imagine the typos? Yikes…

    3. Nessus says:

      I’m not a programmer, so please forgive me if this is a stupid question, but why not just solve this by adding “spacebar’ as a variable?

      That way the compiler can be very easily designed to tell a period from a decimal point by context.

      1. pseudonym says:

        There are no stupid questions. Only stupid answers.

        Whitespace is used for formatting code in every language that I know of. In Python it is mandatory. Lines of code that belong to eachother are often indented the same level. This makes it easier to read. So I do not think space could be used as a variable. But I think that you mean to suggest that it can be used as a separator.

        I think your suggestion is a good one. 53 and 5 3 are different. So 3. and 3 . are then different as well. The problem with that is that ending a sentence with a space and then a point is not what we are used to as humans . It will cause a lot of mistakes . Probably annoy some people even .

        1. Paul Spooner says:

          Fascinating experiment.
          I somehow parse it as a truncated ellipsis . Which gives the impression of halting continuation rather than segregation .

        2. Nessus says:

          I’d simply make it so 3. is always read by the compiler as “three period”, never “three-point-zero”. After all, no one uses decimal points like that out in the world at large. If someone means 3 as a whole integer, they write 3 or 3.0, but not 3..

          If programmers do use decimal points that way… well, that’s a weird thing they probably only learned because some weird code syntax required it of them. As long as we’re talking about fixing semicolons as periods, might as well fix that too for the same reasons.

          1. Echo Tango says:

            It wouldn’t be a “code” syntax thing; It’s from math. Unless I’m mistaken, 3. is the correct way to signify the number three with exactly one significant digit, whereas 3.0 is two significant digits, and 3 is infinite precision.

            1. Echo Tango says:

              @Shamus
              The styling on your site seems to not do anything with _strong_ and _b_ tags, although _em_ and _i_ work fine.

    4. Michael says:

      Obviously this particular situation is solved by declaring x in advance

      It is? I didn’t think C had any problems with, e.g.

      int x;

      x = 3.0 /* actually 3; the floating point value gets cast to int */

      1. Echo Tango says:

        Automatic type-casting is nearly as bad as a loosely-typed / duck-typed languages – ugh!

  5. Noah says:

    As a Ruby user, let me chime in on semicolons. Ruby, like JavaScript, is semicolon-optional rather than no-semicolons.

    The difference is that in JS, the compiler will (says the language spec) add lines until doing so would fail to parse, or until you put a semicolon, while in Ruby, it will stop as soon as it gets a legal parse. Here’s where that difference matters:

    // In JS, it will keep adding lines until doing so would be an error.
    myobject
    .callFunc1() // This line is fine, and calls .callFunc1() on myobject
    .callFunc2()
    .callFunc3()
    ThisLine().cantBeConcatenated().forLackOfADot().soItIsTreatedAsSeparate()

    Whereas…

    # In Ruby, it will stop as soon as it legally can
    so_this_line.is_fine.
    and_so_is_this().due_to_the_dot_on_previous_line
    .but_this_line_gets_a_syntax_error # Because the previous line parsed fine, and stopped as if there were a semicolon

    But in both JS and Ruby you can add a semicolon and it will stop parsing.

    1. Chad Miller says:

      Strictly speaking, Python has optional semicolons too. It’s just that ending lines with a superfluous semicolon is considered bad style, and using semicolons to cram too much on one line is also considered bad style, so they’re almost never used.

      1. Paul Spooner says:

        Plus it also has the special character to carry a logical line across several textual lines.

  6. Thomas says:

    Isn’t option number 2 for line breaks wide spread? Most high level languages like R and VBA have them (typically ‘_’ or just brackets)

  7. Daimbert says:

    There is an advantage to having something LIKE header files which are kinda implemented in Java’s interfaces: the ability to share to anyone who might want to call code the interfaces/methods of that code without having to let them see the code itself. However, programmers generating them manually probably isn’t necessary anymore. Then again, I suspect that many IDEs will now generate and update header files automatically anyway (I don’t use IDEs for programming and so can’t say for certain).

    1. Adam says:

      If header files are not generated automatically these days, I’d be interested in understanding what language feature prevents that. It’s not elegant, but as an optional extra tool that “modern” C/C++ programmers could choose to use to make life easier it seems like it would be an easy win.

      1. Jaedar says:

        I’m interested in this as well. Like the import statements in Java files, it seems like something someone would have figured out how to make the compiler/IDE do automatically(while leaving the possibility for a human to fix it in case there’s something truly complex happening).

        Being vexed by something the compiler/IDE does for you doesn’t make a lot of sense after all, it’s just a minor technical problem.

        1. lethal_guitar says:

          Doing this automatically is much harder than it seems at first, because of the C++ preprocessor. The code you have on disk is not the code the compiler sees after preprocessing – the latter doesn’t even have a notion of header files anymore.

          What goes into a header and how it relates to things in corresponding cpp files is ultimately human-made convention, and irrelevant to the compiler – it only cares about the preprocessed source for a given translation unit being valid.

          So you can of course make something that implements a certain convention, and does that automatically very well, but it’s pretty much impossible to make an automated solution that knows the intention behind any possible constellation of files. And in a sufficiently large and old codebase, you’re practically guaranteed to have some code somewhere that defies the convention in some subtle way, and thus breaks your tooling.

          1. Daimbert says:

            From the IDE’s perspective, though, for C++ all it would need to do, as noted elsewhere, is add every public method of the class to its header file. Maybe also all protected and private (although you shouldn’t need to add private). For C, by default it could add all defined methods but you’d run into a problem if you didn’t want a function in the header file for some reason.

            Anything else could be added manually.

            1. The Dark Canuck says:

              In C++, these notions of “public” and “private” are part of the class declaration, which is precisely what is put in the header file in the first place. There isn’t any way to tell whether a given class method is public or private from the implementation in the .cpp file. This is in and of itself stupid, as it means you need to swap between files to properly grok the class you’re working on.

              1. Daimbert says:

                Yeah, I knew it had been awhile for C++, since I had forgotten that you don’t say public or private in the .cpp file.

                1. Richard says:

                  Well, technically you can. If you declare the class in the CPP file.

                  Which means there is no header file, yay!

                  Which means you can’t actually reference the class anywhere outside of the translation unit. Oh bugger.

  8. tmtvl says:

    Possible typolice: did you mean the programming language Perl instead of Pearl?

    Also, most non-semicolon languages allow breaking up a long line by adding a backslash where the split happens, escaping the following newline character.

    I still prefer semicolons.

    1. pseudonym says:

      Those backslashes are ugly

      If myvariablewithwaytoolongname == \
      myotherwaytoverbosevariablename:

      Ugly! But luckily in Python you can also use parentheses

      If (myvariablewithwaytoolongname ==
      myotherwaytoverbosevariablename):

      This is equally valid and no backslashes needed.

      1. tmtvl says:

        Wait, if you need the colon to open the body of the if-block, why would you need to escape the newline?

        Too little dwimmyness, 3/10.

        1. Chad says:

          It’s pretty easy for a human to look at a complete section of code and figure out what the programmer meant, especially where the expressions have been changed to phrases that remove ambiguity. A large part of this process is starting from identifying the start and end of the logical block, which is easy because the example-writer contrived it so, and because the reader knows that the example will follow certain conventions.

          The compiler has a much harder time with all of this, because it doesn’t have the reader’s birds eye view, nor can it assume that the programmer arranged the code to create a clean example; instead the compiler is looking for mistakes (and they’re very, *very* common).

          Modern compilers have gotten better at this by ‘virtue’ of becoming much more complicated, it the details there are pernicious (look at the relatively simple example of ruby vs. javascript continuations above). C++ was built on top of C (literally; the first versions ran a pre-processing step over the code, output C code, and then ran the C compiler). That version of C is itself quite old; both C and C++ have changed quite a lot since then, and new versions want to be additive (roughly, the new version shouldn’t break your current code unless you were doing something very strange, and nobody ever thinks their particular weirdness is enough to justify a break).

          This leads us to the current version of C++, where the wooly corners actually cannot be parsed cleanly (search for “C++ parsing undecidable” for more details).

  9. Tablis says:

    Julia language has a nice system of dealing with with ending lines. If the line forms a correct expression it interprets it as a single command, no colon is needed. If the line looks like a part of the expression it continues reading to the next line.
    For example

    a = 1 + 2 + 3 +
    5 + 6

    is fine.

    a = 1 + 2 + 3
    + 5 + 6

    is error, as it is command “a = 1 + 2 + 3″ and the next command ” + 5 + 6″ is illegal. It works in the same way with anything, so you can write stuff like

    x = f(
    x,
    y,
    z
    )

    which is immensly useful for complicated functions used in plotting and statistics.

    1. Decius says:

      Please tell me that there is a reasonable IDE for Julia that clearly indicates which vertical rows of text are combined to form a correct expression.

      That should be a feature on IDEs for every language, because making programmers do it with indentation is expensive and can incorporate errors.

      1. Tablis says:

        That would be to much to ask probably, IDE would need to know all operators which in Julia can be defined wherever. Sure, making inedentation is tiresome, but at least in this language it cannot cause errors.

        1. Decius says:

          If a programmer makes the indentation not match the grouping done by the compiler, the relevant type of error occurs.

          The IDE can simply… not define nonstandard operators. Just because the language supports it doesn’t mean the IDE has to let you do it.

          e.g
          if (a>b<c)
          will not throw a compile error in several languages. But it will give math majors an aneurysm and behaves differently in different languages.

  10. Thom says:

    Shamus said

    In my experience, most coding projects encourage you to keep individual lines of code under 80-ish characters long. Back in the old days, this was the limit of how much text you could fit on the screen horizontally.

    The 80-character limit actually dates back to punch cards, an example of a physical object leading to a habit or tradition.

    1. Decius says:

      Did punch cards inform the monitor width? 80 characters was a standard width of monitors in the BBS era, and most systems would want users to confirm the width of their display so that they could treat it nicely- if a user had a non-standard 75 character wide display, sending them 80 characters and a newline would cause them to display 75 characters, wrap to a new line, then display 5 characters and start a new line.

      1. Rob says:

        Yes, and the 80 character standard goes much further back than you might think, too. It’s a fascinating story about how a single decision made in the 1800’s set a standard that informed later standards, creating a line of “if it ain’t broke, don’t fix it” justifications that took over a century to kick. It’s like that (apocryphal) story about how today’s railroad track gauges are based on standards going back to Rome’s horse-driven chariots.

    2. Chad says:

      This is true, and that limit is pretty arbitrary, BUT it’s also true that human eyes have a much harder time reading long lines than short lines. That’s the reason that novels are formatted the way they are, that larger books using columnar layouts, and that systems designed for reading will try to reflow text into columns (either multiples side by side or even just one column with a lot of white space if necessary) when they can. Programming language code is not generally an exception to this — it’s when details matter and your head has to carry a lot of state from the end of one line to the beginning of the next that this hits hardest — but code has the common exception that nested indenting often gives code a “floating column” of reasonable (for eye strain) length that works it’s way farther and farther from the starting margin. You can see clear examples of this in the screenshot of Shamus’ editing session, along with some of the visual tricks that modern editors employ to help.

      The end result is that avoiding long lines in code is still a good idea in general, and that’s not likely to change as long as humans are absorbing code from mostly 2D surfaces with eyeballs. (Which hopefully isn’t “forever”, but practically will be with us for a good long time.)

  11. Chris says:

    Golang has semicolons. They’re just inserted automatically at parse time, which limits where you can put your newlines. As far as the grammar is concerned they’re still there. JavaScript has a similar feature, though it doesn’t work nearly as well, so most people still write explicit semicolons.

    The reason semicolons are so prevalent is that it’s really hard to design a language that parses unambiguously without them (or something equivalent). Especially if you want the language to parse efficiently (context-free grammar, etc.). In general, languages without semicolons are either strict about newlines (as pointed out in the article) since they take the place of semicolons, or they have some other, similar disambiguating syntax. The one exception I know is Lua. It’s been very carefully designed not to need semicolons or similar, and newlines don’t matter at all. One consequence is that return statements can only occur at the end of a block, since otherwise it would be ambiguous (usually resolved by semicolons).

    I agree that headers are no longer a good idea, and it’s good that new languages don’t have them, but I disagree with a couple of points:

    It greatly inflates compile times.

    Maybe this is true if you’re doing lots of template stuff in your headers or hve lots of definitions (vs. declarations). Otherwise the effect of header inclusion on compile times is negligible. In my C programs this has never been an issue. The compiler spends all its time doing other things.

    Files need to be compiled more often.

    It depends on the language and tooling, but this generally would happen with or without header files. It depends more on how you structure your source code and program. If your compiler does translations on a file-by-file basis, putting unrelated things in the same file will result in redundant compilation during iterative rebuilds regardless of header files.

    1. Cubic says:

      More than one compiler implements ‘precompiled headers’ so I guess it can be a problem.

      Also just to be pedantic, computer languages are usually further restricted to something like LALR(1) or LL(1) for efficient parsing. Almost all context free languages can be parsed by an Earley parser though … in O(N^3).

      (C and C++ furthermore seem to need hacking the symbol table while parsing. Unless someone’s been clever in recent decades.)

  12. Thomas Adamson says:

    Implicit line continuation involving the ()[]{} has been a feature of every semicolonless language I’ve used.

    Otherwise line a continuation character (usually either \ or \\ or // ) is available.

    1. Leeward says:

      I was going to comment just to say this. I like my semicolons, but I almost never see escaped newlines in Python code. There are very few cases where you want to continue a line and that’s the best option.

      1. Richard says:

        The trouble with inferring continuation via brackets etc, or an explicit continuation character is that the compiler often can’t spot tyops where you miss a bracket or continuation, because it’s still be a valid statement.

        It just doesn’t do what you thought you wrote, which is a much worse problem.
        An explicit end-of-statement character avoids this, because missing an end-of-statement usually doesn’t result in a valid statement.

        Macro (ab)use in C/C++ and similar also suffer this, which is one of the main reasons modern C++ strongly discourages the use of macros for anything other than ‘compile-time feature switching’.

  13. Leeward says:

    For what it’s worth, Erlang does actually use periods. Functions are lists of expressions, and the last one is returned. The rules are a little more complex than I want to put in a post (though not very complex; Erlang is a pretty simple language) but yeah. It uses periods to end functions.

    1. Cubic says:

      Erlang was inspired by Prolog’s syntax, which has features like that.

      However, Prolog is defined by an operator precedence grammar (with some caveats) while Erlang requires a bit more.

      1. tmtvl says:

        Now I want to see someone code a game in prolog. That would be very schadenfreude-y.

        1. Cubic says:

          Might look something like

          ?- play_game, did_i_win.
          no.

  14. ElementalAlchemist says:

    That’s like having a construction worker use a shovel to loosen up the soil before they use the bulldozer to dig the hole.

    Technically you’d need an excavator for digging a hole. Although you could have a bulldozer with a backhoe (i.e. a small excavator bolted on to the back).

    1. Decius says:

      Depends on how steep the sides of the hole need to be.

      1. Echo Tango says:

        And the shape of the hole! U-shaped along all axes, or |_| shaped!

        1. Decius says:

          Also if you need to get the bulldozer out after the hole is dug.

      2. ElementalAlchemist says:

        What you’d use a bulldozer for would be more of a trench than a hole.

  15. Leeward says:

    The issue of header files is a much bigger deal in C++ than in C. Since the idiomatic way to export a class is to include its whole definition (including private members) in the header, changing the implementation of some class’s internal structure can cause a whole project to have to recompile. And since compilation is super hard in C++, that’s a big deal.

    I worked on a 2 million line C project for several years, and probably only changed things in widely-included headers once (when we ported to 64-bit).

    There’s also an advantage to header files that you’re ignoring, probably because it doesn’t help you in your typical use case. They document the interface to your code. When you’re working with a big code base written by lots of people with interlocking distinct modules, it’s important that the interfaces between those modules are well defined and easily comprehensible. If you use a third party library (that may or may not come with source code) its interface is defined in the headers.

    Modern languages that eschew header files have to have extra tooling (auto-generation of documentation) to work around this. Even with that, there’s still a missing feature in the concrete interface definition. It’s a worthwhile trade-off for most of them (some still have optional headers) but it’s actually a trade-off. The headers aren’t entirely worthless to humans.

    1. Xeorm says:

      That’s still not an advantage. If you do want to put in documentation of some sort, it’s good to have that be a dedicated documentation function. It’s easy as pie to auto-create documentation for a project that emulates what a header file at bare minimum does. Take the function definitions and export them to another file. Could even create it automatically if you really wanted to. The rest of what makes a header file good documentation via comments and such is easy enough to export too if that’s what you’re looking for.

      It’s still making things easier for the computer when it doesn’t need the help. These are things we should be having the computer do automatically, not burden the programmer with. Or in this case with how much hassle it creates with compiling, this is like telling the machine to wait while the worker loosens the soil. It’s not a necessary step and you’re getting in the way of the project in general.

  16. Retsam says:

    JS does “Automatic Semicolon Insertion” (ASI), as others have mentioned, and it has its quirks. A famous pitfall is something like:


    function oops()
    {
        return
        {
            foo: "bar"
        }
    }

    The intent is for that function to return the { foo: "bar" } object, but due to ASI, it’s interpreted as:


    function oops()
    {
        return; // <-- semi-colon inserted
        {
            foo: "bar"
        }
    }

    (This may be one reason the Allman brace style isn’t very commonly used in JS)

    A more insidious (albeit somewhat contrived) example, is something like:


    foo()
    ("foo" + "bar")
    ["Hello", "World"].forEach(msg => console.log(msg))

    This actually gets interpreted as:


    foo()("foo" + "bar")["Hello", "World"].forEach(msg => console.log(msg))

    Which calls the foo function, then attempts to call the return value of that as a function, then tries to treat the result of that as a dictionary, looking for a “World” key.

    Apparently the rules of “semi-colonless JS” is that you can’t start lines with `{`, `[`, or `(`, and the convention in those cases is to do:


    foo()
    ;("foo" + "bar")
    ;["Hello", "World"].forEach(msg => console.log(msg))

    Personally, I think it’s easier to just use semi-colons, or let the code-formatter do it for me.

  17. John says:

    Fortran has never used semicolons. Instead, programmers indicate that a single set of instructions is split over multiple lines through the use of continuation characters. The exact syntax varies depending on the Fortran standard. Fortran77 uses an exclamation mark (or other character) in column six to indicate that a line is the continuation of the previous line. Fortran77 is very particular about formatting. The first six columns of each line are for things like line numbers and continuation characters. Columns seven through seventy-two are for statements. Anything after column seventy two gets ignored. Long lines therefore require the use of continuation characters. My understanding is that these conventions are punch-card related. Later Fortran standards aren’t so picky; you just need to put an ampersand at the end of one line to indicate that the next line is a continuation.

  18. Allen Gould says:

    1. Been forever since I programmed in C, but I remember being taught that header files were also your documentation – here’s what’s in the .c file, here’s what they do high-level so you don’t have to play archaeologist with the old code.

    2. Continuation characters are *terrible*. My work programming is in VBA, and they are finicky AF – I avoid them at all costs.
    (Now, part of the blame may end up belonging to the code editor, because it seems to enjoy finding reasons to move that _ elsewhere or otherwise parsing it incorrectly.)
    When doing SQL, I always end up remembering how awesome semicolons are because then you both know where the statement ends, and know where the compiler thinks the statement ends, and you gain all the flexibility of being able to arrange your statement in a readable format for Future You.

    1. sheer_falacy says:

      Header files as documentation is done as Interfaces in more modern object-oriented languages – it means that you have an API that your implementation has to follow, but allows multiple implementations of it, has easy compile time checking that you didn’t mess it up, and is just generally helpful.

  19. Nessus says:

    To me it seems like it would be a non-trivial advantage in both readabilty and teachabilty if punctuation in programming languages meant approximately the same thing as in prose. If I were designing a programming language (which I acknowledge is a dumb thing for me to even say, since I’m not a programmer), I feel like I’d design it so this:

    hat.MoveTo (Vector3D (player.avatar.position.x, player.avatar.y + player.height, player.avatar.position.z));

    Would be formatted like this:

    Hat-MoveTo: Vector3D= (player/avatar/position:x, player/avatar:y + player/height, player/avatar/position:z).

    Or maybe this:

    Hat: MoveTo (player>avatar>position=x, player>avatar=y + player>height, player>avatar>position=z).

    Or maybe:

    [email protected](player/avatar/position:x, player/avatar:y + player/height, player/avatar/position:z).

    Those are still clunky AF, and I’m sure they contain many endlessly nitpickable issues I’m not aware of which are not the point so PLEASE don’t bother. I’m also sure that actual programmers see no difference in readabilty there, which IS kind of related to the point (experienced programmers are not going to be good judges of what is or isn’t readable to a noob/layman, so they’re also going to suck at gauging the advantages). I’m just saying it seems like it should be possible to design a programming language that follows the same punctuation rules as prose, and the only reason it hasn’t been done is because either programmers who’ve invested a lot in mastering existing languages would object, and/or fears like Shamus expressed over expanding the variable set.

    The worriers about expanding the variable set seems a little weird to me, because we already use a massively expanded variable set everyday for prose, and we have programs designed to handle that load. If anything, the more strict expectations of code syntax would make things “easy mode” for a modern prose text parser, so I don’t see why expanding the variable set by just one or a few would be an issue for either the meat or the metal.

    1. Lucas Ieks Minicz says:

      An issue with this idea is that programming languages are *not* prose. Making a notation more similar to that would probably confuse newcomers even more.

      Take a look at (a slightly refactored version of) your example in a few more languages:
      hat.MoveTo(player.avatar.position + Vector3D(0, player.height, 0)); // C++
      hat moveTo: player avatar position + (Vector3D x: 0 y: player height z: 0). “Smalltalk, notice the dot and the slight difference in code”
      move hat (position (avatar player) + Vector (0, height player, 0)) — Haskell, a more mathematical notation, though parenthesis are not needed in function calls
      movehat(hat, position OF avatar OF player + VECTOR(0, height OF player, 0)); CO Algol 68 CO

      For an experienced programmer, all of those should be equally readable (except for the paradigm differences in the last two, but whatever), and for a newbie, none is going to be particularly intuitive if they don’t know what is actually happening there, but the more “english-like” ones run the risk of being misleading, since they would already “know” what it means (and likely be wrong).

      I have actually seen cases of that happening, even with the more usual C-ish languages (like thinking a(“string”, whatever) should be written a(“string,” whatever) note the closing quotes and comma). For a natural language analogy, think about issues with false cognates, like “when” in english and “wenn” in german (which actually means “if”). So I don’t think it would be an advantage, since you are already learning a whole new language anyway.

      As for the number of variables, for the compiler it makes no difference. For the computer running the code, it MIGHT make a difference. However, programmers might use up to 5 or so variables in a single (medium-sized) function (or at least I usually do). Using extra variables everywhere could increase this number by an order of magnitude, and make it a lot harder to keep track of everything, especially if you like using short names like me (i, id, pos, p) (I regularly hit into double declaration errors whenever I’m writing more complicated code, but I guess it is my fault for using overshort names).

    2. tmtvl says:

      Well, there’s also different paradigms…

      move(hat, top_of(player_avatar));

      hat.move(player.avatar.top);

      world.move(hats.hat, positions.top(player.avatar));

      position(avatar(player)).top.set(hat);

    3. The Puzzler says:

      It’s a pity some mathematical symbols, like – and . and * and ! don’t match their meaning as punctuation – that puts us at a disadvantage for trying to make programming symbols match both…

  20. baud says:

    Small typo:

    These days your IDE can stck files side-by-side so you can get more information on the screen.

  21. krellen says:

    COBOL actually does use periods as line breaks, because it was supposed to be as close to English grammar as possible.

    Possible wasn’t very, for the record.

  22. Decius says:

    Hats should care what direction the wearer’s head is pointed. For example, if I’m facing northeast, my hat is also facing northeast, and if I’m looking straight up, my hat is no longer above my head.

    1. Paul Spooner says:

      The example is for simple characters. For skeleton rigged ones, the hat would inherit the head transform matrix instead of the simple offsets.

  23. Decius says:

    >“Oh, I don’t know what the code for WeaponReload () looks like, but according to the header file I can tell that the code exists. I’ll just keep compiling and trust that the code for WeaponReload () will show up later when I’m compiling some other file.”

    Why does that header file need to be read more than once per project when compiling the project? Yes, if you’re only (re)compiling one file, you might need to read all the headers to know where other functions are, but if you’re compiling all of the files you shouldn’t need to read any given header more than once.

    1. Xeorm says:

      Compilation is done by file. The file in the .cpp file is where the actual meat and potatoes are, with commands like
      if (reloadButton.isPressed()) {WeaponReload();}
      When the compiler first encounters this bit the WeaponReload function will be referenced in the .h file, but won’t have any code attached. All it’ll know is what it requires and what it returns.

      After the .cpp files are compiled, then the linker comes along and links up the code between them.

      And you’d think the header files wouldn’t have to be read multiple times, but again: it’s done on a per file basis. It really is an incredibly inefficient way to compile a program.

      1. Decius says:

        Yes. It is done that way. But there’s no reason for it to be done that way.

        1. Mousazz says:

          And, therefore, if I understand it correctly, Jai doesn’t do it that way. It compiles everything at once.

          1. Richard says:

            That’s both meaningless and definitely wrong.

            C/C++ compilers actually don’t re-parse the header files over and over, though the standard requires them to behave “as-if” they did.

            1. Richard says:

              To clarify, as I’m out of the edit window:

              My suspicion is that the demonstrated Jai toolset is either only tokenising at “compile” (deferring actual compilation until the code is run), or is compiling-as-you-type.
              The former is what C#, Java et al do. The latter is quite lovely – some C/C++ IDEs do that as well, and I wish they all did.
              – If only for code completion, as this approach is many orders of magnitude better than Intellisense. Sadly you can only really do it with clang (LLVM) and GCC as other compilers don’t (currently) expose the necessary hooks into their parse trees.

              In the demos the Jai toolchain definitely isn’t doing any of the weird and wonderful optimisations that a modern compiler toolchain does (even in ‘debug’ builds).

              Most of the ‘new’ languages actually use LLVM (which is at heart a C/C++ backend), in order to benefit from some of the (in many cases frankly utterly insane) optimisations mature toolchains are able to do.

              Several of the more crazy optimisations are actually explictly time-limited, where the compiler spends a defined maximum amount of wall-clock time (eg 0.1sec) trying out various possibilities before choosing whichever was the ‘best’.
              Thus faster compilation directly causes slower binaries…
              (These tend to be optional because many software companies prefer builds on different machines to pick the same result.)

    2. Eric says:

      Headers need to be re-read because the compiler may parse the header contents differently based on what other information has already been parsed.

      For example, weapon.h might include functions for reloading clips, but only if bullet_clip.h has already been parsed before weapon.h. When shotgun.cpp includes weapon.h, the clip functions will be skipped, and if the compiler reused the parse results for pistol.cpp, there would be errors about missing functions that pistol was expecting to exist.

      The real use of conditional headers is more likely to be for compiling to different targets (windows, mac, linux, ios, android) or features (debug, demo, achievements, drm). In those cases you wouldn’t change options between cpp files, but you could, so the compiler needs to re-process the whole set, just in case.

      1. Decius says:

        Why would weapon.h need to be complied three times (once for itself, once for bullet_magazine.h, once for shotgun.cpp)?

      2. Richard says:

        You’re assuming that the externally-observable behaviour is the entirety of the actual behaviour. It isn’t.

        Modern compilers don’t have a separate preprocessor step, so while the ‘compiler’ behaves “as-if” (un)defining those macros altered the raw text seen by the compiler, this is not what has actually occurred.

        Consider how you’d optimise something that had a set of “if this then that” and text-replacement features, where well-defined parts may require multiple different sets of rules.

        Then consider features like “#pragma once”.
        To support that a compiler must know about file boundaries, and can clearly infer rather a lot from it.

        #pragma once was originally invented to speed up compilation, but these days it barely has any effect because the compiler knows what include guards look like.

  24. Taxi says:

    What if you have a backpack that can be damaged by bullets shot from a gun directly from another backpack and mounted on a car, handled by a marine?

    1. Decius says:

      Why would you be able to mount bullets to a car?

      Yes, I know that the ambiguity was intentional.

  25. Blake says:

    Having done lots of C++ and Lua, I really don’t care whether or not a language has semi-colons.
    I think if I was writing a language I’d fall on the side of brevity as the standard case, and use something like an ellipsis to continue a line. It’d be rare to see, but still easy to write and understand without overloading the use of something like a backslash which might be useful elsewhere.

    As for headers I’d personally like it if you could just write the equivalent of a .cpp file, with a ‘public’ keyword on anything you wanted the rest of the code to be able to access, and a ‘make-header’ kind of function that worked from there, defining the class by its public members and a byte-array of the correct size without exposing any of the actual member types.

    On header compile times, have you looked into c++20 modules yet? It’s essentially a standard way of making ‘pre-compiled headers’ for individual components.
    Apparently the early implementations are compiling an order of magnitude faster, not quite as PCHs (which have had decades of optimisations), but they avoid a bunch of PCH issues like touching anything thats included there causing everything to recompile since you only have the one PCH.
    Also macros don’t leak out of/into modules which is great.
    Long term I think modules will lead to a healthier c++ ecosystem as libraries become easier to write, but it’ll take a while for everyone to adopt them.

    1. Richard says:

      Indeed, the thing I find disturbing about all these “We need a new language instead of C/C++” is that many of them spend much of their argument knocking down the straw men of ancient versions.

      Idiomatic C++17 is a very different language to idomatic C++98. It’d be surprising if it wasn’t.

      The main issue with the idea of ‘no-headers’ is that it makes it impossible to put whole objects on the stack.
      To construct an object on the stack, the calling code needs to know exactly how big it is, and the alignment requirements of the memory, and of course the constructor function to call.

      Those alignment requirements might change depending on the private members.
      (While on x86/amd64 they probably won’t, C/C++ are used on a lot of architectures)

      To call any methods in an object the compiler needs to know the function layout. If it inherits anything, it needs to know the vtable (inherited function) layout of what it is in the ‘current-context’.

      The linker can do some of that, however you really want to avoid putting more work into the (serial) linker instead of the (massively parallel) compilation stages.

      – The code also needs to know which allocator to use.
      There’s several reasons a class might need to use a non-default allocator – eg the object you’re constructing might be in a different heap, actually in diffeent physical RAM (on a GPU or other external accelerator), have very special alignment needs (for AVX/GPU), or it might be a large performance boost to use an alternate scheme for keeping track of memory usage insteda of the usual general purpose one etc.

  26. Nimrandir says:

    Mathematica (assuming nothing’s changed since I last used it) uses semicolons to suppress output, so I’d have a bunch of commands that had to be executed but whose results didn’t require displaying. Of course, I’m not terribly fond of Mathematica for a slew of other reasons.

  27. Tuck says:

    As a professional programmer turned professional hole-digger, using a bulldozer to dig a hole is like using PHP to program your open world 3D third-person MMORPG…

      1. Tuck says:

        Yeah, that or a backhoe (or 360/JCB/digger/etc here in the UK).

  28. Sniffnoy says:

    Worth noting here is Haskell’s approach to the semicolon issue, which isn’t quite either of your two options above. It’s sort of similar to Ruby’s and Javascript’s “automatic semicolon insertion” that other commenters have mentioned, but much less haphazard. Like those, you can write things in full form with semicolons and braces (although in Haskell I think basically nobody does this). But, you can also write it in what I’d guess you’d call a Python-formatting style where you use indentation to mark this sort of thing, and the compiler automatically transforms it into the full form before compilation.

    So in Haskell, you can continue a long line onto the next line, but you indicate that it’s still the same line by means of indentation rather than a particular special character. (Or I guess you could say tabs and spaces are that character. :P )

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply

Your email address will not be published. Required fields are marked *