Ask Me a Question:
What is “Trashing the heap”?

By Shamus Posted Wednesday Oct 6, 2010

Filed under: Programming 261 comments

In an earlier post, I talked about making programs trash the heap, and someone wanted to know what that was. Trashing the heap is something you’ve seen before. It often looks like this:

popup_crash.jpg

Here is how it works:

The Heap…

You’ve probably noticed that programs take up computer memory. As they do stuff, they need to store information. The program says, “Hey, I need 8 bytes of memory.” The system finds a free spot in memory big enough to hold something 8 bytes in size, and tells the program where it is. This happens thousands of times a second for a busy program. Get four bytes. Get eight more bytes. Then release the four because you’re done with them. Now get 100 bytes. Now get a megabyte. Now drop the eight bytes.

This sea of data is called “the heap”. It’s also sometimes called the “free store”, but I’ve only ever heard old beardy types use that terminology. I think the last time I saw the words “free store” was in 1991-ish. And the book I was reading was already old.

Anyway, usually this activity is abstracted for a programmer. You just create variables when you need them and throw them away when you’re done.

But sometimes, in some languages, you do need to worry about memory. And this is where things get messy.

…and the trashing thereof.

Let’s say you’re programming in C or C++, and you have the program grab enough memory to store 20 bytes of data. And now you make a perfectly innocent mistake and accidentally copy 4,096 bytes into that 20-byte slot. (This is actually easy to do for a lot of reasons. More on that in a minute.) Your data will fill up those 20 bytes, and then overwrite the next 4,076 of data. Any variables that happened to be occupying that space have now had their values replaced with something different.

Congratulations, you’ve just trashed the heap. If you are very very lucky, the program will crash right away.

If you are not lucky, it will continue to run but begin behaving oddly. See, those 4,076 bytes of memory might have been filled with crucial bits of data needed to keep this program operational. If it crashes instantly you can look at what it was doing just before death and you’ll find the trouble spot. But those bytes of memory could have been empty, and thus spewing a bunch of random garbage into that space is “harmless”. (This time.) What you’ve got in this case is a random crash bug. The program may run fine, act oddly, or insta-crash, all based on what things happened to be in that spot of memory at the time.

WHYYYYY!?!?

This is the subject of holy wars. Some say that C is a crap language because you can (and must) interact with memory directly. Other people say the people in the first group are just crap programmers.

I am not an expert on languages. Other than doing a lot of BASIC in my teenage years I’ve spent limited time dabbling outside of C, so I can’t make any really good comparisons. But 90% of the problems I have in C are because it doesn’t have a simple way of handling strings of text.

Here is a bit of old-school BASIC code that adds two strings of data together:

100
101
102
103
a$ = "Twilight is a great book"
b$ = "- for starting arguments!"
c$ = a$ + b$
PRINT c$

Run this, and it will take two phrases:

Twilight is a great book

and:

- for starting arguments!

And merge them together to print the entire sentence.

Twilight is a great book - for starting arguments!

In many languages you can define bits of text, cut them up, join them, or whatever you like and you don’t have to worry about memory. In fact, it’s actually impossible to worry about memory in old-school BASIC – it has no tools for doing so. It’s simple. It’s readable. It’s impossible to make it crash. (Although it’s still possible to create all sorts of other bugs. But trashing the heap is not a risk.)

In C*, the language does not do all of this legwork for you. If you wanted to add those strings together you’d need to measure the length of the first string. Then measure the length of the second. Then allocate a block of memory large enough for both strings plus 1 extra byte. Then copy the first string into that spot of memory. Then copy the second string in the spot 24 bytes after that (just after the first string).

char* TellMeAboutTwilight ()
{
 
  char a[] = "Twilight is a great book";
  char b[] = "- for starting arguments!";
  //See how big a and b are
  int len = strlen (a) + strlen(b);
  //Now allocate enough memory for both, plus 1 byte
  char* c = (char*)malloc (len + 1);
  //Now copy a into c
  strcpy (c, a);
  //Now copy b just after that
  strcpy (c + strlen (a), b);
  return c;
 
}

(Note to would-be nitpickers: I’m aware you’d use sprintf to save yourself a few lines, and I know you wouldn’t really do things just this way. This post is for non-coders. Don’t Be That Guy.)

The advantage of the C way is that it is crazy fast and memory efficient. This was important back when machines ran at sub-megahertz speeds and had 64k of memory. Which is not even enough memory to store this one image:

ram.jpg

But today we’ve got computers with lots of power and spending three minutes trying to save ten bytes of memory is a horrifying waste of programmer resources. It’s like pushing your car through an intersection to save on gas. The effort is much greater than the savings and you’re a lot more likely to cause a crash somewhere.

90% of my memory mishaps are the result of juggling string data like this.

1) Measuring and allocating memory is annoying and adds a lot of extra lines of code and is prone to mishaps.

2) You have to remember to explicitly free the memory later, “Okay, I’m done with this spot now. Something else can use that memory.” If you forget, then each time this part of the program is run it will grab more memory. (This is called a memory “leak”. The program eats up more and more memory the longer it runs.)

3) You have to remember to not use that spot after you’ve freed it. The variable might still be around, but after you’ve freed the memory it pointed to it’s just a crash waiting to happen.

4) Some programmers – myself included – save themselves the headache of measuring & allocating by just grabbing a space that’s “always going to be big enough”. Instead of measuring a and b, I’ll just grab… hmmmm… 100 bytes? That sounds good. In effect, I’m routing around the features of the language that were intended to make C fast and memory efficient by deliberately wasting memory. And of course, maybe in some unusual circumstances 100 might not be enough. Did I remember to add a bunch of code to catch and handle that case?

5) Strings must end in an invisible terminating character. When you print or copy strings, it looks for this terminator to let it know when to stop copying. If that terminator isn’t there for some reason it will keep printing or copying until it runs right out of the space you’ve allocated and will sail off into the heap looking for it. You’ll end up printing a bunch of garbage or (worse) copying a lot more stuff than you intended. This also means that the length of all strings (in memory) is the number of characters it contains, plus one. It’s just really easy to make one-off mistakes like this.

Interfacing directly with memory is really fast but also dangerous. It’s a powerful tool, like a flamethrower. Critics of C and C++ say that languages shouldn’t have flamethrowers. Supporters say that flamethrowers are fine, you just need to not make any mistakes. I’m of the opinion that having a flamethrower is a good thing, but I shouldn’t need to use it every time I want to light a cigarette.

I don’t mind this hassle when I’m dealing with something big. When I’m loading 10MB texture maps and complex 3d models into memory I don’t mind the overhead involved to take care of them efficiently. They’re big and you’re usually in an unbelievable hurry when you’re dealing with those. But juggling crappy little 10-bytes strings like they were live hand grenades is tedious. Partly because they’re so trivial, and partly because it’s something that needs to be done often. Even after all these years I still get annoyed at how cluttered and inelegant it is when I want to deal with a couple of short strings. What would be a single line of code in any other language ends up being half a dozen. (If done properly. If done improperly it’s just two lines of code now and half an hour of pulling your hair out six months from now when you have to sort out why it’s crashing.)

There are add-ons for C++ out there that will help with this, but they’re not standard. If you use one, you may find your code is no longer portable. Or it will be a headache for other programmers to read and maintain. Or those add-ons might conflict with something else you’re trying to use.

And now you know.

 


From The Archives:
 

261 thoughts on “Ask Me a Question:
What is “Trashing the heap”?

  1. Jarenth says:

    Interesting read, and understandable for C-laymen such as myself. Thank you.

    Question, though: Is it only the various C-languages that do this? I can’t remember Java or Delphi forcing me to do this kind of stuff, anyway; are there more programming languages that force this manual memory allocation?

    1. Sumanai says:

      Yes, but they’re low-level languages, and few code in other than C. The only other one that I know is Assembly. Professional programmers feel free to fill in.

    2. MrPyro says:

      Java has a memory manager built into the JVM, I think. I know that it definitely has a garbage collector which runs around freeing up unallocated memory.

      Delphi is Object Oriented Pascal, isn’t it? That has no manual memory management either.

      1. delphi for win32 is only one step above c++ (it has built in string management) but allows you to allocate memory if needed. (and requires you to clean up your own objects)

    3. Kdansky says:

      Algol, COBOL, FORTRAN and all these other really old things do this too. But “modern” languages like Java, Delphi (which is pretty much Pascal.v2 with GUI), VBasic, Scala, Python, Ruby or even C# do not require it (much).

      C and C++ are just very widely used and therefore prone to a lot of criticism.

    4. Erik says:

      It’s primarily the C family of languages, because C was created to write an operating system in. When you’re writing an OS (or any other application that needs direct hardware access and predictable timing), you really need the low-level features that C provides.

      Unfortunately, because it worked so well at its original task it became the “hot” language of the time, got used for general purpose applications, and was used by folks that had NO idea what they were doing. This resulted in amazing quantities of really bad code that was capable of crashing the entire system.

      For modern application programming, C would be an epically bad choice. C++ inherited all of C, including its low-level features, so although it’s better it’s still booby-trapped and (IMO) a bad choice for new development. Serious application development should be done using an appropriate language. Java & C# are adequate, though my personal favorite is Python and I’ve heard good things about Ruby.

      Really, the only reason to be using C/C++ is if you need to be close to the hardware for some reason. I live in it, since I do embedded development, but I’m the exception.

      1. Simon Buchan says:

        On any modern machine/OS, the only really hardware specific code is (should be) in drivers or the OS’s boot code – this is the code that does things like “put the bytes F1 3E 83 A9 in memory at physical address 5F0000, because that’s where the IO port that sends messages to the CD drive is” (not really).

        More probably, it’s because you want to use libraries. OpenGL, FMOD, Havok, maths (matrixes and linear algebra, not square root!), networking, threading and other plumbing. Somewhat ironicly, you have to be much more knowledgeable about C to use a library from a non-C language in many cases:

        C’s binary interface is the de-facto for pretty much any operating system under the sun, so any generally useful library tends to be written for C usage first, and other languages have to have glue written to connect to them. This glue is incredibly boring *and* difficult to write, so unless it’s both a fairly popular language and a fairly popular library, you will have to learn the C API so you can write the glue anyway. In both cases, all the documentation and the help other users give you will be against the C API, so you have to convert everything in your head. On top of that, every time something goes wrong you have to check whether it’s the glue that’s wrong, on top of your code using the API or the API itself.

        C or C++:

        #include
        #include

        … later …
        wglUseFontOutlines(…)

        C#:
        static class Native {
        [DllImport(“opengl32”, EntryPoint = “wglUseFontOutlines”, CallingConvention=CallingConvention.Winapi)]
        public static extern bool wglUseFontOutlines(
        IntPtr hDC,
        [MarshalAs(UnmanagedType.U4)] UInt32 first,
        [MarshalAs(UnmanagedType.U4)] UInt32 count,
        [MarshalAs(UnmanagedType.U4)] UInt32 listBase,
        [MarshalAs(UnmanagedType.R4)] Single deviation,
        [MarshalAs(UnmanagedType.R4)] Single extrusion,
        [MarshalAs(UnmanagedType.I4)] Int32 format,
        [Out] Gdi.GLYPHMETRICSFLOAT[] lpgmf);
        }
        [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Auto)]
        public struct GLYPHMETRICSFLOAT
        {
        public Single gmfBlackBoxX;
        public Single gmfBlackBoxY;
        public POINTFLOAT gmfptGlyphOrigin;
        public Single gmfCellIncX;
        public Single gmfCellIncY;
        }

        public struct POINTFLOAT
        {
        public Single x;
        public Single y;
        }

        … later…
        Native.wglUseFontOutlines(…)

        And C#’s “Platform Invoke” is one of the simpler to use “glues”!(better known as Foreign Function Interfaces, or FFI).

    5. Jarenth says:

      And now I know.

      Appreciate the replies. Thanks. ;)

  2. Ingvar says:

    Well spoken! I have, at times, written small string libraries to get around most of my annoyances with C strings, but, well, there’s more than one way of implementing a (character) string and some ways are good for some things and other ways for other things, so in the end, I always end up with five partial implementations, all different, none covering the WHOLE need, but all (together) doing so, with painful impedance mismatch everywhere.

    Oh, another “fun” thing with overwritten allocations… Sometimes, they change the book-keeping that’s used to keep tabs on what is and isn’t allocated, then it becomes very erratic.

  3. somebodys_kid says:

    My hat is off to you for daring to write a GUI program in C++…have you thought about porting it to, say, C#? Or would the time sink needed for that not be worth it? The present iteration of .NET is pretty darn good in this programmer’s opinion.

    1. Shamus says:

      Being an old geezer, I don’t know C#. I realize it’s the hip new thing now and I’ve been sort of meaning to check it out at some point, but on my personal timeline it feels like the language appeared five minutes ago.

      1. Aelyn says:

        C# = C++ – (memory management) + (liberal smattering of VB)

        Honestly, once you’ve got the basic logical structures of computer languages down the differences become syntactical. Those only take a little time to figure out.

        Esoteric, problem-specific languages may differ of course.

        1. Jason says:

          Actually C# = Java, with some minor syntax changes.

          It’s a fantastic language, and .NET is a good runtime.

          Just don’t be silly and use it on a server.

          1. Sumanai says:

            What if you need to run the software on a non-Windows system? Is the .NET actually available for others?

            1. neothoron says:

              There is something called Mono developed by Novell, that is a free, cross-OS implementation of the .NET runtime. Unity and some medium-visibility Gnome software (Gnome Do, Banshee, F-Spot, Tomboy) runs on Mono.

              1. Deadfast says:

                Mono is on par with .NET 2.0, with .NET 4.0 released earlier this year it is not a good substitution.

                1. Kyte says:

                  Except .NET 4.0 didn’t really add that much to the plate in terms of features, esp. in the portable field. The biggest focus was interoperability with COM & P/Invoke, which don’t work outside Windows anyways, features for C# 4.0, which are also mostly about interop. The rest is support for nice stuff like full support for dynamic languages, Contracts and Parallel processing and a couple new numeric types.

                  Sure, there are bugfixes and whatnot, but for the most part, developers can stay on 2.0 (3.0 and 3.5 work on top of the 2.0 CLR and Mono already implemented most of the fatures anyways)
                  And if you’re really desperate, Mono 2.8 is already on preview with support for C# 4.0 (and associated framework changes)

          2. ACTUALLY… C# CLI is kind of like the JVM but the libraries are more like delphi (the dude that designed the original delphi being the head guy on c#)

            also, delphi DOES have access to memory management. Creating objects improperly can still create memory leaks etc, it’s maybe half a level above c++

            1. Mistwraithe says:

              Yes, but in Delphi strings are automatically allocated, reference counted and freed for you without having to do any work (unless you really WANT to do them the old fashioned way). Which helps because as Shamus said you play with strings a LOT in most programs.

        2. Sumanai says:

          “+ (liberal smattering of VB)”

          Right, that does it. I’m never touching C#. In my eyes Basic, especially Visual Basic, should’ve died over ten years ago.

          1. kingcom says:

            Awwww VB isnt all bad. I figure its kinda like crystal meth. You get this really nice GUI created really fast but when you come down, you have no idea what you’ve done and no idea of the kind of long term problems you’ve just caused yourself.

            1. Sumanai says:

              Replace GUI with something else and you’ve got programming with anything while drunk.

              And that’s like saying driving at top speed at a brick wall isn’t all bad. I mean, you get a nice rush for a while right?

          2. Kyte says:

            I think you missed the outcry VB devs did when they discovered VB.net, in the spirit of fixing things, made it impossible to trivially migrate. C# and VB.net are NOT VB, and approaching (or the opposite) the languages with that prejudice is nothing short of idiotic.

            1. Aelyn says:

              It is not unlike the move from VB3 to VB4. That was when the move from 16 to 32 bit occurred. VBX’s went bye-bye and the OCX became king. The code changes in that version change were tremendous and time consuming.

            2. Sumanai says:

              I don’t think I get your comment. So C# and VB.net is better/not as bad as VB? That just feels like saying “eating gravel isn’t as bad as eating shit”. It doesn’t exactly make gravel attracting.

              Not that my hatred for VB is necessarily justifiably or fair anyway.

              Edit: didn’t read Ian’s (or other’s) comment before, so look at this in that light.

          3. Ian says:

            Visual Basic, as it was, is dead.

            Visual Basic .NET is a sanitized version of Visual Basic with true object oriented support (it’s kind of required now) that compiles code to the CLR.

            I don’t know where the previous poster got that C# contains a “liberal smattering of VB,” because it’s more Java-inspired than VB-inspired, though in my opinion it’s a bit less clunky and potentially less verbose than Java (that said, I’m far more experienced with C# than I am Java). In fact, I’d go so far as to say that the only reason that C# even remotely resembles VB.NET is because they both use the same support assemblies (kind of like a standard library of sorts).

            Basically: C# is kind of like a C++/Java hybrid that is designed to compile to CLR bytebode. VB.NET is a greatly enhanced (and surprisingly sensical, unlike VB6) Visual Basic-inspired language that is designed to compile to CLR bytecode. They are radically different in terms of syntax, and VB.NET tends to do a bit more handholding than C#. Which one you use is personal preference. They are both capable of doing the exact same things.

            1. Phil says:

              “They are both capable of doing the exact same things.”

              discounting amusing things like anonymous functions.

              1. Ian says:

                Ah, true. I suppose C# 3.0 did kind of make a liar out of me there. I haven’t touched VB.NET since VS.NET 2002, so meh.

                That being said, that is a language construct. Even if you can’t use that exact syntax you can still pull off the same thing in VB.NET (this also holds true with a couple of other features that I can’t remember off the top of my head).

                For most applications, particularly business-oriented ones, you can accomplish the same thing with similar code using both languages. Better? :)

          4. Tomas says:

            I don’t think C# is at all related to VB. Being a VB programmer in my teenage years, then C++, and now C#, I actually loath VB with every fibre of my being (pardon VB programmers :-)), but love C#. So don’t write off C# because of VB. After all, the things they have similar, like the garbage collector, aren’t they really the (few) good things about VB?

            1. Ian says:

              VB.NET is radically different than VB6. The language itself is a lot more sane and the performance is far better. It compiles to the same kind of bytecode that C# does.

              I can’t really think of any good reason to use VB over C#, though. It seems like more of a personal preference thing to me (if you aren’t forced to use one over the other by your employer, of course).

        3. Peter H. Coffin says:

          You might want to add “+ huge levels of dependency”. Running C# for GUIs means installing a .Net framework (or work-alike like Mono), of which there are several major incarnations which are not entirely compatible with one another, and myriad sub-versions within each that change the behavior of various things from one to the next. This also adds in a layer that rapidly becomes either a march of security update patches (roughly bimonthly) or a potential security problem. Frameworks in general may save a bunch of work on the front end, but in the summation, might end up being simply a push off of work to other people at other times.

          1. Ian says:

            While yes the framework requirement is onnerous, as is esuring the correct version is installed on target hardware which is often an issue on managed systems that are not allowed automatic updates on site hurts, it’s not a whole lot worse than the Java requirement for all the Java code out there.

          2. Kyte says:

            By now, it’s a safe bet to assume the client’s running Framework 3.5 Client Profile, at the very least.

      2. Robyrt says:

        Being a young punk, C# feels like someone listed all the problems with C++, wrote an awkward implementation of the solutions, and declared it the new standard library. Which is exactly what I was looking for.

        1. Sumanai says:

          Someone else wrote online that he saw C# as a solution to all of the problems C++ doesn’t have, some of the problems C++ does have, and a heaping of whole new problems.

          I don’t have personal experience with C#, so that’s not my view. I’m however not interested in it at all, and have difficulty understanding why it’s so bloody popular.

          1. radio_babylon says:

            “have difficulty understanding why it's so bloody popular.”

            one word: microsoft.

            they dump metric F-TONS of money into getting higher education to teach that microsoft solutions are the be-all-end-all. then they dump metric F-TONS of money into convincing idiot business managers that microsoft solutions are the only way to go and that anyone who questions that should be fired immediately. end result being, if you arent brainwashed in the course of getting a CS degree, you quickly learn to shut your mouth and swallow the microsoft line because thems that goes along, get along, and youd like to keep getting your paycheck so you can pay off that huge student loan.

      3. Canthros says:

        C# is not without its warts (switch statements, for instance, bug the ever-loving life out of me: case fall-through no longer exists, but a case that doesn’t end in break, return or throw is a compile error), but it’s much easier to live with than C, C++ or Java (Java has always felt kinda clunky to me, for no good reason). IME, YMMV.

        Not that I’m crazy about it, but it has its advantages.

        1. houser2112 says:

          No case fall-through, but break is still required? How annoying. And no case fall-through makes it just a pretty if/else.

          1. Miral says:

            While C# doesn’t have case fall-through, it does have something better: case jumps. Which really is the best of both worlds — preventing fall-through eliminates a whole class of bugs caused by “oops, I forgot to ‘break'”, while adding jumps lets you explicitly specify where you’re trying to share implementation. Thusly:

            switch (something)
            {
              case 1:
              case 2:
                // some implementation for 1 & 2
                goto case 5;
              case 3:
                // some implemenation for 3
                break;
              case 4:
              default:
                // some implementation for 4, and anything outside 1-5
                break;
              case 5:
                // some implementation for 1, 2, and 5
                break;
            }
            
            1. Muttley says:

              Goto? YIKES! Something that was declared evil already 40 years ago comes back from the dead. Doesn’t look like a good substitute for fall-through, honestly.

              1. Pickly says:

                What exactly annoys people about GoTo, out of curiosity?

                (I’ve heard a similar complaint somewhere else, and hin my very slim programming experience have never actually run into it at all, so don’t have any direct experience with what sorts of issues it might cause.)

                1. krellen says:

                  GoTo programming is very, very, very difficult to follow for anyone, including the person that wrote it. If you ever have to go back and look at the code again, GoTos make your job exponentially more difficult.

                2. Alan De Smet says:

                  Edsger W. Dijkstra’s letter to the editor, which the editor titled, “Go To Statement Considered Harmful” is the most famous argument against the goto statement. But to summarize: goto makes it easy to write “spaghetti code”, where the program logical jumps around all over. It can be hard to identify how you ended up at a particular location. Other language elements like loops, conditionals, and functions provide more explicit entry and exit points.

                  While all that is true, the idea that goto is always bad is an overreaction. goto is a powerful and potentially dangerous tool. One should avoid it because the other tools are almost always safer and easier to read. But every once in a while goto can actually simplify program flow, making the resulting code easier to read. For example, goto is sometimes the clearest way to handle errors by jumping to a shared recovery section of the code.

                  The counter-counter point is that it’s dangerous enough that you should scare people away from it. In 15 years of professionally programming, I use goto once every 5 or so years. I see other valid uses every year or two. The concern is that when you put dangerous features into the hands of random programmers, many of them lack the restraint to use it appropriately.

                  “Are programmers smart enough to use powerful and dangerous features?” is the core question dividing a lot of programming languages. :-)

              2. Jack Kucan says:

                I’d honestly love it if GOTO was limited to switch cases, since it would make it slightly more obvious when you were falling through to the next case, and you should pretty much never use it outside of a switch case in C#.

              3. WJS says:

                Isn’t the problem with goto that it can send you just about anywhere? A goto that’s limited to within the current block (probably on the same screen) doesn’t seem to have that problem.

        2. Kdansky says:

          No case-fall-through? I consider that an irrelevant feature, not a bug. Honestly, I’ve not ever used that ever, but I have fixed quite a few bugs where people forgot a break. That said, having to write ‘break;’ still seems really pointless, and it is the same as an if/else block. I have never been a fan of switch statements to begin with, they are glorified ifs anyway. The compiler can bloody well leave me alone with its optimisation issues.

          Java isn’t so clunky anymore starting from their fifth iteration, which introduces generics. Their compiler really sucks for generics and produces horrible code, but in 99.9% of all cases that doesn’t matter, and having things like Map makes everything so insanely simple to write, and they fixed all their issues with encapsulated primitives too. Compared to newfangled things like Scala or Ruby, it might feel clunky. Compared to C++, it feels as elegant as battles in Crouching Tiger Hidden Dragon.

          1. Ingvar says:

            Nah, switch statements are glorified jump tables (or, rather, jump tables that have a pretty surface syntax) and are, as such, pretty useful. However, what I’d REALLY like is something that takes a whole swoop of various conditional tests, with associated code bodies and execute the first block that has a matching condition.

            1. Matt says:

              Maybe something like this?

            2. Will says:

              So basically you want a switch statement.

              1. Alan De Smet says:

                In C/C++, switch is just a jump table. You can only switch on integers and you can only test against constant integers. It’s not possible to say something like this:


                switch(c) {
                case == random_int(): win_lottery(); break;
                case c == 32 && cheat == TRUE:
                win_lottery(); break;
                case = 1 && = 100: do_b(); break;

                }

                or perhaps:


                std::string token = get_next_token();
                switch(token) {
                case "xml": open_body(); break;
                case "html": open_html(); break;
                case "body": open_body(); break;
                // ...
                };

                Which is unfortunate, because I occasionally want to say things like that. Chained “else if” statements do the job (and are logically identical), but aren’t as elegant to read or write. It’s one of several pieces of syntactic sugar I wish C++ would steal, along with Pascal’s “with STRUCT do”.

              2. Ingvar says:

                No, basically, I want something where I can go:

                switch {
                  strcmp(foo, "foo"): do_something(); /* implicit break here */
                  flag & BITMASK != 0: do_something_else(); fallthrough; /* explicit no-break */
                }
                

                In, kinda, C-like syntax. And with the ability to test random variables, not a single one. The closest I’ve seen, so far, is COND in Common Lisp (well, most lisp-like environments), but that lacks the ability to fall through and that MAY be nice, I think.

                1. WJS says:

                  I’ve seen that kind of construct a few times. It looks something like this:

                  switch(true)
                  {
                      case(foo==”foo”):
                          do_something();
                          break;
                      case(flag & BITMASK != 0):
                          do_something_else();
                  }

                  Basically, the only difference I can see is that you want to reverse the default breaking behaviour.

          2. Canthros says:

            I’ve worked a very small amount (very, very small amount) with Java 5. I am familiar with it, at least in passing. The clunky feeling I’ve gotten from the language hadn’t really disappeared. Can’t really put a finger on the why, though some of it has to do with the load of syntax it takes just to get to “Hello, World,” a problem C# also has. Java is, or, at least, was, conceptually cleaner than C#, but the trade-off seemed to be requiring more code to get from A to B. IMHO, YMMV, etc.

            Anyway, case fall-through is handy in limited situations for reducing code repetition. If you have two cases where case B can be summarized as case-A-with-an-extra-step, you can do the extra step of case B, fall through to case A and then break.

            Aside from a certain brevity of expression for simple comparisons, it’s pretty much the one feature that switch has over if, and, in C#, it doesn’t even have that. As a result, C#’s switch statement offers very, very little and does so with syntax that no longer even makes sense.

      4. Michael says:

        That may be more accurate than you think. C# seems to have come out of nowhere sometime between when I graduated with an AAS in Programing (2005), learned that I couldn’t do anything with that degree and went back to get a BS in Political Science. So I literally haven’t the faintest idea what it is.

      5. Psivamp says:

        I personally haven’t used C# for anything, but my father — life-long coder — had to learn it and is using it for his current personal and professional projects.

        He prefers it to C++ because he doesn’t have to deal directly with memory. On another note, the Windows GUI objects that it ship with are garbage. Text labels won’t draw over images, for example. So, it’s capable of quickly making Windows GUI apps, but if the GUI is complex it might start to break.

        Also, don’t they charge for Visual Studio now?

        1. Kyte says:

          VS Express is free and good for most personal-project purposes.

        2. Svick says:

          Yes and no. Microsoft sells Visual Studio, but there is also a basic version (called “Express”) available for free.

        3. Simon Buchan says:

          All the stuff in “System.Windows.Forms” is just a thin wrapper over the native Windows… uh… window APIs. “new Form” becomes “CreateWindow(“”, …)”, “new Button” becomes “CreateWindow(“BUTTON”, …)”, “myButton.Label = “Foo”” becomes “SetWindowText(hwndMyButton, “Foo”)” and so on (Yeah, labels and buttons and all the other controls are windows as well. It’s called Windows for a good reason :).

          If you want a fancy new GUI library in C#, use WPF, which implements all the controls and layout you want in fancy DirectX. After you learn all the insanely complex (but powerful) systems that support it.

  4. Ethan says:

    …and knowing is half the battle.

    1. Teldurn says:

      G.I. JOE!

      1. ClearWater says:

        The basketball heroes! (That’s what they say, isn’t it?)

    2. Taelus says:

      Dangit! This was the first thing I thought and I was hoping no one else was going to get there first. I really should know better on this site than to think I’m the only nerd here.

    3. HeadHunter says:

      But violence is the other half – that’s why it’s called “battle”! :p

  5. 4th Dimension says:

    I worked in C++ years ago, but isn’t one of the differences between C and C++ that C++ supports string concatenation, like stringC=stringA+stringB

    Of course you need to allocate stringC first I suppose.

    And I agree with this. Especially since I come from the other side of the problem. C# where there is no way to manually allocate memory (well there are ways to use pointers, and allocate memory, but you can’t deallocate it later). Yes, 99% of time you don’t need memory management, but in that 1% is where you are trying to do some complex Image Processing things, and you need any extra edge you can find.

    1. Mewse says:

      This is a somewhat pedantic reply, but I’m a programmer; fussing over the trivial details is what I’m paid to do. ;)

      C++ is basically the same as C, but with a bunch of new features added. Native support for strings wasn’t one of those new features; C++ still treats strings as raw character arrays. However, STL (the “Standard Template Library”) does provide really useful string implementations which allow you to easily combine, split, and otherwise munge strings to your heart’s content, exactly as easily as you would want.

      To my knowledge, the STL is now available pretty much everywhere you’ll find a C++ compiler. It’s not part of C++, but it almost always comes together with your C++ package. I’ve personally used STL’s string classes in programs running on PC, Mac, PlayStation 2, PSP, Dreamcast, Wii, XBox 360, PlayStation 3, and iPhone. So it’s pretty cross-platform and reliable.

      I’m one of those old geezers who normally advocates using only the basic compiler primitives, not big add-on libraries, for largely the same reasons that Shamus mentions. But I strongly, strongly recommend using STL strings (or a work-alike) instead of using the built-in C string functionality whenever possible. It’s just too easy to mess up when you’re handling raw C strings directly. It’s just not worth the risk.

      1. Kamil says:

        The Standard Template Library is part of C++ ISO Standard. Compilers that doesn’t come with STL is not standard compliant.
        Btw Shamus, from your writing I get a sense that you program C++ the old way. Do you ever try STL ? or Boost ? The style of C++ programming has change a lot this last decade.

      2. wererogue says:

        Yeah, std::string is as ubiquitous now as standards-compliant compilers are. But I almost never find people using them – we’re too used to handling strings ourselves now.

        1. Shamus says:

          That, and a lot of us have legacy libraries and structures that take char* parameters and such. I suppose you can use std::string and then grab the char out from under it when you need it. Probably a worthwhile approach, but I’ve never gotten into the habit.

          1. Veylon says:

            Actually, the string class has a .c_str() function that gets the char array for you. I’ve done this any number of times with old windows functions that want LPSTR and such. Very handy.

          2. SatansBestBuddy says:

            Yeah, well, when you’re habits are actively causing problems and headaches, a change of habit is well advised.

            Think of it like optimizing yourself; it takes a while to find exactly what needs to be fixed, and takes another while to get that fix up and running, but once you do you’ll wonder why it took you so long to get around to fixing it.

          3. Mephane says:

            I hope you’re talking about char* as output parameter. char* as input-only is pure evil (imho). At work we have to use a library whose authors couldn’t care less about const-correctness, so you can’t even read data from objects declared as const. And then it want input as non-const references, which means that effectively const functions passing class members to that library require said members to be mutable. I am sure one day someone will stumble around our code and think “What the hell, why on earth are these variables declared mutable”. Then he decides to revert them to normal declaration, and nothing would compile any more. Heh.

            1. Miral says:

              Depending on how widespread the library is, I wouldn’t use mutable, I’d use const_cast in the specific cases that needed it instead (probably by writing wrapper functions around the library functions that had const-correct interfaces).

              Of course, either way you’re vulnerable to the library implementation changing something you’re not expecting it to. But you could mitigate that by copying the data yourself before passing it in (in the wrapper functions), to guarantee that the library couldn’t screw it up. That’s definitely the safest option, although it might hurt performance a little.

              1. Mephane says:

                Well that library is available as open source, so it’s not that we don’t know whether it might change something (because it doesn’t), but it’s just annoying that it breaks the whole concept of const-correctness. Anyway, don’t ask me why, but I didn’t even think of just copying the member variables before passing them; probably because I’ve become so used to very clear and strict const-concepts and const-reference argument passing the I pretty much automatically avoid any unneccessary copying without thinking about it, heh.

            2. Phil says:

              I like Win32’s OpenPrinter, I think, which takes a char* (or LPTSTR if you’re into that stuff). I can’t think of any reason it is NOT a const parameter, but it isn’t. One of the annoying cases where you either cast the const off string::c_str() (evil), or have to go and allocate another char array.
              Haven’t compared std::string v MFC/ATL::CString for a while… wonder if STL still wins in speed these days.

          4. Ell Jay says:

            “Grab the char out from under it”– is that like musical chars?

          5. Ian says:

            I was the same. I spent too long stuck in plain C on the PS2 (enforced actually because the same code ran on the Gamecube) and when I moved to windows based development elsewhere one of the things I was encouraged to use after a performance reveiw was STL.

            It is well worth the time investment.

            P.S. I’m sorry about the terible games I helped create

            1. Tizzy says:

              NAMES!…

        2. Mephane says:

          Well, simply said, std::string solves the very string operation problems Shamus is talking about – and still provides an interface to have it read by old-fashioned functions, but manipulation goes by included operators and methods.

          I guess Shamus knows this anyway, but for those who don’t:

          std::string allows you to do the very same thing as in the Basic example:

          std::string str1 = "Hello";
          std::string str2 = " world!";
          std::string str3 = str1 + str2;// str3 now is "Hello world!"

          This alone is already reason enough why I think that for 99.9% of all situations, C++ is clearly superior to C (because no such thing would even be possible there); you can write all your code as low-level as someone would do in C if you wish, but yet take advantage of std::string everywhere.

          That and std::vector, which does something similar but for arbitrary types, not specifically characters of text.

          1. King of Men says:

            std::string is not quite as convenient as having a built-in string class in the language. For example, you can’t do this:


            void myFunction (std::string s) {
            // Do something with s
            }

            (...)

            myFunction("string" + "literal");

            even though std::string has a conversion-from-char* constructor and a + operator. The compiler will still believe that you’re trying to add two string literals, ie char* objects, and complain. You have to do this:


            myFunction(std::string("string") + "literal");

            which is not a whole lot of extra code, admittedly, but still.

            Nonetheless, std::string really does solve a lot of the pain of working with strings. It’s still annoying to concatenate strings with numbers, though, as in


            int result = bigCalculation();
            std::string output("The number is ");
            output += result; // This doesn't work.

            Some idiot didn’t add an operator+(int s) method, and so we’re stuck still having to futz around with sprintf or buffering. Annoying.

            1. Mephane says:

              Yes, that is true. For some time I’ve done work with Borland Builder, where they have that AnsiString class which has overloads like that.

              Luckily, at my current workplace, we are also using boost (or more precisely – it is installed and we are allowed to use it, that doesn’t mean that people typically do so), which has that absolutely convenient lexical_cast function which works just like static_cast etc., only that they transform numerical values to strings back and forth. Not as elegant as writing “string1 = string2 + int 1”.

              But… hey, I just got an idea. I could write such an operator outside of the class (but inside the std namespace, of course). I guess I’ll just be making a wrapper for the aforementioned lexical_cast, but it would really be so convenient.

            2. Alan De Smet says:

              Overloading + to mean “concatenate strings” always rubs me wrong. It seems like a good idea, then people start doing clever stuff, like implementing “operator+(std::string &lhs, int rhs)”. (Which you can do, if you really want.) Now you’re facing code that looks like:

              std::string s1 = 20+"10";
              std::string s2 = "four: " + 2 + 2;

              and you have no idea what should happen. Which will surprise fewer programmers, 30 or 2010, 4 or 22? After all, if you’re silently converting integers into strings, why not convert strings into integers? Lots of languages will do exactly that! You end up with weird rules, like “The left hand side has to be a string,” or needing to pay very close attention to the precedence rules. You’re in the land of special cases and exceptions. You can obviously live there just fine (See also: Java), but I’m not convinced that it’s inherently better.

              Given C++’s grounding in C, I think the solution (stringstreams) was a reasonable one. For those not familiar with them, the above code looks like this:


              int result = bigCalculation();
              std::ostringstream output;
              output << "The number is " << result;
              // If you want the std::string, use output.str()

              While overloading the bitshift operator isn’t great, it is symmetric with the preexisting I/O stream interface C++ had, and bitshifting strings is kinda silly.

              As someone who has done a fair amount of string manipulation in C++ over the last few weeks, it’s not a big deal. It’s certainly not as convenient as, say, Perl, but

              1. Simon Buchan says:

                I prefer:

                typedef std::ostringstream osstream;

                std::string result = (osstream() << "Hello, " << user.name << ". Your age is: " << (now() – user.birthdate).years() << "!").str();

                Also, your first line "std::string s1 = 20+"10";" shows the *real* reason C++ should *never* have defined string operator+(). That is assigning whatever is 20 bytes past the start of the static string "10"!

      3. Simon Buchan says:

        SUPER NITPICK WARNING:

        The “STL” correctly only refers to the collections section of the C++ standard library: the stuff that uses iterators. std::string is a bit of an edge case, since it was in the pre-standard C++ library long before STL, but it is a collection of characters:

        for (std::string::iterator it = str.begin();
        it != str.end(); ++it)
        {
        char c = *it;
        }

        The C++ standard library contains the C standard library, IO streams, strings, STL containers, STL algorithms (find and sort, plus a bunch of really weird stuff noone uses), memory (the *_ptr classes ESPECIALLY), exceptions, locale (whether your numbers use comma or dot, basiclly), the pretty anemic typeinfo, and some numeric stuff: complex numbers and limits (maximums, not the math type).

  6. Aelyn says:

    After programming C/C++ commercially, I declared that any language that didn’t require me to manage memory was a cake walk. I still stand by that.

    I actually failed to get a job once (listen up, Shamus… this could be you) at a Powerbuilder shop. I was leaving a job where I wrote C++. The programming manager asked me what the worst bug I’d ever had was and how I solved it. The bug involved memory management, double-deleted pointers and garbage collection.

    He said, “Um… okay.” He had no clue. He quickly realized I shouldn’t work for him and I came to the EXACT SAME CONCLUSION. Win.

    1. Simon Buchan says:

      I’ll just wait until you figure out what event listeners mean in a GCed language :).

  7. wtrmute says:

    I think you can, in fact, use a simple garbage collector to handle strings especially, like this Boehm-Demers-Weiser GC. It’s not necessarily right for every project, but as a fellow C guy, you probably have already taken this lesson to heart.

  8. Henebry says:

    So am I right in guessing that your comic-creation app is blazing fast precisely because it’s got a “flamethrower” for an engine?

    1. Adam says:

      Probably more because it was designed by a single, experienced programmer and built with exactly the features and UI he needs, and no more.

      Odds are most of the alternatives are written in C/C++ as well.

  9. Richard says:

    Hm. I finally know what a memory leak is, now.

    Interesting.

    1. Psithief says:

      I thought a memory leak was when memory didn’t get deallocated after the object inside was past its usefulness.

      1. wtrmute says:

        That’s exactly what he said in the article: the memory doesn’t get deallocated and thus, the program still thinks it’s in use. Next time it passes by that part of the code, it’ll allocate another piece of memory it’ll fail to deallocate, and so forth. Over time, the little pieces of memory that were “leaked” start accumulating and you’ll see the memory footprint of your process grow.

        1. WJS says:

          Which can either be borderline unnoticeable, or crash your program, depending on how often you do it. Fun.

  10. Ben Orchard says:

    Shamus,

    If you ever want to start writing text books and other books on programming, you could probably make a killing making C and C++ clear to new programmers. As someone who suffered through the dry barrens of Dietel & Dietel’s C book, I firmly believe that one of the most daunting aspects of learning a language is the absolute CRAP that passes for writing in those books.

    If you ever write an Intro to C book, I’ll buy it. I’d know that it wouldn’t be BORING. When I dive into a book on programming I don’t expect Tolkien or Aasimov at their best, but I would like it to be useful for something OTHER than curing insomnia (hello Android documentation as an example of something that gave me much more sleep).

    1. Valaqil says:

      I was tempted to add a “Me too” or some other comment on some of the other stuff here, but this comment caught my eye. Even as someone who writes code for a living, I would like to see Shamus Young’s Guide to Programming. (Or whatever you would call it.) I’m completely serious. You have a good writing style, know how to make things clear, and keep it interesting. Would you ever consider writing one? I know I’d recommend it to people I know. Maybe you could send a couple of these articles to a publisher or something? (I’m not really sure how you would go about starting that.)

      1. Randy Johnson says:

        Oh man, I just had this picture of Shamus teaching online C classes. 6 People sitting in vent with him, and Shamus teaching them how to make a Hello World program. I am pretty sure I would pay for that course. Like I told him the other day, all I know is Java, and I should really learn something new.

        1. Gaukler says:

          I’d be in if Rutskarn would TA and come up with horrible coding puns.

          1. Roll-a-die says:

            Hey Shamus, you feeling, loopy today?

            Grr can’t think of any recursion puns. Recursion Puns.

            1. Valaqil says:

              Not original, I know, but I think the old saw:

              Recursion. (n)
              —If you don’t understand, see Recursion.

              counts as a recursive pun, although not your typical homophonic pun.

              1. Yar Kramer says:

                I misread that as “homophobic pun.” D’oh!

              2. SteveDJ says:

                Have you tried to Google the word “recursion”? Try it…

                1. Fists says:

                  thats good, although I always misspell when I google anyway so probably wouldn’t have noticed normally

                2. silver says:

                  Have you tried to Google the word “recursion”? Try it…

      2. Michael says:

        Honestly, some of what Shamus has been writing on the subject reminds me of In The Beginning there was the Command Line by Neil Stephenson. Though Stephenson was talking about general computer history and not specifically programing IIRC.

        1. Technically, Stephenson was talking about the history of Operating Systems. He talks about the history of computers, but his focus is clearly on the OS.

          The thing that blew my mind from that book was when he talks about all the advances made by computers from the 1950’s to the 1980’s, and how a lot of those advances were basically invisible to the operating system. Whether you’re using punchcards, a keyboard and printer, or a keyboard and monitor, to the computer it’s all the same. Character input and character output. It wasn’t until 1984 that computer interfaces really changed (from the OS perspective).

  11. Ian says:

    No love for the Standard Template Library then?

    Solved all my C++ string problems back in the day (and it’s free).

    1. Mukik182 says:

      And standard too! I haven’t coded in C but I mainly use C++ as a hobbyist, so no “real world” problems for me, but I never had trouble using strings (note I’m referring to C++ STL strings). They come as part of the standard so are implemented in every compiler.

      On the other hand, I really liked your article, found it well explained and interesting. Keep on!

  12. froogger says:

    Thanks for the explanation. It’s good to get another confirmation that my decision to not delve into programming was the right choice. I quit that about the time they started talking about “object-oriented” as I found it tedious and not as rewarding as fixing existing systems. Seems not much has changed apart from the tools and languages since then.

    However, I still find scriptwriting nescessary in my work with support and maintenance, so the skills haven’t been a complete waste.

    1. Rosseloh says:

      Same here, I’m always looking at my decision to leave the generic “computer science” degree and moving to the CCNP netadmin program. In fact it seems we made our minds up at the same point, with object-oriented stuff. But, as you mention, knowing the fundamentals of programming and how computers act on instructions certainly helps.

      And Shamus, if the quality would be the same as this post, I must third the proposal that you write a programming book — I’d buy it and read it even though I don’t plan to write much code in my life, and I’d show it to everyone I know who teaches that kind of thing.

    2. Zak McKracken says:

      I’m still not doing much more with my coding skills than scripting (were doing some Fortran coding before), but the concept of object orientation seems to me an extremely good thing, even though it did take a while to wrap my head around it.

      Initially it’s just a roundabout way of doing things, but once you’re settled in, you can do amazing things that would require brutal hacks without object orientation.

      In that respect:
      http://xkcd.com/353/

  13. CmdrMarkos says:

    So I must be wrong, but I thought the OS would take care of protecting a program from read/writing memory not allocated to it, or is that just for special cases? Of course that doesn’t stop a program from corrupting its own memory space.

    Also I never got why the string termination character was necessary _if_ your string takes the whole allocated memory space.

    But then I’ve only dabbled with C… back to FORTRAN 77!

    1. Ingvar says:

      There are usually multiple layers of allocation. What the program requests from the OS tends to be page-sized chunks (multiples of, usually, 4 KB). These are then divvied up by the runtime library into your 1-,3-,7-,23- or anything-else-sized allocation requests (with a bit of padding for the book-keeping, because you never know when you’ll need to reclaim a chunk).

      1. silver says:

        Right. the OS will keep you from going well outside your memory bounds, but won’t keep you from overwriting _your own_ data.

    2. Canthros says:

      The OS protection in question typically involves throwing an exception or other serious error. Unexpected errors tend to lead to application crashes. However, it is preferable for the at-fault application to crash than having a different application or even the OS crash instead.

      String termination allows for strings of variable length to be stored in the same memory location. If the value of string A changes from “Hello, world!” to “Hello!”, you don’t print “Hello! world!” later (or “Hello! “, for that matter).

      Additionally, the use of character-terminated strings allows for strings of, theoretically, arbitrary length (in practice, limited by available memory). An alternate approach used in some other languages, length-termination, typically has a known maximum length beyond which strings may not grow (ISTR this is 255 characters in at least some versions of Pascal, for instance).

    3. Deoxy says:

      So I must be wrong, but I thought the OS would take care of protecting a program from read/writing memory not allocated to it, or is that just for special cases?

      The “special case” is “every OS but those by Microsoft”. Indeed, handling those sorts of things is one of the most basic functions of the OS.

      1. Kyte says:

        BS. EVERY modern OS has memory protection. It’s what makes apps crash instead of OSs. However, the OS can’t know if you’re doing a valid assignment or just scribbling all over your variables, so trashing your own heap is perfectly possible.
        In *NIX/Linux/BSD/etc they’re called segfaults. In Windows they’re called Illegal operations or whatever. In the end, it’s the same thing.

        1. Deoxy says:

          BS. EVERY modern OS has memory protection. It's what makes apps crash instead of OSs.

          Which is why apps never crash Windows anymore in your world…. what planet are you living on?

          1. Alan De Smet says:

            The modern line of Windows is descended from NT, not 95, and has had high quality memory protection from day one. Normal applications can merrily write wherever they want in memory with zero risk.

            So, why doesn’t it work in practice? A variety of options: Buggy drivers by third parties remain a common problem. It’s (part of) why Microsoft is so keen on WHQL testing; they’re getting grief for Windows crashing, so they want to reduce the risk. Also, software that doesn’t play by the rules. There is a long list of crap that software shouldn’t do: poke around in the boot sector, install custom drivers, hook directly into the OS memory space. And modern Windows, by default, doesn’t allow it. But software publishers whine that they need to do dangerous things, mostly so they can implement their buggy DRM implementation of the week. So the software gets run at the Administrator or System security level where it’s free to fuck everything up. Add in viruses and spyware playing much the same games and you’ve got a recipe for crash soup.

            The problem isn’t the memory protection. It’s a culture of insecurity endemic to commercial software developers that Microsoft allows to continue in the name of backward compatibility. (And understandably. If DangerousGarbage 3.2 works on Windows N, but doesn’t on Windows N+1, users will be blaiming Microsoft, not DangerousGarbage for abusing its privileges.)

            Given all of this, to an extent it’s impressive how stable Windows is. And, by and large, it is stable. Several friends who work professionally as systems administrators all grudgingly yield that since Windows 2000, it’s a pretty good operating system.

            Believe it or not, I’m not a fan of Microsoft. (My heart goes to Canonical, makers of the the Ubuntu distribution of Linux.) They do a lot of terrible stuff. But I have zero complaints about the memory management in modern Windows releases.

            1. Deoxy says:

              The modern line of Windows is descended from NT, not 95, and has had high quality memory protection from day one. Normal applications can merrily write wherever they want in memory with zero risk.

              snip

              Also, software that doesn't play by the rules. There is a long list of crap that software shouldn't do: poke around in the boot sector, install custom drivers, hook directly into the OS memory space.

              snip

              So the software gets run at the Administrator or System security level where it's free to fuck everything up.

              So, it has GREAT memory protection… it’s just optional, and basically up to the program itself whether or not to use it.

              To put it another way, I stand by what I said; the difference between “memory protection that the program can opt out of” and “NO memory protection” is not significant.

              (I non-grudgingly yield that Windows 2000 forward suck much less than other previous versions of Windows. Compared to other modern OSs, it still sucks.)

              1. Kyte says:

                Even at admin level (System is impossible to access unless you pull off a privilege-elevating exploit), processes run each in their own address spaces. Even if you want to, you can’t just scribble over someone else’s memory. To crash another program or Windows (without exploiting inputs), you have to get into the appropriate process, which can’t be done with an EXE. You need to inject DLL for that. Admin just makes such a thing possible. (It’s how installers and debuggers work, after all).
                Even then, it’s the exact same thing as other OSs. Once malicious code reaches Admin/SU status, they’ve got complete control. Memory protection is not engineered towards that. But the OS can’t know if something is legit or not. That’s why Admin is not the default level of accounts anymore, and programs have, by large, learned not to ask for admin unless needed. In fact, Vista ever-derided UAC was designed to be a roadblock to privilege escalation. From there on, PEBCAK.

                But we’ve digressed now. You’re confusing memory protection with security. Mem protection is just meant to isolate processes. Security is much more complex.

                1. Ian says:

                  To take what you said a step further, the Administrators group in Windows doesn’t have as much control over the system as root in a *nix system. As a root user, you can easily nuke a filesystem, write garbage to kernel memory, and do a ton of other wicked and nasty things.

                  In the Windows world, your average Administrator has less direct control over low-level system functionality. You can’t destroy a system disk with “rm -rf /”. You can still do damage if you know what values to garble and such, but it takes far more than a simple typo to do so.

                2. Deoxy says:

                  To crash another program or Windows (without exploiting inputs), you have to get into the appropriate process, which can't be done with an EXE. You need to inject DLL for that.

                  Gee, that makes it better – who ever uses DLLs these days? Duh….

                  programs have, by large, learned not to ask for admin unless needed.

                  Yay, MOST programs DON’T opt out anymore. Duh….

                  As I said, security you can opt out of is NOT security.

                  (Memory protection IS a form of security, as in the general usage of the word, just not “keeping people (especially hackers) out of the system” security, which is what we use that word to refer to most often when talking about computers these days. No, I was not confusing the two.)

                  1. WJS says:

                    Obvious troll is obvious. Everything you just said applies to other OSes too. You can “opt out” of security by running as root, and do everything (and more) that a windows admin can do.

          2. Phil says:

            One might also point out that most of the article deals not just with trashing the heap, but also with the stack, as well. Though, overwriting 4000+ bytes on the stack is generally a lot more dangerous than doing it to the heap (i.e., buffer overruns).

            1. Phil says:

              Hmm, and this was supposed to be a general comment, not a reply to the above comment. Ah well. Trashing my own heap, it seems. :)

          3. Shamus says:

            There’s a difference between “don’t have this feature at all” and “this feature should be improved.”

            1. Deoxy says:

              So, a door that requires a code typed on a keypad to enter is a form of security… even if the door has a secondary handle on it (that anyone can use, if they notice it) that will open the door without said code?

              Having a form of security that is optional at the request of those you are supposed to secure against is NOT security at all.

              But it does LOOK like security. Yay! (I would like to thank the TSA for being an even better object lesson on this point.)

              1. Shamus says:

                I guess this would mean something if all processes were going around installing themselves as system drivers or whatever. Yes, the *can* do this. But I don’t see it as the OS’s job to “defend” me from malicious processes. (One of the reasons I never went to Vista.)

                Look, I use windows. A lot. Twelve hours a day. Ten programs running at a time. Firefox with ten taps open. Everything eating tons of memory. A couple of ill-behaved programs crash themselves now and again. Meanwhile, I’m writing software that sometimes crashes spectacularly. And yet yesterday my machine rebooted for the first time in a month. Yes, I’m aware that a Linux machine is a juggernaut, but your appraisal of the memory system in Windows is wrong. If you were right, this machine wouldn’t last an hour.

                1. Ian says:

                  Linux crashes for me just like Windows does, and for many of the same reasons. The only time Linux ever crashes for me is because of an errant driver (usually the crapware that NVIDIA and ATI like to call “drivers” [though they have gotten better over time, at least]).

                  In fact, as of Vista, Windows handles video card driver crap-outs far better than Linux does. While Linux either hard-locks or freaks out, Windows can usually reset the video card driver and have me back up and running in 10-15 seconds. Prior to Vista, that was a driver-specific feature, and I can say that ATI’s GPU reset feature worked fairly well in XP (for some reason, Rollercoaster Tycoon 3 tripped it on my laptop semi-regularly; probably a driver bug).

                2. Deoxy says:

                  I do agree with you that it is MUCH better than it used to be, and I use Windows every day, as well (computer programmer, all Windows based – I make my salary off of Microsoft, to a large extent).

                  But I blue-screened my brand new computer a few weeks ago when I tried to use PixelCity the first time on it. Yup, windows is so stable…

                  And you booted it for the first time in a whole month? That people consider that a long time for a Windows machine (even in a case like yours) makes my case for me.

                  Windows leaks memory (or allows programs on it to do so, which is logically equivalent). If it didn’t, reboots would only be required after OS patches, and would be a COMPLETE waste of time the rest of the time.

                  1. WJS says:

                    So memory leaks are impossible on Linux? I’m somewhat skeptical about that…

      2. Ian says:

        Windows has had memory protection since NT came out in 1993…

        1. Deoxy says:

          Yes, it just doesn’t WORK very well. It was a big step up from what they had before (NOTHING), so, um, yay for them. It still sucks.

          1. Miral says:

            Actually, it works perfectly. One process can’t stomp on another process’s data unless the other process lets it. There’s no way the OS can tell whether a program is making a “good” write or a “bad” write to its own memory, though.

            Unfortunately, since Windows is the most popular OS at the moment, it has lots of people writing software for it — even people who shouldn’t be, since they write crappy software.

          2. Ian says:

            As Miral said above, it most certainly does work. NT was built as a server and workstation OS. If there were no memory protection, a Windows server could potentially drop like a ton of bricks over a bad PHP script. I’ve gotten upwards of three month uptimes using 2000, XP, Vista, and 7, and generally the only reason I reboot at that point is to do updates and other maintenance tasks. I don’t think I would have broken a week if NT’s memory protection “didn’t work very well.”

            The only thing that memory protection does not and can not protect the system from are kernel mode drivers, but that’s nothing new. What do you suppose would happen if you were to compile a Linux kernel module that trampled the system memory?

  14. Skip says:

    CmdMarcos, yes and no, depending on what OS, but nothing stops a program from trashing its own allocated memory. Recent OS’s mostly keep you from stomping on other processes, but you can still generally affect them by stomping on shared stuff and resources.

  15. Nidokoenig says:

    First thing this makes me think of is X-Com, where you’re limited to 80 items because of RAM limitations at the time, and the way old games like the original Pokemons and Sonic 3 would count items over 99 with two digits, one of which was garbled pixels. Very interesting.

    +1 to wanting to read a proper programming book by Shamus. Hell, there’s enough unique stuff on this site to put it together and tart it up a bit for a book, like Dave Sirlin did with his Playing to Win website. Could be an idea.

    1. Adam says:

      99 is a UI bug, and not that exciting code-wise.

      Look up the 255 max on stats in old RPGs. :P

      1. Klay F. says:

        heh, I had always wondered why those programmers had limited themselves to just one byte to represent numbers.

        Though, now that I think about it, with as limited as computers were back then, I guess it would make sense to save every possible byte.

        But then a new question arises for me, even supposedly “next-gen” games have this weird limit, where space should no longer be an issue. Two specific games that come to mind are Oblivion and the first Starcraft. Starcraft limited the number of upgrades a particular unit could recieve to 255, and Oblivion had the same limit on how much you could artificially raise your character’s attributes.

        1. Will says:

          You have to put a limit somewhere. In both games it was never expected that anyone would ever get either number anything like that high, so they just made it a nice round 256 and left it at that.

        2. Kyte says:

          Why waste 2 bytes when you don’t expect the value to go over 100?

          1. WJS says:

            Because you have billions of the things? We’re talking about character statistics, you’re never going to have more than a dozen or so of them. Unless you have millions of (fully detailed) NPCs floating around, using an int instead of a byte isn’t going to make a dent in your memory.

      2. MrPyro says:

        I remember playing Phantasy Star on the Master System and discovering the XP and money capped at 65535, and being just geeky enough (at the time) to have an idea why.

  16. DougO says:

    STL is a wonderful band-aid over some of the issues of C++. It also comes with a few of its own caveats, but it takes away a bit of the “here’s more rope to hang yourself with”.

    C# [Disclaimer: Microsoft currently pays my mortgage] and its built-in memory management do wonders for letting you NOT worry about trashing the heap or stack–at the price of performance. Great for non-graphics apps that spend 95% of their time waiting on a user or a remote signal anyway.

    1. Simon Buchan says:

      Umm.. the standard library (see my pedant rant about STL above :)) is the POINT of C++. The “++” is about letting things like std::string and std::vector exist, so better to say “C++ is a wonderful band-aid over some of the issues of C”. Of course, C++ *also* gives you a whole bunch of rope to hang yourself with, if you don’t feel like using C’s gun to shoot yourself in the foot.

      Funny thing about (exact, compacting) GCed languages like C#: they are actually *faster* than manual (or conservative GCed) memory management for long running processes due to cache coherency and locality. When C# looses, it’s generally because a) the developer used libraries that exchange development speed for execution speed (eg, LINQ), b) The C# compiler generates worse code than C++ (delibrately, it is a much simpler and faster compiler), which only matters rarely, or c) The total running time is short enough that the runtime initialization is dominant.

  17. And even in the cases where systems and compilers are intelligent enough to stop it spewing over random areas of memory, they usually kill the errant process anyway. So you’ve got the same problem, just possibly immediately and easier to track back to the issue

  18. Gary says:

    I enjoyed C when I used it. The memory management issues are nothing compared to babysitting the STACK in Assembly. I loathe assemby. Powerful, yes. Enjoyable? Grrrrr…..

    I do have commend you on your explainations of how these things work. Much more user friendly than anything I’ve seen in a classroom.

  19. Newbie says:

    If someone would be kind enough to ignore the vagueness of my following question:

    Is C++ hard to learn? In my Theoretical Physics University course I am going to start I will be learning Fortran and C++ to code simulations for the problems we have to work out, I don’t code and have never coded. I would like some idea on difficulty level of the language.

    1. Valaqil says:

      Questions like this are a little hard to answer. Not because the answer is vague, but because the answer really does depend on _you_. How difficult the language is depends on how you’re learning it, and whether you, yourself, think in a way that works for programming. (And, arguably, a host of other small factors.)

      To answer your question, I don’t think so. C++ isn’t hard to learn. Some applications of the language can be difficult because they require you to learn something specific (e.g. graphics) or really focus on how you solve a problem (complex math). C++ is a lot easier to learn than C because it avoids some of the direct ways you can mess something up. I will suggest something, however. Since your course doesn’t sound like it is an introduction to programming, find a resource and start learning now. It can only help you later.

      1. Newbie says:

        That is what I ment by vague… English is not my strong point, I just sort of blunder through it. Complex Maths wont be a problem but graphics might be (although I might not need to work heavily with that). I think getting a head start might be a good idea. Thanks for your advice.

        1. Valaqil says:

          Just to clarify: I was only using those examples to illustrate that the language itself shouldn’t take long. You’ll probably be writing a program that successfully answers a given equation before very long at all. Writing a simulation may or may not be a while after that. However, if you do anything more complicated later, that new use of the language will be the difficult part, not using the language itself, since you’ll still be using the same keywords and structures, just in new ways.

          1. Newbie says:

            Ah! I see I took the examples too seriously then? Silly me.

      2. Yar Kramer says:

        I personally learn programming languages and the like through following tutorials and then fooling around with other peoples’ code. However, something tells me that C++ isn’t the language I want to do that kind of thing in …

    2. Sumanai says:

      Depends. You’ll have to wait and see, since ultimately it varies from person to person like math.

      1. Newbie says:

        Nice… answer a vague question with a vague answer. But thanks for answering anyway.

        1. Sumanai says:

          I didn’t mean to answer vaguely, but my original response was very long and meandering so I edited. A bit much apparently, but can’t be helped now.

          What I meant was that how difficult anyone will find C++ is largely dependant on the person. Just like math is hard for some, but easy to others. And that you shouldn’t really worry and just wait and see for yourself.

          I don’t think I’m clarifying myself much, but hope this’ll help.

          1. Newbie says:

            I liked the answer before you clarified it was something I would do. I know what you meant anyway. I just wondered about programming. It is new to me and kind of unexpected.

    3. Kdansky says:

      When learning to program, you have to learn two different sets of skills:

      1. The language itself. There are a lot of differences between C, Java, Visual Basic and Fortran. You will have to learn the intrinsics at some point.

      2. But first, you need to learn to think in terms of process flows, memory, pointers and all those other abstract concepts. The differences in this regard between the languages are very minor (C vs Java) up to completely irrelevant (C# vs Java).

      Note that the second part takes quite a few years to get down decently, while the first point can be fixed in months, or even weeks, depending on experience and difficulty of the language.

      1. Newbie says:

        For 1: When you say this can be done in months/weeks did you mean when starting from scratch or learning a new language? If the latter than would staring from scratch increase the time I would spend learning it?

        For 2: That is pretty much what I expected. Abstract concepts made me giggle, as my course is Theoretical Physics abstract concepts encompasses ALL that I am going to learn. Thanks for that it was most helpful.

        1. Kdansky says:

          If you can write code easily in language X, then it will take you three months at the most to get proficient at an acceptable level with language Y, because they only differ in syntax, and not in concepts. Sure, the uber-gurus will still use the features of their chosen language better, but you can painlessly get work done.

          That is to say, as long as both languages belong to the same category. Query languages (SQL, XQuery, XSLT) are a very different beast from procedural/OO languages (C*, Java, Python, Pascal) and so are functional languages (Prolog, Haskell).

          1. Newbie says:

            I see thanks for that.

    4. Kyte says:

      In the end, it depends on how much you can grok pointers. Everything else is peanuts. If your brain isn’t wired to handle pointers, references and whatnot, C/C++ will be very difficult. The rest is the standard programming’s mindset: If you can think in an imperative style (“To solve X, you need to do A, then B, then repeat C until N is equal to M”), you’ll do good.
      (Tidbit: Another programming style is functional (“X can be represented as a composition of functions F, G & H por input A”), which is gaining popularity since it makes it easy to do stuff in parallel. Naturally, it lends itself well for physics, maths and related. Check out Haskell)

      1. Newbie says:

        Until I got to the first brackets I didn’t understand anything there. But they are technical terms I’m guessing so not important to know the word itself. I will indeed check out Haskell whatever it is…

        1. Kyte says:

          Heh. Pointers are variables just like any other variable. But instead of holding data, they hold the address where you can find said data. References are the address itself. You could say pointers are variables that hold references. So access to the data becomes indirect. Instead of accessing X for value N, you access X for location P which has value N.
          Some people have trouble thinking in terms of indirect access (and the ramifications thereof), which is what makes pointers so difficult to use.

    5. silver says:

      My experience with language learning:
      once I learned Lisp and C and anything with object-oriented features, I was able to learn every other language trivially.

      But while learning the language is trivial, the long part is learning the libraries. My first C program, for example, has a function that converts strings containing characters like ’12’ into the actual number 12. Because I didn’t know atoi() existed. I wasted time and thought and debugging power on something that comes with the language – because I didn’t know the libraries.

      Learning a language is like learning a grammar and consequently easy, but libraries are more akin to vocabulary, and take more brute memorization.

      1. Newbie says:

        So it would be like learning that there is a notation for square root instead of deriving it by a formula? I’m feeling swamped here…
        If that is what you mean then judging from the comments I’ve got back then that might be what causes me the most problems but one that can be remedied quickly. Thanks. I have had a lot of stuff to think about from all the people who responded and I am greatful for it.

  20. RobertB says:

    I’ll join in and give a shoutout to STL. Between the string implementation and the containers, you can avoid a lot of string-related headaches.

    As for C#, yeah, it’s pretty much Java with the serial numbers filed off.

  21. Kdansky says:

    I want to add something important people never realize when they talk about performance of do-it-yourself memory management. Everyone thinks it’s faster because you don’t have a garbage collector. But that is actually not always true: A GC will do most of its work when the CPU would be idle anyway, while your memory management code will be executed at the point you specify. So your fast and cheap code takes up a few valuable cycles, while the slow GC takes up a ton of worthless cycles.

    Would you rather pay someone with a single ounce of gold, or with a pound of dirt? In the end, paying “more” can be cheaper.

    C++ can be the right choice. But in most cases, it is not. ;)

    1. Sumanai says:

      From what I’ve understood from various sources through the net is that anyone who trusts garbage collection to do its job properly is mining for pyrite.

      1. Kdansky says:

        That is pretty much the opposite of reality. If you mistrust garbage collection, you are asking for trouble.

        Modern GCs do a stunning job, nearly all of the time. At the point where a GC consistently runs into trouble, you are doing very unusual things, such as crazy high-performance calculations or memory allocation in a scale that was considered all but impossible just a few years ago. But your average desktop application requires less than 1 GB of RAM (most get by with a few to a few dozen megabytes even, that is less than 1% of what your crappy netbook offers) and spends 90% of its time waiting for user input. Which means the GC can clean up your memory nine times for every key you press. If that’s not enough, you wrote really shitty code somewhere else. ;)

        If you want some evidence: Take a look at minecraft. It is able to simulate worlds that are bigger than the earth, and it runs in Java, which not only has a GC, but doesn’t even directly compile to machine code (which makes everything slower by a ridiculous factor)!

        1. Another Scott says:

          Wait… bigger than the Earth?!?
          /disbelieve

          1. Kyte says:

            Truth. Not as deep, but the simulated area’s bigger than Earth’s surface.

          2. Alan De Smet says:

            Sort of. Minecraft generates terrain around you, about a kilometer (from my crude eyeballing). If you get close to the edge of the generated terrain, it generates some more. It will theoretically generate an infinite amount of terrain, but the practical limit is disk space.

            Also, it’s not accurate to say that it’s simulating it all at once. The game would quickly grind to a halt processing everything. There is a limited zone around you that is being simulated. Everything else exists in a sort of suspended animation. Of course, you’re not there to observe it, so you’re unlikely to notice.

            1. Veloxyll says:

              If a cave collapses in Minecraft, but no-one is around to see it, does it still make a sound?

              1. Kyte says:

                Yes, ’cause it’ll only collapse when you’re close enough to hear it. :D

                1. Will says:

                  No because caves don’t collapse in minecraft; gravity only affects sand and gravel blocks, and it only affects them when something changes around them.

                  You can often find levitating sand\gravel blocks that will happily float away until you poke a nearby block, at which point the game will go “OH SHIT” and make them fall.

              2. ColdDeath says:

                No, because there is no physics in Minecraft for a cave to collapse. It can only be blown up >:D

                1. Sumanai says:

                  So… if a creeper detonates and no-one is around to… wait, they don’t move or exist if the player is not around. Let’s try again.

                  If a TNT block detonates and no-one is around to see it, does it screw the player’s shit up?

                  Someone needs to run some testing.

        2. safetydan says:

          Actually, Java does compile down to machine code, just that it does it at runtime using profile guided optimisation. Look in to HotSpot one day and some of the amazing things it can do with JVM byte code.

          1. silver says:

            “Actually, Java CAN compile down to machine code”
            fixed.
            it does it with some compilers but not with others.

            1. safetydan says:

              Err… you’re not thinking of GCJ are you? That’s static compilation at build time. I’m talking about the runtime compilation that Sun’s (now Oracle’s) HotSpot and any sensible JVM (including IBM’s) does. Java byte code hasn’t been interpreted at runtime for years now (bar the initial startup before HotSpot kicks in).

          2. Kdansky says:

            You are pointing out the obvious. Of course Java code also needs to become machine code at some point, how else is it supposed to run at all? ;)

            1. neothoron says:

              You’re missing the difference between “Just-in-time compilation” and “interpretation”.

        3. Sumanai says:

          I think you misunderstood my “don’t trust the GC” comment.

          I meant that when you go around saying that you should trust something like that some dumb young programmer with bright eyes and lazy bones will write a program during the weekend. Happily thinking that all his memory management will be done by the garbage collector and ignorant of the fact that the GC doesn’t like it when you do that thing.

          You know, the one that all the experienced X-programmers know not to do, because X’s GC always throws a great big sulk over it and refuses to touch that part of the software? That either isn’t in the official documents or is hidden like the Arc of the Covenant? So every time the program starts that part of the code some memory gets allocated but not freed until it’s closed completely and you can bet that part gets called every other second.

          Then just to top it off it’s supposed to be run on the background, hours on end. Like say, an IM client. And you’d like to be able browse the net without cutting off a free method of communication with your friends, but you can’t do it because your computer is chugging along since that bloody client is leaking memory like Pratchett.

          And all the other ones are either also written by similiar fuckbends who didn’t bother to actually do any optimization because “X is almost as light as C” forgetting that they’re running the damn thing on a high-end desktop computer and that they barely ever have more than two tabs open because more would confuse and frighten them.

          And the rest are written in C by other idiots who either have never heard of the term “feature creep” or never understood why it was a bad thing, so it’s in perpetual alpha. Never to be bugfixed, because “that’s what beta is for”.

          Now pardon me, I’ve got an orphanage to burn down. Those bloody gits’ happiness is annoying me and peeing on their parade apparently didn’t do more than temporarily annoy them.

          (Seriously though, I meant that you should always make sure that the GC can do it’s job. For instance in Flash if an island gets too big it’s GC won’t touch it so you have to keep ’em small (which incidentally, isn’t/wasn’t in the official documentation apparently) . Basically a shift from managing every damn memory allocation yourself, you have to just make sure that the GC is happy and working.)

          (Oh, and I don’t like it when people say things like “it’s stunning” or “it’s awesome” especially when they’re obviously trying to counterbalance someone else’s cynicism, and are going overboard to hyping. Which is bad.)

          1. Kyte says:

            leaking memory like Pratchett
            Bad analogy, dude.

            Also: Apart from that example you gave (which sounds like a shitty GC anyways), I’ve never heard of special restrictions in GC usage. Certainly not the JVM’s or CLR, at least.
            Of course, many people mix “GC” with “Resource disposal”, and that’s where the shit hits the fan. Nothing more fun than discovering the program leaked 4 file handles, 3 brushes, 5 bitmaps and a couple mutexes. (To throw numbers out there)

            *hands Sumanai a lighter.*

            1. Sumanai says:

              A few days in-between, but bail doesn’t pay itself. So:

              I ran into a blog post by someone who made a game in Flash which leaked memory after a while (can’t remember specifics, but it was a turn-based flight thingie). The reason was that the GC in Flash doesn’t touch islands that get past a certain size and assumes they’re always needed. This wasn’t according to him mentioned in the official documentation and found about it by pure luck.

              And then got Flash fanboys claiming that it was documented and that it was his fault anyway and blah blah blah.

              The source for this piece of knowledge (assuming I remember correctly) mentioned that every language he had worked in he had to either manage the memory manually or make sure that the GC is doing it for him. To him it was apparently a “GC helps, but not as much as is given to believe”

              Also, “bad analogy” as in “inaccurate”, “dude. Not cool” or “just plain sucks”? They’re all par for the course with me. Even at the same time.

  22. RobertB says:

    @Newbie: As Shamus mentions above, C/C++ offers you a lot of flexibility and performance, at the cost of making it much easier to shoot yourself in the foot. I know you might be constrained by the third-party libraries you’re required to use, but if that’s not the case I’d recommend pretty much _anything_ else.

    1. Newbie says:

      Almost missed this. Anyway I don’t think I would have problems with mistakes (if that’s what you mean with shooting myself in the foot) I am obsessive to a very sharp point. And I am afraid it looks like I don’t have a choice. Thanks for the helpful input nonetheless.

      1. Deoxy says:

        EVERYONE makes mistakes. Everyone. Indeed, the more thorough you are, the fewer mistakes you make, the WORSE the ones you make usually are (in that they are so much harder and more complicated to find – “dumb” mistakes are the best to make because they are quick to find and easy to fix).

        I’m with RobertB – only use a flamethrower if you NEED a flamethrower. Using a flamethrower to cook you food or light your cigarette is REALLY REALLY STUPID. You might do it right the vast majority of the time, but the rare mistake is a very bad thing.

        1. Newbie says:

          Yes but with maths and Physics I am very capable with going through LOADS of lines of equations, diagrams and working out to find my mistakes. Not to mention then that the mistakes I have made will be fewer in number because when making a mistake can ruin about 3 hours of your life you learn not to do so many (EDIT: also when your exams are half that time it also helps re-enforce the ability to make fewer mistakes). I don’t know whether I will need the “flamethrower” I just know I am being taught how to use that and a “Lighter?” (by the way I have no idea whether comparing Fortran to a lighter is correct I was just trying to be cool =D )

          1. Ingvar says:

            Fortran is more akin to a thermic lance, with a slippery grip and an obscure habit of occaionally swapping the operator and business end around. But, unlike the flamethrower of C and C++, it’s perfectly safe to use in gusting wind.

            1. Rosseloh says:

              Can I just say that all these analogies are making me quite happy?

              Yes, I also enjoy Rutskarn’s puns. Why do you ask?

              1. Sumanai says:

                The only problem I have with them is that at first I start thinking with them. For example I’ve felt inclined to post something like:

                “Suggesting that he should use a lighter isn’t really helping since he has already mentioned that he’ll be given a flamethrower.”

                Which is followed by my mind wandering and suddenly thinking about it literally.

                “But don’t flamethrowers have those little flames going on all the time? Couldn’t you just light your cigarette with that one?”

                At which point I’m a lost cause.

                1. Pickly says:

                  Was thinking the same thing, actually. :)

                2. WJS says:

                  A pilot light? No. That’s something you see in fiction. A pilot light works great on a boiler in a basement, but not so well on a hand-held tool used outside. Electric igniters don’t get blown out by the wind.

          2. Adeon says:

            Fortran is a pretty old language. As such it isn’t really designed to be a teaching language so it isn’t always user friendly. On the upside (and probably the reason your university teaches it) is that it is considered very good for numerical analysis and there are quite a few libraries that very purpose.

            1. Deoxy says:

              FORTRAN (now Fortran – no longer all caps, yay) is still in use because no language has been created to replace it. It is capable of using numbers of ANY level of precision (limited only by available memory). No other language does that (or at least nothing better enough to replace it – I don’t think it’s even been attempted in a long time), so Fortran is still in use.

              1. Simon Buchan says:

                Almost any newish scripty language, Python and Ruby for example, use infinite precision. And they took it from Lisp (circa 1950s :), not Fortran.

                1. Deoxy says:

                  That was the explicit reason I was given for its continuing use – I suppose my source was just wrong (or perhaps outdated).

                2. Kyte says:

                  Fortran’s still in use for the same reason COBOL’s still in use: Lots of preexisting code.

                  1. WJS says:

                    Arguably, that’s also the reason that C(++) is still used as much as it is. Java (for example) is pretty common, but I doubt it even comes close to C in terms of library availability.

          3. RobertB says:

            The problem isn’t that you’re not going to be able to get a grip on C/C++ from a detail perspective. The problem is that the sorts of errors endemic to C/C++ memory management and pointer manipulation can be easy to make and tricky to find. So _if_ you don’t have a compelling reason to use C/C++ (i.e. locked into legacy libraries, performance is critical but not so critical you want to use assembler), then you should use something that doesn’t offer these sorts of opportunities for error.

            I code for a living, and probably 85-90% of my work is in C/C++. But if I’m doing one-off or smaller-scale apps where the conditions above don’t apply, I’ll do them in Java and not feel the least bit bad about it.

        2. Miral says:

          This reminds me of that old programming adage — “Debugging is twice as hard as programming. This means that if you write code in the cleverest way you can, by definition you are not smart enough to debug it.” Or, to put it another way: “keep it simple, stupid!”

  23. kilmor says:

    I’ve been programming only in C for the last 7 years(right out of school), and have to say that memory management issues really aren’t that much of a problem for us. We have pretty good memory tracking tools for detecting any kind of memory overwrite or memory leak, and utility functions for alot of str duplication and manipulation so you’re not copy/pasting the same code everywhere.

    Also, nerd note: snprintf is slow(ish), you’re faster to just do like strncpy/strncat/strncat/etc for simply stuff like that, if its something where the code will get ran alot. If you dont care then the clarity of snprintf is always nice.

  24. Deoxy says:

    I really like that flamethrower analogy – I think it gets the point across better than the handsaw analogy I used in the last C thread here.

    Sure, you only hurt yourself if you make a mistake with a flamethrower, but still, NOBODY USES FLAMETHROWERS for their normal sources of heat or flame (cooking food, lighting cigarettes, etc). If someone uses a flamethrower to light their cigarette and burns themself, no one would call them stupid for making a mistake with the flamethrower…

    They would call them stupid for USING THE FLAMETHROWER IN THE FIRST PLACE.

    And so it is with C. Excellent analogy.

    Edit: and I would totally buy the Shamus Young Book on Programming, as so many others have said. Really, Shamus, you have a gift for this stuff.

    1. WJS says:

      Weren’t you arguing that C shouldn’t even exist though? Because the flamethrower analogy utterly fails to support that position. There are a number of tasks that flamethrowers are used for, even if they’re not common.

  25. Avilan says:

    Just wanted to thank you for teaching me something today. Good work Sir!

  26. Mike Riddle says:

    I am a 15 year C programmer (still use it).. Heard an interesting stat about 10 years ago. The last 10% of the bugs to be fixed in a C project take 90% of the bug fix budget and of those 90% are memory issues (overruns, rogue pointers etc). Do not know it is true, but I can easily believe it.

    After spending 14 years fixing memory issues (the first year I did not fix any only made them), I really appreciate any language with garbage collection (in particular Python and Ruby). You can focus on the problem you are creating the program to solve not the problems you are creating in the program.

  27. Daimbert says:

    Ultimately, the big thing here is a different attitude towards programming and programming languages. I’ve had to deal with C++ and Java (and done some Python and Smalltalk and some other languages) and the attitude difference is clear:

    C/C++: You’re the king, you know what you’re doing, you tell me what to do and I’ll do it. Even if I think it’s stupid, I’ll do it, just in case you know better than I do.

    Java: I’m the king, I know what I’m doing, and you get to use me to do what you want. I do as much as I can for you and hide the details, and I won’t let you do something that I think is stupid.

    Both have their downsides. C/C++ may take longer to code in some cases and you have to be very careful that you don’t screw-up. But if you need to do something odd, you can do it. On the Java side, you might not be able to do it if you want something not standard, and sometimes it takes a lot of work to convince Java (GridBag, I’m looking at you here) to do what you want. And in Java, you never have to think about what you’re doing, so sometimes you just don’t know what you’re doing.

    The biggest example of the latter is that I recent spent an entire weekend tracking a deadlock on threads where basically normal, don’t-think-about-it locking managed to have two methods lock each other out, just because of the code path that it HAPPENS to follow. C/C++ makes you THINK about locking, while Java says “Just tell me to synchronize and I’ll lock things for you”. Bleh.

    For the former, I was recently cursing at the Java compiler because it wouldn’t compile because I didn’t initialize a variable in a case for a new case I was adding … that I knew always ran last and didn’t want to initialize to anything anyway at that point. C/C++ would have warned me and moved on.

    It depends, really, on what you like. I like more control, so I prefer C/C++, at the cost of one type of error and some extra work in some cases. Some like the ease of use, and so prefer Java at the cost of a different type of error and extra work in other cases. Really, the best language to use is … the best language to use for your feature, taking into account how you like to code.

  28. Factoid says:

    I haven’t programmed much C, and I haven’t programmed C++ regularly for about 6 years now (since I graduated undergrad). But I don’t remember doing any of that stuff in C++ for string manipulation.

    We used what I thought was a standard header (string.h) for C++ string manipulation.

    Is that not an industry standard? maybe that’s specific to Microsoft’s Visual C++ compiler, which was the one we used in the computer science department.

    We had standard functions to do things like concatenate strings, but maybe they weren’t actually so standard.

    1. Alan De Smet says:

      is now simply , and it’s part of the Standard Template Library. The STL has been a formal part of C++ since the mid-90s. There was widespread support since the late 90s.

  29. Robert Conley says:

    C/C++ are part of what now called unmanaged languages. As pointed out in the OP you have to allocate and deallocate memory yourself. Visual Basic 6, Java, C#, VB.NET all manage your memory for you and do automated garbage collection.

    Having coded and maintained a CAD/CAM application in VB6 for 15 years along with a C++ add-on for Orbiter Space Simulator (Project Mercury and Project Gemini for Orbiter) unmanaged langauges should not be used for general purpose application development.

    They should be used when you absolutely need to use them. In my VB6 CAD/CAM application we have several libraries that are written in C or C++ that are called by VB6. And reason is that we need low level access to the hardware of the computer to interface with the metal cutting machines the software controls.

    Also my advice really only applies to NEW projects. Existing projects should not switch unless there is another compelling reason. For NEW project any of the managed mainstream managed languages will be far more productive then their unmanaged counterparts. The time you save on testing and maintenance is considerable.

    1. Deoxy says:

      Very well said.

  30. Adamantyr says:

    A great post, Shamus. Count me in as a prospective buyer of your forthcoming programming book. A lot of programming books these days are like PHD dissertations converted to book form, and are about half as exciting to read. One of my favorite game book writers is Andre LaMothe, who writes in a clear and accessible fashion, with a lot of geeky pop culture references thrown in.

    Another factor with C/C++ besides self-managed dynamic memory is security. NONE of the original functions take security into account. Many viruses and malware take advantage of buffer overruns to get into memory areas they would otherwise be blocked from by the OS.

  31. SatansBestBuddy says:

    I’ll jump on the bandwagon and say that you could, and should, totally write a programming book, if only because it combines your writing work with your programming work, but also because you’re damn good at making it clear what’s going on it the program, even for me, who, at best, can write in HTML. =/

    It’ll keep you working until you get a new job, to boot.

  32. SteveDJ says:

    Love the article, but there are a couple of problems, and it is bugging me enought that I have to ask about them. Cannot figure out quoting, so I’ll just copy:

    You said “In C*, the language does not do all of this legwork for you”

    At first, I thought you had meant to say C# with a typo, but now I think you meant to have a *footnote, but there is none.

    Later, you said “…Which is not even enough memory to store this one image:”

    …what image?

    1. X2-Eliah says:

      1×1 pixel black dot, I think.

      Also, dibs on discovering the hidden Shamus-photo :P

    2. Kyte says:

      The asterisk is a wildcard. ;)
      And the image’s the one hotlinked from I can haz cheezburger. You might have the site blocked.

      1. Shamus says:

        Hm. It’s actually not hotlinked. I have the image on my site. (I had to reduce it to get it down to the required size.)

        Why can’t people see it, I wonder? Can ANYONE see it?

        1. Deadfast says:

          I can see the image just fine.

        2. Nidokoenig says:

          I can see it, too.

        3. Kacky Snorgle says:

          I see the one-pixel black dot, but from the other comments it sounds like that’s not what you’re asking about?

          The single-pixel image actually makes a certain amount of sense in the context of the post (though even my distinctly non-techie brain thinks that 64K sounds like a bit too much for the amount of overhead in an image format…).

        4. Gandaug says:

          No problems here.

          Firefox for what it matters.

        5. Kyte says:

          Oops, I’d checked the wrong link there. ¬¬
          But yeah, perfectly visible.

    3. Shamus says:

      C* was shorthand for C or C++. Kind of unclear, in retrospect.

      There really is an image after that paragraph. Are you viewing the site through a feed reader, or on the site directly?

      Or…

      Are you pulling my leg?

      1. Tarev says:

        I can see it, if that means anything.

      2. Daimbert says:

        I see only a dot, and I’m reading from the site directly. I actually thought that was the joke …

      3. David W says:

        I also cannot see the picture – but I attribute that to my work firewall. Many images, embedded videos, and such, not just on your site, do not show up on my work computer but work just fine at home.

        I can’t see the comment avatars, or any of Spoiler Warning, and so on, at work, but then I also can’t go to blogspot or twitter, so I think it’s all in the name of encouraging productivity.

        It’s vaguely possible that it’s an IE vs Firefox thing, since that’s another difference between work and home for me, but I think that’s unlikely.

      4. Robyrt says:

        I can see the picture through my work firewall. It’s thematically appropriate in a way that internet memes rarely are.

      5. SteveDJ says:

        Nope, I cannot see it. Looking at the site directly. I’m at work – perhaps some firewall/proxy thingy… :-(

      6. Daphne B says:

        +1 to the “I only see a dot” contingent. I’m using IE, and if I try to load the link in its own window, it loads just fine. I’m reading your site directly.

        eta: I bet it’s the image’s empty “width” attribute. If you had no “width” at all, it’d behave like your “crash” image up top. But IE and FF seem to treat a “width” attribute differently, so I bet IE is treating it as a width of 0 and not showing anything beyond that.

        1. Veloxyll says:

          It appears to be an IE error. Thanks Microsoft… (it shows for me on Firefox, but IE does not show the one image)

          1. Drew says:

            It looks like the source includes “width=”” in the img tag. I wonder if different browsers handle that differently, with IE scaling the image down to a width of zero and Firefox assuming it means nothing. That’s the only thing I can guess.

            Edit: And there we go: http://scottbush.net/web-dev/empty-width-and-height-attributes-prevent-image-rendering-in-ie/

            So that’s the problem.

        2. X2-Eliah says:

          I’m using IE9 and I see the black dot just fine, and the link to the goat image is clickable :)

  33. Gandaug says:

    Thank you for this, Shamus. I’m not a programmer by any means and have no interest in programming, but you’ve actually managed to explain some things elegantly for someone with no understanding of the subject matter.

    You really are an excellent writer. While you’re looking for work you should really consider professional writing.

  34. Davie says:

    Very interesting. Knowing now that the results of continued operation could be even more annoying, I’m less angry at my programs and games for crashing.

    1. Jabor says:

      A game crashing then-and-now is much better than it trying to struggle on, and only managing to write garbage into your save files before realizing that it’s getting nowhere.

      Of course, if it were well-programmed neither of those would be a frequent occurence, but sometimes the problem is outside of the game’s control (graphics card drivers, I’m looking at you…)

  35. silver says:

    If you’re writing an operating system, a new programming language, or a high performance game, you probably need to be using C and just learn to deal with memory efficiently and correctly.

    If you’re not, save yourself a ton of time and effort and problems and use a language with memory management and reflection and such. You’ll thank yourself later.

    Actually, in the game case, ideally you would find a way to reserve C for the graphics parts and do everything ELSE in the high level language. I doubt anyone does this in practice, however. Sad for them.

    1. Blake says:

      On a current project at work we drive most of the game through lua but any intense functionality (rendering, pathfinding, math heavy code, anything network related) has to be written in C.

      The tools we use for development are largely C# and most of our build scripts are in python.

      It really does come down to using the right tool for the job. As people keep saying, you’re an idiot if you use a flamethrower to light a cigarette because learning to use a cigarette lighter might be too hard.

      EDIT: Speaking of work, I just noticed someone else in the office using the PixelCity screen saver. +1 wins to Shamus.

      1. Deoxy says:

        I love PixelCity… but it runs SO SLOW, even on the new machine at work. It’s probably some kind of setting or graphics card issue, but it does make me sad. So many other people seem to get to really enjoy it.

  36. Mystyk says:

    a$ = "Twilight is a great book"
    b$ = "- for starting arguments!"
    c$ = a$ + b$
    PRINT c$

    Run this, and it will take two phrases:
    Twilight is a great book
    and:
    - for starting arguments!
    And merge them together to print the entire sentence.
    Twilight is a great book - for starting arguments!

    No, it most certainly will not. Do you really believe that the space in your final output between “book” and “-” will magically materialize if it is neither in the end of the former string nor the beginning of the latter? [/pedant] ;)

    1. mcgurker says:

      Hahahahaha, excellent catch! This is the sort of thing we practically get grilled on in my CS class.

  37. mark says:

    Try java. It’s fun.

    1. Veloxyll says:

      No, it isn’t.

  38. Blanko2 says:

    i like how it says at one point “this post is for non-coders”
    and then it is followed by 552 comments from people who know how to code.
    XD
    i am highly amused by this.

  39. BuschnicK says:

    I don’t agree with you on the “performance used to be important but isn’t any longer” part of the argument. You can easily die a death by a thausand paper cuts if you don’t pay attention to the little things (like string handling for example!). Even on today’s machines:
    http://blog.buschnick.net/2010/09/performance-rant.html

  40. MaxDZ8 says:

    Shamus,
    I think you made a big mistake by using strings as an example. I understand your desire to make this easy to the layman but it seems no one is understanding problem’s generalization.

    Also, I find somewhat disturbing that we previously complained about performance and now we’re taking for granted there’s enough perf for everything… I’m with you on the STL hate.

    I seriously think your skills are still needed so good luck with your search. But next time, please strlen your strings first.

  41. Zaxares says:

    After reading all that, I just want to say that I now want to see somebody try to light a cigarette with a flamethrower. XD

    1. Will says:

      Jackass have probably already done this.

  42. EmmEnnEff says:

    Thing is, the vast majority of performance issues are caused by poor choice of algorithms, rather then “poor” choice of languages.

  43. ClearWater says:

    Variables ending in $, now that takes me back!

  44. MrPyro says:

    In my brief C++ career (about 8 months working on one project) I don’t think I ever trashed the heap by writing something too large to an array; my favourite memory bug was allocating memory to a pointer, moving the pointer, then calling delete() on it.

  45. MadTinkerer says:

    What I don’t get is why C doesn’t have a Standard Template Library like C++ does. Obviously the Linux ninja-gurus don’t need all the tools the STL gives you, but surely some kind of standard simple string class when you just need a standard simple string class like what you get in the STL (or vectors at least! I never knew how easy array handling could be until I started using vectors for a lot of stuff instead of arrays.) isn’t going to hurt.

    1. Alan De Smet says:

      To an extent C can’t. The Template part of the STL requires language support. std::string may look simple, but it’s actually a specialization of std::basic_string. You can easily make strings of wchar_t’s to do unicode. Not good enough? You can make strings of int, or even a custom character class if you really want. You can do roughly equivalent things in C, but they’ll always be crude and a bit error prone. (You’re in the realm of #define macros and casting.)

      Okay, what if we jettison templates? Skip super configurable objects, just give me plain strings, maybe in char and wchar_t variants. That’s much easier. But C won’t help clean up objects on the stack or nested inside of other objects. Part of why C++’s classes are awesome is that you write a good destructor and it will be automatically called when the object goes out of scope. C has no equivalent. You still need pay attention to when you’re done with a given string and explicitly ask it to be cleaned up. Still much more error prone than C++’s std::string.

      Okay, we’ll accept that. But there is still value in providing smarter string functions that do things like automatically growing to hold data, ensuring null termination, and perhaps tracking length so you can include the null character or at least make strlen really fast. A variety of libraries exist to do exactly that (and for a skilled C programmer it’s a few hours of work to whip one up). Why not include that? At this point we’re down to language design philosophy and history. C is very much “assembly, but better.” It was written for low level programming, an area it continues to excel at. You want to keep the assumptions to bare minimum. Making more powerful constructs part of the default assumptions comes with a cost (e.g. memory usage, code size, speed) that someone developing an operating system, a low-level library, or an embedded system frequently don’t want to pay. A given project might want some parts, but not all. There are also tradeoffs: for one project the ability to have null characters in a string is important and worth the cost, for others it isn’t. Something like the STL is quite good, but it’s designed to work for most people, most of the time. If you want more, you can easily built it yourself, get a third party library, or move to C++.

      There is an elegance to C’s minimalism, and I think it’s part of C’s strength. The C Programming Language is, for all practical purposes, a complete definition of C, and is surprisingly thin and readable.

      I like C and C++. They’re great tools, each well suited to different tasks. I don’t see a lot of value in trying to make C a more versatile tool when you can easily reach over and pick up C++ instead.

  46. thebigJ_A says:

    Yay! A whole post dedicated to my question.

    I thank you, and also you’re welcome, since I know you love to talk about this stuff.

    I actually understood that. (Though some of the comments seem to be in a foreign language.) Now just teach me all the rest of programming, and we’ll start a software company. Voila! Both our money problems solved!!

  47. gkscotty says:

    I get the feeling some people might enjoy this.

    Lets Break Pokemon Blue

    The original Pokemon games are famously easy to trash the heap in, yet not programmed to crash if it happens – that Z80 WILL soldier on no matter what nonsense you’re telling it. Which leads to all kinds of strange corruptions.

  48. Corylea says:

    Dear Shamus,

    You, dear sir, are a gifted writer.

    I’m a psychologist by training; what I spend most of my time doing is making mods for The Witcher. So the only programming I know how to do is in Neverwinter Nights scripting language, not exactly a big, important language. :-)

    But you’ve made the memory problems of C both interesting and understandable for someone like me. I read the whole article and enjoyed it. I read some parts aloud to my husband the Computer Science professor.

    You were wondering whether to go for a job coding or one writing. Only you know which way your heart leads. But I can tell you that you’re a gifted writer. Maybe you could write the manuals for a game company? I think you’d be really, really good at that.

    1. thebigJ_A says:

      Yeah, write a manual for Dwarf Fortress. That’d be awesome, although I think the laws of physics might prevent its completion.

  49. Tetris says:

    Shamus, you’re a literary genius. I love the pushing-your-car-through-the-intersection metaphor and how it works on two levels.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Jarenth Cancel reply

Your email address will not be published.