Project Good Robot 22: Text Files

By Shamus
on Oct 9, 2013
Filed under:
Good Robot

No pretty screenshots for you today, just walls of text and some confused ramblings. I’m a little out of sorts and not really in any mental shape to be accomplishing interesting things.

I’m feeling a bit un-creative and run down today. So I want to get away from the stuff that requires a lot of creativity (gameplay, writing, art) and just work on something straightforward and mechanical. I’m trying to make sure I do something on the project every day, but it’s probably a bad idea to do anything tricky while I’m all muddle-brained. I check out my miles-long to-do list and find something really dull: reading ini files.

You know ini files. Stuff like this:

[Settings]
Setup=1
music=0.40
sound=1.00
fullscreen=0
[Window]
Width=1456
Height=887

This is important because right now my ini reader is the only part of my program that’s still tied to Windows. To read and write ini files I’m calling GetPrivateProfileString () and WritePrivateProfileString (). Boo. If I ever want to do a Linux port then I need to replace those with my own code.

You might remember that last Christmas I wrote an ini file parser. The good news is that the parser seemed to turn out okay. The bad news is that I was using Qt at the time. Which means I was using the much more useful and powerful Qt string class and not the bog-standard std::string.

Crud.

std::string is how you’re supposed to use strings in C++. The problem is that it sucks. It doesn’t have a built-in way to strip whitespace off the beginning and end of a string, even though I have never ever written a parser of any sort that didn’t need that. Also, being able to compare two strings in a case-insensitive way is pretty fundamental to most text-processing problems. I need to find the [settings] section of the file even if it’s called [Settings] or [SETTINGS]. Also, I need to convert between numerical values and strings. Qt strings do all of these. std::string does none of them.

So what happens? Everyone needs this stuff, so everyone writes their own string-manipulating tools to cover all these deficiencies. Then everyone has their own little set of string tools, which end up in all the code they write. So now if I want to share my ini reading code with somebody they also need my string tools. They likely have the exact same string tools, and now they have duplicates hanging off my project, cluttering up my code, duplicating their code, causing name collisions, and just generally making life less easy.

I moaned about this on Twitter yesterday and got this reply:


I don’t even know what that means. Unicode still confuses me. I have no idea what “normalizing” a STRING would do. Now, normalizing a vector? That I get. But what does normalizing text do, make all the letters the same height? (I kid.)

I do wonder how Python, Java, and C# do these things, since they don’t have this problem. Regardless of the cause, it’s still a terrible problem that ultimately undermines the strengths of the language. I’ve seen a lot of string tool sets in my lifetime. Most of them were good, a few were great, none of them were complete and all of them were incompatible with each other. Stuff like this is one of the reasons that C++ isn’t nearly as portable as it should be.

Hey you: Language evangelist. Yeah you. Don’t tell me to learn your favorite language because it doesn’t have this problem. Just don’t. That’s like telling someone to learn to play the the trumpet because they’re getting blisters playing the guitar. You’re not solving my problem, you’re giving me a new, larger problem: “Learn an entirely new language.” I’m trying to finish a game here, and switching languages mid-project would be idiotic even if I could be guaranteed that the new language could do everything I need it to do. (And that is not a guarantee by any stretch.)

It takes time to learn to use a language well. Sure, you can get pretty far in a couple of weeks, but if you’re trying to make a playble, portable, stable, smooth-running game then it requires a depth of knowledge you just can’t get noodling around with example programs. And even if the learning curve wasn’t an issue, the simple conversion time to translate these 40k lines of code would be completely unreasonable.

Don’t be that guy. I’m telling you. That guy is obnoxious.

“Hey Shamus, just use boost!” Yeah, yeah. Of course, I’ve already got my own file-handling code, threading code, a bunch of math, and stuff for juggling strings. Should I re-write all my own stuff to use the more standardized boost library, or maybe just some of them? I mean, I hate to import a massive thing like boost SINCE ALL I WANTED TO DO WAS STRIP SOME SPACES OFF A STRING FOR THE LOVE OF KONRAD ZUSE, ARE YOU KIDDING ME?

Anyway.

So I’ve got this ini-reading code from last year. I pull out the Qt strings, add std:strings, and then plug the holes left by the transition. I run the program and it crashes.

Or at least, I assume it crashes. I hit the run button and the window fails to appear. I assume it silently bombed on startup? A couple of minutes later I realize the thing is still running, it just never got to the part of the program where it created the window. I kill it and then try again, but this time stepping through the code manually. It turns out the problem is that this new code is now the slowest code I have ever written in my entire life. Like, I’ve never seen anything this deeply screwed.

The file I’m reading looks like this:

[Map1]
Title=Chapter One
Subtitle=The Freezing Caves
Fog=0
WallColor=89f
LightColor=444
SkyColor=47a
StoryPoints=5 6 7
MusicTrack=steady-climb.ogg
Xp=150
Tileset=3
Stars=0
Pages=10
SwarmSize=5
Patterns=cave cave cave donut
RobotSwarm=cutter1
RobotMooks=worm1
RobotNormal=worm2
RobotBadass=worm3

That is the entire description for level one. The level description file itself is under 300 lines and is ~5k in size. For some danged reason, it takes ONE FULL SECOND to read the above data. That’s like taking eighteen months to make coffee. It’s so far beyond what’s sane or reasonable that I want to delete the parser to punish it for sucking so bad.

The same parser worked reasonably under Qt, and the only change I made was the aforementioned switch to std:string. I’ve heard complaints that std::string is slow, but there’s no way I can blame it for this insane dawdling. I don’t know what to blame, actually. Really, in a single second I should be able to read in the whole file, put every single character in alphabetical order, ROT-13 encode the whole thing, remove all instances of the letter “F”, save it back to disk, and still have nine-tenths of a second left over. What is going on here?

In a fit, I just rip the entire parser apart and re-write it from scratch. I make a fancy version that loads in the file, parses it, and keeps it cached in memory with key values already forced to lowercase for fast lookups. This solves the speed problem and now I can read ini files without using any Microsoft code.

This is actually kind of unfortunate. If I had my head in any kind of working order, I would have run some tests to figure out where the bottleneck was coming from. As it stands, I probably did more work than I needed to, and I destroyed the problem instead of studying it.

So that was my day. I did a ton of work to make a system that was probably overkill for my needs, all to replace a system that used to be a single line of Microsoft-specific code. I have no new features to show for my efforts. I’m not feeling too good about myself right now.

Since today is Post Huge Blocks Of Text Until People Fall Asleep day, let me hit you with another one. I have a reason for all of this text-parsing, and the reason is this:

(Protip: I don’t actually expect you to read this. Just skip to the bottom and I’ll explain.

#Robot template
#First is the name of the robot, in brackets. This is the name used to spawn
#the robot in the console or in the levels file.
 
[Template]
 
#Ai core defines what kind of logic the bot will use to move. Options are:
#* beeline - the bot heads directly for the player in a straight line.
#* pounce - the bot attempts to circle around the player, spiraling inward until close, where it "pounces".
#* tunnel - the bot loops around the player, passing through level geometry if needed.
#* sentry - bot will move until close enough to attack, then root in place.
#* orbit - the bot will circle-strafe the player.
#* walk - moves along ground.
#* hitnrun - Bot heads towards the player when it's ready to attack and away when waiting to refire.
 
AiCore=walk
 
#determines if this is a boss or not. Bosses have their name overhead and special music.
 
Boss=1
 
#This is the name shown to the user. Only applies to bosses.
 
Name=Walter
 
#BodyParts defines what sprites will make up the robot's body.
#If it's a boss, it will use the parts Boss0 through Boss7.
#Otherwise it will use parts Robot0 through Robot29
#Use -1 to end the group.
#See sprite.ini 
 
BodyParts=9 24 23 24 23 21 -1
 
#This controls how the body parts are arranged. Options are:
#* fixed - Single-sprite body.
#* snake - Sprites are chained together, with the ending ones chasing after the head.
#* squid - Main body has 3 limbs of 2 segments each.
#* jelly - Fixed body with one additional sprite hanging below it. Intended to look sort of "jellyfish"-ish.
#* worm - Like snake, except the body segments will fall with gravity, collide with ground, and move up and down as the bot moves laterally.
#* turret - Uses three sprites: A fixed main body with a rotating arm on either side. 
 
BodyType=worm
 
# Size of the head in game units. For reference, player torso is about 0.1
 
BodySize=0.3
BodyColor=fff
 
#Movement speed
 
Speed=1.9
 
#How close the player must be before this robot will switch to alerted state.
 
SpotDistance=4
 
#Eye is the index of the eye used on the head. 
 
Eye=1
 
#This determines how the eye moves around. Options:
#sweep, scan, player, heading
 
EyeMovement=player
 
#color of the iris
 
EyeColor=000
 
#offset of eye from center of head 
 
EyeOffset=0 0
 
#Size of eye relative to head. 0.5 = half head size. 
 
EyeSize=0.4
 
#This is the overpower level of all attacks: Lasers, melee, and missiles.
#They correspond to the player levels. Bounce is the number of bounces each laser should do.
 
AttackPower=1
 
#The distance at which the robot can / should / try to use ranged attacks.
#How this number is actually used depends on which AICore is being used.
#Defaults to 3
 
AttackRange=6
 
#This controls if the robot will lead the player when aiming at them. (Makes them much more dangerous.)
#0 = Never
#1 = Always
#N = Every Nth shot is predicted.
 
AttackPredict=0
 
#This is used by lasers. Number of wall reflections before the bullet stops.
 
ShotBounce=0
 
#Number of shots in a single volley of laser fire. 0 to disable lasers.
#Refire is time between volleys, in milliseconds.
 
VolleyLaser=5
RefireLaser=2500
 
#Launcher is the kind of thing to fire. Options are:
#basic, homing, cluser, and robot <robot name>
 
Launcher=basic
 
#Volley is how many missiles to fire in a single attack.
#Refire is time between volleys, in milliseconds.
 
VolleyMissile=2
RefireMissile=1500
 
#This is how often (milliseconds) it can hit with melee attack. Set to 0 to disable melee
 
RefireMelee=150
 
#The MAXIMUM number of each powerup to drop. Defaults to 0.
#Actual drop rate is determined by dice rolls and game logic.
#If DropMissiles is non-zero, then it will always drop at least 1.
 
DropMissiles=10
DropShields=10
 
#Voce bank to play when seeing the player, being hit, or dying
 
Voice=2
 
#Armor is simple damage reduction. 
 
Armor=1
Hitpoints=10
 
# Legs controls how many legs the bot has. (Only work PROPERLY for "walk" AiCores)
# WalkHeight is the distance from the ground to the knee. WalkStride is the 
# lateral distance from the body to the knee. WalkCrouch is how high off the ground
# the HEAD is in crouching stance. WalkStand is how high when standing. These values
# are expressed in BODY LENGTHS, not world units. 
 
Legs=2
WalkHeight=3
WalkStride=3
WalkCrouch=2
WalkStand=5

Okay, the point of all that is that I wanted to move a bunch of the game systems into the config files. It’s now possible to add new robots to the game just by editing the text file. You can then fire up the game and spawn your creation using the console. Or you can open up the level file I showed you earlier and add the bot to the game.

I originally opened up the game like this because a few people said they wanted to play around with modding. Once I was done I realized it was actually a lot nicer to make content for the game like this. I can mess around with game balance, difficulty, and pacing by altering the robot properties and without touching the code.

This also had some odd side-effects. At one point I had a special boss that was just hard-coded to create minions. When I generalized all robot properties I had to include this functionality. So now you can make any bot create other bots. (Instead of firing missiles, they fire robots.) I don’t even know what kind of trouble you can make with that. For fun, I made a robot that “attacks” by creating another robot of the same type. This creates an interesting powers-of-two problem that’s basically unsolvable if you let it go on too long. The robots can’t hurt you, but they can make copies of themselves until your computer runs out of memory, which I think is a kind of victory state for them.

So that’s how today went: I accomplished nothing, but I worked very hard on it.

Enjoyed this post? Please share!


A Hundred!2020202013I bet you won't even read all 193 comments before leaving your own.

From the Archives:

  1. Ross says:

    ‘So that’s how today went: I accomplished nothing, but I worked very hard on it.’

    This is quite possibly the MOST accurate description of what software development actually entails I have ever read. Great.

    • Neruz says:

      That one line manages to completely encompass the entire software development process in a single thought.

      • Well sometimes it may go the other way. I’ve had moments where I went “Oh shit, this is gonna take days to do” (and hour later) “Well I’ll be damned”.

        I’ve also found myself surprised at my own forthsight where I’ve made some functionality in such a way that I could easily change/improve/swap it later on.

        I’ve also search for solutions to a problem on the net, only to find the solution written by a me a few years younger. *sigh*

        I always though the phrase “I’ve forgotten more than you will ever learn!” as silly, but it’s not *facepalm*

        But yeah, a lot of times programing is a lot of faffing about.
        It is not unusual to write several pages worth of code and spend a day or two on it, then the next day you end up deleting it and writing a different way of doing it instead.
        A good programmer will generally delete more code than is actually “written” if that makes sense to anyone.

        I like the saying “As simple as possible but no simpler!” Some attribute that saying to Einstein (which is not really correct).
        THE gist of the saying though when applied to programming is that the code and anything related should be so simple and trimmed down as you can possibly make it, but without taking away functionality.

        A lot of the programming out there is very code bloated, partly due to nobody writing their own stuff, it’s just coding with Legos almost these days with some tiny code to glue stuff. And when bugs start flying nobody knows where the issue is or the performance bottleneck is.

        Any good programmer if they have the time and luxury to do so, usually prefer to write their own code from scratch. Which is why there is a ton of similar libraries of code or sets of tools out there.

        Unfortunately many higher up in companies that employ coders rarely know how messy it can get, and sometimes you can literally wast days doing nothing but try and understand how one thing is supposed to interact with another, or worse you need to re-write parts to make it work so you can finally use it (and realize you might be better of coding your own piece from scratch).

        Ever seen a plate of spaghetti and meatballs that have exploded? Trying to piece that together to what it originally looked like, that’s coding.

        • Volfram says:

          Yep, I’m forever saying “Well, this is going to be a pain in the neck to code.” 1 hour later. “Wow… was it really that easy?”

          Also “Alright, time to add a utility to…this… huh. So I wrote a utility for exactly this a year ago. And it’s twice as fast as what I was going to do.”

          The bottlenecks and difficulty of adapting other peoples’ code is why I decided to write my engine from scratch. I know exactly where everything is, and if something goes wrong(usually with my own code) I can dig down and fix it. Also a serious learning experience.

        • Steve C says:

          I’ve also search for solutions to a problem on the net, only to find the solution written by a me a few years younger. *sigh*

          I hate that guy! He’s such an asshole! Stealing all my good ideas. Jerk.

  2. Aegis says:

    “The robots can’t hurt you, but they can create copies of themselves until your computer runs out of memory, which i guess is a victory state for them.” That’s a fantastic type of attack: the ultimate in passive agressive antics.
    I love it! Keep up the good ‘work’.

  3. Probably not the answer you are looking for, but I found that making variable names (in ini files etc) all lowercase or all upper case to work very well.

    For example a [section] I would always do lower case and
    robot_badass=worm3

    just as an example.

    You can easily add some verifier to the loading code. Check if only a to z and _ are used for variable and section names and spit out “Error on line x” which would help find issues quickly.

    Likewise filenames, I always use all lowercase and underscore instead of a space. These are habits that I picked up during my early years of coding and web design (with *nux systems case does matter in filenames)
    Also with XML case matters so case Case CASE are all different.
    Another benefit is string matching as you can do simple binary matching which is lightning fast.

    Also, if you create a language.ini file the strings themselves (if you want to support weird letters/characters like æ etc.) Then using UTF8 is the best solution as a UTF8 string can be carried with whatever ascii/8byte string code you have,and you just need to convert from UTF8 to the systems character set at the GUI level.

    The only time you need to bite the bullet and make something case-insensitive is search and match stuff, like searching on google, or a forum or a website.
    Just think about your program, will the end user ever actually need case insensitivity for any strings of input or output? Can they name the save games? Number the files instead and let the user name them whatever they want even something the filesystem would not allow, it’s for display only instead.

    As for modders, just document that they need to write this or that in all upper or all lower case. Trust me, modders would rather be informed of such and adhere to it than have to find out through trial and error that some things must be this or that case and others can be anything.

    Also I see you are using decimals and hex. A little tip is to use $ at the front of hexnumbers. And a . in floating point numbers. And while it’s tempting to write a shortform fff instead of $000fff I advise using the full hex form, also if colors can have a alpha then $000fff7f would be advised instead. (actually I wrote that in RGBA form while in memory on x86 systems it’s actually in ABGR form, but I’m getting sidetracked again…

    What I’m saying is that you can simplify things if you just ensure consistency.
    Case is mostly just decorative, i find that _ serves a equal purpose in splitting up a variable name.
    I like how you have no whitespace around the = which is something I have pretty much by standard which allow the use of a strings that start with a space since there are no other whitespace to worry about.

    I know a old programmer saying is… Be strict in what you output and liberal in what you accept (I’m sure I remember the words incorrectly., google it for the original).
    But I have found that being strict in what I output and strict in what I accept makes for way more solid code.
    Because you can end up coding til your blue in the face to be liberal enough to handle “anything”.

    Seeing a line in the error log saying: Warning! variables must be lowercase, “Legs=2” found on line 42.
    Or: Warning! Variables must not contain any characters besides a to z and _ and 0 to 9, “Legs =2” found on line 42.

    Although using space might be useful for sections.
    [mook 1]

    [mook 2]

    And so on. this would allow to group the mooks, but also differentiate them (by the 1 and 2) with the space acting as a separator,
    which is much better than say [mook1] and [mook2] which you can’t easily loop/parse, at least not directly.

    • ET says:

      I personally will only ever use CAPS_WITH_UNDERSCORES or lowercase-with-dashes.
      Saves me the strain on my SHIFT-finger.

      The “Be generous in what you accept, and strict in what you output.” is from an article on the Linux philosophy, I think.
      I’m probably wrong, although it’s a good rule to follow.
      I just wish I could find the link to that old article.
      IIRC, it opened with an analogy using car sales lots:
      – Windows is a dealership selling rusty cars, which are actually built out of motorcycle frames with extra wheels bolted on. They get horrible fuel economy, but they were the first big dealership, and most people have gotten used to their quirks and performance problems.
      – Mac OS is a fancy foreign dealership which sells pristine white cars which get twice the fuel economy of the Windows cars, but *everything* is underneath a warranty sticker, and they usually just tell you to buy a new car if you want anything repaired.
      – Linux is a farmer’s field across from the other two dealerships, which the farmer has rented for a case of beer to some local university professors and hobbyists. They are giving away free armored tanks, which get quadruple the fuel economy of even the Mac cars, rarely break down, run great in the winter, can go 300 km/h, and accelerate from 0 to 100 km/h in four seconds. Hardly anybody drives them, because they don’t want to look like they are driving a tank, and don’t want to assemble it themselves from parts.

  4. Mephane says:

    I read the entire robot template, filled with glee. This is the stuff people used to mod Freelancer, which also had the entire game content (minus assets, of course) defined in INI files, even the weapons. It was even possible to change stuff like speed, turn rate and life time of missiles, to create anything from rockets that fly straight to homing missiles that will never stop their pursuit.

    It also lead to some pretty impressive conversions of the game to completely different universes like Star Wars and Battlestar Galactica, and many variants of rebalancing mods of the original game (I played Victor’s Mod extensively).

    • swenson says:

      I read it all too! My first exposure to modding (and, now that I think about it, programming) was when I was opening random files and discovered a big file with all sorts of text about the monsters, NPCs, items, and spells of FATE. And you know the best part? When I changed stuff in that file, it changed it in the game!

      Needless to say, ever since I’ve had a fondness for a nice, easily-moddable ini file.

    • *nods vigorously in agreement*

      So it saddens me that ini and config ans script files in games these days are encrypted in some weird way. (and these are single player games mind you or the single player client part of the game)

    • Chris Robertson says:

      Is it just me or is that file not valid?

      #determines if this is a boss or not. Bosses have their name overhead and special music.

      Boss=1

      #BodyParts defines what sprites will make up the robot’s body.
      #If it’s a boss, it will use the parts Boss0 through Boss7.
      #Otherwise it will use parts Robot0 through Robot29
      #Use -1 to end the group.
      #See sprite.ini

      BodyParts=9 24 23 24 23 21 -1

      I wonder what would happen in this case. Would it wrap around (using modulus or the like)? Would it buffer over run? Would the parser catch this and just error out?

      • Shamus says:

        Oooh. Good catch. The parser is far too stupid to catch this. Let me look…

        Okay, there are a LOT of sprites in the sheet. The first 32 are for normal robots. The next 8 are for bosses. Then there are 8 eyes. Then some particle effects.

        So if you say a robot is a boss, then it will try to use the first 8 boss sprites. If you go for #9, you’ll get the first eye. 8 more sprites and you’ll get the first particle effect. If you pass all of those, you’ll get some interface / HUD stuff.

        If you go past them all – say you reference body part 200 – then you will overflow a buffer and probably crash.

        I don’t actually care about grabbing non-robot sprites, but I should probably do a little bounds-checking to prevent the crash.

    • Bryan says:

      C&C: Red Alert (the first one) had a nice ini file as well, although it was embedded in the executable unless you copied it out to change it.

      Good system.

  5. Knut says:

    So this counts as your modding guide? :)

  6. Knut says:

    Also, for a nice explanation/introduction to Unicode, see this: http://www.youtube.com/watch?v=MijmeoH9LT4

  7. MichaelGC says:

    So now you can make any bot create other bots. (Instead of firing missiles, they fire robots.)

    This reminded me of the Gun Gun from Stolen Pixels.

  8. Svick says:

    Also, I need to convert between numerical values and strings. Qt strings do all of these. std::string does none of them.

    Aren’t you supposed to use std::stringstream for that? Though I admit it’s not very convenient.

    I do wonder how Python, Java, and C# do these things, since they don’t have this problem.

    For one, they don’t do the crazy thing C++ does that string is actually a template (std::basic_string<char>) and they specifically work with some Unicode encoding (AFAIK std::string doesn’t understand encoding at all).

    Specifically in C#, culture-sensitive operations are done using the current culture by default. This can actually be a headache, because it means "I".ToLower() returns different result in Turkey (ı) than in the rest of the world (i).

    But you can always specify the culture explicitly (or change the current culture). And there is also the “invariant culture”, which makes the most sense for things like parsing .ini files. (You want to use Settings, not Settıngs, even if you’re in Turkey.)

    • Csirke says:

      At least it seems like we got the standard number conversion functions for std::string in C++11: http://en.cppreference.com/w/cpp/string/basic_string

      This means it should work in Visual Studio >= 2012 and in gcc >= 4.7 (But you need the “-std=C++11” flag with gcc).

      • Nathon says:

        Aww, I’m sad GCC didn’t decide to go with “-std=C++0xb”. That would have been more awesome.

      • Svick says:

        Yeah, except there’s this part:

        It takes as many characters as possible to form a valid integer number representation and converts them to an integer value

        So, this code will work fine, even though I’m quite sure you don’t want it to work 99 % of the time:

        std::string s = "42c";
        int result = std::stoi(s);

        • Bryan says:

          That just matches the behavior of atoi() and strtol() in C. Those will pull digits off the start of the const char* you give them, until they find something that isn’t a digit. strtol(), at least, tells you how far it got.

          Which is not to say it’s *good* behavior, and using atoi() is particularly problematic because it has no way to signal an error, but it is there…

        • paercebal says:

          std::string s = “42c”;
          int result = std::stoi(s);

          Looking at the API (it is always a good idea to RTFM), you’ll see that, like the C “strtod” class of functions, std::stoi lets you get the index of the first non-converted character (that would be 2, for ‘c’) and even the base (which is, by default, 10).

          In the end, your code should be:

          size_t i ;
          int result = std::stoi(“42c”, &i) ;
          std::cout << result << "\n" ; // prints 42
          std::cout << i << "\n" ; // prints 2

          This is not the coolest function around (std::string's interface was never good anyway), but it does its job, as long as it is correctly used.

  9. While looking at C compilers I stumbled across something that might be of interest to some folks here. (especially if you do C or C++ coding).

    This is the Joint Strike Fighter C++ programming style guide, http://www.stroustrup.com/JSF-AV-rules.pdf

    If you want to create code that is solid enough to ensure your fighter jet does not fall out of the sky, then this is a good starting point. Heh!
    Interesting to see that AV Rule 60 is what I practice (a pet peeve of mine really).

    • Volfram says:

      I call those “C-style” braces. They take up a bit more vertical space, but it makes it REALLY easy to spot a missed brace. Even when I was writing Java code(where they called C-style “bad” and Java-style “good”) I would write C-style and then convert to Java-style just before check-in.

      Possibly not as good a practice, but I also typically omit braces whenever possible, like an IF statement with only one instruction following the IF.

      Interesting: rule 59 is enforced in D by the compiler. It’ll refuse to compile if you have an if(stuff); or for(stuff); statement. Must be if(stuff){} or for(stuff){}

    • ET says:

      I like rule 60; It makes things easier to follow visually, at the expense of what…a single freaking line of code?
      Can you tell that I hate “Egyptian Brakets”? :P
      Seriously, though, whenever I get somebody who says “I need to save that one line of code; Scrolling is easier for large files!” I tell them to use a language which doen’t even need those squiggle-brackets.
      Alternatively, I can usually find examples of code-mess, which easily exceeds one line of code per block/if/loop, because they have other habits, which are poor.

    • Nathon says:

      1: I’m appalled that people would consider a language like C++ in something so critical. (No, I’m not suggesting that Shamus should stop using C++)

      2: I used to care about brace styles, but Python and Go have conspired to make me stop. As long as you indent properly, it doesn’t matter where the braces go.

      • Deadfast says:

        As long as you indent properly, it doesn’t matter where the braces go.

        So nothing has actually been solved then :P

      • paercebal says:

        I’m appalled that people would consider a language like C++ in something so critical.

        “Appalled”?

        As a reference, the F-35’s software was coded from 35% in C++, 55% in C, and the remaining in other languages.

        So, not only they did consider a language like C++, but they even used it, and they made the plane fly.

        Perhaps you did miss something at some point…

      • Richard says:

        They need performance.

        The microseconds lost in unnecessary run-time checks could be the difference between “plane flies” and “plane falls”.

        Also, the CPU isn’t very fast and it doesn’t have very much memory because it’s most important feature is that it keeps on working under rather nasty conditions.

        When used right, C++ and C give you much higher performance than managed code.

        When used wrong, the managed code still dies, just not as spectacularly as the unmanaged – so why bother? Just do it right, and test the blazes out of it to ensure you did it right.

  10. Alexander The 1st says:

    First off: Those are some awesome comments you have in the ini file structure. The only part I don’t know if I understand at 5:00 in the morning is the eye index part, and the eye offset – but even then, I imagine that would become relatively obvious with a bit of testing during a modding session.

    Having said that: you doing this makes me want to see a later project (Not anytime soon, since I’d prefer to be able to play the game first :p) where you turn this ini template system into a full-on engine, Unreal Development Kit style.

    Presumably with a behavior section, I guess? That would be essentially some subset of functional programming that allows people to create an AI section for the stuff or something.

    However, I totally recognise that that is almost certainly way more work than the current parser – just sounded interesting as an idea, is all.

    …Actually, now I want to try the idea in my own preferred language to kill time tomorrow. :p

  11. houser2112 says:

    “I need to find the [settings] section of the file even if it’s called [Settings] or [SETTINGS].”

    Assuming you’re not writing an ini parser in general, why? You’re making the ini files, so you know what they’re going to look like.

    “I do wonder how Python, Java, and C# do these things, since they don’t have this problem.”
    Not a Java evangelist, but it’s what I know:
    Stripping whitespace
    Convert String to integer
    Convert integer to String
    Case insensitive String comparison

    • swenson says:

      Because then you don’t have to remember what you did before, and anybody who mods it doesn’t have to worry about it later. The point of programming, after all, is to get the computer to do stuff you don’t want to do, and something I definitely don’t want to do is have to look up whether or not I should be capitalizing a setting name.

      I mean, if I have a setting for leg length, did I use leglength? LEGLENGTH? leg_length, LEG_LENGTH, LegLength, legLength, Leglength, Leg_Length, Leg_length, or leg_Length?

      It’s a whole lot easier to let the parser handle it, IMO.

      • Well if anyone end up messing up leg_length that much then they got bigger issues I’m sure :P
        Messing up the spelling is not so bad if something tells you (debug window or error log) that you messed up.
        It’s worse if no error is raised and you realize that leg_length and leglength are actually two different ones, but you got so used to the parser to “fix” your mistakes.

        Personally I have an issue with something that “fixes” stuff I type automatically.
        THe only exception would be the input parser of a text adventure game to avoid too much “hunt the verb” (old school folks that know what the IF Archive is and Infocom was, should know what I mean).

  12. Kronopath says:

    Ah yes, string normalization. A big pain in unicode. I’ll try and explain it.

    Check this letter out. What letter is it?

    а

    If you said it’s the letter ‘a’, you’re wrong. It’s actually the Cyrillic letter ‘а’ (unless Shamus’s commenting system converted it, that is).

    Without proper string normalization, I could make an account on this website called “Shаmus” with a Cyrillic ‘а’ and fool everyone into thinking I was our dear author.

    Here’s another pop quiz. What’s the difference between these two letters?

    é é

    Again, unless my computer or Shamus’s commenting system converted it, these are two different strings. One is a single character é, and another is an ‘e’ followed by a combining acute accent character ( ́).

    If your software doesn’t take these kinds of differences into account, you’ll have visually identical strings that are considered completely different by your software. And even professional software doesn’t always get it right – if I do a search in this page for “Shamus” using Chrome, it doesn’t pick up the slightly different Cyrillic-infused “Shаmus”.

    Normalization of a string is a way of handling this, by converting all these visually identical sets of characters into a single canonical form. For example, converting my two-character ‘é’ into the single-character ‘é’. But depending on which canonical representation the library you’re using chooses for each, things like sorting of strings can end up having different results.

    Long story short, it’s a pain to handle, and C++ does nothing to help you handle it.

    • Shаmus says:

      Sorry, just had to try this.

      Edit: Ok, you definitely have a point there about the importance of normalization…

      • Volfram says:

        EVERYTHING I KNOW IS A LIE NOW!

      • That’s really cool, but also terrifying. Anyone could be an imposter – it’d be obvious if it was Shamus, but anyone else on this site could easily be impersonated with a simple switch of characters.

        I could even be an imposter.

        • Don’t be silly.No one would ever attempt such a thing.

        • Paul Spooner says:

          Except for the gravatar, which requires that the impostor knows your e-mail address. The address is very difficult to reverse-engineer. I know, I’ve tried.

          • For example, here’s a guess at my e-mail address. The name shows up the same (even with the cyrillic), but the gravatar doesn’t. Kind of a “seal of authenticity” if you are familiar with the commentators on the site.

            • Paul Spooner says:

              I suppose you could create a new gravatar account and upload a copy of someone else’s avatar in order to impersonate them. It would be a lot of work, and possibly difficult to find a copy of the picture with enough resolution to look the same (since this site scales them way down.)

              EDIT: Is this really Paul, or someone pretending to be him? Or even someone pretending to be him pretending to be someone else pretending to be him?

              I DON”T EVEN KNOW!!! WHAT IS REAL?

              • Not Daemian Lucifer says:

                Yep. Works just fine.
                Seriously though, I’m not DL.

                EDIT: In fact all but one of the above are actually Paul Spooner! Guess which one!

                • Paul Spooner says:

                  We must never use this power for evil. It must be sealed away for all time. And the seal broken into seven pieces and distributed to the seven sages. And maybe there can be some sort of element themed castles. And there would be, like, a “Light” castle, and maybe a “soap bubbles” castle, because soap bubbles are kind of like “air” right?

                • Cuthalion says:

                  Well, obviously. Pshaw. I mean, DL never uses spaces after periods.

      • bucaneer says:

        I don’t think you even need to use any special characters for that – nothing stops me from putting a plain “Shamus” in the name field here. It would be more relevant in the forums, especially since they don’t display join date or post count next to each post. Though impersonating Shamus is more difficult than impersonating a regular user, what with golden comments and red forum name.

        • Shamus says:

          Or does it?

          EDIT: Nope. Nothing stops me from using someone else’s name.

          Apparently the comments system isn’t intended to perform identity verification.

      • Kronopath says:

        Oh man, this is fantastic. But at least it knows well enough to not put the gold border around it.

        I should probably make myself a Gravatar for this email, lest someone show up with a Cyrillic “Kronopаth” account.

    • Drew says:

      That’s an excellent explanation, thanks. (And a pretty solid example as well, I might add)

    • Lalaland says:

      Fine explanation there mate, I only ran into this as the Irish language uses the fátha (aka an accent in French) which would bedevil me in college as college is the kind of place where David Murphy will decide his name is Daithí Ó Múrchu and annoy bad programmers like me :) Especially as young Daithí could be lazy and forget to type ‘i + AltGr’ and insist the system is at fault for not realising Daithi and Daithí are the same person.

      When the web moved to understanding Unicode URLs this became a very favourite technique for phishers too, that Cyrillic ‘a’ found whole new continents of people to befuddle

    • ET says:

      So…isn’t this a shortcoming of the underlying structure of Unicode?
      I mean, I’m sure there were reasons for including weird stuff like this, but…
      Shouldn’t the mapping from bytes-to-visual-characters be one-to-one?
      This (and other stuff, like the reverse-left-to-right character) seems like a headache that they should have forseen when writing up Unicode.

      • Alan says:

        Bytes to visual characters? Bytes are (in practice) 8 bits. No way we’re fitting every language ever into the 256 options we have there.

        But assuming you just meant “a single number,” you’re still stuck. The Cyrillic a is different than the Latin a. While in many fonts they may be identical, because Cyrillic has such a different feel, a font’s author may decide to give it a different representation.

        Then you get to “close enough”. Limiting myself to just characters on my (US Qwerty) keyboard: I have |, l, and I. At small point sizes in sans serif, those can be difficult to distinguish, but they are different. And it turns out that other languages have similar, but not quite identical characters that will look identical at a small font size.

        (Edit: looks like I is distinct using the default font Shamus set up. Try copying this into the URL section of your web browser for a better example: |lIi )

        Finally, there are a few reasons that you can combine characters together, but a big one is that some languages really work that way; it would be impractical to enumerate every valid combination.

      • James Schend says:

        There are Unicode variants that have a 1:1 byte-sequence-to-character relationship, UTF16 being the most popular. (It’s what Microsoft standardized on before UTF8 existed, so it’s still very common.) There’s also UTF32, which is the same concept but has enough characters to support more Asian languages.

        But that doesn’t solve the problem of “visually identical strings will be considered different”. Many languages share characters, and many languages have characters “close enough” that humans can’t casually tell them apart. (Kronopath’s post has a couple good examples.) This is a huge problem with URLs, for example, where phishers or scammers can make a site that a reasonable person will read as “bankofamerica.com”. (Which is why your browser displays a warning if the URL contains international characters, BTW.)

        • Alan says:

          UTF-16 is variable length, although most of the time there is a 1:1 relationship.

          Even UTF-32 doesn’t escape, as some characters that people need to use don’t have dedicated Unicode code points. For example, the only way to represent “ю́” in UTF-32 is with two 32-bit characters (the ю and the accent over it).

          I recommend “UTF-8 Everywhere” to programmers, it includes some good coverage of why UTF-16 is pretty much never the right answer and why UTF-32 probably isn’t worth using. (Thanks again to whoever linked it in a comment in an earlier Project Good Robot post!)

          • Bryan says:

            Yeah, no kidding. Being able to use the fact that a single zero byte still terminates the string in UTF-8 saves a lot of headaches, and then you only need to convert in “one” place: where you display the bytes to the user.

            (And some of the time you don’t even need to do that…)

            Of course, parsing — especially “case-insensitively” — is still a huge pain. :-/

          • James Schend says:

            I stand corrected, thanks.

    • Svick says:

      You’re certainly right about the second part (combining characters).

      But as far as I know, none of the defined normalization forms would convert Cyrillic а to Latin a.

    • Myself I think normalization is just wrong.

      For example, “Roger Hågensen” in international form becomes “Roger Haagensen” however in Norway “Hågensen” and “Haagensen” are actually two distinct and different last names. Using “Hagensen” would also be wrong, but less likely to cause clashes.

      Then there is Norwegian and Swedish. Both languages have “a” to “z” like the English and US alphabet, but Norwegian has the three additional characters æøå (and the uppercase ÆØÅ)
      while Swedish has the three additional characters åäö (and the uppercase ÅÄÖ)

      While å and å are the same in both languages,
      ø and ö are not quite the same as you can see (hopefully),
      and with æ and ä it gets even weirder as ä could visually be mistaken for å.

      You would not want normalization as for names you would end up renaming a person (and depending on the database in question that could be a huge issue), there’s enough identity mixups with people having the same name even as it is.

      I’ve found the best way to handle unicode is to not handle it at all (I just store it as UT8 as that ensures that code that only understand ascii text does not mangle the strings). And I lay the burden of unicode on the font system instead. (usually in my case that would be Windows itself).

      If a user wish to save a file with a Hebrew filename or using Japanese characters then they are free to do so, while none of my code directly support it nor does it prevents it. A filename is a filename, a variablename is a variable name, and so on.

      In some cases it would be harmful to normalize, take for example the user folder on Windows which can have odd characters, if you normalize that you end up writing either outside the user folder or into the wrong userfolder if you are really unlucky.

      Normalization of Unicode characters should never have been allowed to begin with.
      It’s enough of a security risk that people can get phished with lookalike characters, now if software in addition normalizes characters you could end up at the wrong site even if you did type a url correctly. (or as many do, type in google and click the very first link they see, lazy people).

      And on Linux for example “Filename” and “filename” are two different file names as the filesystem is case sensitive. Windows users have been spoiled by case insensitivity.
      XML is case sensitive. web urls (minus the domain part) are case sensitive (if the server is linux based at least), webpages in html is not case sensitive unless it’s XML in which case it is.

      For a programmer it’s a minefield out there. Most users luckily never notices this, but that is mostly thanks to programmers cursing for hours to get things to cooperate with each other).

      Also a letter that looks identical to another letter usually do not get typed by accident.
      And in areas where the correct characters are vital a documented standard is specified to indicate what characters are allowed.

      • Zak McKracken says:

        One case where normalization still helps:

        1: Searches. If I’m outside of Norway, searching a list for someone named “Roger Hågensen” but don’t have the appropriate letters on my keyboard, it would be nice if I couls still find him (and possible his nemesis “Roger Haagensen” at the same time, but that’s fine). Or if I don’t remember which type of accent some french word is using … very very helpful!

        The important bit is that the original data is not changed but as in case-insensitive searches, you’re being special-character-insensitive.

    • Daemian Lucifer says:

      The solution is simple:Rimuv ol douz annecesari letrz.All around the world.

      Though I must admit,then we wouldnt have ÆØÅ,which would be a shame.

  13. mwchase says:

    On normalizing unicode: it’s kind of sort of analogous to normalizing vectors. Kind of.

    See, when you normalize a vector, you’re throwing away some information the vector encodes that you don’t care about. Now, the information that you don’t care about, in a unicode string, is the sequence of code points chosen to specify a particular glyph, because there are many, many glyphs that map to multiple sequences of code points. (For example, there’s a code point for é, but there’s also an “acute” combining mark that you can put with an e, and it corresponds to the same glyph.) If you don’t make sure that any given glyph has just one canonical representation in your strings, then comparing the strings’ bytes is not guaranteed to give “true” for two strings that look the same.

    So, normalizing a unicode string is basically mapping code points to glyphs to canonical code points. Basically.

    (Anyone want to correct me? I read up on this stuff years ago, for fun.)

    • Zak McKracken says:

      only a small remark:
      The two “é” characters in Kronopath’s post further up look different on my machine. Meaning: I think that some fonts you might be using use the same glyph for different unicodes but that may be different for other fonts and/or other machines with different locales or whatdoiknow.

  14. Rodyle says:

    Shamus, if you want to alleviate these problems, just use Assembly. Sure, you’ll have to write a different version of the code for each architecture, but it’s a good deal, because you’ll never have to bother with strings again!

    /sarcasm off.

    I was wondering: what did you write that it took so long to parse just a view lines of text? I literally cannot think of a way in C++ to do that without cheating by using huge unnecessary loops.

    • Alan says:

      Don’t know if it was Shamus’s problem, but lots of calls std::string::append (or its close relatives push_back or operator+=) can kill you as the memory needs to be reallocated and copied. Exactly how awful this is will depend on how often you’re doing it and how std::string was implemented for your platform (I believe the spare capacity size is implementation dependent). I’ve actually seen code that read one byte at a time and called append repeatedly.

      • Rod says:

        You can call string::reserve to preallocate a large chunk of memory before you need it. Unfortunately, esoteric stuff like that is sometimes needed for good performance in C++.

      • Simon Buchan says:

        C++ is *very* good at appending, actually. It’s why all containers have a .capacity() and a .reserve() function: no re-allocation needs to happen unless the string .size() would become greater than .capacity(), and when it does, the .capacity() can be doubled (or some other factor), meaning that for any arbitrary number of +=, append or push_back() calls, each character is only copied at most twice (or some other factor), and there are only O(log n) reallocations: this is *fast*.

        I would guess Shamus’ slow parsing is due to using <iostream> which is terrible: unnecessary per character(!) thread locking and <stdio.h> synchronisation, locale conversions or debugging support are really easy to invoke. Best most of the time to fread()/ifstream::read() it all into a buffer and parse from that.

        Also <regex> is pretty awesome.

        • paercebal says:

          I agree: It is my observation people using raw C-style char arrays will usually get buffer overruns, and rarely a perceptible performance win, and people using raw malloc-ed char * memory chunks will get worse performance memory and/or speed-wise, not even mentioning the memory leaks issue that plagues that kind of unsafe code.

          C++ string is heavily optimized in this. First, it will have the small string optimization, meaning that for small strings (usually, 14 characters plus the zero), there will be no allocation whatsoever. Then, unless the one who coded your STL is retarded (and in this case, I hope you didn’t pay for it), the allocation will increase the size of the string with a constant factor, usually something around 2.

          Meaning that AT MOST, you’ll have, on a 64-bit platform, 64 reallocations. My guess is that you’ll have other problems before reaching that number.

          Even then, as already mentioned, the “reserve” function can be used to preallocate the right amount of memory, thus avoiding reallocations altogether.

          And of course, everything there is exception safe, and fully inlined.

          For the same reason std::sort is usually faster than qsort, you’ll get a lot of performance benefits by using a std::string written by a professional, instead of using a “vanilla” String class.

          Now, if only the C++ committee could standardize a true string helper functions library on top of that…

  15. Volfram says:

    Aaaaaaaand suddenly I’m REALLY glad I use D, which has most of that stuff either built into the language(you get access to Strings from the get-go, without importing ANYTHING) or the std.string library(white space stripping, parsing, and there’s a Soundex feature, which I had never heard of before. No normalization, unfortunately.)

    No, this is NOT a suggestion for Shamus to drop everything and learn D now. That would be about seven shades of impractical at this point. I am trying to promote D, though, because as a language it really needs more love.

    • Dave B. says:

      You may have already answered this but…

      This question will probably reveal my ignorance, but here goes. I hear this discussion over and over about what language to use for game development or OS development, and the consensus is usually that you should use C++ because of its performance and low-level relationship to the hardware. My question is, has anyone designed a direct competitor to C? We’re swimming in high-level languages, but I’ve never heard of a low-level language that is intended to “do what C++ does, but better.”

      I skimmed the wikipedia article on D, and it looks to me like that’s exactly what it’s trying to do. Am I wrong?

      • WillRiker says:

        These days, high level languages are much more useful because 99.99% of programming tasks don’t require blinding performance; programmers are considerably more productive in higher level languages, at the cost of performance that they don’t need.

        Still, there are a few languages that attempt to be useful successors to C for systems programming and the like: D, as mentioned. There’s also Go (invented by none other than Rob Pike and Ken Thompson). Objective-C isn’t big outside the Apple world, but it’s another attempt at adding Object-Oriented features to C.

        Honestly, though, unless you are writing the next Crysis game, you don’t need the performance of hand-tuned C++. I am 100% confident that Good Robot could have been written in Python and still hit Shamus’s performance goals (note, this is not a suggestion that he should have written it in Python, it’s just an observation).

      • ET says:

        Another note about using C:
        Don’t use it unless you need to.
        As WillRiker said, most of the time you don’t need blinding performance.
        And when you do, then you’ll rewrite that single library function, algorithm, or whatever.
        (And if you really need performance, you might even need to write assembly!)
        For the love of Ketchup, keep the rest of your code in your high-level language!
        Remember, programmer’s are expensive, computers are cheap, and we’re still living with Moore’s Law.*
        Also remember, that before you go ahead and start throwing in optimization stuff in your original language, or any other language, that you check what the Big-Oh cost of your thing is.
        It doesn’t make much sense to go wasting weeks shaving microseconds off of something, if you could have switched algorithms, and shaved seconds or minutes off of run-time! :)

        * Well, actually another law, or an extension of Moore’s law that applies to things other than transistor-count. But you get the idea: we’re still on an exponential curve for performance in many areas.

        • Nathon says:

          On writing assembly code: I’ve had to write assembly for some very weird architectures, but these days C compilers are good enough that they can be convinced to generate decently performant object code the vast majority of the time. If you’re considering hand-writing a routine in assembly, write it in C first then use compiler tricks to get the assembly output to be fast. It’s more portable and it takes less time to write it.

          edit: Whoops, this was supposed to be a reply to an earlier comment by WillRiker.

      • Nathon says:

        Pedantry time! C is not the same language as C++. They are somewhat related in that it is possible to write a C program that will not cause a C++ compiler to barf, but that’s about as far as it goes. The ABIs are different, the name mangling is different, and the semantics of identically-named things are different.

        There are several languages that have been written with the goal of replacing C (and C++). Go and D are two great examples. As a C programmer, I think they’re both great languages.

        • Volfram says:

          C != C++

          Yes! This is a very important distinction! C is much closer to the hardware than C++, and you can write with a much better eye on code performance and making sure the program does EXACTLY what you want it to do. C++ and D are both more abstract than C, and will generally never provide the same level of performance or micro-management.

          • Daemian Lucifer says:

            Bah!Real programmers use machine code,not some fancy language.

            • My college agrees with you. It’s why I have a physics degree instead of a comp sci degree. Going from Java and practical programming to C and academic programming was way too painful for me. On the plus side, I can make an infrared sensor from parts and then create a program that will do something when it trips, which is nifty.

              Give me a high-level language any day of the week. I respect those who can deal with C, but I’m not one of them.

              Oh, and Shamus, love the clear files! Clear coding (and comments) are wonderful and so rare.

            • Neil Roy says:

              I still use C. I have written some simple games with C++ and I see the usefulness of it, but I guess I’m just old school. My current game I am working on is 100% C. I set my compiler (MinGW 4.7.0) to the 2011 standard of C actually.

          • paercebal says:

            C is much closer to the hardware than C++

            This is so wrong…
            :-)

            C and C++ have exactly the same level of access to hardware, because C++ has the very same API as C to do this.

            In fact, C++ can automatically benefit from more optimizations from the compiler than C. For example, it is easy to produce in C++ code that will be either partially or fully executed at compilation time, while a C compiler would only generate the code to execute it at runtime (the classical example is comparing qsort vs. std::sort on large arrays of doubles, for example, my own measurements on VC++2010 showed qsort to be 50% slower despite commonly believed to be “close to the metal” when compared to std::sort’s use of a functor).

            And while C++ have the same access to hardware and better optimizations potential, C++ has also better constructs for efficient, type safe, zero-cost abstractions. Which means less bugs, and the same performance.

            For example, fonction pointers will *never* really get optimized in C (or in C++), while C++ functors, and in some cases even C++ virtual functions can be fully inlined. And at no moment you’ll use void * pointers, and casts, and other C tricks, meaning you’ll have less bugs.

            My advice: When comparing C and C++, please code the C++ part in real C++. Using a C++ compiler to compile C code will get little benefits, and is not a credible comparison.

            • Anachronist says:

              Partly agree, partly disagree.

              I kinda skipped C++, going from C to other languages, but when I did write something in C++ I had no problems using the standard C string libraries, which can do pretty much everything Shamus wants.

              In fact, the first C++ compilers were really just fancy preprocessors that turned your C++ source code into C, and then used a regular C compiler to create the executable. I see no good reason, beyond introducing a bit of non-OOP code, to avoid using the highly portable and optimized standard C library functions to manipulate strings.

              The tools are all there. They are meant to be used.

              • Alan says:

                While C++ has a lot of sharp edges, std::string is a win for almost every situation over handing around char*s. You eliminate questions of who owns memory. If you need to be able to handle arbitrarily large strings, std::string will worry about managing the length for you, eliminating a bunch of possible bugs if you solved it yourself. For the functionality shared between std::string and char*, they’re both highly portable and optimized. There is a bit more overhead for the std::string, but the trade off is that in some cases (strlen vs std::string::length, strcat vs std::string::append) std::string is strictly faster.

                For those cases where std::string doesn’t directly offer functionality you’d find in, say , you’ve got two options. One, you inevitably can do it, often by dragging into to the mix. Regrettably, this is often verbose. Fortunately std::string plays nicely with the the C string functions, I certainly feel no guilt about breaking them out!

                As for “which can do pretty much everything Shamus wants,” I don’t think in the sense that Shamus means. He calls out stripping white space and case insensitive string comparisons, and the standard C library isn’t going to give you functionality that does that directly. It’s not hard to write those functions. It’s equally not hard to write in C++ (in part because we cheat and use C’s functions. :-) ). Indeed for stripping whitespace, I think the C++ will be more succinct, as we’re looking at something like:

                std::string strip(const std::string & in) {
                const char * WHITESPACE = ” \t\n\r\f\v”;

                size_t start = in.find_first_not_of(WHITESPACE);
                if(start == std::string::npos) { return “”; }

                size_t end = in.find_last_not_of(WHITESPACE);

                return in.substr(start, end-start+1);
                }

                versus something like http://stackoverflow.com/questions/122616/how-do-i-trim-leading-trailing-whitespace-in-a-standard-way

              • James Schend says:

                In modern C++ compilers, if you turn on exception handling (on by default), believe it or not C++ does have a runtime and will make your program slower than the exact same code in C. (Even if you don’t use exceptions.)

                So I would say that the statement “C is closer to the hardware than C++” is true– assuming the C++ has exception handling enabled.

                • Alan says:

                  I would go so far as to say, if one is going to disable exception handling in C++, maybe they should use C. This isn’t a bad thing, there are situations where it’s (IMHO) clearly correct: OS kernels, libraries (C is the language almost every other languages plays nice with), embedded systems, any situation in which memory and/or processor time is extremely precious. But if you turn off exception handling, a lot of C++ idioms fail. You end up creating code that make it easy for new developers to do the wrong thing, like failing to check what new returned. You can’t (easily) use chunks of the standard libraries because some of their error reporting mechanisms disappeared. The benefits of being able to write what is essentially C With Classes is dubious in light of the costs.

        • Dave B. says:

          Thanks for clarifying. I was a bit unclear on how close C is to C++, and got careless with the terms.

        • James Schend says:

          You think *Go* is a great language? The Go that shows up in DailyWTF threads constantly?

          Hmm.

          • Alan says:

            You want as many of these to be true as possible before using a programming language:

            Core language is reasonably stable – Otherwise you’re going to be spending a lot of time keeping up and not getting actual work done.

            has been around for at least 10 years – Anything younger than this and you’re pretty much guaranteed there are weird holes. This is roughly how long it takes for a language to jettison “purity” for practicality. Even for languages that don’t give a crap about purity, you’re looking at roughly this long until the final major holes are identified and closed. (It took Java 9 years to add generics. Ah, the good old days of the 90s, when Java fans insisted that the lack of generics was a feature.)

            has an active user community – You want to be able to ask other people questions, to ask Google questions and find existing discussions. Also, an active user community is a good sign that a language will continue to be supported.

            has multiple implementations (the compiler and/or interpreter) – A single implementation is suspicious, doubly so if it’s proprietary. For your own safety, tie your fortunes to as few other companies as possible.

            has an open source implementation – Now you’re not dependent on anyone else! If you’re desperate enough, you can port and maintain it yourself.

            has a working debugger – Either this is self evident, or you really need to learn to use a debugger.

            has non-trivial third party libraries available – This is another sign of an active, healthy user community.

          • Volfram says:

            Go does WHAT with dependencies!?

            Because THAT isn’t a disaster waiting to happen at all!

            • James Schend says:

              Oh there’s more. Go doesn’t have constructors, for example. But it has a dumb convention that every class that requires construction has a function named NewX() which does the construction. But calling code has no particular requirement to use it.

              Which means instead of the language just supporting constructors in the first place, literally every class in Go has to have a bunch of boilerplate in every member function to ensure it’s been constructed properly first.

              Oh and classes that don’t need construction can omit the NewX() function. How do you tell whether a class needs construction or not? You read its documentation.

              Also, function and property visibility is determined based on whether its name starts with an uppercase or lowercase letter. Meaning, Go (for all practical purposes) doesn’t support Unicode variable names, since many languages do not have a concept of “capitalization”. This language was created in 2007.

        • Shivoa says:

          One of the interesting points of C++ being that it is the kitchen sink of languages and so does contain C (with some compatibility issues so not a superset, some interpretations are different but it goes as far as to include the complete C standard library in the C++ standard libs in the same std:: namespace). C++ obviously is a much larger language but those who are quick to claim the vast difference rather than rough superset status should consider that C99 is not a superset of C89, that is not an important line in the sand.

          Speaking of the C std lib and it being in C++, if your C++ strings are just C strings (null terminated char sequence) then it’s kinda core functionality to just call atoi/atol in the ’80s without worrying about error handling and later shun anyone who would be so foolish and call strtol (all in cstdlib) and use the error handling it provides (and not having a specification that allows undefined behaviour if you didn’t validate your string before you call the function – which is why you should NEVER use atoi/atol today, let strtol use the lib built in validation rather than calling something which can do anything it wants the second you pass in a string where “the converted value would be out of the range of values representable” by the return type).

      • Volfram says:

        Short answer: yes, D was designed as a replacement for C++, to help eliminate many of the pitfalls, flaws, and inbred madness that C++ brings to the table.

        My rule of thumb: D for application programming, C for embedded programming.

        Long answer: what WillRiker said. He said it better than I could, and probably knows the field better than I do.

      • Mark Erikson says:

        There’s a lot of discussion about that lately, actually. The answer is “sort of”. The Go language is advertised as being a competitor of C++ (and was discussed as such by its creator recently), but is really aimed somewhere between C++ and Python. The D language is close, but the fact that garbage collection is built-in to the runtime, and then needed by portions of the standard library, make it unsuitable for C++ programmers who want absolute control over how their code runs.

        The best example of a true bare-metal, full-control, systems programming competitor to C++ that we have is the Rust language, currently under development by Mozilla. It’s very much NOT ready for general usage yet, as the language syntax itself is still in flux, but I’ve seen numerous comments saying “I’m excited about this, and as soon as it’s ready, I’m looking forward to using it”. There was a Rust 0.8 release recently, with discussion on Hacker News and Reddit.

        • WillRiker says:

          I don’t have any firsthand experience with D, but my understanding was that while D provides garbage collection, it also provides facilities for doing manual memory management. This seems like the best of both worlds to me — you get automatic memory management, considerably increasing programmer productivity, but you still have the option to selectively manually manage the memory of the parts of your program that need the performance increase. Does it not work out like that in practice?

          • Volfram says:

            It does work like that in practice(I generally find that the garbage collector is good enough I don’t really need to do manual work), and in fact you *can* turn off the D garbage collector, but it’s a really bad idea because it makes a few things(dynamic arrays, mostly) not work anymore.

            You *can* tell the GC not to run during a given block or manually tell it to do a soft or hard compact as well. It’s one of the reasons I still prefer C for embedded programming.

            If you were writing a large project in C++, 9 times out of 10 you’ll be importing an existing garbage collector anyway. If you’re writing a large project in C# or Java, those languages are both garbage collected as well. If you’re writing a large project in C, you should have a good reason, or you’re spending an awful lot of programmer time on a low-level language, but I believe even C has garbage collection libraries.

            • Alan says:

              If you want garbage collection, C++ seems like the wrong language (and I say that as a huge fan of C++). I appreciate that there are specialized cases for it, but I’d be shocked if 90% of large scale C++ projects used a garbage collector, even limiting it to new projects.

              (Randomly on the subject of strange specialized things for C++, you can interpret C++. The world is very strange. http://root.cern.ch/drupal/content/cint )

              • Volfram says:

                Admittedly I am making this assumption based on what I read while studying D, so there may be a small chance of very major bias.

                But it did apparently happen enough to be mentioned specifically, so perhaps 3 times out of 10?

                • Alan says:

                  I’ve got one big blind spot in this area: I have a poor grasp in how software developed for use by companies is developed. As that’s the bulk of the software in the world, that’s a huge blind spot. Given that, I’m pretty confident that garbage collection is not commonly used in C++. Anyone doing new development who wants garbage collection has almost certainly moved to a language which supports it natively. Anyone maintaining existing code probably can’t easily use a garbage collector because it changes how the language works in potentially dangerous ways; existing code has a good chance of relying on Resource Allocation Is Initialization for prompt release of precious resources, and garbage collection can’t keep that guarantee. (Indeed, lack of RAII support is the biggest thing that drives me crazy in most garbage collected languages. I gather that D offers garbage collection and guaranteed destruction on scope exit, which seems promising.)

                  • Mephane says:

                    Yepp. Not only is C++ not ideal for implementing and using a garbage collector (always easy to accidentally allocate stuff that is not detected by the GC), such a thing would always be in conflict with some features of C++, mostly RAII. For those who don’t program in C++ (or at all), the abbreviations stands for Resource Allocation Is Initialization, although most important effect of the technique is not obvious from the name. The gist of it is that you create an object that holds a resource on the stack, and when it scope ends (for example at the end of a function or loop iteration), the object is automatically deleted, and when that happens, the object also automatically frees up the resource, be it memory, an opened file, or something else. You can thus explicitly control when, for example, a large chunk of memory is allocated and when it is deallocated while reducing the risk of memory leaks to a minimum.

                    Example:


                    void WriteTextToFile(const std::string& fileName, const std::string& text)
                    {
                      // Open the file for writing, create it if it does not exist.
                      std::ofstream out(fileName);
                      // Check whether the file is all good and can be used.
                      if (out)
                        // Write the text into the file.
                        out << text;
                      // Scope ends, the "out" object is automatically destroyed, the buffer flushed and the file closed; if an exception occurs and the operation is thus aborted, the file is still closed properly. There is no way other than "goto" (which is abhorred for this and many other reasons) to circumvent this.
                    }

                    Edit: I forgot how to properly format code in comments here. <pre> appears to be removed from comments automatically. I attempted to format it as best as I can with some HTML-Fu.

                    • paercebal says:

                      Actually, C++/CLI has an interesting take on mixing RAII code (native C++) and garbage collected code (managed C++).

                      In essence you can write (if my memory is right… I didn’t write C++/CLI since a few years):

                      // T is a “ref class”
                      // C is a normal C++ class
                      // both are declared in the same header
                      // and used in the same source

                      void foo()
                      {
                      T ^ t1 = gcnew T() ; // will be garbage collected
                      T t2 ; // will be disposed at the end of the scope
                      C * c1 = new C() ; // will need a delete, or a smart pointer
                      delete c1 ;
                      C c2 ; // will be destroyed at the end of the scope
                      }

                      So, mixing GC and RAII would not be a big deal in C++, as demonstrated by C++/CLI solution.

                      IIRC, C++11 has already a few bindings for a GC. It’s only a matter of time before C++ has a standard GC in addition to its other memory management schemes.

                      And, as always, pick the one that suits your needs. And in C++, most of the time, it will be RAII.

      • James Schend says:

        Ok there’s a few things:

        1) The vast, vast, vast majority of programs don’t require C++ performance. Yes, including video games. For the extremely tiny bits of code that do require C++ performance, high-level programs can call-out to C++-compiled DLLs.

        2) The vast, vast, vast majority of programming teams should be prioritizing programmer efficiency over hardware efficiency. Either you’ll beat your competition to market, or you’ll be able to spend much more time working on polishing up your product to make it perfect than just getting it to run in the first place.

        3) C++ isn’t as fast as its supporters say it is anyway. Sure, it doesn’t have periodic garbage collection, but its crummy design practically guarantees it’s going to spend ages doing expensive memory allocations that higher-level languages don’t need to worry about– there’s a post above just talking about the memory allocations required to do STD::string appends, which is mostly a non-issue in higher-level languages.

        Based on those points, I’d say C# *is* a direct competitor to C++. And a really, really damned good one. And one that kicks C++’s ass six ways to sunday.

        There are two reasons left to use C++:
        1) Better tools support (you can practically guarantee any gaming-related library will have a C++ interface).
        2) “We’ve always done it that way”, the world’s crappiest reason to do anything.

        … to answer your question more literally:
        * C compares well to C++. No seriously. C is a better-designed language than C++, and most of the C++ “niceness” can be reproduced in C without too much trouble.
        * Objective-C was originally intended to fill the niche you point out. The current form of Objective-C, however, has a garbage collector, so it’s not as C-like as it was ten years ago.

        • Shamus says:

          I haven’t used C# myself, but being a Microsoft product I’m skeptical that C# is the way to go for people looking to make portable code.

          • WillRiker says:

            Interestingly enough, it is actually quite portable — the Mono project allows you to run .NET languages on Linux and (I assume) OSX. There are quite a few indie games written in C# using XNA (which is a surprisingly good game development framework) that work well on Linux and OSX because they’re tested against MonoGame. One example? Bastion.

          • An old but good saying is… “The right tool for the right job!” that will also hold true.

          • James Schend says:

            For normal desktop applications, you have Java-level portability. (Which means: Mono will let you run it on Linux and OS X, but it’ll look a bit weird on either. Especially on OS X.) It’s a shame that GUI portability isn’t better than Java on non-Windows platforms, but it does exist.

            For games, C# has *excellent* portability. All the C#-based game libraries/engines support all major PC OSes, plus Xbox 360, Xbox One, Playstation 3, Playstation 4 and now Vita. Basically the only thing you’re missing is support for Nintendo consoles.

          • Volfram says:

            This is effectively why I use D. D has mostly the same featureset as C# besides being compiled instead of interpreted(and there’s an LLVM project for D, meaning it can be interpreted if you want it to), and the GDC compiler allows compilation of binaries in any output format that GCC can output in.(their words, not mine)

            I will not say that D is better than C# or C# is better than D. I will say I know of two major differences.

            1: C# is interpreted, D is compiled
            2: C# was released with the full weight of Microsoft behind it, D was a pet project of half a dozen guys.

            • Svick says:

              C# is not interpreted. When you compile C# program, the compiler emits CIL code (Common Intermediate Language). When you execute the CIL-containing binary, the runtime JIT compiles it into native code (JIT means only parts that are needed right now are compiled).

              This has some performance implications (the compilation takes some time; the JIT compiler can’t perform some optimizations that could take a long time), but it’s still much better than interpreting some code, as languages like PHP do.

            • James Schend says:

              1. C# is compiled into CIL, which is basically the .net equivalent to Java’s ByteCode. Some C# uses do just-in-time compilation (typically: web apps, in the PHP mold), some compile the entire program right away, but it’s always compiled.

              I honestly don’t know where you got the idea that C# was interpreted.

              2. The problem with working in D is tools support. What IDE do you use to code D? What debugger? Do you even have a graphical debugger for D? What about Edit and Continue? Does it allow debugging your GPU shaders? Intellisense? Automated refactoring tools?

              Like I said previously, the key to successful software is programmer productivity. If you don’t have good tools, you don’t have programmer productivity.

        • Matt Downie says:

          Replacing C++ with something better is like replacing the English language with something better; it can be done, but there’s not much advantage unless you can get everyone else to use it too.

        • Shamus says:

          “3) C++ isn’t as fast as its supporters say it is anyway. Sure, it doesn’t have periodic garbage collection, but its crummy design practically guarantees it’s going to spend ages doing expensive memory allocations that higher-level languages don’t need to worry about– there’s a post above just talking about the memory allocations required to do STD::string appends, which is mostly a non-issue in higher-level languages.”

          What do higher-level languages do instead of allocate memory?

          • James Schend says:

            They use a StringBuilder (or equivalent) class, specifically designed to not allocate on every append.

            But from Alan’s post above it sounds like I was wrong about that criticism– you can “guess” the size of the string in advance in C++ and tell it to pre-allocate that much space.

            I will stand by this though:
            1) C++ has a lot of features that are very difficult and confusing and can easily cause performance-killing weirdness in your code.

            2) C++ code, when benchmarked fairly, almost never beats equivalent Java or C# code, despite being more difficult to write.

            3) C++ coders who obsess over performance, and use the language because it has greater performance, virtually never actually measure it.

            A good summary of C++’s weaknesses is here: http://yosefk.com/c++fqa/defective.html

            • Alan says:

              C++ has a lot of sharp edges. It’s incredibly inconsistent. Mastery is very difficult (although practical fluency isn’t that awful). But it is what it had to be. It had to be highly compatible with C. It adopted a general belief that the language should try to support a wide variety of solutions to better map to different problems, that it should avoid telling you to convert your square peg problem into a round peg so that the language can remain pure in only offering round holes. The result was inevitable. It hasn’t continued to dominate for so long because programmers are stupid, it was the right language for the time and in some areas remains the right language.

              (For anyone curious about why C++ is the complex, sharp edged, idiosyncratic beast that it is, I recommend Bjarne Stroustrup’s The Design and Evolution of C++, which I found to both entertaining and enlightening (of course, I’m the sort of person who finds discussions about tradeoffs in language design entertaining, so your mileage may vary).)

              1) I’ll yield very difficult and confusing, but performance-killing? Do you mean that one can unintentionally create poor performing code? My own experience (18ish-years as a C++ programmer) doesn’t support that claim. Terrible performance comes from the same sorts of mistakes you can make in any language: bad algorithms, misunderstanding the cost of operations. There is a popular idea that a+b might be unexpectedly expensive, but in practice how many operations that are logically addition are unexpectedly expensive? My own experience is none.

              2) I can just as fairly say that Java and C# code, when benchmarked fairly, almost never beats equivalent C++ code. The devil is in the definitions of “benchmarked” “fairly” and “equivalent.” When it comes to memory usage, garbage collected languages are fighting with one hand tied behind their back. Garbage collectors get increasingly slow as memory gets tighter (a major challenge on Android or anyone running long-lived services). Or to be extra snarky, what language is your favorite JVM implemented in?

              3) All too true. :-) For the majority of projects, C++ is the wrong solution, and most people who think they need the speed and/or reduced memory footprint are wrong.

              The “Defective C++” article seems like it could be reasonably summarized as “C++ is highly compatible with C and C++ isn’t Java.” It tells me more about the author than C++.

              • James Schend says:

                To give an example of the performance implications of not having a garbage collector:

                1) If you use RAII, and have a complex data structure in the class, the only way to pass one “branch” of that data structure to another class is to make a complete copy of it (otherwise, you’re violating the rules of RAII.) Depending on the size of your data structure, that could be a huge performance hit.

                It’s possible to design your classes so that’s not too much of an issue, but … why should you have to? A garbage collector makes it a non-issue.

                2) Since you don’t have a garbage collector, you can’t compact memory. Which means, over time, your C++ program will have little “slits” open up in its memory space that are too small for new objects to be inserted into. So any C++ program of any complexity “leaks” (for lack of a better word– not technically a leak) memory. Perhaps not an issue for a game, but definitely an issue for something like the F-35 fighter mentioned above.

                As far as C compatibility: the article’s point is more like: “C++ pointlessly duplicates features already available in C, but the C++ version is almost invariably worse.”

                But my point 3 is the most relevant one. I constantly hear from people who chose C++ or C because of “performance”, and ask, “did you actually benchmark it?” and the answer is always stunned silence. Sure, garbage collected languages were slower in 1995. It’s not 1995 anymore.

                • Richard says:

                  Or use a Smart Pointer, whether boost, Qt or C++11.

                  I use smart pointers all over my code – in fact, almost everything is either instantiated on the stack or handled by a smart pointer that is itself on the stack.

                  The ‘hard’ part is deciding which smart pointer is appropriate – is this a singleton, strongly-shared, weakly-shared etc.

                  Weak pointers are brilliant – “I want to know if this thing still exists”.

                  To some extent strongly-shared pointers and garbage collection are near-equivalents, with the difference being that GC works on “Everybody wait a moment while I go round killing the spares” and strongly-shared pointers on “Nobody needs me now, goodbye cruel world”

                  Suicide versus murder. Or something.

                  Did I take that analogy too far?

        • Alan says:

          I was the person who wrote about std::string::append, and Simon Buchan correctly pointed out that the appends aren’t really a big deal. In the case I was thinking of, the real price was probably calling ::read() one byte at a time.

          Appending to a String in Java is, ironically, even worse. Because String is immutable, the closest you can do is to String.concat, which must allocate a new String every time. So instead, you’re encouraged to use StringBuilder, which is implemented pretty much like C++’s std::string. This does not improve my opinion of Java.

          • James Schend says:

            C#’s compiler will automatically find String objects with a lot of appends and swap-in StringBuilder objects as appropriate. Does Java not do this? (It wouldn’t surprise me– I don’t have much respect for Java’s tooling, especially since Oracle took it over.)

            • Alan says:

              The compiler is re-writing your code to use entirely different objects? That creeps me out.

              Can I teach the compiler to do the same trick for other classes? Maybe I have a final/immutable MyImage that can do compositing by copy, but for non-trivial use you should use MyImageCompositor. If I can teach the compiler about the replacement, it still feels weird, but at least it’s a general solution. If it’s limited to String/StringBuilder, it seems like an admission that their String class wasn’t actually developer friendly, so they hacked in a workaround instead of fixing the class.

              • James Schend says:

                > The compiler is re-writing your code to use entirely different
                > objects? That creeps me out.

                Why?

                It saves tons of programmer hours, it saves tons of runtime, it’s a complete no-brainer.

                Oh and BTW: Java does it too.

                > If it’s limited to String/StringBuilder, it seems like an
                > admission that their String class wasn’t actually developer
                > friendly, so they hacked in a workaround instead of fixing the
                > class.

                Well, fair enough. But it’s a heck of a lot better than the C++ alternative where the construct isn’t developer friendly, but the language designers don’t give a crap about ever fixing or addressing it.

                • Alan says:

                  When a single language construct gets special treatment, I’m hesitant to call it a no-brainer. I’m dubious of the compiler learning an optimization for a single use case. I’m dubious of a compiler optimizing code by replacing it with entirely different code paths. How does it even look in a debugger? Is the debugger sneaking behind my back and replacing a request for the value of mystr with a call to tmpstrbuilder.ToString()? Given the decision to make String immutable/final, it’s probably reasonable. But it’s creepy.

                  To the extent that std::string has faults, it’s that it lacks a bunch of functionality. std::string is a container for a pile of bytes and little more. But the core design is sound. It behaves like any other container class. “Fixing” it is straightforward: extend it through inheritance.

                  • James Schend says:

                    > When a single language construct gets special treatment, I’m
                    > hesitant to call it a no-brainer. I’m dubious of the compiler
                    > learning an optimization for a single use case.

                    Why? Do you have any actual reasons, or is it just an emotional gut feeling?

                    > I’m dubious of a compiler optimizing code by replacing it with
                    > entirely different code paths.

                    Again, why? Java and C# aren’t new little experimental baby languages, they’ve been doing this for years, and it works. What’s the issue?

                    > How does it even look in a debugger?

                    Debuggers have access to your source.

                    > Given the decision to make String immutable/final, it’s
                    > probably reasonable. But it’s creepy.

                    I think a software engineer making judgement based on whether something is “creepy” or not is… creepy.

                    Engineering is about building software. There’s no emotion involved; if a technique is superior, we use it.

                    • Alan says:

                      Whenever I said “creepy,” replace it with “raises a red flag based on my experience.”

                      Part of engineering is being able to identify patterns so we can make decisions based on what we’ve learned. We’re stuck with our limited human minds and frequently limited context, meaning the resulting pattern matching needs to be fuzzy to be useful. It allows us to look at a small section of code and realize that there may be a bigger problem. It literally feels wrong. It’s a key part of how humans cope with complexity. (I’m reminded of Schneier on something feeling “hinky.” That doesn’t mean the solution is wrong, but it is good evidence to be suspicious.

                      I’m dubious of the compiler learning an optimization for a single use case.

                      Why? Do you have any actual reasons, or is it just an emotional gut feeling?

                      When I encounter a one-off fix, I’m going to be suspicious.

                      There is a good chance that a deeper problem is being papered over, leaving a landmine hiding for others. “I made a trivial change, and suddenly my code got way slower.”

                      There is a good chance that a more general and useful solution is being overlooked. The ability as a developer to say, “If you see this pattern, replace it with this other pattern,” is intriguing. Maybe my Vector3D is final, and automatically replacing dense calls to its math functions with the non-final Vector3DMath class will speed up common cases without needing to rewrite code.

                      There is a good chance that the developer missed an important case, meaning the fix works 99% of the time, but makes things worse 1% of the time. This is admittedly far less likely in a key, widely used language element like this.

                      There is a time and a place for a one-off fix. As I said, given the decision to make String immutable/final, it’s probably reasonable. But it matches patterns that raise red flags.

                      Debuggers have access to your source.

                      But my source isn’t being run. Obviously this is a problem with any optimization (ah, the fun of debugging -O3 binaries). But this crosses a line from “the PC is jumping around my code” or “that briefly used variable was optimized away” into “a wildly different code path is invoked.” If I’m in the middle of a blob of mystr.append calls that were optimized into using StringBuilder, how does it look in the debugger? Does the compiler tell me nothing, because mystr literally doesn’t exist? Does it sneak around my back and transform “show me mystr” into “show me the result of calling tmpStringBuilder.ToString()”?

                    • Svick says:

                      String is a type that’s already treated as special in C#/.Net (unlike std::string in C++), so I think making it even more special is not a big deal.

                      And you’re saying that there might be a more general solution, but what other type:

                      is immutable
                      often holds large data
                      is commonly concatenated

                      Also, there is no string.Append(), there is only string + string (or string += string). And I think that for an experienced developer, that should indicate there might a performance problem, even if he’s new to C#.

                      One more thing: in C#, there are cases where the code that is being run is much more different than the code you wrote (yield return or the new await) and the VS debugger works fine for those. So I doubt a simple stringStringBuilder transformation would be an issue for it (but I haven’t actually checked).

        • paercebal says:

          here’s a post above just talking about the memory allocations required to do STD::string appends, which is mostly a non-issue in higher-level languages.

          The post above on std::string appends is bogus, so don’t quote it too much… :-)

          C compares well to C++. No seriously. C is a better-designed language than C++, and most of the C++ “niceness” can be reproduced in C without too much trouble

          This is laughable.

          There’s nothing in C that can achieve the power of C++’s OOP or templates with the same safety, genericity AND performance. In the other hand, C++ have access to all C features.

          My own experience (I have 13+ years of C++ development experience) shows most C frameworks mimicking C++ will get worse results both in maintenance and performance.

          Now, I would be happy to examine your own version of a “C++ “niceness” reproduced in C without too much trouble”.

          Of course, the C code should be as fast and as safe as the C++ “niceness” it aims to replace (as safe as in: “If the using code is wrong, the compiler will complain”, instead of “it will crash at runtime”).

          Try to mimick C++ capturing lambdas. Or private/public access to struct members (remember efficiency is important, so void * or forward declarations are NOT an option). Or C++ STL containers and iterators. Or C++ static and dynamic polymorphism. Or operator overloading. Or C++ compile-time computation…

          … And you’ll fail. Because each and every one of those standard C++ features are impossible to reproduce in a clear and efficient way with code fed to an unmodified C89, C99 or even C11 compiler…

          • James Schend says:

            Since C++ has no real encapsulation anyway (any change to a class requires a recompilation, since the users of that class need to know how much memory it consumes), and since C++ classes aren’t portable in object form, unless it’s compiled using the exact same compiler with the exact same settings on the exact same OS, and since any C++ object can dive in any other C++ object’s data, the public/private keywords are basically just decoration anyway. So putting functions in structs, C-style, is just as good as C++ classes.

            But that aside.

            The reason C is a superior language, IMO, is that it’s so much simpler that it’s easy to look at a snippet of C code and figure out what the heck it’s doing. C debuggers are useful. C doesn’t have weirdness like templates (which are nightmares to debug), it doesn’t have operator overloading (which makes you question that literally every line of code in your program is doing what it looks like it’s doing), C libraries are portable so you don’t have to recompile at the drop of a hat, C standard library functions are generally more useful and easier-to-use than the C++ equivalents, etc.

            C isn’t a superior language to C++ because it has more features; it’s a superior language because it’s simple enough that a human being can fit it all in his skull at one time, and still have room in there for keeping track of the problem domain. Remember the goal isn’t to dink around with your programming language, the goal is to produce useful software.

            Of course I’m not saying that C is a *good* language, or that I’d choose it for a new project. But I firmly believe it’s a *better* language than C++. My catchword here is “programmer productivity”. C has it. C++ doesn’t.

            The fact that you list operator overloading as a *positive* feature means we’ll probably never agree on these points. ;)

            • Alan says:

              “C isn’t a superior language to C++ … it’s a superior language because it’s simple enough that a human being can fit it all in his skull at one time…”

              I might argue the merits of “superior,” but you’re absolutely right that one of C’s strengths is its simplicity. There is a lot of value in that. (Another big strength you brought up is the portability of libraries. If a language will play nicely with code in another language, it’s inevitably C, and for good reason.)

              There is a tension between simple and consistent versus offering a level of messiness to that maps to the messiness of the real world. English is a train wreck of special cases, inconsistent rules, and general weirdness that C++ looks simple. None-the-less I manage to fit enough English into my skull while still keeping track of my problem domain to write specifications and documentation. On the other hand, gaining fluency in English was a multi-year process. Esperanto has a certain appeal.

              C++’s classes provide defenses against accidentally sticking your nose into someone else’s implementation details. If you try to play with someone else’s privates, the compiler will smack you. Sure, you can circumvent it, but why would you do so? C++ allows you to move more errors from run-time to compile time, which is pure win.

              Also, while you can implement inheritance, even multiple inheritance, in C, it’s nowhere near as clear as expressing it in C++. Classes also give you guarantees that the destructor will be called; if I forget to call fclose (which serves the same purpose for the FILE object), I’m just quietly leaking some memory and a file descriptor. That guarantee allows many common cases (file handles, sockets, database connections, locks) to be expressed very clearly and succinctly.

              There are perfectly fine C++ debuggers, I’ve had no complaints with Visual C++’s debugger, nor gdb, over the last 15 or so years.

              Operator overloading is… complicated. It’s tempting for new C++ programmers to use them for “clever” things with the result being unreadable code. But every time I’m forced to say

              MyVector new_position = add( old_position, multiply(velocity, delta));

              instead of

              MyVector new_position = old_position + velocity * delta;

              I die a little inside. And seriously, “makes me question what literally every line of code…is doing”? Do you work with crazy people who think redefining what float*float means is a good idea?

              “C standard library functions are generally more useful and easier-to-use than the C++ equivalents”

              Remember, we cheat and offer all the C functions, so it’s pure win for C++. :-)

              Slightly more seriously, this is matter of perspective. I’ve used C container implementations, and they are consistently more error prone and harder to use than the STL’s offerings. Exceptions mean you can blindly ignore return codes in many cases (assuming you decide crashing is an acceptable result, and sometimes it is).

              On the whole, I find the C++ versions more useful and easier-to-use than the C equivalents. I say that as someone who came to C++ after a few years of C. I still occasionally write some C. I find myself strictly more productive in C++ than C. On the other hand, it took me years before I was truly fluent in C++.

    • Nathon says:

      Oh, there’s also FORTRAN. I hear it’s fast for certain work loads, but I’ve never written code in it.

      Ada is a very serious and heavy language that can compete with C++ on speed while still having all sorts of safety features that can be turned on or off at compile time. I particularly like the feature whereby it will crash your program if you overflow a buffer instead of just silently letting execution continue until a totally unrelated part of the program goes to access the now-corrupted memory and dumps core (or, even worse, behaves incorrectly), leading to weeks worth of head scratching and cursing followed by a psychotic break and eventually a shooting rampage/suicide.

      Buffer overflows are not fun to debug. Languages that just let them happen should die. Yes, I use one such language every day.

      • Volfram says:

        My initial image loading function was stolen from a forum post somewhere and the original poster neglected to account for count-up and count-down differences in loops.

        Short version: he had a buffer overflow in his algorithm which also caused the image to appear incorrectly. Because he used a > instead of a >=.

      • bubba0077 says:

        There’s a reason Ada is frequently used in avionics.

  16. I’ve known James Iry since he was 18 years old. He certainly hasn’t gotten any LESS confusing with time.

    So don’t feel bad, Shamus. I’ve spent 20+ years not understanding him!

    POGOFISH!

  17. Tim Keating says:

    “This is actually kind of unfortunate. If I had my head in any kind of working order, I would have run some tests to figure out where the bottleneck was coming from. As it stands, I probably did more work than I needed to, and I destroyed the problem instead of studying it.”

    Well, you can always go back and pull the prior revision out of source control if you get curious. Uh, you ARE using source control, right Shamus? Shamus?

    • Shamus says:

      I am, but I didn’t check-in my broken code that crashed on startup. So I no longer have that parser.

      • UTAlan says:

        Out of curiosity, which source control do you use?

        • Nathon says:

          From previous posts, it’s probably safe to assume he’s still using Mercurial.

        • Myself I prefer Ctrl-C and Ctrl-V combined with “Create new folder” and a quick rename ;)

          I prefer if everything is fully self contained (goes for any Programs I code too), if the end user has to install anything then that is a big no-no in my book.

          • Retsam says:

            I’m not sure you understand what source control is, or what it does. At its most basic form source control lets you back up to previous versions of your code, but your files and file structure are completely the same; there’s nothing about source control that requires end-user installation or anything like that.

            You could even think of it as a system that does that copy paste for you, and hides it away so that you don’t even have to look at it.

            Source control for single person projects is nice. Source control for more than one person? Pretty much essential.

            (Or perhaps I’m completely not understanding your post)

            • I actually veered of the subject of my comment during “I prefer if everything is fully self contained” to a side comment that I prefer my finished code and source to have the same traits.

              Which is… just a folder that can be moved around.

              As to source control systems, not had a need for those yet, I’m pondering something GIT and github related if it comes to that (lots of folks have been bragging about that one).

              • Simon Buchan says:

                I’ve not heard of a single source control system that cares where the directory is (your GUI tool might have a recently used project list or something). CVS and Subversion would litter the project with .cvs and .svn in every subdirectory, but the newer and less crappy Git and Mercurial both use a single .git or .hg at the root directory.

                The switching cost of learning how to use source control usefully might be the only real reason to not use it, but even then it’s less bad than copy-pasting (being able to ask “what changed in this file since it worked last” alone is huge!)

              • James Schend says:

                Git has a nightmare learning curve, has terrible tooling on Windows*, has a tendency to just barf itself for no reason (“detached head!”). You can tell it was made by Linux users. If you can’t use a commandline interface, forget about it. (My dyslexia basically makes CLI programs like git utterly unusable for me.)

                *) you can use GitHub for Windows, which doesn’t do merges, or the awful TortoiseGit which probably won’t even install it’s so buggy). Ironically, the only Git client for Windows that doesn’t completely suck is the one Microsoft made for Visual Studio.

                • Alan says:

                  From what I’ve heard, what James says about the Windows situation is accurate. Git was developed with a Linux/Unix/POSIX mindset, and makes a lot of assumptions based on that. As a result, the Windows versions tend to be very alien.

                  That said, the learning curve isn’t as bad as James is suggesting, but Git is a powerful and complex beast. It was designed to manage a group of hundreds of developers who only erratically merge their hundreds of branches. But if you’re not doing that, you can ignore a lot of the complexity. For the case of a single person, it’s on par with any other command line revision control system. For a centralized team, which is the environment I mostly use it in, it’s strictly more complex than many other offerings. But the tradeoff is more functionality, functionality we’re glad to have.

                  To the extent that it’s hard to learn, it’s partially because any distribution revision control system is complex, and partially because it carries a lot of terminology. “detached head” is a real complaint, although it’s not “barf[ing] itself for no reason”, it’s complaining that you asked to see an older version of the code, then you started modifying that old code, and now it’s confused about why you did that. You’ll need to pick a solution: discarding your changes, or creating a new branch with your changes and merging them two sets of changes together.

                  My assessment is that it was a pain in the ass to learn, especially when working with others, but the benefits have totally been worth it. It’s very convenient even when developing projects for myself, because it’s easy to set up and use in that case.

                  If the Unixy-ness of Git is offputting, and you’re looking for something free, I recommend looking at Subversion (old school centralized server model) or Mercurial (Git’s biggest open source competitor in distributed revision control).

                  • James Schend says:

                    > To the extent that it’s hard to learn, it’s partially because
                    > any distribution revision control system is complex, and
                    > partially because it carries a lot of terminology. “detached
                    > head” is a real complaint, although it’s not “barf[ing] itself
                    > for no reason”, it’s complaining that you asked to see an older
                    > version of the code, then you started modifying that old code,
                    > and now it’s confused about why you did that.

                    Actually GitHub for Windows was implemented in such a way that Git *would* return “detached head” randomly and for no reason, and the only solution was to back-up your repo, delete it from disk and re-sync. The developers of the product confirmed the bug with me, and had to make several patches to fix it. (IIRC, it had to do something with Windows file-locking, which the Git developers, being Linux weenies, didn’t consider and the GitHub for Windows developers, being incompetent, didn’t address before release.)

                    Revision control doesn’t have to be complex or hard to learn– you can teach anybody Microsoft Word’s revision control in an hour, easily. But for some reason, the kind of people who write revision control for code seem to think “usability” is a dirty word.

                    • Alan says:

                      Ah, sorry, I hadn’t realized the complaint was about GitHub for Windows (and possible plain old Git on Windows). That’s pretty awful. It’s certainly true that Git’s core developers really don’t care about Windows, and the software reflects it. That Git is deeply non-native on Windows and weird stuff breaks as a result is a pretty compelling argument to avoid it if you’re doing Windows development.

                      I’m not sure it’s fair to call them Linux weenies for ignoring a platform they don’t care to support any more than calling someone a Windows weenie for choosing to use Direct3D instead of OpenGL.

                      Simple revision control doesn’t need to be complex or hard to learn. But the moment you want branches and merges, it starts getting complex. Simply understanding what merging means in the context of branches and reverting is non-trivial. Add in support for being truly distributed and you’re stuck with a minimum level of complexity and learning curve. This isn’t to excuse Git; it could absolutely be more helpful. My own experience is that day-to-day work really isn’t a big deal, but when you run into an exceptional situation you either need a deep understanding of how Git thinks, or to get familiar with StackOverflow.

                      Ironically, I spent an hour or so last week with Word 2010’s Compare and Combine functionality. I found it neither simple nor easy to learn.

                    • James Schend says:

                      Alan: I wouldn’t mind it not working on Windows; the problem is they claim it does. I hate liars.

                      But again, Git’s a terrible product on every OS. It’s not learnable, discoverable, accessible, or usable in any way. It’s the most user-hostile piece of software I’ve had the misfortune to use since Lotus Notes. (Actually, it’s worse than Lotus Notes: at least Notes tries and fails. Git doesn’t even try.)

              • William Newman says:

                Roger Hågensen wrote

                “Which is… just a folder that can be moved around.”

                Source code control systems tend to work as annotation on “folders” (aka directories) and don’t interfere with moving the folder/directory around; the annotation follows naturally. I’m very familiar with the CVS and git source control systems; both of them work that way. As I understand it, subversion works that way too. And AFAIK it’s very common for other systems as well.

                “As to source control systems, not had a need for those yet, I’m pondering something GIT and github related if it comes to that (lots of folks have been bragging about that one).”

                Source code control systems have some advantages compared to just copying your directory when you have a version you want to save. Off the top of my head…

                (1) They make it cheap and fast to save small changes in big projects. It’d be easy to save all your work every half an hour or so for months or years, even if you were making changes to a sizable project like a computer game, and still not begin to fill up a little old 1GB flash thumb drive. I like checking in the state of my project pretty much every time a new bit of functionality works, so sometimes ten checkins per day or so; the resulting source code control data remain small enough to back up and take with me very easily.

                (2) They systematize asking how the code has changed since a previous version. It takes only a few seconds to ask things like “what has changed in foo.c since my last checkin” or “what has changed in the project since last weekend”? In my experience, when manually shepherding many named directories containing previous states of the project, queries like that tend to take tens of seconds looking for the appropriate folder to compare.

                (3) They often provide clever support for multi-person projects, e.g. making it convenient to merge patches or to query “who wrote this line of code, and when?”

  18. Tim Keating says:

    Also — this is NOT a language recommendation — Alexei Alexandrescu’s book “The D Programming Language” has the best explanation of Unicode for programmers I’ve ever read.

  19. swenson says:

    “The robots can’t hurt you, but they can make copies of themselves until your computer runs out of memory, which I think is a kind of victory state for them.”

    I feel that’s an oddly appropriate way for a robot to win.

    “Jellyfish-ish” is a great word, by the way.

    • Steve C says:

      That would be a great way to lose! I’m being serious- that’s a great idea. Have a little pop up saying like the Bad Robots overloaded Good Robot’s sensors and ran him out of memory. Good Robot has been Defeated. Treat it like your hp went to 0 if there are too many Bad Robots.

      “Good Robot has crashed! No not the program, just you. Don’t let that many Bad Robots multiply next time.”

      • WJS says:

        While that might be a fun idea, you’d have to figure out some lower level symptoms first. Even in a game with high mortality, it’s no fun “dying” completely out of the blue.

  20. BenD says:

    Shamus, as a result of you introducing me (all of us) to Starcraft commentary viewing on Youtube, now whenever I see ‘Protip:’ in your blog or really anywhere else, all I can hear is “PROOOO TIP OF THE DAAAAAAAAY.”

  21. Hitchmeister says:

    I kind of like the robot that fires robots idea, but I’m not so keen on simply crashing the program. Can a robot change it’s behavior if certain conditions are met? I’m thinking a robot that comes out firing robots that fire robots etc. But before firing they look to see how many robot firing robots are on the screen and if there are too many (some suitably dangerous number well short of crashing the program), they switch to self-destruct mode and blow themselves, and anything too close to them, up. So, as the Good Robot, you can either trying to kill them faster than they “breed,” or just let them fill most of the screen then hide someplace safe as they blow themselves up and win that way. Though it might not always be possible to find a safe spot by the time there’s enough to trigger self-destruct and waiting long enough to realize that means killing them all is that much more unlikely. On the other hand, trying to kill them all individually could take a ridiculously long time as they continuously increase in numbers.

  22. silver Harloe says:

    My response to Iry’s comment about “std::string won’t do it because C++ doesn’t want to marry unicode” would be “that’s a great reason for std::string not to do it, and terrible terrible reason for std::string::unicode not to exist”

  23. Jack V says:

    LOL. That was an awesome description. I’m still getting over how bad C strings are, I’d not thought about the insufficiency of C++ strings until you said, but you’re right everyone having their own little library of helper functions is what happens.

    I like C++ a lot, but I agree with all the drawbacks you describe.

    FWIW, I know you don’t really want people telling you things, but a minimum piece of advice I’d give for anyone using C++ is that it is almost always easier if you go ahead and use boost.

    Obviously, don’t rewrite everything in boost that you’ve already done! But whenever you need _new_ functionality which is already in boost, it’s probably easier to install boost than to write it yourself, and more portable — it’s pretty normal to need to install boost to compile a project. It’s not worth it for one thing, but there’s always a bunch of other stuff where there’s already a class for it in boost.

    And I assume you already know this, but all of the commonly used bits of boost are header files, so you only need to install them, you don’t need to worry about libraries.

    PS. And it’s really old, but I found http://www.joelonsoftware.com/articles/Unicode.html a good description of why unicode matters. OTOH, if you’re ignoring all special characters _anyway_, there’s no reason you can’t have a function which could, for instance, at least trim white space even if it wouldn’t work with ALL word languages.

    • Ithilanor says:

      Oh, C strings…we’re learning about them in my C++ class at the moment, and we’re required to use them for our current projects. I don’t care how slow std::string is, it has to be better than C-string madness.

      • Volfram says:

        std::string is a set of handler functions for C-strings. You’re still working with a null-terminated variable-length character array.

        • Alan says:

          It’s not quite that awful. std::string stores the length and is perfectly happy to store binary data, including nulls. Because it knows the length, things that are slow with a char* (strlen, strcat) aren’t. For compatibility with C functions, it also offers to stick a null just past the last element, but that’s only guaranteed if you ask for it (std::string::c_str). Compared to a char*, it’s a lot easier. The big weakness is what Shamus observed: it comes with damn near zero functionality for doing anything intelligent, especially user facing.

          • Mephane says:

            Yepp, it is but a container. A good one, I think, but really limited to that single role. I still don’t understand why the standard does not include more sophisticated string manipulation (conversion to lower/upper case, trimming, tokenization, unicode normalization etc.) in seperate library functions. It’s one thing to decide that by design, std::string is a pure container, but another thing to decide that everyone must refer to external libraries for stuff that is noaways just basic, or implement it all for themselves.

  24. Rick says:

    Congratulations on making the final piece of your code portable, I wouldn’t say that you achieved nothing.

  25. Abnaxis says:

    Can I make a suggestion? Put something in an ini file that lets people create and index their own sprite sheets. I guarantee there are going to be people who want to fly around shooting stuff besides robots, and that would make it easy.

    EDIT: Oh man, I thought about it and you must be doing this. With the way you did collision, it would be so friggin’ trivial to add custom graphics if there’s an interface to do it. The prospective modder wouldn’t have to do anything other than draw the sprites and the engine will handle it…

  26. David says:

    You asked about Python. INI files are standard, “import ConfigParser” is all you need. If you want an automagically-created object from your ini file, then you can get one with this.

  27. The Rocketeer says:

    Hey Shamus, one of the launcher types is called ‘cluser,’ is that supposed to be ‘cluster,’ or were you trying to keep it to six letters? Or are there cluser missiles now that I’m unaware of? I shouldn’t have let my subscription to Pop Mech lapse.

  28. Neko says:

    In Boost‘s defence, its parts are mostly self contained and (at least for strings and most other bits) implemented purely as headers. No new library dependencies to link with, no extra dlls required on the user’s end. It uses a lot of template magic, which can be both a blessing and a curse.

    Kronopath already explained Unicode normalisation way better than I could, so hats off to you, sir.

  29. Vipermagi says:

    There’s typos in the last .ini block you suggested skipping:
    “#basic, homing, cluser, and robot ”
    “#Voce bank to play when seeing the player, being hit, or dying”

    The lines also don’t consistently start with capital letters, but oh well.

  30. Veylon says:

    The INI doesn’t look that bad to go after with C++ strings.
    First, you break it up into separate lines.
    list lines;
    Second, you throw away the lines where the first non-whitespace is ‘#’
    Then, you parse through the list, attaching each variable to it’s parent bracketed header.
    map<string,list >
    I don’t see any non-valued variables in there, so you can probably just break every variable across the ‘=’, trim the whitespace from the front and back of the values, and break them across the internal spaces
    map<string,map<string,list > >
    And when you finally get that, you can just step across the

    So, you essentially need a tokenizing function to break strings across a given character, an whitespace stripping function, and final function(s) to parse the resultant multi-level container. Or a class wrapper would work, too.

    I’ll grant that all this STL isn’t super efficient, but unless you’re wrangling INI files by the hundred, it isn’t likely to matter.

  31. Deoxy says:

    Since today is Post Huge Blocks Of Text Until People Fall Asleep day, let me hit you with another one.

    All this terrible text is like torturing me with the Comfy Chair. “No, not more (yummy, awesome) text! Whatever shall I do??!??!! (yay!)”

    Also, this

    The robots can’t hurt you, but they can make copies of themselves until your computer runs out of memory, which I think is a kind of victory state for them.

    is among your more awesome statements, which is quite saying something, really. Also, if there is any XP in the game, something like that utterly breaks it.

    As to ini files and modding, yes, that’s DEFINITELY the way to go, both for other people modding and for yourself “modding” more content. I really don’t understand why anyone would do it any other way (though they aren’t usually called “ini files”).

  32. james says:

    Patterns=cave cave cave donut

    Now I understand how the levels are made.

    And sorry, I read the enemy ai definition bit. It’s given me ideas for my games.

  33. fscan says:

    What’s wrong with using boost for this? Most of the stuff (eg the trim) is header only so you only pay for what you use .. no need to link to anything. And no need to convert other working stuff to using boost.

    Boost is kinda a “almost standard library” … there’s a nice map of the c++ land :)

  34. AndyL says:

    import a massive thing like boost” feels like a weird thing to say. A misunderstanding, perhaps.

    Most of Boost isn’t even a library. It’s just headers.

    Besides that, Boost isn’t a monolith, you just take the parts you need.

  35. Neil Roy says:

    An interesting note I read about the C11 (C 2011 standard) is it has “Improved Unicode support”.

  36. Neil Roy says:

    Another note, Allegro 5 also has built in support for reading and writing INI files, unicode, crossplatform etc… and you can still use C++ with it, many do. :) Just saying, given the problems you have had, you might consider trying it out. The source code for it is also available and if you see a problem with the library, they’re always looking for people to help out with it. Could be right up your alley.

  37. Nixitur says:

    Oh man, I’m super late to the party, but that approach to creating content for your game reminds me a lot of how Toady does things with Dwarf Fortress.
    Without even touching the code, you can make a civilization of sentient humanoids made out of beer with three arms and four legs who for some reason fight using wooden maces.
    Hell, there was (is?) a project for Dwarf Fortress which literally just adds hundreds of real life animals to the game.

    Of course, if you’re not handling the text files right, you might end up with all quadrupeds missing back legs.

    • WJS says:

      This is hardly unusual you know. Virtually all games have asset files separate from the core logic. It’s not quite so common for them to be ascii, or stored in a regular directory rather than an archive, but DF is far from the first to do any of this. Really, the only thing I would call strange is the choice of ini format. Usually it would be XML or Lua.

  38. Misamoto says:

    Late too, but that’s awesomely meta – shoot the robots faster than they can crash your PC

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *

*
*

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>