The Strange Evolution of OpenGL Part 3

By Shamus Posted Thursday Apr 9, 2015

Filed under: Programming 61 comments

So last time I described how OpenGL used to work in the pre-graphics card stone age. Those were simpler, clearer days. Yes, they were also slow as hell and unable to do much in the way of fancy graphics. But, you know, simple.

But the simplicity couldn’t last. The evolution of OpenGL is basically a long series of refinements where more and more work was gradually moved to the graphics card. Let’s go over them.

1. Rasterization

As far as I can tell, this was the “killer app” of graphics cards. Your game shoves the texture maps over to the graphics card. Then it tells OpenGL how big the canvas (the screen) is. Then it sends a bunch of 2D triangles along the lines of, “This triangle occupies this part of the screen and uses such-and-such part of this texture.” The graphics card would do all the work of coloring those triangles in with pixels.

Let’s talk about a CPU. The CPU inside your computer is a complex beast. Even ignoring the fact that it’s actually many cores strapped together, it needs to be able to perform all sorts of operations. Code needs to be able to branch. (If X then do thing A or else do thing B.) It needs to be able to switch between running one program and another, completely different program constantly. It’s got this complex system where it brings in data from main memory and moves it into progressively faster but smaller banks of memory. Basically, it needs to be able to run anything. I’ve heard people refer to this as being a machine that is Turing Complete.

Not so with graphics cards. They just needed to fill in triangles with pixels. They didn’t need to do three completely different things at once, or handle complex branching code. Since their output was in colored pixels and not hard data, a certain degree of slop was allowed. The math could make certain approximations and shortcuts because if the output was 0.01% off, nobody would be able to tell. Even if you had superhuman eyes that could spot the subtle color differences, you’re using a monitor that can’t display differences that slight.

All of this meant that graphics processors could be much simpler. It’s a bit like comparing a classic fast-food place to the work of a single highly trained chef. The chef can make you almost anything you can ask for, while the fast food place can only make hamburgers. But the fast food place is optimized for it and can crank out 24 hamburgers a minuteSomething like that. You young people don’t realize this, but fast food used to be FAST. Back before McNuggets, McChicken, McRib, McFish, McSoup, McPizza, Fajitas, Grilled Chicken, Chicken Tenders, six different salads and ten different types of hamburger.. Simplicity is speed.

Simpler cores didn’t just make them faster, it also made them smaller. So instead of one giant core they could have a whole bunch of them. And those cores would do nothing but take in 2D triangles and spit out pixels.

2. Transform and Lighting

So it’s sometime in the late 90’s and we’ve got these graphics cards that take 2D trianglesThat is, a triangle expressed in terms of where it will appear on the screen. The graphics card has NO IDEA where the vertex might be in terms of your 3D game. and fill them in with pixels. That’s cool, I guess. It was certainly a massive boost in terms of our rendering capabilities. But it was clear they could be doing a lot more.

There’s a step I’ve been sort of glossing over here. That’s the bit of math you have to do to figure out if a particular triangle will end up on-screen, and if so, where. “Hm. Given that the camera is in position C and looking in direction D, and this particular triangle is so many units away, then the vertices of this triangle will end up on this part of the screen.” Like coloring in triangles with pixels, this is yet another brute-force, bulk, doing-math-on-three-numbers kind of jobYes nitpickers, it’s actually FOUR numbers. But I don’t want to burn through a couple of paragraphs explaining why. We’re already in a two-levels-deep digression. Let It Go.. Which means it’s a good thing to offload onto the graphics card.

This process of translating vertices from game-space to screen-space is called “Transform and lighting”I’m glossing over the whole “lighting” thing here because I don’t think we need to go over it in detail. And we’ve got enough ground to cover as it is..

Because I spent so much of my career riding the tail end on the technology curve, the timeline is always a bit muddled for me. Apparently all of this happened in 1999, but I didn’t really think about it for at least another four years or soWhich is a long time in computer graphics, and a really long time at this particular point in graphics evolution..

According to the NVIDIA page, the new T&L-capable cards offered “an order of magnitude increase in visual complexity”. An “order of magnitude” is just engineer talk for “ten times more / less”, but it sounds so much more impressive and technical than just saying “ten times”. On one hand, I’d caution against believing breathless claims made by marketing. On the other hand, that sounds pretty accurate. We really did get a huge jump up in model complexity.

Here is a screenshot from the pre-T&L game Thief:

Ha ha. The graphics in the past were so terrible. The gameplay, on the other hand. . .

The canister on that table is a cylinder with 5 sides. (Not counting the top and bottom.) It looks ridiculous. At the time, every polygon was a precious thing, and we couldn’t afford to waste them. So our games were filled with coarse geometric shapes. When we were able to offload a bunch of polygon processing to the GPU, we suddenly had enough polygons to make round things look round.

The problem was, this jump was almost too big. It was so big that we kind of didn’t need to make another. If we double the number of sides in the canister above, you’ll end up with something that looks completely round to the user. Maybe if they mash their face into the object they will be able to see the angled edges, but it’s no longer the glaring problem. The jump from 5 to 10 polygons is a lot more visually important than the jump from (say) 10 to 100, or even 1,000. We suddenly had as many polygons as we could hope to use on those old machines.

3. Vertex Buffers

Remember that a graphics card is, in a lot of ways, a separate computer. It’s got its own memory and its own processors. So when you want to tell the graphics card to render a triangle, you need to send it all of the information about that triangle. There is a bit of a choke point between the devices, meaning it takes much longer to send a triangle to the graphics card that it does to (say) move the triangle from one part of memory to another.

Once you have T&L, you’ll quickly notice that you spend a lot of time sending the exact same data to the GPU over and over again. Every single frame, you tell the graphics cards about the same exact polygons that are still in the same positions. (The walls of your level, and other non-dynamic stuff.) The camera is moving, not the walls. So why do I need to keep sending the same huge load of data every frame?

So vertex buffers give us a way to shove all the data from the PC and store it on the GPU. So instead of sending 1,000 polygons, I just need to tell the card, “Remember where I gave you 1,000 triangles? Draw those again, but with the camera in this new location.”

4. Shaders

Like I said a few paragraphs ago: If we wanted games to continue improving visually, it was pretty clear that mindlessly cranking up the polygon counts wasn’t the way to go. To take the next step, we needed to change how we drew those polygons. And for that we needed shaders.

From this point, it was no longer possible to offer a generalized solution for all games. You could make a graphics card that was good at cel shading, but that would only help with games that were cel shaded. You could make a card good at bump mapping, but that wouldn’t do anything for games without bump maps. Instead of adding more “features” to the graphics card, we just needed to give developers a way to control all that raw rendering power directly. We needed to give them a way to write programs that ran on the graphics hardware.

So now developers need to make two shaders: A vertex shader to do the Transform & Lighting, and a fragment shader to do the rasterization.

Not pictured: A complete lack of documentation, ambiguous standards, sloppy implementation, and standards-breaking features from the big GPU companies.

Shaders made a lot of things possible or practical: Light bloom, anti-aliasing, various lighting tricks, normal maps. This was a massive turning point in game development. In one leap:

  1. Games took a massive step forward in visual quality. I think this was actually really important from a cultural perspective. Serious games now looked good enough that they could show up in a TV commercial without looking ridiculous. Before this point, only cartoon-y games (Mario, et al) could get away with this.
  2. Games took a huge jump in expense to produce.
  3. Games took a big jump in complexity. In the late 90’s, it was still possible for a small team to get together and make a AAA game without publisher backingBioWare is the oft-cited example of this.. This was the beginning of the end of that era. (You could argue that with the release of so many free AAA game engines, those days are returning. But that’s another article.)

Like a lot of technical advancements, you can argue about where to “officially” draw this particular line. I draw it at 2004. Half-Life 2. Doom 3. Thief Deadly Shadows. Compare each of those games with their predecessorsTo be clear, I mean you should compare Doom 3 with Quake III Arena, not with Doom 2. to see just how extreme this move was. Visually, I think Doom 3 has more in common with the games of 2015 than it does the games of 1999.

Left is without normal maps. Right is standard rendering. Note the keyboard and face. Click for LOLHUGE! view.
Left is without normal maps. Right is standard rendering. Note the keyboard and face. Click for LOLHUGE! view.

5. More Shaders

Since 2004, the graphics race has mostly been about what we can do with shaders. I can’t think of any big changes since then. Every once in a while shader programs get some new ability. Sometimes the ability is explicit. We’ve added some ability to manipulate pixel data in ways that weren’t possible before. Sometimes the feature is implicit. Graphics hardware is now fast enough to do some heavy-duty lighting effect that would have been too slow on the old cards. But either way, the steps have been more evolutionary than revolutionary.

And to be honest, for me all the changes kind of blur together here. The documentation on the OpenGL shading languages isn’t that great to begin with, so it’s pretty hard to piece together the various iterations of the language if you weren’t already following them when they happened.

The point of all this is: Right now, in 2015, you can use any or none of these features of OpenGL. You can render with all of the advancements of the last 20 years, or you can render raw immediate-mode triangles like it’s 1993. This has made OpenGL sort of cluttered, confusing, obtuse, and ugly. It’s poisoned the well of documentation by making sure there are five conflicting answers to every question, and it’s difficult for the student to know if the answer they’re reading is actually the most recent.

So now the Kronos group – the folks behind OpenGL – are wiping the slate clean and trying to come up with something specifically designed for the world of rendering as it exists today. Vulkan is the new way of doing things, and it is not an extension of OpenGL. It’s a new beast.

I know nothing about it, aside from the overview-style documents I’ve read and the tech demos I’ve seen. Given my habit of lagging behind technology until the documentation has a chance to catch up, I probably won’t mess with Vulkan for a few years.

In the meantime: OpenGL is strange and difficult, and there’s nothing we can do about it. This is a rotten time to be learning low-level rendering stuff. The old way is a mess and the new way isn’t ready yet.



[1] Something like that. You young people don’t realize this, but fast food used to be FAST. Back before McNuggets, McChicken, McRib, McFish, McSoup, McPizza, Fajitas, Grilled Chicken, Chicken Tenders, six different salads and ten different types of hamburger.

[2] That is, a triangle expressed in terms of where it will appear on the screen. The graphics card has NO IDEA where the vertex might be in terms of your 3D game.

[3] Yes nitpickers, it’s actually FOUR numbers. But I don’t want to burn through a couple of paragraphs explaining why. We’re already in a two-levels-deep digression. Let It Go.

[4] I’m glossing over the whole “lighting” thing here because I don’t think we need to go over it in detail. And we’ve got enough ground to cover as it is.

[5] Which is a long time in computer graphics, and a really long time at this particular point in graphics evolution.

[6] BioWare is the oft-cited example of this.

[7] To be clear, I mean you should compare Doom 3 with Quake III Arena, not with Doom 2.

From The Archives:

61 thoughts on “The Strange Evolution of OpenGL Part 3

  1. krellen says:

    Typo check: I think the steps have been evolutionary, not the staps.

    1. Wide And Nerdy says:

      And in the same sentence “the steps have been more evolutionary than revolutionary”

      1. MadTinkerer says:

        While we’re pointing out typos, I think it’s supposed to be Quake III Arena in footnote 7.

        1. Ziusudra says:

          And “a long series or refinements” in second paragraph; “or” should be “of”.

          1. shiroax says:

            “Once” in paragraph 4 should be “one”

            Interesting how everybody catches one typo.

            1. David says:

              And nobody’s caught this one yet: “Every singe frame,…”

              Unless we’re talking about Furmark and you’re literally burning the frames with your video card. :)

              1. Alexander The 1st says:

                Maybe that’s what Shamus’ custom OpenGL shader does.

                That, or corrects typos on screen without fixing them in the actual document sent through WordPress.

              2. Worthstream says:

                Also, it’s the Khronos group, not the Kronos group.

    2. AileTheAlien says:

      You beat me to it! Stahp! :P

    3. The Rocketeer says:

      (I was pointing out the Quake III Arena typo, but MadTinkerer beat me to it. By over an hour. No, I hadn’t refreshed the article since I loaded it ages ago. :) )

  2. Da Mage says:

    I think every person writing “how to” or giving advice through forums for OpenGL based has avoided anything before 2.1 for the last 5 years. More recently people are starting to drift towards the 3.3 core or 4.x version where all of the old crap was cut from the library.

    The trouble is this old openGL advice is still online and searching for a “how to” with openGL leads to forums and guides from 10 years ago. By actually changing the name of the rendering library they will do more good since it will mean future searches will no long bring up the openGL 1.x stuff.

    What i really hate is the number of sites that say “Here is opengl guide for X” and will turn up no matter what version numbering you use. Then you close it as soon as they start with glBegin(). It just wastes a lot of time that people don’t have when trying to solve problems.

    1. Lanthanide says:

      On google, you can give a date range for the results you want back. That should trim out a lot of the ancient stuff easily.

      1. Bropocalypse says:

        That won’t stop outdated advice that was recently given, unfortunately.

        1. Or ancient forums that have been recently posted to, I think.

      2. xKiv says:

        How about adding a “-glBegin”?

  3. CrazyYarick says:

    Shamus, I think since you have had modeling experience in the past you should try out the DCC software of today as a neat experiment to see if you can create models that would look up to today’s “standard.” Also to see where artist’s tools have improved/stagnated. You can get student versions for free(not a euphemism for piracy) and even your hated Blender is actually really nice now.

    1. AileTheAlien says:

      I too, would like to see a (mini?) post on this. Like, one afternoon, see if you can make a 3D model that looks nice. Maybe a gingerbread house? (Mmm, tasty!)

    2. Piflik says:

      “and even your hated Blender is actually really nice now”

      Sorry, but no, it’s not. Having used 3dsmax, Maya, C4D, Modo and XSI I can say that Blender’s UI is still as unusable as ever, even with a custom control scheme and each and every critique Shamus hurled at it in those old articles are still just as valid.

      1. CrazyYarick says:

        I would have to respectfully disagree. I have used 3Ds Max, Maya, Modo, Mudbox, and ZBrush (Modo is my favorite commercial app with Max at second and Maya last) and will say that Blender, for the most part, is fine now. This is especially true if you are doing simple work and not massive team based projects.
        Almost nothing is hidden as it was before. You have to get used to the way that it operates in “modes” on an object, but that can be said of all packs(ie. Max you have to make sure it’s an edit/editable poly, and ZBrush is the unholy beast of cumbersome until you get used to it). In Blender almost all of the options are either in menus or in the “T” or “N” bars.
        My only two gripes are that the transform widgets are not able to lock to two/all axis (at this point I don’t use them so I can’t comment too much) and that left click to select is not standard (some really strange people actually do seem to prefer this though ;) ).

        I’m sorry but I forgot about the multi-monitor support. It can be done, but feels really awkward. You can add that to my list of grievances.

  4. HeroOfHyla says:

    All this talk of fancy 3D rendering, and I’m just using Swing, AWT, and Graphics2D.

  5. Bropocalypse says:

    A year or two ago, I actually tried to teach myself OpenGL.
    Boy, that was a fruitless venture. I learned since then that it wasn’t my fault that I wasn’t understanding the documentation that I could find. The code’s a mess, there’s no definitive, up-to-date guide that you can rely on, and even the most recent parts of OpenGL don’t even work as designed(Just ask the Clockwork Empires guys).
    I’m glad there’s stuff like Unity now.

  6. WILL says:

    I can absolutely guarantee the new way will still be big old buffers of floats, so at least storing/sending data to the GPU will work about the same. I also seriously doubt the basic vertex/fragment shader pipeline will disappear.

    1. mhoff12358 says:

      It’s actually sort of all about changing how you send data to the GPU…
      Not in the sense that you no longer send large chunks of data, but in changing how the data is treated, making things less about the specific graphics pipeline and more developer controlled.

  7. lethal_guitar says:

    Small nitpick: Anti-aliasing was possible before shaders. But shaders enabled variants (e.g. FXAA) that are fast and still improve the image quality quite a bit.

    In fact, heavily shader based renderers, like the Unreal Engine 3, weren’t able to use classic anti-aliasing techniques in the beginning.

    1. WILL says:

      I can’t be the only one who thinks FXAA looks like trash.

      1. Da Mage says:

        Agreed, I prefer sharp edges with jaggies then have a blurry image from FXAA.

      2. lethal_guitar says:

        I absolutely agree that it looks Bad compared to MSAA or FSAA. I still prefer it over no AA at all :) But of course that’s personal preference

  8. EwgB says:

    While we’re nitpicking, I think you meant “cel shading” and not “cell shading”.

  9. bickerdyke says:

    As usual: brilliant article, brilliantly written.

  10. Nick Powell says:

    Have you posted the Doom 3 comparison shot before? I’m getting deja vu looking at it…

    1. BeardedDork says:

      All the images in this article have been previously used, with the possible exception of the Thief screen shot, that I might just recognize that from having replayed Thief about a year ago.

  11. Cybron says:

    Excellent article. Good accounts of recent technical history are always surprisingly difficult to find. This does the job wonderfully.

  12. Retsam says:

    Nitpick; I’m not sure the implication that a GPU isn’t Turing Complete is accurate. I’m obviously not disagreeing with “CPUs are very complicated in ways that GPUs aren’t” point, but it’d surprise me if GPU’s aren’t Turing Complete, because that bar, as I understand it, isn’t terribly high. Conway’s Game of Life and Minecraft’s redstone circuitry are both Turing Complete, despite both being very simple (well, very simple compared to CPUs and GPUs).

    1. Shamus says:

      As I understand it (which is based on some pretty informal learning) not being able to loop and branch will disqualify as processor from being Turing complete. Shaders can’t branch. If you have a conditional:

      if (thing) {
      DoFoo ();
      } else {
      DoBar ();

      Both DoFoo and DoBar are expanded by the compiler as if they were inline. And at run time, it will execute both blocks of code and simply throw away the result it doesn’t need. It also unrolls loops, so:

      for (int i=0; i<3; i++) {
      DoFoo ();


      DoFoo ();
      DoFoo ();
      DoFoo ();

      Which means you can't have a loop unless its boundaries can be known at compile time. There will be some problems you just can't solve, no matter how much time and memory you give it. Which (as I understand the term) denies it the TC descriptor.

      1. ulrichomega says:

        I believe that SL 3 is Turing Complete. From a quick search it allows arbitrary iteration, and you can do branching in it (even if the branching is done in the way you mention). I have no word on whether you’d want to take advantage of its Turing Complete features or not, but I think it is.

        Also of note is that I believe that SPIR-V, Vulkan’s new compute shader, is Turing Complete. Or, rather, it supports compiling a subset of C++ to SPIR-V, which can then be run as a shader.

      2. rmt says:

        Actually, I’m fairly sure that modern OpenGL shaders are (theoretically) Turing complete.

        For example, it is possible to implement Conway’s Game of Life, which is Turing complete, exclusively in a pixel shader. Therefore, the pixel shader has to be Turing complete as well. But that’s cheating, because you still have to rely on the CPU to issue draw commands.

        But apart from that, you can in fact loop over variables defined at run time, you can have branches (though with limitations, as pointed out by Shamus), and I believe you can also have recursion (Basically a way of implementing loops by having functions call themselves). That’s enough to be Turing complete, given little things like inifinite time and memory space of course :)

        1. Knut says:

          I haven’t tried writing recursive code in shaders, but I know it’s not allowed in OpenCL. And not in GLSL either according to this. Not sure about HLSL or the other shading languages, but I guessing not.

          1. rmt says:

            Yes, I think you’re right.

            But you could probably fake it using an array as your call stack and an index into the array as your stack pointer… I think.

      3. Jimmy Bennett says:

        As someone with some formal learning (but way less experience programming on GPUs) this analysis seems correct. The two things you need for a language/machine to be Turing complete are branching and loops/recursion.

        For conditional statements, depending on how the GPU “decides which value to use” it might have some limited branching ability, but without the ability to loop indefinitely, it’s not Turing complete. (I’m assuming the GPU also can’t run recursive functions. If that’s wrong, then the analysis changes).

      4. Richard says:

        True, however you can run the same shader an arbitrary number of times, mutating the dataset each time – eg Game Of Life simulations.

        Which is a pretty good description of the actual Turing Machine.

        Is it Turing Complete if it needs an external device to ‘crank the handle’?

        1. Knut says:

          If it needs the CPU to initiate each frame, it’s not Turing complete by itself. (but the whole system is)

          1. Kian says:

            Of course, any system involving the cpu is going to be Turing complete.

            1. Zukhramm says:

              In that case, can you not say any system involving reality is Turing complete?

    2. WJS says:

      I’m not sure I’d call Minecraft’s redstone turing complete. You can build turing complete things with it, sure, but “redstone circuits” is about as broad a descriptor as “electronic circuits”. And I wouldn’t say that a transistor is turing complete either. Or a logic gate. It’s only turing complete when you put many, many many of these things together.

  13. Daemian Lucifer says:

    Im always puzzled by what “order of magnitude” means for a binary number system used in computers.Does it mean twice as good/bad?Or 8 times?Or 16,32,64?Or maybe even 1024?

    1. Julian says:

      Usually twice

    2. Jacob Albano says:

      I believe it just means the numbers are all shifted over one decimal place.

      1. WJS says:

        One binary place. :P

    3. Flakey says:

      An order of magnitude usually refers to adding a digit. In the classic way it was first used. 10 is an order of magnitude more than 1. 100 is an order of magnitude more than 10 etc etc

      This can be complicated by marketing people co opting the word because it can sound impressive, and using it for big improvements but not that big.

      1. Chris Robertson says:

        Speaking of marketing people co opting terms, ever since reading the commentary on this Irregular Webcomic, I find great amusement in marketing people using “quantum leap” to describe the improvements in their products.

      2. Richard says:

        It’s a Quantum Leap in efficacy!

      3. Daemian Lucifer says:

        I know the definition of the order of magnitude.But I have no clue what it means in computers because I have no idea what scale is used.Is it just the binary bit,in which case it is 2?Or is the byte considered the basic unit,in which case it is 8?Or is it a number of bytes in which case it could be 16,32 or 64?Or is it going by the scale data is going by for files,in which case it would be 1024?So what does it mean if one memory chip is an order of magnitude better than another?

        And thats not even going into the whole thing of stuff like graphics cards that use both the binary and decimal system for their memory and their speed.

        Oh yeah,quantum leap.That one is even worse.

        1. Kian says:

          Saying something is an order of magnitude better than another thing is meaningless unless you specify how you define better. Aside from that, if someone with a technical background is saying it, it probably means 10 times. Computers may be binary, but people are used to thinking in decimal and converting numbers to different bases in speech is a pain (is 0x10 said “ten base 16” or “sixteen”?)

    4. default_ex says:

      Usually in programming we’re talking magnitude as going up or down powers of two. Not sure if that’s intended or not but typically what you get if you use a technique said to be orders of magnitude more or less complex.

  14. Majromax says:

    >> [3] Let It Go.

    The code glows white on the phosphors tonight,
    Not a traceback to be seen.
    A function in isolation,
    and it just might compile clean.

    Vectors transforming like a swirling storm outside,
    Couldn’t keep it scaled, heaven knows I tried.

    We need more rows than just these three,
    The matrix, it has to be 4D!
    Rotate, rescale, what do you know?
    Well, now they know!

    The code never bothered me anyway.

    1. Richard says:


    2. topazwolf says:

      Very good, even follows the beat well.

  15. I’m kind of impressed with how the the guy that did the hands of that guy in Doom 3 only created the index finger separately for the typing animation, the rest of the hand is just a blob, but the normal map makes it look like the other fingers are modeled as well. Pretty impressive.

    1. Geebs says:

      It’s also adequately explained in-universe: the extremely poor lighting in the Mars base resulted in the rash of horrific industrial accidents which made it look like everybody’s fingers had each been broken four or five times

      1. WJS says:

        I have no idea whatsoever if that’s true or not. You could be making shit up, or that could be a joke in the actual game. Well played.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply

Your email address will not be published.