Good Robot #40: Overdraw

By Shamus
on Dec 29, 2015
Filed under:
Good Robot

My to-do list grows and shrinks as the project rolls on. I’ll have 20 items on my to-do list one week. I’ll get 13 of them done. Then at our next weekly meeting, those 13 are reviewed. Some are marked as done. Some end up back on the list because my solution was too narrow, or didn’t work in all cases, or I misunderstood the problem. Then a few new issues will get piled onto the list.

So after the meeting my to-do list will be back up to 25 or so items and we’ll begin again. So it goes.

But some items have never been touched. They’ve haunted the bottom of the list, never getting done, never getting looked at. The oldest item on my list now is actually a collection of bullet-points that can all roughly be summed up as “performance problems”. To wit: The game runs too slow.

Not on my machine, mind you. It’s fine on my machine. But on craptopsCrappy laptops, or any similarly under-powered machine. is runs at about half the framerate it should. So what’s going on?

Finding a Problem You Don’t Have

This is a 2D game. We’re never rendering more than a few thousand polygons per frame, and even old laptops should be able to handle polygon loads in the millions. The game is not performing as well as it should, given what we’re asking the hardware to do.

Unfortunately, I can’t test the problem on my end. I have a decent machine with a mid-range graphics card, and the game hits a stable 60fps for me at all times. I’m trying to solve a problem I can’t detect or measure.

I complained about this on Twitter, and several people suggested using a virtual machine. For the record: You can set up a virtual machine to emulate a computer with low memory or processing power, but last I checked there was no way to usefully do this for graphics hardware. Unless there’s been some amazing leap forward in the graphics capabilities of your average VM over the last couple of years, then this isn’t possible. The emulator would need to take draw calls targeted at the graphics hardware it’s pretending to be, constrain them in various ways, and then hand them off to the actual graphics hardware in my system. It would mean emulating all those horrendous and proprietary driver layers. Even if a VM could do that reliably, I imagine the overhead in such a conversion would be at least as large as the effect I’m trying to measure. To wit: I could spend hours isolating and solving some graphics bottleneck, only to find out I was working around some slow-down inherent in the emulation, and not the problem that real craptops are exhibiting.

I could probably send an email out to our testers and find someone with a suitable machine. But sending builds to people and having them report back their non-technical observations in imprecise terms is a slow and frustrating way to work. I’ll go that way if I have to, but let’s see if I can nail this down on my own.

Fill Rate Problem

We have one important clue to go on:

The problem gets better when the user turns down the resolution.

This actually indicates that the problem probably doesn’t have anything to do with how many polygons we’re drawing.

Happy little BSP trees.

Happy little BSP trees.

Let’s imagine our graphics card is good ol’ Bob Ross. There are three main choke points you have to worry about while rendering:

Throughput: How fast you can I tell Bob to paint? Like, “Make a red stroke at the top” or, “Dab some green around this spot here.” Throughput used to be a big problem back in the day, but I haven’t had to worry about throughput problems in about a decade. I’m not saying they never happen, but it’s just not a common case. It’s certainly not a big deal for those of us operating in retro-styled 2D.

Memory: “Think of this as “how many different colors of paint can we keep on the palette at once.” (It’s also tied to how big the canvas can be, and a bunch of other details, but it’s not worth getting into right now.) So if we’re trying to paint with seven different colors (texture maps) but we can only hold onto five of them at a time, then it’s going to be a pain in the ass for our poor painter. I tell him, “Dab some green around this spot.” But he looks down and sees he doesn’t have any green on his palette. So he scrapes off the blue to make room, loads up on green, and paints the requested strokes. And then I tell him, “Paint some blue lines right here.” So he has to dump another color to get blue back, and so on.

Memory problems tend to be really modal. Everything runs normally until the instant you run out, at which point the framerate drops from “normal” to “OMG is the game crashed?”

Fill rate: How much paint is Bob putting down? It doesn’t take much throughput for me to say, “Fill the canvas with blue paint.” And it only uses one color, so the memory cost is minimal. But obviously covering the entire canvas is time consuming when compared to (say) drawing a single stroke. You can imagine a situation where I tell Bob to cover the canvas with dark blue, and then cover it again with white, and then yet another layer of white, over and over, until it’s sky-colored. That’s going to take bloody ages compared to just starting with a lighter shade of blue in the first place.

Note the faint blue glow coming from the walls.

Note the faint blue glow coming from the walls.

So based on the described symptoms – that turning down the resolution will greatly speed up the game – I’m thinking we’re dealing with a fill rate problem. I don’t know how. We’re not drawing a lot of stuff or covering the canvas over and over. I’ve had projects that did more and ran faster on worse hardware. But this is the assumption I’m starting with, based on what I’ve been told.

The problem doesn’t seem to change with the number of robots and particles on-screen. The framerate is low, but stable. So I’ll begin with the idea that the slowdown is related to level geometry.

As a brute-force method of testing, I add a loop to the rendering code. Every time it needs to draw part of the level, it will actually draw it ten times.

So the rendering loop looks like this:

  1. Draw the distant background image. (Ten times)
  2. Draw the most distant layer of walls. (Ten times)
  3. Draw the player’s “flashlight” beam.
  4. Draw the medium-distance layer of walls. (Ten times)
  5. Draw the glowing “aura” around the walls. (Ten times)
  6. Draw the floating dust particles.
  7. Draw the enemy robots.
  8. Draw the player.
  9. Draw another version of the flashlight beamI forget why we draw it twice, but I know there’s a good reason. Something about how it needs to interact with the various layers to brighten them..
  10. Draw the non-robot stuff like doors, projectiles, machines, and so on.
  11. Draw the particle effects.
  12. Draw the big black foreground walls. (Ten times)

At the risk of making this too complicated to follow: Items 4-7 are actually drawn twice. It draws a “dark” version of the whole thing, except masked out so that it only draws inside of shadows. Then it draws a bright version of it, but masked out so that it only draws the parts that are NOT shadowed. Yes, it probably seems like there are faster ways of achieving this effect (like just drawing a back shadow over some parts to darken them) but I don’t want to get sidetracked for thousands of words explaining why we have to do things this way. Shadowing is more than just a brute-force “darken”. Trust me.

The point is that items 4-7 are kind of multiplied by two. So they’re actually being drawn twenty times now.

So now the level drawing is ten times slower. What’s the effect?

Ugh. Color banding.

Ugh. Color banding.

No effect. The game is still running at 60fps.

You can now see the glow effects around the walls. Those are there to add some brightness and color variation. They make the level “pop” a little more. Looking back, I suppose this particular area of the game was a bad place to take these screenshots. The glow layer is usually set up to use a different / contrasting color in relation to the background. In this particular level it’s just a brighter version of the background.

Obviously drawing a rock wall ten times doesn’t make the wall look any different. A wall drawn once looks identical to a wall drawn ten times. But the glow layer is blended with what’s already on-screen. Like the example above where we kept adding layers of white paint to a dark blue canvas, the image gets a little brighter each time.

Okay, let’s try 20x overdraw.

Nope, still 60fps.

O-kay. Let’s jump to fifty(!) times overdraw.

Finally the framerate dips down a tiny bit.

Okay. Let’s go to 80x overdraw.

Finally the framerate drops into the 30fps range that craptop users are reportingNote that I took these screenshots after the fact, and I took them with 50x overdraw, not 80x. So don’t be confused by the fact that the fps counter in the corner disagrees with this write-up..

I have to say: The power delta between shitty integrated graphics and a proper graphics card is truly staggering. Just imagine how much larger this difference would be if I had a high-end card!

Not an improvement.

Not an improvement.

I fly around the game a little. After a while I notice that the game runs more slowly when there’s a lot of rock on the screen. If I’m flying around an empty chamber, the frame rate goes up a little. If I pass through a narrow tunnel, the frame rate dips.

That’s a bit counter-intuitive.

Maybe it’s the shadow system? If there are walls nearby, then it needs to muck around projecting shadows from them. I’ve spent a lot of time tuning this stuff and the system is supposedly pretty lightweight by now, but maybe I bungled something? So I turn off shadows to see if that helps.

The fps drops into the single digits.

That’s… really interesting. I would expect no change. Barring that, maybe a tiny increase. But instead the game slows down to unplayable levels?

So the walls are hiding something that’s slowing down the game? Imagine Bob Ross paints an intricate little homestead with people, vehicles, livestock, and a windmill, but then he paints a hill in the foreground that completely obscures it all. It would take him ages to paint, even though the final product looks like a simple picture. I can’t imagine what could be back there, but I turn off the wall rendering layer so I can get a look.

Again, ignore the FPS counter in the corner. I rarely remember to get screenshots when I`m solving a problem, and creating the screenshots later can lead to re-enactment discrepancies like this.

Again, ignore the FPS counter in the corner. I rarely remember to get screenshots when I`m solving a problem, and creating the screenshots later can lead to re-enactment discrepancies like this.

Hmm. That looks… normal? I don’t see what’s causing…

Hang on. Let me turn off shadows…

Again, ignore the FPS counter in the corner. I rarely remember to get screenshots when I`m solving a problem, and creating the screenshots later can lead to re-enactment discrepancies like this.

Again, ignore the FPS counter in the corner. I rarely remember to get screenshots when I`m solving a problem, and creating the screenshots later can lead to re-enactment discrepancies like this.

Well, shit. I guess that explains it.

That region of solid cyan color shouldn’t be there. Instead we should be seeing through to the background layers. But here we have hundreds and hundreds of glowy spots all stacked on top of each other. The glow layer is supposed to be this sort of “aura” around the walls. There’s no reason to have those glowy circles drawn BEHIND the walls where they can never be seen. When I turned off shadows it had to draw all of that crap twice as much, since it was drawing it in both the shadow and non-shadow passes.

Looking through the code, it looks like I inverted a bit of logic. Instead of “never ever put a glow spot where you can’t see them” we wound up with “always do exactly that”. I made a bug that would slow down the game, and also conceal itself visually. That’s diabolical. And stupid.

It’s a simple fix:

After removing all of the pointless glow that you couldn`t see anyway...

After removing all of the pointless glow that you couldn`t see anyway...

This screenshot sort of undersells just how bad the problem it was. That area on the right was stacked thick with glowy spots. Perhaps a third of the screen area looks different in the screenshot, but in practical terms we’re drawing a fraction of the previous load. (Working it out exactly would require doing some really annoying surface-area calculations that I’m too lazy to work out.)

With this fix in place, the game can keep up a stable 60fps even with the GPU-killing 50x overdraw in place. That is, the game can hit the target framerate even if it has to draw the level fifty times every frame.

I don’t know if this will fix THE problem, but it was certainly A problem.

The final product looks exactly what we started with, except it’s now achieving that image with a fraction of the work.

This is actually just the first screenshot again, for comparison.

This is actually just the first screenshot again, for comparison.

I anticipate one objection from graphics programmers:

Hey Shamus, why don’t you draw the level front-to-back, instead of back-to-front? If you did that, then this bug would have been harmless. None of the extra glow crap would ever have been drawn, since it would have been skipped on account of being behind the walls.

That’s how the project was originally when I was working solo. The art style was all hard-edged pixels. But when I teamed up with Pyro, the artists wanted to smooth all the coarse edges off the pixels. Giving pixels soft edges means that you have to draw stuff back-to-front, or there would be ugly transparency problems where the walls met the background.

My to-do list is getting short these days. The game will be coming out in a few months. The artists will probably wrap up their work around the end of the year, leaving Arvind and I to worry about bugs and marketing.

So, uh… buy my game?

Enjoyed this post? Please share!


[1] Crappy laptops, or any similarly under-powered machine.

[2] I forget why we draw it twice, but I know there’s a good reason. Something about how it needs to interact with the various layers to brighten them.

[3] Note that I took these screenshots after the fact, and I took them with 50x overdraw, not 80x. So don’t be confused by the fact that the fps counter in the corner disagrees with this write-up.

2020202010There are now 90 comments. Almost a hundred!

From the Archives:

  1. guy says:

    Maybe it’s the shadow system? If there are walls nearby, then it needs to muck around projecting shadows from them. I’ve spent a lot of time tuning this stuff and the system is supposedly pretty lightweight by now, but maybe I bungled something? So I turn off shadows to see if that helps.

    The fps drops into the single digits.

    Me: WHAT?

    Mandatory dumb car analogy: It’s like taking your foot off the brake and having the car jerk to a complete stop.

    • BenD says:

      Yeah, same. I am not even remotely a programmer, yet “I turned off a system that requires power to run, and the apparent power output sank through the floor” is so startling that I went back and read it several times. Stunning! Glad you found this bug. Looking forward to hearing how the craptop testers report in. ;)

      • Daemian Lucifer says:

        And if you followed his ramblings on twitter,you wouldve been boggled by the “why” for quite a few days now.Finally got to read the reason for this inconceivable blunder.

    • Xeorm says:

      I ran into a funny reaction like that somewhat recently. Was experimenting with working on android programming, and only doing some basics. Following a tutorial (mostly) all I was doing was drawing an image on the screen. Put in some more code to track the fps, and found I was getting a measly 15 fps. I was doing things differently, but not that differently, but I changed things closer to the tutorial, still no real change.

      Curious at how bad things were, I drew 100 of the same image across the screen, and was surprised to see the fps counter at close to 20 fps. Very confusing. Stopped using the emulator and started debugging with the phone and got 60 fps no problem. Didn’t think the emulator would be that bad.

      • Geebs says:

        The android emulator is notoriously slow because it actually does emulate the whole widget; the iOS simulator is much faster because it runs all of the objective-c stuff as native code on the developer’s actual computer. Emulating an Android program which requires OpenGL will generally be much slower than 2D stuff.

    • Volfram says:

      Basically it means that whatever was causing the slowdown *only* existed in the “light side” draw phases(was my thought when I read it). For the car analogy, it would mean that whatever’s causing your car to stop is caused by the brake being entirely disabled.

      I assume that by “lots of glowy bits,” what Shamus actually means is “every single pixel which was behind a wall(and not in shadow) had a glow image attached to it.” Small wonder he was getting slowdown.

    • Phantos says:

      Stuff like that is why I refuse to learn how to code. I would run out of the patience to deal with problems like that, which seem to have no end and never seem to make any sense.

      Every game is a tiny miracle if it manages to function at all.

      • Bitterpark says:

        But what about the thrill of the bug hunt? It’s like solving a case, or diagnosing a disease: eliminating possibilities until there’s just a couple threads to follow, and then following the chain of connections until you finally find the point of origin and nail that bug!

        Granted, doctors and investigators rarely get the urge to slap themselves for causing the problem in the first place, but still…

    • Dt3r says:

      Same reaction here! Counter-intuitive problems like that always have interesting explanations. It comes up a lot in biology because of how complicated the systems can get. You spend years learning that X causes Y and suddenly something breaks that model. It can be maddening, but it’s also a great learning opportunity. (And thanks to Shamus for teaching us!)

      X causes Y (except when it doesn’t)
      Always remember the second half.

  2. hemebond says:

    So, uh… buy my game?

    Releasing on GNU/Linux x86_64?

  3. Nidokoenig says:

    Regarding craptops, have you considered asking recycling centres and computer repair shops? The PC parts you can get there should be cheap and suitably crappy, though time cost to get them working would be a factor. You might end up solving problems specific to dodgy parts, though.

    • Muspel says:

      Clearly, the proper solution is to start mailing Shamus all of our old, broken computers.

      • psivamp says:

        I’m not against it. SMART systems have secure drive erasure and that ’09 MacBook under the pile of trash near the desk isn’t doing me any good.

        Bonus: It has a SSD and already runs 2 OSes — could easily be 3. And it has a sweet Firefly sticker on the cover.

      • Felblood says:

        –but then what would I use to compute with? That’s all I own.

        It’s funny; all these computers were considered top of the line gaming PCs, workstations and even rendering servers, once upon a time.

        Now they are just craptops that barely run Warframe if you spend months performance tuning them.

      • Dt3r says:

        Haha, might not be a bad idea. Remember the diecast episode with Roses?

    • Rosseloh says:

      Can’t speak for all computer shops, but we have a pile of crappy stuff lying around that, assuming I knew you (obviously this is just a thought experiment since Shamus lives rather far from me), I would likely just give you for free. Or at least lend you one, on the condition that you bring us pizza or donuts or something. The kicker with ours is that they all work – we wouldn’t keep them around otherwise.

      Hell, if you wanted to check if GR would run on Windows Fundamentals For Legacy PCs, we have something running it. It’s got a P3. Granted, it also has a graphics card, but that’s easy enough to rectify. (We actually had it running a FEAR Combat server at one point…for…some reason.)

  4. Rodyle says:

    So how bad of specs should I be thinking of when I hear ‘craptop’? 4 year old netbook bad? What kind of other stuff can’t these people do on their computers properly?

    • Lanthanide says:

      For me, the number one aspect of a ‘craptop’ is an onboard integrated GPU. And probably a crappy one at that.

      • Felblood says:

        My Q45 chipset is definitely the weak link in this computer, but I’m having trouble finding an upgrade card option, becasue the mobo doesn’t support PCI-E 2.1

        Does anyone here spend enough time with old cards to recommend a PCI-E 2.0 x16 card?

        I honestly don’t know what’s good, and anything has to be better than a Q45, so I’m very open to suggestions here, but the price has to be dirt cheap.

        If you really know your stuff, here are some bonus challenges:

        I’d prefer something with nVidia PhysX support.

        I’d like both VGA and HDMI outputs, but I know that one of the chief reasons that you can’t run PCI-E 2.1 cards on older hardware is a lack of power, so I suspect anything that does that will want 2.1 voltages.

        EDIT: Oh, I almost forgot to mention the other important requirement. It has to be SFF compatible. Good luck!

        • John says:

          You probably want something along the lines of a GeForce 9 series card–by which I mean something with a name like “9600 GT”. I know those are PCI Express 2.0 cards. I did a lot of antiquated graphics card research earlier this month and it looks like PCI Express 1.0 support stops with the 8 series. (The most powerful card I could find for my antiquated PCI Express 1.0 motherboard was the GeForce 8600 GTS. I ended up settling for a less powerful 8600 GT and saved myself $25.) Where 2.0 support ends, alas, I do not know. Nor can I tell you much about VGA and HDMI outputs. I would have preferred a card with one or both of those myself, but the card I bought only has s-Video and DVI. I’m using a DVI-to-HDMI adapter. It looks fine, but there is–obviously–no audio signal to the TV and I have to use external speakers. You may be out of luck there.

          I did my graphics card research on the Passmark GPU benchmark site. They’re pretty good about telling you which standards each card (type) generally supports, though you should always check with a specific card’s manufacturer to be sure.

          • Cilvre says:

            everything that is pci 3 through pci 1.1 will work in any pci express 16x slot, hurray for forward thinking backwards compatibility!

            • John says:

              I’m pretty sure I knew that you could use an older card in a newer slot–that sounds very familiar–but I absolutely did not know that you could also use a newer card in an older slot. Huh.

              • AdmiralJonB says:

                I can confirm this. I’ve got an NVIDIA 970 GTX (which is PCI-E 3.0) in a 2.0 slot and it runs perfectly fine. From what I gathered on the internet, I’m unlikely to notice any difference as a 2.0 slot is fast enough for anything you need graphically (or rather, there would be other limitations such as hard drive speed).

            • Felblood says:

              Sadly, this was originally a PCI-E x16 1.0 slot, and firmware patches will only take me as far as 2.0. The hardware just doesn’t have the capability to power a good 2.1 card.

              That is still some really impressive engineering, but my mobo is just slightly too old to have benefited from it.

              Which is a shame, because somebody gave me a pretty good ATI card, that I can’t use becasue it demands 2.1

        • Cilvre says:

          actually you can use pci express 3.0 cards on the lower 2.1 and 2.0 slots. They just wont run at full speed but through testing, this was found not to be a huge issue yet. You just have to have a power supply capable of running them.

  5. Tektotherriggen says:

    Seeing all that glowy light – did J. J. Abrams get access to your source code somehow?

    “Needs more lens flare!”

  6. Cuthalion says:

    That was fascinating. I’m a novice programmer, so these sorts of gotchas are always nice to keep in the back of my mind for future reference.

    • Lanthanide says:

      Eh, this specific problem is very particular to the type of game and the various decisions that led up to the current rendering pipeline. It’s not very illustrative in itself.

      The much more useful part is the debugging approach that Shamus used to track down the problem. Generalising that debugging technique (simulate slowness, then tweak individual settings one by one) allows it to be applied to almost any project.

      • Cuthalion says:

        Well, since my game is also a 2d, back-to-front-drawn game, I could totally see something along these lines happening where draw order hides a problem.

        But yes, the technique is more broadly useful.

  7. AR+ says:

    So since you’re not actually running it on slower systems…

    How do you plan to specify the minimum requirements?

    • Lanthanide says:

      Shamus may not be running it on slower systems, but that doesn’t mean other people aren’t (or won’t be).

      Shamus is specifically trying to debug a slow performance problem, which needs hands-on access to the slow platform with his eyes in front of it.

      Generating systems requirements isn’t Shamus’ job, and will be done once all of the performance problems like the above are solved.

    • Alexander The 1st says:

      “If your computer is more than 80x slower than these specs that I used, it’s below the minimum requirements.”

    • AileTheAlien says:

      Steam seems to think that these are the minimum specs:

      2 GHz Intel Core 2 Duo or AMD equivalent
      2 GB RAM
      OpenGL 2.1 compatible graphics card with 1 GB memory
      1 GB available space

  8. Aerik says:

    Shamus, do you see Good Robot as a good game to play with the Steamtroller?

    You said that the Steamtroller was a solution in search of a problem, so, uh, maybe you can be the problem!

    • Lachlan the Mad says:

      Based on what we know about the game from the dev blog, I think that the Steamtroller would work super well, because it would give you extremely good control over both your movement and aiming directions. Although I would suggest using the left touchpad as a “steering wheel” and a left trigger as the “accelerator” if you wanted to play in a dual-touchpad way.

  9. Chris says:

    So, uh… buy my game?

    You truly are the Don Draper of our generation.

  10. Hector says:

    Something I’m curious about: did you cap your frame rate? Given the kind of graphics technology at work, and even accounting for a heavy level of physics calculation, it seems like you could easily achieve 100+ fps without really taxing your hardware.

    • Shamus says:

      I’m using SDL to talk to Windows, and SDL 1 (no point in migrating to SDL 2 so late in the project) has a firm 60fps cap. When you call SDL_GL_SwapBuffers (), SDL actually just waits for the next refresh before it returns. It’s actually really annoying.

      • Da Mage says:

        I don’t know if this also works for SDL 1, but with SDL 2 it will cap at 60 until you turn off vSync in the Nvidia control panel. Might be worth checking, in case you game has problems at high framerates and you need to build a cap yourself.

      • Volfram says:

        I figured I could use that(in SDL2) to automatically figure out what my monitor’s refresh rate was. I don’t have any displays with >60Hz refresh, so I haven’t been able to check if it actually matches the display refresh rate.

        Moving to SDL2 was interesting. It supports multiple windows, which is nice, and according to the migration wiki, “doesn’t lose the OpenGL context when you resize the window,” which I…assume? means I don’t have to cache and retrieve the GPU buffers every time the display gets upscaled?

        It seems to detect the mouse properly when I upscale now, and that was my only metric before, so…

      • Hector says:

        Thank you for responding. The numbers just didn’t quite add up in my head.

      • AdmiralJonB says:

        I’m not suggesting you do migrate to SDL2, but just in case one day you feel like experimenting, in my own projects I’ve found this to be incredibly simple. The only changes really was how you created a window, and I think creating the surfaces as well, but otherwise a very minimal thing.

  11. Piflik says:

    In your last post, you had a wireframe shot of the game, and I wondered about the overdraw there. I actually wanted to comment on it, but then forgot about it after reading the whole text. I initially thought these were overlapping rock-sprites to make the walls black, which would have been strange enough, but invisible glow-sprites are even worse.

    Regarding rendering front-to-back: this is also possible with transparency. You can render transparent object back-to-front or front-to-back, it just has to be consistent. Usually you would have to sort the object, but in your case you have distinct layers that can be drawn one-by-one (no need to sort within a layer, since the objects are either completely opaque or, with the glow, additive). The only thing to not is that the equations to calculate the resulting color and transparency are slightly different:

    Usual back-to-front:
    Color = sourceAlpha * sourceColor + (1 – sourceAlpha) * destinationColor
    Alpha = sourceAlpha * 1 + (1-sourceAlpha) * destinationAlpha

    Color = destinationAlpha * sourceAlpha * sourceColor + destinationColor
    Alpha = (1-srcAlpha) * destinationAlpha

    (Source is what is drawn in the current pass, destination is what is already in the framebuffer, if that is not clear…in the back-to-front case, destinationAlpha is usually always 1, in the front-to-back case, it is initialized with 1)

    More on this here

    • Richard says:

      Thankyou! I’ve been trying to find that for weeks…

    • Draklaw says:

      This won’t help with performance without some culling. Fragments that fall behind an opaque surface will still be rendered and blended. A solution would be to write depth only for fully opaque fragments so that the early depth-test does the culling, but I’m quite sure this can not be done in a single pass. And doing a pass for the color and a second pass for the depth would require switching shaders all the time… So I’m not convinced there is anything to win here.

  12. Diego says:

    Nice problem solving. I could get a rush just by reading about it. Are we weird people?

    And, just cause you’re asking nicely, I will buy the game :P

  13. The Snide Sniper says:

    Some other things you can do:

    Partial front-to-back rendering. If you can identify solid regions or objects, draw them front-to-back, then render (possibly) transparent objects back-to-front.

    Glow as “fringe” quads. It looks like you’re drawing tons and tons of blended quads. From my days back with a craptop, I remember blended surfaces being significantly more expensive than solid or early-exit fully- transparent surfaces. You could eliminate some remaining overdraw by rendering the glow as an extrusion from the wall surface, with a few extra triangles thrown in at bends.

  14. Da Mage says:

    ahhh fill-rate, the bane of any graphics programming.

    You talk about turning systems off until the problem was found, but did you also go through and check for any slow-downs in your fragment shaders? I’ve often found errors or over-complexity in the fragment shaders is what causes fill-rate problems.

    Also, how many different shaders do you use in Good Robot? It would be interesting to know.

    • Shamus says:

      One frag shader. Very small and simple. In fact, here’s the source:

      #version 150 compatibility

      #define TEX0 gl_TexCoord[0]
      #define SPRITE_GRID 32
      uniform sampler2D uni_texture;

      void main()
      gl_FragColor.rgba = texture (uni_texture, TEX0.xy)*gl_Color;

      • Shamus says:

        For context: The only thing I do with the shaders is move all the T&L and sprite lookup onto the hardware.

        “Here’s a sprite centered at XY, using sprite entry S from the atlas, with a size of N, rotated A degrees. There. YOU figure out the vertex and UV values.”

  15. Mephane says:

    So, uh… buy my game?

    The stage is empty. Brightly illuminated it sits bare, before a simple, unadorned grey wall and a few blue lights where it meets the ceiling.

    Enter a red-haired man in his mid twenties, clean-shaven, clad in sneakers, trousers and a vibrantly red jacket worn openly on top of a simple white t-shirt. His face contorted into an expression of eager and angry excitation, masking his inexplicable glee, he grabs into his left trouser pocket. Thus he produces a loose bundle of hundred dollar bills and silently waves them face-high towards another person off-stage, unseen. He then lets his arms droop, turns to face the audience, and with a serene, almost zen-like expression on his face, performs a deep, humbling bow as the audience bursts into applause and the curtain falls before the man, still unmoving in his bowing pose.

  16. Mephane says:

    And now for something completely different: How much say do you have about the music in the game? I would assume it is being made precisely to your tastes which you outlined in your various music-making posts. :)

  17. Herecomesjohnny says:

    Like reading a murder mystery or detective story.

  18. MikhailBorg says:

    I don’t know who on your team decided on an OS X version, but you just sold at least one more copy. Thanks!!

    • Fists says:

      I could be mis-remembering or making this up completely but I think cross-platform compatibility might have been one of Shamus’ original design goals back when this was a little hobby/blog project. A project to help him separate from Windows dependent programming.

  19. Steve C says:

    I read a couple of articles today about the[url=] inner workings of a a successful Indie developer[/url] (Amplitude Studios). It’s an order of magnitude bigger than Pyrodactyl Games (50 employees vs 5). Still thought it was interesting and possibly relevant. I particularly liked the [url=]video talk at GDC about managing community driven development[/url]. It talked a lot about transparency.

    It made consider how Shamus is the king of that philosophy. He turns his bugs into interesting content that can stand on it’s own. And mentioning bugs get feedback like [i]”huh, interesting. Also I’ll buy your game just as soon as you put it up for sale”[/i].

  20. Steve C says:

    Hmm. I’m not seeing my comments (sometimes) going into moderation anymore. They aren’t being displayed at all. I wonder, they being deleted automatically?

    • Shamus says:

      I’m never sure what angers the Spam Filter Gnomes, but in this case I THINK the problem is you used BB code (common on forums) in the comments, which only recognize (bits of) HTML.

      I did find your comment and rescue it. (It was on top of the pile of 12k spam comments. If you’d waited an hour to say anything it probably would have been too hard for me to find.)

      Anyway. So that’s what happened.

    • Steve C says:

      Oops. I’ll reformat it and repost it since that is a mess and I can’t edit it. I must have inadvertently done that a few times since this isn’t the first time it’s happened. Thanks.

      I would have easily noticed that error and edited it if it had been posted. It is ironic that in a post about an easy bug to see, but covered up by walls, was essentially what happened to my comments.

  21. WILL says:

    Unity3d has a nifty little mode that lets you view the overdraw directly, and it’s relatively simple to implement.

    • Volfram says:

      Yeah, that should be as simple as turning off textures and drawing all polygons at a fixed opacity. I could probably implement it in my engine in 5 minutes if I didn’t already have the whole graphics pipeline in pieces on the workbench undergoing a complete overhaul.(single-threaded to multi-threaded. Required a little more forethought than I’d previously anticipated.)

  22. Steve C says:

    I read a couple of articles today about the inner workings of a a successful Indie developer (Amplitude Studios). It’s an order of magnitude bigger than Pyrodactyl Games (50 employees vs 5). Still thought it was interesting and possibly relevant. I particularly liked the video talk at GDC about managing community driven development. It talked a lot about transparency while being transparent how they accomplish that.

    It made me consider how Shamus is the king of that philosophy. He turns his bugs into interesting content that can stand on it’s own. And mentioning bugs generates feedback like ”huh, interesting. Also I’ll buy your game just as soon as you put it up for sale”.

    • Volfram says:

      I actually started my own blog to do exactly that. It turns out I’m not as good at it as Shamus is…

      Which still wouldn’t be a bad thing if I wasn’t actually objectively terrible at it.

  23. MadTinkerer says:

    Well I have a now-ancient Dell laptop from 2009. It has Windows 7 installed (I won’t install Win 8 or Win 10 on any computer I own ever), and runs every pre-Portal 2 Valve game at maximum settings. Portal 2 only needs a few tweaks down from maximum settings. DOTA 2 and CS:GO require some serious compromises to run at decent framerates.

    The Muffet fight in Undertale causes a little bit of framerate drop (likely due to all the under-the-hood 3D effects used to make the “2D” graphics scroll and scale during the fight), but otherwise Undertale runs perfectly.

    Most games either run fantastically (almost the entire GoG catalog) or I know is pointless to even attempt (Most AAA games from 2011 onward).

    Gone Home, Dear Esther, and The Stanley Parable all run great at max settings (possibly because of no character animation in any of those). FRACT OSC slows to a crawl. The Magic Circle is too choppy once several characters are in the same area.

    Nearly every non-Crysis pre-Bioshock 2 FPS can be run at max settings. Bioshock and Bioshock 2 run fine at almost max settings. Bioshock Infinite requires fiddling with settings, but is playable.

    Skyrim requires a few compromises but rarely slows down once those compromises are met. Fallout 3 requires a few tweaks. Probably not going to attempt to play Fallout 4 on this machine.

    Every pre-Serious Sam 3 game, including the remakes of First and Second Encounters, can be run at max settings. However, Serious Sam 3 has too many issues even at minimum settings. :(

    Minecraft actually destroyed my hard drive once when it was still in warranty. When I got the new HD, I jury-rigged a external laptop fan and a mini-fan so that heat will never be a problem again. Eventually Notch figured out the overheating problem (it wasn’t just my machine: it affected a lot of older devices at the time) and patched it and I forgave him. But I still keep the jury-rigged air-cooling system on at all times just in case. My “laptop” has been a desktop for five years.

    Oh, and Minecraft works perfectly now. Other than max draw distance, every other setting can be maximised with no problem.

    So if that qualifies as a “craptop”, and you want me to test the Alpha for you, let me know.

    EDIT: Oh, and I’ll definitely let you know how the final version runs, regardless.

    • Tuck says:

      (I won’t install Win 8 or Win 10 on any computer I own ever)
      I’m hardly a Microsoft fanboy, but all three of my machines run better after upgrading to Win 10. There are definitely some things I dislike about the OS (lack of proper update control is the main one), but the performance gain and stability is worth it for me. It’s simply a far lesser load on the hardware. Oh, I did purchase and install StartIsBack to give me a proper Win 7 style start menu, but from memory that cost all of £3 when I originally bought it for Win 8. There are few really valid reasons for not upgrading.

      Of course, it might not be possible to get compatible hardware drivers for a 2009 laptop. :)

      • MadTinkerer says:

        I’ve heard good things about performance gains and various new features. Here are the things I consider more important than all the nice new features:

        1) I once explained to my Apple-product using friends that Apple products were fine consumer products, but Windows machines were more like power tools. Like how a VCR belongs in the living room, but a power saw belongs in the garage. This was before Windows 8 was announced.

        What happened next, I don’t compare to categories of consumer goods vs. tools. Instead, I use this analogy: Company A was really good at making adding machines. Really REALLY good at making adding machines. Then they saw that Company B was making a lot of money off of their rotary-dial phones because rotary-dial phones were popular. So Company A decided that all of their adding machines needed rotary-dial interfaces because that was “the future”.

        Microsoft deciding to panic and follow Apple’s design philosophy (but not really, because Macs didn’t instantly use the iOS interface when the iPhone 1 was released YOU CRETINS) means that by definition they’re just following whatever trends they think have been established. By definition they’re failing to lead and have long lost all confidence IN DESKTOPS as a CONCEPT. Microsoft LOST CONFIDENCE IN DESKTOPS. This is like… New Coke. Exactly like New Coke. But worse, because Microsoft failed to learn what Coca-Cola learned.

        2) I haven’t actually used Windows 8 myself. But my brother’s user experience with it is more than enough to convince me that I really don’t want or need to give it a chance. I’d really rather just finally learn the Linux interface.

        3) “lack of proper update control is the main one” FUCK NO

        4) There are articles titled things like “How to turn off spying in windows 10”. Shit like that is unacceptable. Regarding Windows 8, I am mostly just annoyed and frustrated and disappointed with Microsoft. Regarding Windows 10, it’s a matter of principle. LINE DRAWN, STAND TAKEN.

        5) Other reasons, but I’m supposed to be asleep now, and my brother is complaining about my typing.

        • Daemian Lucifer says:

          What made me 100% certain that I will never get windows 10 for my home computer is that it has an actual key logger integrated in it.Meaning it can log every key you press.And send it to microsoft.Even turned off,I dont want that shit on my os.Though I will definitely advertise that fact to anyone who is managing a company,theyll love it.

          As for your talk about what happened to windows,thats the sad thing of what happened to the opera browser.It used to have all these neat features that everyone copied from them(it introduced tabs,amongst other things).Then they decided to stop innovating and ape chrome.This led to them removing bookmarks(amongst other things),and only got them back after a huge backlash from the users.Its still a cool browser,but not nearly as powerful or as innovative as it used to be.

          • Morden says:

            If you haven’t already heard of it, and want an alternative to old Opera, you might want to check out Vivaldi.

          • Zak McKracken says:

            Yeah, I’m still hoping that vivaldi will eventually have an e-mail client with frictionless import of all e-mail stuff from Opera. I’ve got over 15 years worth of e-mail, all sorted into categories and stuff, and the procedure for moving to Thunderbird or anything else requires one to export every single folder by itself, and looses anything but name and e-mail address from your contacts. … that would take me days to do, and I’d still lose a lot of data…

        • John says:

          Unless you’re referring to the command line, there is no such thing as “the Linux interface”. No, it’s much worse than that. There must be dozens of Linux GUIs by now. Every time one developer does something another developer disapproves of even slightly, a new fork is born and the number of GUIs increases.

          • Volfram says:

            personally I’m a fan of KDE and Cinnamon. Gnome tries too hard to pretend it’s a mac, and every time I’ve ever tested it out, it’s run more poorly than the KDE equivalents. Cinnamon seems to be fastest and it strongly resembles everything we liked about Windows 7’s interface.

            I’m currently running Win7 on a laptop and Win8.1 on a desktop. The Metro interface isn’t nearly so bad if you have both a touchscreen and a mouse, but if you don’t have both a touchscreen and a mouse, it’s a wreck.

            My biggest objection to Win10 is that since they integrated GFWL, they also had to integrate the draconian GFWL EULA, which includes a class action waiver.

            And those can go screw themselves with a rusty chainsaw. In the butt.

            • John says:

              I’ve heard good things about Cinnamon. I definitely plan to check it out if I ever work my way up to hardware that doesn’t absolutely require the lightweight-ness of LXDE.

              • Volfram says:

                Just run off a Mint liveUSB, that’s what I did.

                Speaking of which, HUGE fan of Linux Mint. My older laptop runs like it did 3 years ago when I boot it with Mint instead of Windows 7.(At least, when I’m in Live mode and it’s running from a ramdisk instead of suffering through the 5mb/sec USB bottleneck when I load Persistent mode)

                • Chris says:

                  I’ve been running Mint on my laptop for a few months now. I quite like it. I had a ton of stability problems with Ubuntu (and I lack the Linux skills to track down the problem), but Mint is a rock. Cinnamon is a very comfortable UI for a long-time Windows user. The only problem I’ve encountered is that there isn’t any support for the AC radio in my WiFi NIC because it’s still on kernel 3.19.

              • Zak McKracken says:

                I’ve heard good things about Bodhi Linux for older machines.
                That said: I’m running the latest KDE on a 7 year old Desktpop, only with some desktop effects turned off, and it’s fairly smooth sailing.
                And I just realize that there’s the “Trinity” desktop, a fork of KDE. So if you like KDE than that might be your thing. Similar with Mate if you liked the “old” Gnome 2.0. Both are aiming to be lightweight and work on older machines.

        • Chris says:

          All the 8 hate is very easily remedied. I dropped $8 for Start8 and ModernMix, and every complaint I had about Windows 8 evaporated immediately. Once you override the lousy UI decisions, 8.1 is great. It’s much snappier and more responsive than 7 was on the same hardware, and most of the little tweaks MS made to the desktop shell are fantastic.

          If you’re opposed to spending a few dollars to get rid of the tablet stuff, Classic Shell is free.

  24. SlothfulCobra says:

    Wow, that’s a real neat way to discover a problem. It’s like lifting up a rock to see a million bugs skittering around underneath.

    Or in this case, one bug drawn a million times.

  25. Draklaw says:

    I know I’m late, but there are some tricks you can use if your fix happen to not be enough.

    First, you might be able to reduce the fill rate by enabling alpha test if you don’t do it already. Just set it the alpha threshold to 0 so it will only discard fragment that are fully transparent and thus skip the blending. I guess it should work well with crapware. Alternatively (depending on which GL version you are using), you can discard these fragments in the shader, but I expect it to be less efficient on low-end GPU.

    The second idea is a bit more complicated to implement. You can do a first depth-only pass where you render the walls. By “the walls”, I mean the polygons you use for the shadow computation. I guess the aliased edge it will create are hidden when drawing the real walls, so it should not be a problem. Then you render everything which should be behind the wall at a deeper depth with depth test enable. The GPU should cull all the fragments that fall behind the wall before even hitting the fragment shader (that should be roughly half the fragments used for the glow).

    Well the second idea may be overkill, but the first is definitively easy to implement. Two lines in the initialization code I guess:
    glAlphaFunc(GL_EQUAL, 0.0f);

  26. WJS says:

    Has it really been that long? I’m pretty sure I remember you running into throughput issues on one of your previous projects on here. (You fixed it with a vertex buffer, IIRC) Oh well, I guess I’ll just have to read them all again. :)

  27. defaultex says:

    Had a similar problem with clouds in the graphics engine I was working on a couple years ago. At any given time the game had at most 2/3 of the sky visible. This sky was procedural using atmospheric equations combined with some estimation to eliminate the really complex stuff. Eventually I decided to rotate the sky based on geometric location and all of a sudden I seen big dips in framerate. At first I thought it was the sky, after all it’s doing some pretty crazy stuff with scattering to look really nice. Filled a 4096 by 4096 image with just the sky and framerate was still plenty over my target 60FPS. Then I added clouds and seen the framerate dip. Turns out clouds were rendering at positive and negative angles, not really mirroring due to using 4D noise but same locations above and below horizon. Was one lone dot product result that didn’t get clamped out of a ridiculous amount of math.

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *


Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>