The most central problem to rendering any wide open space is, “How do we avoid drawing everything in the entire world?” You can see this problem at work in open world games like Grand Theft Auto. You’re navigating around a massive city. There are literally tens of thousands (maybe even hundreds of thousands) of objects around the city: Street lights, dumpsters, trash cans, newspaper stands, benches, trees, mailboxes, awnings, telephone poles, parking meters, street signs, traffic cones, trash bags, chain-link fences, and jersey barriers.
And crates. Can’t forget the crates.
I don’t care how much horsepower you have, how much memory you’ve got, or how many surfboard-sized graphics cards you glue together and jam inside your PC: Taking all of that clutter for the entire city and hurling it at the graphics hardware would be ruinous.
So the game needs some way of controlling what things get drawn and how detailed they are. The trash cans two miles away? They don’t even need to be in memory. The street lamp four blocks away? That gets drawn, but we’re going to draw a crude simplified version of it, probably a simple vertical beam with four sides. At this distance it doesn’t matter how crappy it looks, we just need a little black pole to stand in for the real thing. But this mailbox right beside the camera? That needs to be rendered in full detail.
The process of sorting this out is called Level Of Detail. It’s a complicated and interesting branch of knowledge. The trick is that the optimal LOD solution will vary a great deal based on your project. The system used by Grand Theft Auto IV is going to be very different from the one in Spore, which is again different from the one in World of Warcraft. Or Far Cry. Or Minecraft.
The thing is, in the vast majority of cases LOD is something that gets sorted out by the CPU. In my very first programming project here on the site, I spent the entire time sorting through polygons and topography, figuring out what parts were worth drawing and what parts could be simplified. But we live in a strange world now.
You know what we need? A Terrible Car Analogy:
We’ve got a pizza place at the end of a mile-long driveway. Our CPU is the cook inside. He’s handling the phone, taking orders, counting money, making pies. He’s got a fleet of, say, a dozen drivers who each have their own personal jet-powered Batmobile. This fleet is the graphics cardOnce of the hallmarks of a really good Terrible Car Analogy is a really complicated metaphor.. The cook can just barely cook pies fast enough to keep all the drivers busy making deliveries. But for some reason the cook gets it into his head that he’s going to “save some time” for his drivers. He stops cooking and walks a stack of pies out to the end of the driveway so the drivers won’t have to drive as far to pick up their pies.
Maybe that was more a pizza analogy than a car analogy, but you get the idea.
In any case, this is what we’re doing when we spend CPU cycles trying to lighten the load of the graphics card.
Mostly. Usually. It depends.
It depends on how much crap you’re trying to draw. If you push enough staticIn this context, “static” is just programmer talk for “doesn’t change”. polygons and render enough pixels, you will eventually reach the limit of what the GPU and candle. And if the CPU is spending a lot of time idleMaybe this is a graphics demo with no physics, sound, AI, caching, or networking going on. then you might as well have it do something to lighten the GPU load.
Basically, we’ve got a two-part asymmetrical job being performed by a two-part asymmetrical pair of workers. On the PC, this makes for some serious coding challenges. Maybe this game is being run in the future (from the programmer’s perspective) and the user has some incredible CPU. But maybe they’re using a crappy integrated GPU that’s far behind the times. Maybe the user is running a very old computer with a slow CPU, but they’ve put a super-deluxe graphics card into it in the hopes of speeding things up. This wild and unpredictable load balance is why PC gamers want lots of graphics options to tweak. Something like FSAA is basically free for the CPU, but devours a lot of GPU power. At the same time, turning up the detail draw distance or model complexity will likely hit the CPU way harder than the GPU.
For the purposes of this project, we’re all about shoving as much stuff onto the GPU as we can. So many of my projects – being both low-tech and procedural – have been CPU heavy and GPU trivial. The goal here is to push the GPU to get a better feel for this load-balance stuff.
So our first job is to abandon the idea of wasting CPU cycles to save GPU cycles. The GPU can deal with it.
But that’s not to say we want to be completely irresponsible with our GPU. I mean, we don’t want to throw the whole 2048×2048 world at it every frame. That’s 20482 triangle pairs. Your graphics card can probably handle 8,388,608 triangles okay, but we can cut that number way, way down with some basic tricks that won’t cost us any CPU.
So what we do is this:
We build a mesh that’s dense in polygons in the center, but steps down in resolution away from the center. It’s a completely flat grid. It doesn’t need to be the size of the whole dang world. a 1km draw distance is pretty cool, but if the world itself is only 2km2 it’s kind of strange. It means the instant something vanishes over the western horizon it appears to the east. So let’s make our mesh just 1km on a side.
We can pack this perfectly static, unchanging grid into a vertex buffer and forget all about it. We never need to touch it once the program is running. I know the rainbow coloring makes it kind of hard to see what you’re looking atI do this when designing the mesh. The rainbow makes it easy to see if triangles are improperly arranged.. Here is a drawing to give the general idea of how it’s built:
|Hand-drawn. I forgot to cut a few rectangles into triangles, and technically the rectangles should all be cut going the same direction. But this is close enough. You get the idea.|
We make a vertex shader that will take this mesh as input. For each vertex:
- Look at where the camera is. Round its horizontal position off to the nearest grid pointEvery 16th grid point, actually. Otherwise the terrain feels “jittery” by changing very slightly as you move around..
- Add this rounded value to the position of the vertex. This will effectively shift the grid to always be directly under the player, with the highest detail right under them.
- Use this new value to figure out what part of the world this vertex is in. Then look in the heightmap, and lift the vertex up.
So instead of messing around with 8 million triangles, we’ve got just…
78,144. We can live with that. This is all done by the GPU. As far as the CPU is concerned we’re just rendering the exact same flat plane over and over, but once it goes through the shader it becomes this complex, seamless, infinitely tiling topography. And since we’ve got the camera position handy, it’s easy to make the land fall away in the distance, thus creating a faux-spherical world.
Are we rendering more triangles than we need to? Sure. A proper optimization could save us even more. But that would mean the mesh would need to be updated when the camera moved around, which would be the equivalent of our pizza cook hiking a mile to cut down on how far the Batmobiles have to drive. And yes the analogy is strange.
We need one more thing here. We need to color the landscape. Stuff touching the water should look like beach. Steep surfaces should be bare dirt instead of grass. Very steep should be rock instead of dirt. Since the landscape changes over time, we can’t just work this out at load time. We need to keep updating it. But the data doesn’t change quickly. This isn’t something that needs to be updated every single frame.
So what we do is create a background thread. It’s really low-key. It begins by downloading a copy of the heightmap. (Remember that the heightmap lives on the GPU, where it’s formed by the erosion shaders.) Then it passes over the landscape a little bit at a time, looking at height values and doing a lot of if/then/else if/else if/then/else fallback type logic to figure out if a given cell should be grass, dirt, rock, snow, beach. It comes up with a color value for each point, with slight variations so the world doesn’t get too monotonous.
When it’s done, it uploads the color values into a texture that we’ll be using on the terrain from now on. I’m calling this the color map, and it’s the same size as the heightmap. The two go together, with one defining the elevations and the other defining the color. Once the color map is uploaded, the thread downloads a fresh updated version of the heightmap and starts over. The whole cycle takes a few seconds.
Yes, technically we could probably offload this job to the GPU. But the GPU is really inefficient at branching logic, so a lot of power would go to waste. More importantly, it would be a pain in the ass to set this up. We need to keep a lot of variables around for comparing cells. Is one of my neighbors water? Is one of them rock? Am I rock? Do I have any neighbors that are dirt that aren’t touching sand? Gah. Doing that kind of logic with a shader would be horrendous. We’d need to make another shader to hold all of these values to keep track of all these little attributes of individual cells. We’d need yet another entire shader pass just to fill in the variables so we could then do another pass for filling in the color map.
And yes, maybe we would have to suck it up and do that if this was a time-critical job. But it’s not. The color map can go for several seconds without being updated without it causing any problems. There’s no reason to kill ourselves building complex interfaces to optimize low-priority tasks.
Moreover, the point of this project is to get some experience building a variety of typical shader systems from scratch. Using a shader to do something that outlandish doesn’t really advance that goal.
Anyway. It works. I think we’re nearing the end of my list of planned features and experiments. I want to put some foliage on the terrain and muck about with shaders a bit more. Then we need to see how performance is. I’m still thinking about Oculus Rift support. I’m not planning on adding VR to this little project, but for educational purposes I want to see how difficult it is to draw the entire scene 120 times a second.
 Once of the hallmarks of a really good Terrible Car Analogy is a really complicated metaphor.
 In this context, “static” is just programmer talk for “doesn’t change”.
 Maybe this is a graphics demo with no physics, sound, AI, caching, or networking going on.
 I do this when designing the mesh. The rainbow makes it easy to see if triangles are improperly arranged.
 Every 16th grid point, actually. Otherwise the terrain feels “jittery” by changing very slightly as you move around.
The Best of 2012
My picks for what was important, awesome, or worth talking about in 2012.
A Lack of Vision and Leadership
People fault EA for being greedy, but their real sin is just how terrible they are at it.
The No Politics Rule
Here are 6 reasons why I forbid political discussions on this site. #4 will amaze you. Or not.
Game at the Bottom
Why spend millions on visuals that are just a distraction from the REAL game of hotbar-watching?
Skyrim Thieves Guild
The Thieves Guild quest in Skyrim is a vortex of disjointed plot-holes, contrivances, and nonsense.